In our blog post “Evernote’s Three Laws of Data Protection”, Phil touches on some of the measures we take to protect your data and our goal of being a trusted place for it. There is much more we do, so I wanted to talk a bit about an important aspect: what happens when hard drives fail?
You have probably read stories of people buying previously owned computers and finding they contain all sorts of information from the previous owner, sometimes including customer data. It is for this reason why at Evernote we take the decommissioning of these drives very seriously.
At Evernote we use both hard drives and solid state drives (aka SSD) as part of the infrastructure that stores user data. Hard drives are mechanical in nature and thus, as with all things containing moving parts, they will eventually fail. SSDs have a different failure mode: there is a finite number of times data can be written to the memory banks in each SSD after which they become read only. We use hardware RAID controllers and replication to provide redundancy for all our storage devices. This means that when a disk fails your data is safe. Furthermore, we take proactive steps to identify drives that may be failing by monitoring the media errors and predictive failure statistics the drives provide. If a hard drive reaches a threshold (# of media errors or predictive failure counter), we will replace the drive before it fails (usually within the same or next business day). Sometimes drives simply fail without warning; it is our policy to replace these drives ASAP.
The end result of all this being broken drives which may contain user data. The ATA instruction set has a Secure Erase function which will write over every track on the drive, thereby making it nearly impossible to recover the data. This is great but it requires a working drive. In our case, most of our failed drives are non-functional, so we cannot make use of this feature.
Drives are expensive and generally include a good warranty (three years or more). Manufacturers usually require that the customer return the failed hard drives in order to receive a replacement as part of the warranty program. Since our failed drives may contain user data and we cannot use the secure erase tools, we can’t send a drive in for repair/replacement and risk the data, no matter how damaged the drive may be. Thankfully most drive manufacturers offer a “Black Hole” replacement program for this very case. The specifics vary across manufacturers, but it generally requires that the customer send in the faceplate of the drive and some form or written statement attesting to the destruction of the drive.
Our overall approach to handling failed drives is to destroy them. However, we take a “belt & suspenders” approach to this process.
The National Institute of Standards and Technology (NIST) is a US government agency whose mission includes developing and publishing guidelines for other US agencies regarding technology issues. These guidelines are available online (free of charge) and are generally accepted as industry standards. The NIST publication 800-88 (“Guidelines for Media Sanitization”) covers both physical and electronic records. Our approach is based on this publication.
We handle failed drives as follows:
- Failed drives are kept in a secure location pending destruction.
- The face plate of the drives is removed (Figure #1, #2, #3); this requires a few different Torx bits (the iFixit kits are handy for this).
- The drive is then wiped using a degausser (Garner Products HD-2). Degaussing basically means “to de-magnetize”; this is intended to blank the drive (Figure #4).
- The drive is then physically destroyed using a device which bends/breaks the drive by driving a wedge into the drive (Garner Products PD-4, Figure #5 & #6). This renders the drive inoperable (Figure #7 & #8).
- The broken drive parts are then sent off for recycling.
- The drive faceplates are then sent to the respective manufacturers and the replacement drives are put into the spares pool (Figure #9).
The goal of this is to make sure that no user data is EVER at risk due to unsafe handling of the drives. This approach is in line with the guidelines from NIST and other standards bodies, which means these are tried and true methods.