Evernote’s ability to search for text within images is a popular feature. In this post, I’ll describe how the process works and answer some frequently-asked questions.
How images are processed
When a note is sent to Evernote (via synchronization), any
Resources included in the note that match the MIME types for PNG, JPG or GIF are sent to a different set of servers whose sole job is performing Optical Character Recognition (OCR) on the supplied image and report back with whatever it finds. These results are added to the note in the form of a hidden—that is, not visible when viewing the note—metadata attribute called
recoIndex. The full
recoIndex node is visible when a note is exported as an ENEX file.
For example, I dug around and found an old note in my account containing only a single photo of a bottle of beer:
When I export this note as an ENEX file—a portable XML export format for Evernote notes—and jump to the bottom of the file, I’ll find the
recoIndex element. Contained within
recoIndex are a number of
item nodes. Each
item represents a rectangle Evernote’s OCR system believes to contain text.
item contains four attributes:
y indicating the coordinates of top-left corner of the area represented by the
item, as well as
h representing the width and height of the
As an image is evaluated for textual content, a set of possible matches is created as child elements to their corresponding
item. Each match is assigned a weight (represented by the
w attribute of the
item): a numeric value indicating the likelihood that the given match text is the same as the text in the image.
The OCR results are embedded in the note, which is subsequently synchronized back to the user’s client applications. At this point, the text found in the image is available for search.
Here’s a portion of the
recoIndex element found in the note shown earlier which contains
t (match) elements. You’ll notice that most of the
item elements have multiple
t elements and each is assigned the weight value we described earlier. When a user issues a search within an Evernote client, the content of the
t elements is searched:
How PDFs are processed
Evernote’s OCR system can also process PDF files, but they’re handled differently from images. When a PDF is processed, a second PDF document that contains the recognized text is created and embedded in the note containing the original PDF. This second PDF is not visible to the user and exists only to facilitate search. It also doesn’t count against the user’s monthly upload allowance.
For a PDF to be eligible for OCR, it must meet certain requirements:
- It must contain a bitmap image
- It must not contain selectable text (or, at least, a minimal amount)
In practical terms, this eliminates many PDFs generated by other applications from text-based formats, such as word processors and other authoring applications. PDFs that are generated by hardware scanners generally meet the above requirements. If the scanner software performs its own OCR on the PDF, it won’t be processed by Evernote’s OCR service.
If you export a note containing a PDF that has been processed by the OCR system, there will be two nodes in the document:
data node contains a base–64 encoded version of the original PDF and the
alternative-data represents the searchable version of the same PDF.
What kind of text can be recognized?
Anything that the OCR system believes to be text. Typewritten text (e.g., street signs or posters) and handwritten notes (even if your handwriting isn’t the neatest in the world) are both evaluated by the OCR service, provided the service can detect them.
The orientation of the text is a factor, as well. Text found within an image will be evaluated if it matches one of the following orientations within a few degrees:
- 0° — normal horizontal orientation
- 90° — vertical orientation
- 270° — vertical orientation
Text that does not match one of these orientations will be ignored (including diagonal and inverted text).
It’s important to remember that no OCR system is perfect and it’s possible that text you expect to be recognized may not be. That said, the OCR engine is being constantly refined and tuned for better accuracy.
Can Evernote’s OCR be used to create a text version of an image that contains text?
No. As described before, the matching done by the OCR system doesn’t produce one-to-one matches. Rather, there will usually be several potential matches for a given rectangle containing text and many of them will be inexact.
How long does it take for an image to be processed by OCR?
When a user syncs a note containing an image, the image is sent to the aforementioned group of servers for OCR processing. The system is queue-based, meaning the submitted image takes its place in line and will be processed after all other images ahead of it in the queue. Images synced by Premium users, however, are moved to the front of the queue ahead of all images synced by free users.
As to how long it will take, this depends on the size of the queue when the image is sent for processing. For Premium users, image processing generally completes within a few minutes (though, it can take longer in some instances). For free users, the wait can be substantially longer if there is a large number of images in the processing queue.
How many languages does the Evernote OCR systems support?
Currently, Evernote’s OCR system can index 28 typewritten languages and 11 handwritten languages. New languages are added regularly and existing languages are optimized and improved. Users can control which language is used when indexing their data by changing the Recognition Language setting in their account’s Personal Settings.
Can I use the Evernote API only to OCR images?
No. Using Evernote’s API only for the OCR capabilities is a violation of the API License Agreement.
Where can I learn more about the infrastructure that powers Evernote’s OCR system?
We have published two articles to the Evernote Tech Blog that outline the recognition architecture and processes in greater detail:
- Evernote’s Indexing System by Evernote VP of Research and Development, Alex Pashintev.
- Even Grittier Details on Evernote’s Indexing System by Evernote CTO, Dave Engberg.