This guide contains instructions on how to set up and then monitor the NetDocuments OCR service. It also covers how OCR functions and what to expect when OCRing documents with NetDocuments OCR.
If you have any feedback, questions, or issues, submit a request via our support site.
Table of Contents
Many documents will exist in your NetDocuments repository that are image documents – including image formats like TIFF and JPEG, but also PDF documents where the content is like a photograph of the page eg. scanned documents.
There is no text information in the document that the user can search for, just millions of dots on a page of various colors and shades that represent an image of the document.
There is no simple way a person can determine if a PDF document is text-searchable. It can only be done by opening documents and trial and error searching or selecting for text.
That means that if a user tries to search for documents containing a particular word or phrase, the document will not be found. A user wishing to review a 1000-page document will find he/she can open it in their PDF application and select Find to bring up text to be searched on, however, no text will be found until the user waits 10-15 minutes for the document to be OCRd.
Neither of these scenarios allows the NetDocuments user to efficiently find and use their documents, which is one of the many reasons you have implemented NetDocuments in your organization.
What NetDocuments OCR Does
NetDocuments OCR will:
- Using OCR technology, create and apply a text layer to non-text-searchable PDF documents
- Achieve character recognition accuracy using the Tesseract OCR engine
- Convert image documents (BMP, JPEG, PNG, and TIFF) to text-searchable PDF documents that retain all their original image content
- Analyze MS Outlook emails (MSG) containing attachments that are non-text searchable PDF or image documents and convert the attachments to searchable PDF-format documents. Emails which are themselves attachments to the email and their attachments are also analyzed and processed in the same way
- Process your entire existing NetDocuments repository as well as instantly check and process any new documents you save
- Ensure that any annotations in PDF documents such as comments, handwriting, notes, and stamps remain as annotations for future editing
NetDocuments OCR Two Modes
NetDocuments OCR has two modes for processing documents. When you sign up to the NetDocuments OCR product via your NetDocuments account manager, it will be configured with:
- Only the Active Monitoring service, or
- Both the Backlog Processing service and the Active Monitoring service
Backlog Processing Service
The Backlog Processing service is designed to crawl through your existing NetDocuments repository, searching for documents saved over many years. It checks all documents that could potentially be processed and flags those that meet the processing requirements from those which do not require processing.
The documents that require OCRing (and optionally, compressing) are processed, and once each document is successfully processed, it is then saved back into NetDocuments.
At this point, the NetDocuments indexing engine will automatically index that document so text in the document can be searched using the full-text search features of NetDocuments.
The document will either be stored as a New Version or, as a future enhancement, replace the existing document version.
When is Backlog Required?
If you have been a NetDocuments customer for:
- 6 months or less, you must purchase the Backlog Processing service with NetDocuments OCR
- More than 6 months, you have the choice to purchase or not purchase the Backlog Processing Service
The only caveat to this rule is if you purchased the NetDocuments Ingestion Service from NetDocuments and your documents have already been ingested/added to your NetDocuments repository PRIOR to switching on NetDocuments OCR. In this case, you have the choice to purchase or not purchase the Backlog Processing service.
However, note that if you do not purchase the Backlog Processing service, then none of your existing documents stored in NetDocuments at the moment you switch on the NetDocuments OCR process will be OCRd or compressed.
Active Monitoring Service
The Active Monitoring service is designed to watch for any newly saved documents in NetDocuments. This service will test every minute for new documents that have been saved into NetDocuments, assess them for processing, OCR and, optionally, compress them as required.
All customers who sign up for the NetDocuments OCR process will have at a minimum this Active Monitoring service. The same assessment, processing and saving steps are performed as in the Backlog Processing service.
An Administrative dashboard including details regarding documents processed, documents in queue, and documents in your backlog will be available in the future.
Enabling Your NetDocuments OCR Service
After you purchase NetDocuments OCR, you will see a section in the NetDocuments Admin console titled OCR Dashboard. To enable OCR for your NetDocuments Repository, you will need to configure your OCR from this section. Select this option to continue to OCR configuration.
Configure Your OCR Service
Start Using OCR
Once on the OCR Setup page, select Configure OCR.
Turn on the toggle to apply OCR to the cabinet(s) you wish to have the service applied to and then select Next.
In the OCR Settings dialog, turn on the toggle to select which file formats you want to include in the search for files and email attachments.
In the Save Settings section, select the method you want to add the text searchable document back to NetDocuments for the cabinets you have turned on:
- Select the Save as new version option to create a new version of the document in NetDocuments.
- Select the Overwrite current version option to replace the current version with the new one.
NOTE: Overwriting the current version will save the original document as a document attachment to the OCR version.
Select the Retain 'Locked' status checkbox if you store documents in NetDocuments and sometimes use the Locked status indicator on documents and want to retain this setting on the new version of the documents saved by NetDocuments OCR.
In the Backlog Scope section, select how far back you want to OCR your documents.
IMPORTANT: Once you select the Update button, you cannot make any changes to the Backlog Scope.
Select the Update button.
In the upper-right corner of the screen, select the Settings icon to change the File Formats and Save Settings.
NetDocuments OCR FAQ
Please see our FAQ article by clicking here.