Find information for NetDocuments OCR at NetDocuments OCR Admin Help.
NetDocuments OCR is an Optical Character Recognition (OCR) and image compression technology delivered as a secured cloud-to-cloud service without requiring any on-premises software installed.
NetDocuments OCR is a background process that continually monitors files imported either manually by users or mass imported into the customer’s NetDocuments repository.
NetDocuments OCR converts imaged-based documents and image files such as TIFF into text-searchable documents so the built-in NetDocuments full-text searching capability can make the documents accessible to all appropriate users. The output of the file is a searchable PDF added as the official version to the original copy. It will maintain the original modify date.
What technology does NetDocuments OCR use?
NetDocuments OCR is powered by ContentCrawler, a proven DocsCorp (www.docscorp.com) technology used by many law firms globally. NetDocuments OCR provides a unique and exclusive service to NetDocuments customers by delivering on its Cloud First, Cloud Only mantra. This aligns with the NetDocuments Trusted Cloud Platform Compute Fabric strategy encouraging the industry to migrate client and server technologies to the cloud and remove the burdens historically placed on the customer to maintain the installation and testing of software upgrades and compatibility challenges so often associated with legacy DM technologies.
What needs to be installed for the customer to use NetDocuments OCR?
There is no hardware or software installation required. The background process runs on the DocsCorp Cloud Platform, which is hosted on Microsoft Azure. It securely connects to the NetDocuments Cloud Platform. Read more on Azure compliance. This implementation is a strategic integration where both company developments teams worked together to offer an efficient service.
What are the OCR options available?
Two services that customer can subscribe to and enable; in most cases, customers need both:
- Backlog/Bulk Upload (initial one-year minimum commitment). Crawls through the entire repository and processes the appropriate documents, which applies to new or existing NetDocuments customers just adding OCR to their existing repository, or a customer, due to a single event, increases its user count by 10 percent or more. A merger is such an example.
- Active Monitoring (main service after the initial year). Automatically processes documents/images as they are added or imported into the NetDocuments repository either manually or in bulk.
What are the supported OCR features?
NetDocuments OCR provides the following features:
- The OCR engine is Abbyy FineReader 11+.
- Supports the following file formats: TIFF, JPG, PNG, PDF, and MSG.
- Creates new image-based email attachments with PDFs.
- Intelligent OCR technology ensures document fidelity to an original source.
- OCRd documents are converted to text-searchable PDFs and using standard JPEG, JPEG2000, and JBIG2 formats.
- OCR’ed documents can be saved as a new version to the original document.
- Centralized administration dashboard for monitoring and reporting.
- Multi-language recognition of over 180 languages, including Asian and Arabic character sets.
- Error reporting for documents that are corrupt, PDFs with security passwords, and other issues. Administrators can view these documents for correction.
NetDocuments OCR uses the enormous power of the cloud to OCR my documents – why doesn’t it process all my documents almost instantly?
OCRing documents is a complex and processor-intensive function, taking around 1 to 5 seconds per page to OCR (depends on image quality, language, and other factors). A document is OCRd on one single core processor in the Azure cloud infrastructure, so a bigger document can take longer. NetDocuments OCR is designed to focus on the largest document throughput (which is the highest possible number of documents for processing for a given time period) rather than processing a single document in the fastest possible time; reducing overall document efficiency.
The NetDocuments OCR Backlog process searches for, assesses, and OCR and re-saves, if necessary, every document you have stored in NetDocuments. If NetDocuments was to try to do this in just a few hours or days, it could mean impacting the performance of your system while you are trying to open and save your current documents. To avoid this impact on NetDocuments system performance, the processing of the backlog of older documents is typically spread over a 6-month period, depending on the number of documents you have stored, but might occur much quicker than that, depending on your document volume.
I am a new 10 user NetDocuments customer with only one year’s volume of documents saved in NetDocuments. Why does it take NetDocuments OCR the same amount of time to process my document backlog as it takes a large 1,000 user firm with millions of documents?
NetDocuments OCR carefully averages its processing power on a per-user basis. So, a 1,000-user firm can process documents at a rate 100 times that of a 10-user firm based on NetDocuments estimates of the average number of documents stored by all users of NetDocuments globally. This ensures per user all firms have equal access to the system. If you have less than the global average number of documents per user stored in NetDocuments, your backlog process can complete more quickly than others.
How long does NetDocuments OCR take to process my document backlog?
NetDocuments OCR takes approximately 6-months to OCR your entire backlog of documents, regardless of the size of your organization. NetDocuments OCR allocates processing power based on the number of users you are licensed, the more users you have, the more processing power is allocated to you. A 100-user firm is allocated 10 times the processing power of a 10 user firm. By design, the processing timeframe, the process of interrogating, assessing, OCRing, and re-saving documents avoid impacting the performance of your day-to-day work.
NetDocuments OCR places new documents to have a priority over backlog documents with its Active Monitoring process.
How quick is a new document OCRd once saved to NetDocuments?
NetDocuments OCR searches once every minute for any new documents and immediately assesses them for processing if needed. Many document types that are already searchable, such as Word and Excel documents, need no processing. Documents requiring OCRing are placed on priority for processing ahead of Backlog process document queues, which is Active Monitoring. NetDocuments OCR aims to process documents within a few minutes to 1-hour but that depends on several factors, including the number of pages in the document, the number of documents that you have saved into NetDocuments in a very short amount of time, and how many other users in your firm are also saving documents at the same time. So, if you manually upload hundreds of very large documents at the one time, processing can take a little longer. But prepare to be surprised – the power of the NetDocuments OCR means processing can happen quickly even for large document volumes.
How do I track what documents have been updated by NetDocuments OCR?
NetDocuments makes minimal changes to a document including keeping the original date created and date modified when saving the OCRd version as a New Version. However, there are few simple things you can do to determine if the document has been processed.
- Try opening the PDF and find a word - the word you need should now be found in the document.
- If you check the document properties in the PDF application, you will see that the PDF producer is contentCrawler Cloud.
The NetDocuments OCR administrator portal will provide information on the quantity of documents processed. A report is available in NetDocuments that will provide the Doc IDs of those affected documents. In NetDocuments, go to Admin and select Request Activity Logs. Export a date-based report to XML. This can be loaded into Excel where a filter can be applied for the Save as new version activity and filter by the specific user account used for NetDocuments OCR. This will show you the documents processed and saved by NetDocuments OCR in this timeframe. For more detailed information on how to do this, see the NetDocuments OCR Administration Guide.
How do I track documents that failed to OCR
NetDocuments OCR will send a digest report to the administrator email address you specify in the NetDocuments OCR wizard on a weekly basis indicating to you the Doc IDs and Version numbers of those documents that failed to OCR and an indication of the reason for this. It will also provide a report of those documents that were assessed for processing, but it was determined that OCRing of the document was not required.
What could cause OCR to fail on certain documents?
Although the vast majority of documents will process correctly, you will most likely find some documents not processed – no harm is done to these documents – the original document remains in NetDocuments without alteration. Reasons for a document failing to process may include:
- A document is password protected.
- PDF document has a digital certificate (modifying this document would invalidate the certificate).
- Document contents do not match the specified document type (extension is PDF but it’s not a PDF).
- A document is unreadable or corrupted.
- A document is in use or checked out by another user, you cannot OCR.
What happens if a document is being used by a user – currently checked out?
NetDocuments OCR will automatically assess a document when it is saved into NetDocuments to determine if it is required to OCR. In some cases, immediately after a document is saved into NetDocuments, a user will check out the document for further editing. If NetDocuments OCR detects a document is checked out, it will not be taken for processing at that time.
What happens if I make changes to a document while it is being OCRd?
Occasionally, a document is in the process of being OCRd while a user has the document checked out for editing. This should not cause any issue to users - NetDocuments OCR will not prevent any document from being edited at any time. Once NetDocuments OCR has processed a document and is ready to save that document back into NetDocuments, it will first check that the document has not already been updated by a user since a copy of that document was obtained for OCRing. If that document has been modified, that specific OCR task will be abandoned, and the document will be requeued to be OCRd again. If the document in NetDocuments is not changed but is still checked out, NetDocuments OCR will retry saving the document leaving increasingly long periods of time between each attempt and finally giving up on that document after 10 days. If after 10 days of attempts the document is still checked out, the attempts will end and the document will be flagged as unable to OCR due to the document being checked out.
I have annotations in my PDF – are they retained?
It is quite common to annotate a PDF with comments, highlighting, freehand drawings, etc. NetDocuments OCR has the unique ability to OCR documents but will ensure that all these annotations remain as they were in fully editable format, so you can continue editing and adding further annotations.
My documents contain lots of graphics and pictures without text – what happens if OCRing finds no text – do I lose my graphics?
The process of OCRing will not impact in any way graphics that is contained in your document. NetDocuments OCR will attempt to find any words in graphics and overlay an invisible layer of text in the same location as the graphic version of the text. However, any original graphics remains completely unchanged and fully visible to the user. So, if your page or document contains only photographs that have no characters in them, no OCRd text will be added, and no changes made to that page or document.
Why am I required to configure NetDocuments OCR with a user account different from existing admin accounts?
NetDocuments requires that this is a separate account from any other user so it is available even after an individual user might leave an organization. It also makes it clear and keeps track of what documents NetDocuments OCR reviews and processes. This is useful for reporting and audit purposes.
Also, use the NetDocuments OCR Service Account to access the NetDocuments OCR dashboard.
What is the price for NetDocuments OCR?
Please contact your NetDocuments sales executive for pricing and a simple sign up process.
Is NetDocuments OCR available through all sales channels including partners?
Yes. It is a standard NetDocuments add-in and is available now in the following regions
- US (Vault)
- AU (Asia Pacific region)
The EU data centre is coming soon.
How does an existing customer of ContentCrawler migrate to NetDocuments OCR?
Contact your DocsCorp sales representative to transition your license from DocsCorp to NetDocuments. They will coordinate with the NetDocuments personnel for a smooth transition.
Who do I contact for more information?
Please see here for more information.