Table of Contents
Find information for NetDocuments OCR at NetDocuments OCR Admin Help.
NetDocuments OCR is an Optical Character Recognition (OCR) and image compression technology delivered as a secured cloud-to-cloud service without requiring any on-premises software installed.
NetDocuments OCR is a background process that continually monitors files imported either manually by users or mass imported into the customer’s NetDocuments repository.
NetDocuments OCR converts imaged-based documents and image files such as TIFF, BMP, PNG, and JPG, into text-searchable documents so the built-in NetDocuments full-text searching capability can make the documents accessible to all appropriate users. It also converts PDF documents where the content is like a photograph of the page e.g. scanned documents. The output of the file is a searchable PDF added as the official version to the original copy. It will maintain the original modify date.
1. What technology does NetDocuments OCR use?
Since August 1, 2021, NetDocuments has provided optical character recognition (“OCR”) services directly for newly added documents and no longer uses DocsCorp Pty Ltd. as a subprocessor for its ndOCR Services.
NOTE: As an existing customer, there is no change to your current billing cycle or license fees. Customers continue to enjoy fast, high-quality background OCR processing of content.
2. What needs to be installed for the customer to use NetDocuments OCR?
There is no hardware or software installation required. The NetDocuments OCR service functions within NetDocuments’ controlled servers and infrastructure and employs all technical, administrative, and security safeguards in place for the rest of our Service.
3. What are the OCR options available?
Two services that customers can subscribe to and enable; in most cases, customers need both:
- Backlog/Bulk Upload. Crawls through the entire repository and processes the appropriate documents, which applies to new or existing NetDocuments customers just adding OCR to their existing repository, or a customer, due to a single event, increases its user count by 10 percent or more. A merger is such an example.
- Active Monitoring. Automatically processes documents/images as they are added or imported into the NetDocuments repository either manually or in bulk.
4. What are the supported OCR features?
NetDocuments OCR provides the following features:
- The OCR engine is Abbyy FineReader 11+.
- Supports the following file formats: TIFF, BMP, PNG, JPG, PDF, EML, and MSG.
- Creates new image-based email attachments with PDFs.
- Intelligent OCR technology ensures document fidelity to a source.
- OCRd documents are converted to text-searchable PDFs and using standard JPEG, JPEG2000, and JBIG2 formats.
- OCR’ed documents can be saved as a new version to the original document or can overwrite the current version.
- Centralized administration dashboard for monitoring and reporting.
- Multi-language recognition of over 180 languages, including Asian and Arabic character sets.
- Error reporting for corrupt documents, PDFs with security passwords, and other issues. Administrators can view these documents for correction.
5. Why does not NetDocuments OCR process all my documents almost instantly?
OCRing and compressing documents is a complex and processor-intensive function, taking around 1 to 5
seconds per page to OCR (depending on image quality, language, and other factors). NetDocuments OCR is designed to focus on the largest document throughput (the highest possible number of documents to be processed in a given time period) rather than processing a single document in the fastest possible time (but causing overall document efficiency to be reduced).
The NetDocuments OCR Backlog process will search for, assess, OCR/compress, and, if necessary, save every document you have stored in NetDocuments. If NetDocuments was to try to do this in just a few hours or days, it could mean impacting the performance of your system while you are trying to open and save your current documents. To avoid this impact on NetDocuments system performance, processing a backlog of older documents is spread out over time. See the processing document backlog.
6. I am a new 10-user NetDocuments customer with one year’s volume of saved documents – why will it take NetDocuments OCR the same amount of time to process my document backlog as it will a large 1,000 user firm with millions of documents?
NetDocuments OCR carefully averages its processing power on a per-user basis. So, a 1,000-user firm can process documents at a rate 100 times that of a 10-user firm based on NetDocuments estimates of the average number of documents stored by all users of NetDocuments globally. This ensures per user, all firms have equal access to the system. If you have less than the global average number of documents per user stored in NetDocuments, your backlog process can complete more quickly than others.
7. How long will NetDocuments OCR process my document backlog?
Depending on the size of your backlog, the time taken to process it can vary between a few weeks to a few months.
8. How quickly will a new document be OCRd once I have saved it into NetDocuments?
NetDocuments OCR searches after every minute for any new documents that have been saved and will immediately assess them for processing if required.
NOTE: Many document types such as Word and Excel documents need no processing.
Documents requiring OCRing will be prioritized for processing ahead of Backlog Process document queues. This is referred to as Active Monitoring. NetDocuments OCR aims to process documents within a few minutes to one hour, but that depends on several factors:
- the number of pages in the document;
- the number of documents that you have saved into NetDocuments in a very short amount of time;
- how many other users in your firm are also saving documents at the same time.
So, if you manually upload hundreds of extensive documents at one time, processing may take a little longer. However, prepare to be surprised – the power of the NetDocuments OCR means processing can happen quickly, even for large document volumes.
9. What could cause documents not to be OCRd?
Although the vast majority of documents will process correctly, you will most likely find some documents not processed – no harm is done to these documents – the original document remains in NetDocuments without alteration. Reasons for a document failing to process may include:
- A document is password protected.
- PDF document has a digital certificate (modifying this document would invalidate the certificate).
- Document contents do not match the specified document type (extension is PDF, but it’s not a PDF).
- A document is unreadable or corrupted.
- A document is in use or checked out by another user, and you cannot OCR.
10. What happens if a document is currently checked out and being used?
NetDocuments OCR automatically assesses a document when saved into NetDocuments to determine if it is required to be OCRd. In some cases, immediately after a document is saved into NetDocuments, a user checks out the document for further editing. If NetDocuments OCR detects a document is checked out, it will not be taken for processing at that time.
11. What happens if I edit a document while it is being OCRd?
Occasionally, a document is being OCRd while a user has the document checked out for editing. This should not cause any issue to users, as NetDocuments OCR does not prevent any document from being edited at any time. After NetDocuments OCR processed a document and is ready to save it back into NetDocuments, it first checks that a user has not already updated the document since a copy of that document was obtained for OCRing.
If that document has been modified, that specific OCR task will be abandoned, and the official version of the document will be re-queued to be OCRd and compressed again.
If the document in NetDocuments has not been changed but is still checked out, NetDocuments OCR will retry, saving the document leaving increasingly long periods of time between each attempt, and finally giving up on that document after the tenth attempt. If, after 10 attempts, the document is still checked out, the attempts will end, and the document will be flagged as unable to be saved due to the document being checked out.
12. I have annotations in my PDF – are they retained?
It is quite common to annotate a PDF with comments, highlighting, freehand drawings, etc. NetDocuments OCR has the unique ability to OCR and compress documents but ensures that all these annotations remain as they were in fully editable format, so you can continue to edit and add further annotations.
13. My documents contain lots of graphics and pictures without text – what happens if OCRing finds no text – do I lose my graphics?
The process of OCRing will not impact in any way graphics contained in your document. NetDocuments OCR attempts to find any words in graphics and overlay an invisible layer of text in the same location as the graphic version of the text. However, any original graphics remain completely unchanged and fully visible to the user. So, if your page or document contains only photographs that have no characters in them, no OCR text will be added, and no changes will be made to that page or document.
14. Why am I asked to configure NetDocuments OCR with a user account different from existing admin accounts?
This makes it clearer within NetDocuments what documents have been reviewed and processed by NetDocuments OCR. This is useful for reporting and audit purposes.
15. What is the price for NetDocuments OCR?
Please contact your NetDocuments sales executive for pricing and a simple sign-up process.
16. Is NetDocuments OCR available through all sales channels, including partners?
Yes. It is a standard NetDocuments add-in and is available now in the following regions:
- US (Vault)
- AU (Asia Pacific region)
- UK (United Kingdom)
- DE (Germany)
17. Who do I contact for more information?
18. How many languages NetDocuments OCR supports?
NetDocuments OCR supports 185 languages. See below the list of supported languages.
Note: You can choose up to 16 languages to be found in your documents.