Table of Contents
Find information for NetDocuments OCR at NetDocuments OCR Admin Help.
NetDocuments OCR is an Optical Character Recognition (OCR) and image compression technology delivered as a secured cloud-to-cloud service without requiring any on-premises software installed.
NetDocuments OCR is a background process that continually monitors files imported either manually by users or mass imported into the customer’s NetDocuments repository.
NetDocuments OCR converts imaged-based documents and image files such as TIFF, BMP, PNG, and JPG, into text-searchable documents so the built-in NetDocuments full-text searching capability can make the documents accessible to all appropriate users. It also converts PDF documents where the content is like a photograph of the page e.g. scanned documents. The output of the file is a searchable PDF added as the official version to the original copy. It will maintain the original modify date.
1. What technology does NetDocuments OCR use?
Since August 1, 2021, NetDocuments has provided optical character recognition (“OCR”) services directly for newly added documents and no longer uses DocsCorp Pty Ltd. as a subprocessor for its ndOCR Services.
NOTE: As an existing customer, there is no change to your current billing cycle or license fees. Customers continue to enjoy fast, high-quality background OCR processing of content.
2. What needs to be installed for the customer to use NetDocuments OCR?
There is no hardware or software installation required. The NetDocuments OCR service functions within NetDocuments’ controlled servers and infrastructure and employs all technical, administrative, and security safeguards in place for the rest of our Service.
3. What are the OCR options available?
Two services that customers can subscribe to and enable; in most cases, customers need both:
- Backlog/Bulk Upload. Crawls through the entire repository and processes the appropriate documents, which applies to new or existing NetDocuments customers.
- Active Monitoring. Automatically processes documents/images as they are added or imported into the NetDocuments repository either manually or in bulk.
4. What are the supported OCR features?
NetDocuments OCR provides the following features:
- The OCR engine is Tesseract.
- Supports the following file formats: TIFF, BMP, PNG, JPG, and PDF.
- Creates new image-based email attachments with PDFs.
- Intelligent OCR technology ensures document fidelity to a source.
- OCRd documents are converted to text-searchable PDFs, using standard JPEG, JPEG2000, and JBIG2 formats.
- OCRd documents can be saved as a new version to the original document or can overwrite the current version.
5. Why does not NetDocuments OCR process all my documents almost instantly?
OCRing documents is a complex and processor-intensive function, taking around 1 to 5 seconds per page to OCR (depending on image quality, language, and other factors). NetDocuments OCR is designed to focus on the largest document throughput (the highest possible number of documents to be processed in a given time period) rather than processing a single document in the fastest possible time (but causing overall document efficiency to be reduced).
The NetDocuments OCR Backlog process will search for, assess, OCR, and, if necessary, save every document you have stored in NetDocuments. If NetDocuments was to try to do this in just a few hours or days, it could mean impacting the performance of your system while you are trying to open and save your current documents. To avoid this impact on NetDocuments system performance, processing a backlog of older documents is spread out over time. See the processing document backlog.
6. How long will NetDocuments OCR process my document backlog?
Depending on the size of your backlog, the time taken to process it can vary between a few weeks to a few months.
7. How quickly will a new document be OCRd once I have saved it into NetDocuments?
NetDocuments OCR searches after every minute for any new documents that have been saved and will assess them for processing if required.
NOTE: Many document types such as Word and Excel documents need no processing.
New documents requiring OCRing will be prioritized for processing ahead of Backlog Process document queues. This is referred to as Active Monitoring. NetDocuments OCR aims to process documents within a few minutes to one hour, but that depends on several factors:
- The number of pages in the document;
- The number of documents that you have saved into NetDocuments in a very short amount of time;
- How many other users in your firm are also saving documents at the same time.
So, if you manually upload hundreds of extensive documents at one time, processing may take a little longer. However, prepare to be surprised – the power of the NetDocuments OCR means processing can happen quickly, even for large document volumes.
8. What could cause documents not to be OCRd?
Although the vast majority of documents will process correctly, you will most likely find some documents not processed – no harm is done to these documents – the original document remains in NetDocuments without alteration. Reasons for a document failing to process may include:
- A document is password protected.
- PDF document has a digital certificate (modifying this document would invalidate the certificate).
- Document contents do not match the specified document type (extension is PDF, but it’s not a PDF).
- A document is unreadable or corrupted.
- A document is in use or checked out by another user, and you cannot OCR.
9. What happens if a document is currently checked out and being used?
NetDocuments OCR automatically assesses a document when saved into NetDocuments to determine if it is required to be OCRd. In some cases, immediately after a document is saved into NetDocuments, a user checks out the document for further editing. If NetDocuments OCR detects a document is checked out, it will not be taken for processing at that time.
10. What happens if I edit a document while it is being OCRd?
Occasionally, a document is being OCRd while a user has the document checked out for editing. This should not cause any issue to users, as NetDocuments OCR does not prevent any document from being edited at any time. After NetDocuments OCR processed a document and is ready to save it back into NetDocuments, it first checks that a user has not already updated the document since a copy of that document was obtained for OCRing.
If that document has been modified, that specific OCR task will be abandoned, and the official version of the document will be re-queued to be OCRd again.
If the document in NetDocuments has not been changed but is still checked out, NetDocuments OCR will retry, saving the document leaving increasingly long periods of time between each attempt, and finally giving up on that document after a certain amount of attempts. If, after several attempts, the document is still checked out, the attempts will end, and the document will be flagged as unable to be saved due to the document being checked out.
11. I have annotations in my PDF – are they retained?
It is quite common to annotate a PDF with comments, highlighting, freehand drawings, etc. NetDocuments OCR has the unique ability to OCR and compress documents but ensures that all these annotations remain as they were in fully editable format, so you can continue to edit and add further annotations.
12. My documents contain lots of graphics and pictures without text – what happens if OCRing finds no text – do I lose my graphics?
The process of OCRing will not impact in any way graphics contained in your document. NetDocuments OCR attempts to find any words in graphics and overlay an invisible layer of text in the same location as the graphic version of the text. However, any original graphics remain completely unchanged and fully visible to the user. So, if your page or document contains only photographs that have no characters in them, no OCR text will be added, and no changes will be made to that page or document.
13. What is the price for NetDocuments OCR?
Please contact your NetDocuments sales executive for pricing and a simple sign-up process.
14. Is NetDocuments OCR available through all sales channels, including partners?
Yes. It is a standard NetDocuments add-in and is available now in the following regions:
- US (Vault)
- AU (Asia Pacific region)
- UK (United Kingdom)
- DE (Germany)