- What technology does NetDocuments OCR use?
- What needs to be installed to use NetDocuments OCR?
- What are the OCR options available?
- What are the supported OCR features?
- Why does not OCR process all my documents almost instantly?
- Why does OCR process the different volume of documents at the same amount of time?
- How long will NetDocuments OCR process my document backlog?
- How quickly will a new document be OCRd?
- How do I track what documents have been updated?
- What could cause documents not to be OCRd?
- What happens if a document is currently checked out and being used?
- What happens if I edit a document while it is being OCRd?
- I have annotations in my PDF – are they retained?
- Do I lose my graphics and pictures?
- Why do I need to configure NetDocuments OCR with a different user account?
- What is the price for NetDocuments OCR?
- Is NetDocuments OCR available through all sales channels?
- How does a ContentCrawler customer migrate to NetDocuments OCR?
- Who do I contact for more information?
- How many languages NetDocuments OCR supports?
Find information for NetDocuments OCR at NetDocuments OCR Admin Help.
NetDocuments OCR is an Optical Character Recognition (OCR) and image compression technology delivered as a secured cloud-to-cloud service without requiring any on-premises software installed.
NetDocuments OCR is a background process that continually monitors files imported either manually by users or mass imported into the customer’s NetDocuments repository.
NetDocuments OCR converts imaged-based documents and image files such as TIFF into text-searchable documents so the built-in NetDocuments full-text searching capability can make the documents accessible to all appropriate users. The output of the file is a searchable PDF added as the official version to the original copy. It will maintain the original modify date.
1. What technology does NetDocuments OCR use?
NetDocuments OCR is powered by ContentCrawler, a proven DocsCorp (www.docscorp.com) technology used by many law firms globally. NetDocuments OCR provides a unique and exclusive service to NetDocuments customers by delivering on its Cloud First, Cloud Only mantra. This aligns with the NetDocuments Trusted Cloud Platform Compute Fabric strategy encouraging the industry to migrate client and server technologies to the cloud and remove the burdens historically placed on the customer to maintain the installation and testing of software upgrades and compatibility challenges so often associated with legacy DM technologies.
2. What needs to be installed for the customer to use NetDocuments OCR?
There is no hardware or software installation required. The background process runs on the DocsCorp Cloud Platform, which is hosted on Microsoft Azure. It securely connects to the NetDocuments Cloud Platform. Read more on Azure compliance. This implementation is a strategic integration where both company developments teams worked together to offer an efficient service.
3. What are the OCR options available?
Two services that customer can subscribe to and enable; in most cases, customers need both:
- Backlog/Bulk Upload (initial one-year minimum commitment). Crawls through the entire repository and processes the appropriate documents, which applies to new or existing NetDocuments customers just adding OCR to their existing repository, or a customer, due to a single event, increases its user count by 10 percent or more. A merger is such an example.
- Active Monitoring (main service after the initial year). Automatically processes documents/images as they are added or imported into the NetDocuments repository either manually or in bulk.
4. What are the supported OCR features?
NetDocuments OCR provides the following features:
- The OCR engine is Abbyy FineReader 11+.
- Supports the following file formats: TIFF, JPG, PNG, PDF, and MSG.
- Creates new image-based email attachments with PDFs.
- Intelligent OCR technology ensures document fidelity to an original source.
- OCRd documents are converted to text-searchable PDFs and using standard JPEG, JPEG2000, and JBIG2 formats.
- OCR’ed documents can be saved as a new version to the original document.
- Centralized administration dashboard for monitoring and reporting.
- Multi-language recognition of over 180 languages, including Asian and Arabic character sets.
- Error reporting for documents that are corrupt, PDFs with security passwords, and other issues. Administrators can view these documents for correction.
5. NetDocuments OCR uses the cloud’s enormous power to OCR and compress documents – why does not it process all my documents almost instantly?
OCRing and compressing documents is a complex and processor-intensive function, taking around 1 to 5
seconds per page to OCR (depending on image quality, language, and other factors). A document is OCRd on one single-core processor in the Azure cloud infrastructure, so a bigger document will take longer. NetDocuments OCR is designed to focus on the largest document throughput (the highest possible number of documents to be processed in a given time period) rather than processing a single document in the fastest possible time (but causing overall document efficiency to be reduced).
The NetDocuments OCR Backlog process will search for, assess, OCR/compress and, if necessary, save every document you have stored in NetDocuments. If NetDocuments was to try to do this in just a few hours or days, it could mean impacting the performance of your system while you are trying to open and save your current documents. To avoid this impact on NetDocuments system performance, processing a backlog of older documents is spread out over time. See the processing document backlog.
6. I am a new 10-user NetDocuments customer with one year’s volume of saved documents – why will it take NetDocuments OCR the same amount of time to process my document backlog as it will a large 1,000 user firm with millions of documents?
NetDocuments OCR carefully averages its processing power on a per-user basis. So, a 1,000-user firm can process documents at a rate 100 times that of a 10-user firm based on NetDocuments estimates of the average number of documents stored by all users of NetDocuments globally. This ensures per user all firms have equal access to the system. If you have less than the global average number of documents per user stored in NetDocuments, your backlog process can complete more quickly than others.
7. How long will NetDocuments OCR process my document backlog?
NetDocuments OCR allocates processing power based on your number of licensed users. The more users you have, the more processing power is allocated to you. A 100-user firm is allocated 10 times the processing power of a 10-user firm. For a site that has under 10,000 backlog documents ‘per user’, the backlog process takes approximately 6 months to complete, regardless of the size of your organization. The time taken may be longer for sites with a higher count than 10,000 backlog documents ‘per user’. (Note: Only PDF, MSG, and image document types are counted as backlog documents ‘per user’ for this purpose).
8. How quickly will a new document be OCRd once I have saved it into NetDocuments?
NetDocuments OCR searches after every minute for any new documents that have been saved and will immediately assess them for processing if required. (Many document types such as Word and Excel documents need no processing.) Documents requiring OCRing will be prioritized for processing ahead of Backlog Process document queues. This is referred to as Active Monitoring. NetDocuments OCR aims to process documents within a few minutes to 1 hour, but that depends on several factors including the number of pages in the document, the number of documents that you have saved into NetDocuments in a very short amount of time, and how many other users in your firm are also saving documents at the same time. So, if you manually upload hundreds of very large documents at the one time, processing may take a little longer. However, prepare to be surprised – the power of the NetDocuments OCR cloud means processing can happen quickly even for large document volumes.
9. How do I track what documents have been updated by NetDocuments OCR?
NetDocuments makes minimal changes to a document including keeping the modified date and the modified user when saving the OCRd document as a New Version. However, there are a few simple things a user can do to determine if the document has been processed. Firstly, try opening the PDF and find a word - the word you need should now be found in the document. Secondly, if you check the document properties in the PDF application, you should see that the PDF producer is contentCrawler cloud.
The NetDocuments OCR administrator portal will provide information on the number of documents processed. A report is available in NetDocuments that will provide the Doc IDs of those affected documents. In NetDocuments, go to Admin and select Request Activity Logs. Export a date-based report to XML. This can be loaded into Excel where a filter can be applied for the Save as new version activity and filter by the specific user account used for NetDocuments OCR. This will show you the documents processed and saved by NetDocuments OCR in this timeframe. For more detailed information on how to do this, see the NetDocuments OCR Administration Guide.
10. What could cause documents not to be OCRd?
Although the vast majority of documents will process correctly, you will most likely find some documents not processed – no harm is done to these documents – the original document remains in NetDocuments without alteration. Reasons for a document failing to process may include:
- A document is password protected.
- PDF document has a digital certificate (modifying this document would invalidate the certificate).
- Document contents do not match the specified document type (extension is PDF but it’s not a PDF).
- A document is unreadable or corrupted.
- A document is in use or checked out by another user, you cannot OCR.
11. What happens if a document is currently checked out and being used?
NetDocuments OCR automatically assesses a document when it is saved into NetDocuments to determine if it is required to be OCRd. In some cases, immediately after a document is saved into NetDocuments, a user checks out the document for further editing. If NetDocuments OCR detects a document is checked out, it will not be taken for processing at that time.
12. What happens if I edit a document while it is being OCRd?
It can occasionally occur that a document is being OCRd while a user has the document checked out for editing. This should not cause any issue to users, as NetDocuments OCR does not prevent any document from being edited at any time. After NetDocuments OCR processed a document and is ready to save that document back into NetDocuments, it first checks that the document has not already been updated by a user since a copy of that document was obtained for OCRing.
If that document has been modified, that specific OCR task will be abandoned, and the official version of the document will be re-queued to be OCRd and compressed again.
If the document in NetDocuments has not been changed but is still checked out, NetDocuments OCR will retry saving the document leaving increasingly long periods of time between each attempt and finally giving up on that document after the tenth attempt. If after 10 attempts the document is still checked out, the attempts will end, and the document will be flagged as unable to be saved due to the document being checked out.
13. I have annotations in my PDF – are they retained?
It is quite common to annotate a PDF with comments, highlighting, freehand drawings, etc. NetDocuments OCR has the unique ability to OCR and compress documents but ensures that all these annotations remain as they were in fully editable format, so you can continue to edit and add further annotations.
14. My documents contain lots of graphics and pictures without text – what happens if OCRing finds no text – do I lose my graphics?
The process of OCRing will not impact in any way graphics contained in your document. NetDocuments OCR attempts to find any words in graphics and overlay an invisible layer of text in the same location as the graphic version of the text. However, any original graphics remain completely unchanged and fully visible to the user. So, if your page or document contains only photographs that have no characters in them, no OCR text will be added, and no changes will be made to that page or document.
15. Why am I asked to configure NetDocuments OCR with a user account different from existing admin accounts?
This makes it clearer within NetDocuments what documents have been reviewed and processed by NetDocuments OCR. This is useful for reporting and audit purposes.
16. What is the price for NetDocuments OCR?
Please contact your NetDocuments sales executive for pricing and a simple sign up process.
17. Is NetDocuments OCR available through all sales channels including partners?
Yes. It is a standard NetDocuments add-in and is available now in the following regions
- US (Vault)
- AU (Asia Pacific region)
- EU (European region)
18. How does an existing customer of ContentCrawler migrate to NetDocuments OCR?
Contact your DocsCorp sales representative to transition your license from DocsCorp to NetDocuments. They will coordinate with the NetDocuments personnel for a smooth transition.
19. Who do I contact for more information?
20. How many languages NetDocuments OCR supports?
NetDocuments OCR supports 185 languages. See below the list of supported languages.
Note: You can choose up to 16 languages to be found in your documents.