Desktop and Mobile Systems Support

Adobe - Optical Character Recognition & Exporting to Excel

When you create an Acrobat Document (PDF) from Word, the actual text of what was written is stored in that document.  This allows the user to select text so it can copied and pasted into another document.  It also allows newer versions of Microsoft Word to open that file so you can edit it.  (Side note: While this works, Word doesn’t do a great job of preserving the formatting).  However, when paper documents are scanned into a computer, the PDF file is created using an image, without the actual text. 

Without the embedded text, the you can’t copy and paste, you can’t open the file in Word for editing and, perhaps most importantly, the document can’t be searched.  Fortunately, Acrobat can analyze the picture, recognize the text and add it to the document. 

Here’s how…..

We’ll use a sample document that was printed out and scanned in.  The PDF document has been embedded so you can try it yourself. 

pdfadobe-sample.pdf

Open the file and you’ll see the chart of numbers below.  If you try to select the text you’ll find that you can’t…..