This article describes how you can edit the text of a digital document created by scanning its hard copy. The technology that deals with the transformation of the image of a scanned text document into a real editable content is called OCR, from the English "Optical Character Recognition". In order to extract the text contained in a scanned document and make it editable, you can use the "New OCR" website (however in this case any information relating to the formatting will be lost). If you need to process advanced PDF files, you can take advantage of the features of the "Online OCR" web service (however, you need to create a specific account first).
Steps
Method 1 of 2: Use the New OCR website
Step 1. Scan the document to create a PDF
This step is very important because many OCR services are optimized for processing PDF files and not images (for example TIFF).
If possible, try to create a black and white scan of the document in question and not a color one. In this way the OCR software will be able to recognize the characters of the text more easily and efficiently
Step 2. Log in to the New OCR website with your favorite browser
Using this web service you will be able to automatically convert the digital version of a scanned document into a real editable text file.
Step 3. Click the Choose File button
It is gray in color and is positioned at the top of the page. This will bring up the "Windows Explorer" (on Windows systems) or Finder (on Mac) system window.
Step 4. Select the PDF file to be processed
This is the document generated by scanning the paper one.
In order to locate the correct PDF file you may need to first select the folder that contains it using the bar on the left of the dialog box
Step 5. Click the Open button
It is located in the lower right corner of the window. This way the PDF file will be uploaded to the website server.
Step 6. Press the Upload + OCR button
It is visible at the bottom of the page. The PDF file will be imported and converted into an actual text document.
Step 7. Scroll down the page to select the Download option
It is located on the left side of the screen. A small drop-down menu will appear.
Step 8. Choose Microsoft Word (DOC)
It is one of the options in the menu that appeared. This way the content of the PDF file will be downloaded to your computer as a Microsoft Word document.
If you don't have Microsoft Word installed on your computer, you can download the TXT version of the file by choosing the option Plain text (TXT) from the same drop-down menu. You can then make the necessary changes using the "Notepad" program (on Windows systems) or TextEdit (on Mac).
Step 9. Edit the text document you just downloaded
Double-click the Word file to open it in the text editor of the same name created by Microsoft. Now proceed to examine and edit the text resulting from the processing of the original PDF file.
- Some portions of the text may be impossible to edit due to errors while converting the original PDF file.
- Before you can start proofreading the text in your document, you may need to press the button Enable editing, which you find at the top of the Word window.
Step 10. Save the Word document in PDF format after editing is complete
Follow these instructions:
- Windows systems: access the menu File, choose the option Save with name, select the "Word Document" drop-down menu, choose the option PDF and finally press the button Save.
- Mac: access the menu File, choose the option Save with name, type the name you want to assign to the file, click the "Format" field, select the item PDF, then press the button Save.
Method 2 of 2: Use the Online OCR Website
Step 1. Scan the document to create a PDF
This step is very important because many OCR services are optimized for processing PDF files and not images (for example TIFF).
If possible, try to create a black and white scan of the document in question and not a color one. In this way the OCR software will be able to recognize the characters of the text more easily and efficiently
Step 2. Log in to the Online OCR website
Using this web service you will be able to automatically convert the digital version of a scanned document into a real editable text file, while retaining the elements of the original formatting. The Online OCR website allows you to convert only the first 50 pages of a document for free.
Step 3. Click on the SIGN UP link
It is located in the upper right corner of the page. This will take you to the registration screen for a new user account.
Step 4. Create an account
Creating a user profile on the Online OCR site is completely free and allows you to simultaneously edit multiple pages of the same PDF file. To create an account you will need to provide the following information:
- Username: enter the name you want to assign to your account using the "Username" text field;
- Password: Type the security password that will protect access to the profile. Use the "Password" and "Confirm password" text fields;
- E-mail address: enter your e-mail address in the "E-Mail" text field;
- Captcha code: type the sequence of numbers that appeared in the appropriate box in the "Enter Captcha code" text field.
Step 5. Click on the Sign Up button
It is green in color and located at the bottom of the page. This will create a new account, to access the Online OCR site, based on the information provided.
Step 6. Log in to your profile
Click on the link LOGIN located in the upper right corner of the page, enter your username and password and press the green button Log in. You will be redirected to your Dashboard, where you can configure the settings for the conversion of the PDF file in question.
Step 7. Select a language
This is the language in which the text in the PDF file was written. Use the box on the left of the page.
For example, if the original PDF is written in Italian, you will have to choose the option Italian.
Step 8. Select the "Microsoft Word (docx)" check button
It is visible in the "Output formats" column of the "Step 1" section of the page.
Step 9. Select the "All Pages" check button
It is located in the "Multipage document" column of the "Step 1" section of the page.
Step 10. Click the Select file… button
It is blue in color and is located within the "Step 2" section of the page. A dialog box will appear.
Step 11. Select the PDF file to be processed
Simply click the icon of the file obtained from the scan of the original paper document.
In order to locate the correct PDF file you may need to first select the folder that contains it using the bar on the left of the dialog box
Step 12. Click the Open button
It is located in the lower right corner of the window. This way the PDF file will be uploaded to the website server. When the progress bar located to the right of the button Select file … will reach 100% you can continue further.
Step 13. Click the CONVERT button
It is blue in color and is located within the "Step 3" section of the page. When the Online OCR website has completed the conversion of the chosen file you will be redirected to the download page.
Step 14. Select the Word document name
At the bottom of the page you will see a blue link for the name of the file created by the conversion procedure. Selecting it you can download the text document directly to your computer.
Step 15. Review and edit the text-converted version of the original PDF file
Double-click the Word file you just downloaded to open it in the text editor of the same name created by Microsoft. At this point you will be able to make any changes you want to its content.
- Some portions of the text may be impossible to edit due to errors while converting the original PDF file.
- You may need to press the button before you can start proofreading the text in your document Enable editing, which you find at the top of the Word window.
Step 16. Save the Word document in PDF format after editing is complete
Follow these instructions:
- Windows systems: access the menu File, choose the option Save with name, select the "Word Document" drop-down menu, choose the option PDF and finally press the button Save.
- Mac: access the menu File, choose the option Save with name, type the name you want to assign to the file, click the "Format" field, select the item PDF, then press the button Save.