1-800-777-8444
 Home   Store   My Account   Track Order   View Cart   Login 

Cart Contents
Items in Cart: 0
Subtotal: $ 0
[View Cart] [Check Out]

Printer Support - Help with troubleshooting laser and inkjet printers
    Clean Planet Program Clean Planet Program
    Contact Us Contact Us
    Credit Application Credit Application
    Drivers & Downloads Drivers & Downloads
    Error Codes Error Codes
    How To How To
    Knowledge Base Knowledge Base
    Laserquipt Policies Laserquipt Policies
    MSDS MSDS
    News Releases News Releases
    Hewlett Packard Hewlett Packard
    2005 Archive 2005 Archive
    July, 2005 July, 2005
    June, 2005 June, 2005
    May, 2005 May, 2005
    November, 2005 November, 2005
    October, 2005 October, 2005
    2006 Archive 2006 Archive
    Konica Minolta Konica Minolta
    Lexmark Lexmark
    Oki Data Oki Data
    Xerox Xerox
    Print Quality Issues Print Quality Issues
    Troubleshooting Troubleshooting

    Multifunctions
    Printer Accessories
    Printer Parts
    Printers
    Ribbon Supplies
    Laser Supplies
    InkJet Supplies
    Media
    Clearance
    Advanced Search
    Support

Browsing In Home > Support > News Releases > Hewlett Packard > 2005 Archive > May, 2005 > Scanning text and documents: reliable OCR Glossary of printer troubleshooting terms Glossary Contact Us


Scanning text and documents: reliable OCR

Documents are a little simpler to scan than pictures are. Regardless, you should answer this question before scanning your document: What do you intend to do with it? Once again, how you plan to use the document after it's scanned determines how you should scan it.

You have two choices for scanning text and documents. You can scan either as an image and leave it as an image, or you can scan the document for OCR (optical character recognition) processing. In most cases, scanning a document or text to leave it as an image makes sense only if your only purpose is to store the scan. If that's the case and you're certain you'll never need to edit the document, no harm will be done by scanning and storing the document as an image at 150 dpi. This should give you enough clarity to view or print the document should you need to.

However, if you're planning to scan documents and then make changes to the text, scanning to OCR makes more sense . OCR is software that converts scanned text into text that you can edit in a word processing program.

Reliable OCR results

Scanning for OCR is generally a much simpler exercise than image scanning. File format choice, resolution, and color settings are more straightforward. Pay close attention to a few details, and you'll do just fine.

There are two ways to scan for OCR. The first involves using a standard scanning utility to capture an image, and then processing it with an OCR application. The second involves using OCR software to handle every step of the process, from scanning to processing. Because all OCR applications interact differently with scanners, this OCR discussion assumes that you're not scanning from within the OCR application itself, and that you're preparing files for eventual OCR processing.

If you're using your OCR software to access the scanner, some of this general advice still applies, but other details may not. Consult the user guide or online help for your OCR software. Your best bet is to do a little experimenting at the start, perfect the settings, and go from there. Your results are bound to vary.

Let's summarize the preferred settings for OCR scanning. Check with your OCR software's documentation, but the following guidelines should apply.

Format

Various scanning packages can handle different formats. In general, you should select TIFF, BMP, or JPEG, if they're supported. Although the TIFF format offers little compression, files destined for OCR can be efficiently sized if you use the proper color and resolution settings. You might also consider BMP files if you're working on a Microsoft Windows system and are low on disk space.

Resolution

Because you're actually preparing a file for input when you scan for OCR processing (not output at this point), you have a bit more flexibility in your resolution settings. Consider a lower resolution: 150 dpi is good. This speeds up the process and makes the files easier to work with. Check your OCR documentation for other specifics.

Color

OCR scanning has no real need for color, since you're converting to text. Even if your OCR application supports recognizing color characters, it would probably be more trouble than it's worth. Scan in 256 shades of gray (8-bit) or black and white.

Tip: Especially if you're scanning documents that are second- or third-generation copies, be on the lookout for stray marks on the page you're scanning. The OCR software will most likely attempt to read anything it finds as a letter or character, and that can lead to trouble. Use whiteout to cover any stray marks or smudges on the paper you're scanning.

Bad text, worse results

OCR software has come a long way since its clumsy beginnings, but it still simply scans and converts text as is and can't anticipate something based on context or correct obvious misspellings. It relies on the quality of the source file, relevant configurations, and your constant oversight.

If you're planning to use OCR in the workplace, always factor in some review time for any text that's processed via OCR. It's important to do so, and a simple spell check won't suffice. Depending on the amount of text and its condition, OCR might not be a good solution. A document might require such extensive editing that it's cheaper to have it rekeyed by someone with good typing skills. That's the crucial equation in OCR scanning; time saved by scanning versus time spent double-checking the output text for accuracy.











Related Articles
article The printed output contains garbled, jumbled, or illogical text; or the printed output is incomplete, or is missing text or graphics.
Solution one: Print a test page and check the...
(No rating)  10-6-2009    Views: 899   
article The nuts and bolts and tools: scanning hardware and software
There are several types of scanners, and...
(No rating)  5-25-2005    Views: 791   
article Archiving scanned documents and images
You probably amassed a number of scans while...
(No rating)  5-25-2005    Views: 431   
article Xerox Scientists Apply Insights From Ethnography to Develop New Way to Categorize Documents
ROCHESTER, N.Y., July 12, 2005  -- Employing...
(No rating)  7-12-2005    Views: 545   
article Horizontal smearing across the page with parts of graphics or text omitted
The printer will Intermittently print...
(No rating)  10-19-2005    Views: 1725   






User Comments (0) (None) EMAIL | PRINT
      No comments have been posted.

Article 141
Created 5-25-2005
Modified 5-25-2005
Author Ken
Rating (None)
Rate It!








.: Powered by Lore 1.5.4
Home  |   Privacy  |   Terms  |  
Copyright © 2003 LaserQuipt.com