|
Documents are a little simpler to scan than pictures are. Regardless, you should answer this question before scanning your document: What do you intend to do with it? Once again, how you plan to use the document after it's scanned determines how you should scan it. You have two choices for scanning text and documents. You can scan either as an image and leave it as an image, or you can scan the document for OCR (optical character recognition) processing. In most cases, scanning a document or text to leave it as an image makes sense only if your only purpose is to store the scan. If that's the case and you're certain you'll never need to edit the document, no harm will be done by scanning and storing the document as an image at 150 dpi. This should give you enough clarity to view or print the document should you need to. However, if you're planning to scan documents and then make changes to the text, scanning to OCR makes more sense . OCR is software that converts scanned text into text that you can edit in a word processing program. Reliable OCR results Scanning for OCR is generally a much simpler exercise than image scanning. File format choice, resolution, and color settings are more straightforward. Pay close attention to a few details, and you'll do just fine. There are two ways to scan for OCR. The first involves using a standard scanning utility to capture an image, and then processing it with an OCR application. The second involves using OCR software to handle every step of the process, from scanning to processing. Because all OCR applications interact differently with scanners, this OCR discussion assumes that you're not scanning from within the OCR application itself, and that you're preparing files for eventual OCR processing. If you're using your OCR software to access the scanner, some of this general advice still applies, but other details may not. Consult the user guide or online help for your OCR software. Your best bet is to do a little experimenting at the start, perfect the settings, and go from there. Your results are bound to vary. Let's summarize the preferred settings for OCR scanning. Check with your OCR software's documentation, but the following guidelines should apply. Format Various scanning packages can handle different formats. In general, you should select TIFF, BMP, or JPEG, if they're supported. Although the TIFF format offers little compression, files destined for OCR can be efficiently sized if you use the proper color and resolution settings. You might also consider BMP files if you're working on a Microsoft Windows system and are low on disk space. Resolution Because you're actually preparing a file for input when you scan for OCR processing (not output at this point), you have a bit more flexibility in your resolution settings. Consider a lower resolution: 150 dpi is good. This speeds up the process and makes the files easier to work with. Check your OCR documentation for other specifics. Color OCR scanning has no real need for color, since you're converting to text. Even if your OCR application supports recognizing color characters, it would probably be more trouble than it's worth. Scan in 256 shades of gray (8-bit) or black and white. Tip: Especially if you're scanning documents that are second- or third-generation copies, be on the lookout for stray marks on the page you're scanning. The OCR software will most likely attempt to read anything it finds as a letter or character, and that can lead to trouble. Use whiteout to cover any stray marks or smudges on the paper you're scanning. Bad text, worse results OCR software has come a long way since its clumsy beginnings, but it still simply scans and converts text as is and can't anticipate something based on context or correct obvious misspellings. It relies on the quality of the source file, relevant configurations, and your constant oversight. If you're planning to use OCR in the workplace, always factor in some review time for any text that's processed via OCR. It's important to do so, and a simple spell check won't suffice. Depending on the amount of text and its condition, OCR might not be a good solution. A document might require such extensive editing that it's cheaper to have it rekeyed by someone with good typing skills. That's the crucial equation in OCR scanning; time saved by scanning versus time spent double-checking the output text for accuracy.
|