Boreham Library Staff Manuals
OCR Scanning Manual
version 2007.07.26.a
Note: minor changes in menus and wording may occur with each upgrade. Just read the menu carefully and select the correct item wherever it is.
|
|
FineReader OCR
Read ALL instructions before beginning.
WARNING: while it is possible to scan pages continuously with this system, experience has shown that some of the scans will not be readable. Since this will require going back and scanning pages into proper order, the somewhat slower method given here is more reliable.
This is "learning" software. It will learn from what is approved in a document and apply that in the future. However, when documents (and typefaces) change, it will have to "learn" again.
- At the back end of the scanner, on the right, turn ON the scanner using the rocker switch.
Remember to turn OFF the scanner when done, to save the light element.
- Login as yourself. You must already be authorized to use this software.
- Open the INNOPAC folder and click on ABBYY FineReader Professional 6.0, or select it from the Program List if the icon is missing. Ignore any other FineReader choices at this time.
- If you are adding to an existing batch, go to this section.
- Click on File (upper right corner), then select New Batch.
- Save in C:\Documents and Settings\library\My Documents and add a name consisting of the document name followed by an underscore followed by your initials.
- Under Tools select Options and the General tab.
- Select Load and select the generic batch template to load. Then close the Options.
- Place the first page or cover on the scanner:
- For a single page, place it face down, top nearest you (at the front of the scanner), with the side along the long edge nearest you (the left).
- For a book or reading more than one page at a time, place the top face down, along the long edge nearest you, and the side along the short edge nearest you (the front of the scanner).
- To scan multiple pages, click on the Scan and Read button along the right side with the arrow to get the dropdown menu, and select Scan multiple images.
Otherwise, just click on the Scan and Read button.
- A popup window says Reading and lists a page count. (The scanner may say it is calibrating first; this is okay.) Nothing else may happen at this point; this is okay.
- If the FineScanManager window does not come up by itself, then at the bottom of the screen on the Task Bar, FineScanManager appears. Click on that.
- Click on the Overview button. It is okay to delete all previous images.
- Adjust the dotted square with click-and-drag to enclose the desired area on the scanner screen for the document.
- Click on Prescan. This will show a full view of the selected area.
- Click on Scan.
- Switch back to FineReader (Click on the FineReader on the Task Bar).
- Ignore or move the popup Fine Objects window (unless this is last page of the session; then click on Stop.
- Check to see if the green outline boxes (and red, if present) outline all the text.
- IF an image of the document appears on the right side of the screen with a little white page icon at the lower right, FineReader has already read it.
- If the document has not yet been read (the right side says "page not recognized"), click on the Read button.
Reading and Check Spelling functions
- When the Read function is done, the page will show in the left sidebar with a small recognized icon (a little white page at the lower left corner of the page).
- Click on the arrow on the side of Save Text to get the dropdown menu, then select Save text to file.
- Choose MS Word document as the format and the following options:
- Create a single file for all pages
- Retain full page layout
- Keep pictures
Next Page
- Change page(s) on the scanner.
- If the Fine Objects window is still open, click on it to continue. If not, use the Scan and Read button as before.
- Click on the Scan button.
- Open FineScanManager again if necessary.
- IF the page size has not changed, skip to Prescan.
- IF the page size has changed (from one page to two pages, for example), use Overview again to select the dotted line area for the new size. It is okay to delete all previous images (they are already saved as Read).
- Prescan if the size changed, then Scan and continue as with the last page.
- Pages will be added to the left sidebar as scanned, and the "recognized" icon added as they are read.
IMPORTANT: Do not scan more pages until the current scan has been read; not all scans are good and some may need to be repeated.
- Click on the Read button if necessary (the white page icon is not present).
(See the Reading and Check Spelling functions, below.
- When done, the file may be closed using File, Close and then reopened and added to as necessary later.
Adding to an Existing Batch
- Go to File and select Open Batch.
- Select the batch from those under C:\Documents and Settings\library\ and open it.
- Place the first page or cover on the scanner:
- For a single page, place it face down, top nearest you (at the front of the scanner), with the side along the long edge nearest you (the left).
- For a book or reading more than one page at a time, place the top face down, along the long edge nearest you, and the side along the short edge nearest you (the front of the scanner).
- To scan multiple pages, click on the Scan and Read button along the right side with the arrow to get the dropdown menu, and select Scan multiple images.
Otherwise, just click on the Scan and Read button.
- When the scanning is completed and the popup asks Scan next page? answer No.
ON THE SCREEN: The left column shows all the pages scanned so far. The left page is the scanned image. The right page is the OCR interpretation of what the scanned image says. The red highlights and marks show parts which the OCR is not sure about, which must be checked.
- The left column shows all the pages scanned so far, with the latest at the bottom. The last page (the right side page) of the ones just scanned is displayed. Go to the page before it.
Note: page numbers for this program do not correspond directly to the page numbers in the item you are scanning.
- With the first (left) page displayed, be sure the green border encloses all the text on the page, adjusting the border if necessary.
- Click the Read button and confirm to read again if necessary.
- Check the OCR text (see instructions).
- When both scanned pages are complete with no colored marks, save as a Word document in C:\Documents and Settings\library\My Documents with the same title of the item scanned with the following options:
- Save as type: MS Word document
- Save all pages
- Create single file for all pages
- Retain full page layout
- Give permission to overwrite the file each time so a complete copy of all pages, with added pages, is written. When complete, all the pages show a little diskette icon next to the white page icon at the lower left.
- Repeat beginning with the Scan and Read button. When done, go to File and select Close Batch to properly close the batch.
Reading and Check Spelling functions
- Correct the text on the right side (click in the right side window) as needed to match the actual text in the scan on the left side. If the left side text is hard to read, look in the window below these two windows for an enlarged view of the scanned text.
- If a line is missing, move to where it should be, use the Enter key to drop the second line, then move back up and type in the missing text.
- Use F4 key to jump to the next problem character highlighted.
- Enter the correct text and then delete the highlighted text (even if it is correct) so all the color disappears.
- The OCR may try to figure out partial characters. Just delete those.
- Punctuation confuses the OCR as it varies a lot between fonts and printings. It often thinks an entire word is wrong because it has a comma or period after the last letter. Just type in the correct data and delete the highlighted data.
- OCR may have trouble with the letter i, especially if the printing has no dot over it. The lower-case letter L may be confused with the digit 1 (one). Letters printed close together (especially the letter i close to another letter) may be hard to recognize.
- For genuine typographical errors, see instructions below for using [sic] to mark them.
Spell Checking
- If the word is correct but not in the dictionary, it can be added to the dictionary.
Be sure to define the word as a noun, adjective, etc. Some of the variations may not be correct, but those can be ignored.
- Uncertain characters: FineReader is guessing at what the character is supposed to be. If correct, click on Ignore. If not, select from suggestions or correct it and Confirm.
- Typographical and other errors in the original document may be important. In such cases, it's traditional to signal to readers that the oddities are really in the original, and not a mistake of the OCR process. The signal is "[sic]": square brackets for an interpolation, and the Latin word sic, "thus, this way." (Since it's a foreign word, it's always in italics; since it's a whole word and not an abbreviation, it gets no period.) It amounts to saying, "It really is this way, so don't blame me."
- First, confirm that the error is actually present and not just an OCR mistake.
- If an actual error, do NOT add it to the dictionary.
- After the word, put [sic]
Example: He ghew [sic] up in a small town.
Saving as a Word Document
Save the document as C:\Documents and Settings\library\My Documents with the title of the item scanned with the following options:
- Save as type: MS Word document
- Save all pages
- Create single file for all pages
- Retain full page layout
Give permission to overwrite the file each time so a complete copy of all pages, with added pages, is written. When complete, all the pages show a little diskette icon next to the white page icon at the lower left.
Default Settings Changed
ScanWizard: Preferences, Overview, Setup: scan enlarged to cover entire scanner plate.
FineReader
Scan, Open Image:
- Use the Fine Reader Interface
- Scanner settings: 300 dpi; Stop between pages
- Display Options dialog before scanning
- Split dual pages
- Detect image orientation
- Despeckle image
Saved as generic batch template.