FSHS Scanning

Boreham Library Staff Manuals

FSHS Scanning Manual

version 2007.07.26.a

Note: minor changes in menus and wording may occur with each upgrade. Just read the menu carefully and select the correct item wherever it is.



Instructions for USB-interface scanner (Epson 4490 silver)

These instructions refer to scanners which use a USB 2.0 port connection to the PC. These scanners can be moved to any PC which has a USB 2.0 port AND the scanner software. You MUST have the software installed BEFORE connecting the scanner.

Scanning Into Your Computer

  1. Turn on the scanner (switch on right long side, down low) on the Epson 4490.
    When the green light on the front stops blinking, the scanner is ready.

  2. If not already present, create a directory on the computer's C:\ drive called 1work (one work) so that when listed, it always shows up near the top of the list.
  3. Example: C:\1work

  4. load ABBYY FineReader Professional.

  5. Use File on the top bar and create a New Batch, using this naming system:

The code for the entire issue would be the volume number followed by a hyphen, followed by the issue number, followed by an underscore, followed by the Year in four digits, followed by the common abbreviation of the month.
This will keep the pages in the PDF document consistent with the pagination in the issue as much as possible.

Example: 29-2_2005Sept

  • Be sure the Settings for this file are correct. On the top bar, use Tools, Options to open the Options window.

  • General settings

    General tab

    Click on Load button IF you know a template already exists named JFSHS.
    You will get a warning. Answer Yes.

    Open Batch

    Open Batch popup window

    Select the JFSHS template to load from the popup window.

    The name should be JFSHS and it should be saved inthe C:\1work directory.

    Skip down to scanning

    If no template exists yet on this PC, create one as follows:

    Recognition tab



    Select to Autodetect layout and Check in Clear background noise.



    Print type is Autodetect.



    Tables remain blank unless needed otherwise.


    Click in Do not use user patterns IF you don't want to take time to train the software.

    Spelling settings

    Spelling tab

    Check Stop at words with uncertain characters
    Stop at words not found in dictionary
    Ignore word with digits



    Error display level is Standard

    Formatting settings

    Formatting tab

    Keep the settings pictured and click on Formats Settings button.

    PDF settings  

    PDF tab

    Select Text under the page image.
    "The entire image is saved as a picture. The recognized text is put under it. This option is useful if you export your text to document archives: the full page layout is retained and the full-text search is available if you save in this mode."

    Check Replace uncertain words with images

    Select Use standard fonts

    Reduce picture resolution to is set to 300

    JPEG quality is set to 90

    Click on OK

    Scan/Open settings

    Scan/Open Image tab

    Set to the correct TWAIN driver for your scanner.

    Check Detect image orientation
    (this will handle turning pages right-side-up if you need to scan upside down)

    General settings

    General tab

    Click on Save button if no template is already saved for these settings.

    Save Batch settings

    Save Batch Template As tab

    Save the settings under the name JFSHS in the C:\1work directory.

    Click Save.

    Click OK on all the Options.




    Scanning

    Conventions to Remember

    • The Journal of the Fort Smith Historical Society does not begin numbering pages with the cover. Therefore, when you scan the cover in at the start, all the PDF page numbering will not correspond with the issue's pagination.
      To avoid confusion, the numbering for the cover will be adjusted later in Adobe Acrobat Professional.

    • Do NOT fold back the pages and scan the Journal. As you get further into the issue, the thickness of the folded pages will cause you to lose part of the page. (this was tested already)
      Instead, alternate between right-side-up and upside-down pages as you turn the issue around to lay the full flat page in the correct position.
      The Detect image orientation setting will turn the page right side up for you.

    1. Put the first page to be scanned on the scanner table.

    2. Place the document face down on the scanner table in the upper right corner. If necessary, it can be placed upside down in order to get into the upper right corner.

    3. In ABBYY FineReader Pro, click on the Scan & Read button.

    4. Scanner main popup

      Epson Scan

      Set Mode to Professional Mode







      Set Document Type to Reflective for documents.

      Set Document Source to Document Table

      Set Auto Exposure Type to Document



      Select Image Type as 16 bit Grayscale for black and white,
      OR: 24 bit Color for color materials.

      Select Resolution as 300 dpi for good quality text in PDFs.

      Target Size is only used if the area is larger than 8.5" x 11".



      Unsharp Mask Filter is normally checked. Default level is Medium.

      Descreening Filter is normally NOT checked, unless you have a wavy or ripped pattern called "moiré," in some areas.

      Dust Removal is normally NOT checked for paper. It is only used for removing specks, etc., especially for photos and slides.



      Click on Preview button.


    5. The scanner will make noises (some of them alarming, but don't worry).

    6. The Preview window will open to show the page.

    7. Preview popup

      Preview window

      Use the green-box-inside-the-orange-box icon to automatically locate the image.
      Then adjust the dashed lines to fit the image and eliminate the unnecessary areas of the table.


    8. Adjust the dotted square with click-and-drag to enclose the desired area on the scanner screen for the document.

    9. Back on the Epson Scan window, click on the Scan button.

    10. The scanner will scan again, making noises as it does so (including some staccato woodpecker noises) while a progress bar displays.

    11. FineReader should come up automatically. (Otherwise, click on the FineReader on the Task Bar).

    12. FineReader will show a divided window.

    13. FineReader left column

      FineReader left column

      The far left column shows the pages scanned. Pages that have been "read" have a small white page icon on the lower left corner of the page.


      FineReader left window

      FineReader left window

      The inner left window shows the page, with lines marking the blocks that have been found.

      The blocks have different colored lines, according to FineReader's guess as to what the blocks contain.

      By clicking in the blocks and changing the type of block, you can convert FineReader's attempt to "read" a graphic into just a graphic.


      FineReader right window

      The right window shows a large view of part of the page, with highlighting on the dubious parts that FineReader wants you to check for it, to see if it guessed correctly.


      FineReader lower window

      FineReader lower window

      The lower window will change according to where the cursor is, and gives you a closeup of what that part of the page looks like. This makes it easier to distinguish between characters, etc.


    Editing the OCR

    Graphic blocks

    By right clicking on the red lines around the JFSHS logo, you can expand it to cover the entire logo, so FineReader won't break it up while trying to read it.



    The "Journal" is being treated as text (green line block), but it would do better as a graphic (notice how "The" inside the letter O causes it to be treated as a zero instead.

    Rightclick on the green line block, and change it to a Picture block.
    Now it changes into the proper graphic.

    Correcting example 1

    Correcting example 1

    By placing the cursor by one of the orange blocks about which FineReader is unsure, the area is shown enlarged in the bottom window.

    Now it can be seen that what FineReader guessed was a "u" is really an "h", so correct it in the right window until the orange disappears.

    Important Note: you don't have to change the orange text unless it is actually wrong.
    If FineReader's guess is correct, leave it alone and save time.

    You don't need to check spelling, as this is intended to be an exact copy of the original.

    Other editing instructions:

    • If a line is missing, move to where it should be, use the Enter key to drop the second line, then move back up and type in the missing text.

    • Use F4 key to jump to the next problem character highlighted.

    • Enter the correct text and then delete the highlighted text (even if it is correct) so all the color disappears.

    • The OCR may try to figure out partial characters. Just delete those.

    • Punctuation confuses the OCR as it varies a lot between fonts and printings. It often thinks an entire word is wrong because it has a comma or period after the last letter. Just type in the correct data and delete the highlighted data.

    • OCR may have trouble with the letter i, especially if the printing has no dot over it. The lower-case letter L may be confused with the digit 1 (one). Letters printed close together (especially the letter i close to another letter) may be hard to recognize.

    • If the scan shows a little yellow triangle with an exclamation point, mouseover the triangle to see what FineReader wants done. You will probably need to delete that page only and scan again, perhaps at a slightly higher resolution (400 instead of 300, say), to get the characters large enough to use OCR properly.
      Remember to change the resolution back to 300 when you have completed the difficult pages.

    • There is no way to enter the "cents" sign, despite the instructions, which are not clear. Use a "c" instead.


    Saving the PDF

    1. Click on the Save button.

    Save PDF window

    Save the file in the C:\1work directory.

    Use the conventional naming system: Volume hyphen number underscore yearmonth
    In the example, Volume 10, issue 1, 1986April

    Save as Type PDF Document

    Save All pages and select Create a single file for all pages

    You will get a warning about deleting all other versions. That's okay.


    FineReader shows as saved

    FineReader shows as saved

    The little diskette on the page now shows that the page has been saved as part of the PDF file you are building.


    Next Page

  • Change page(s) on the scanner.

  • In FineReader, click on the Scan & Read button.
    Once you have the page size set, as long as you scan properly, you should normally be able to do without PreScan again.

  • Pages will be added to the left sidebar as scanned, and the "recognized" icon added as they are read.

  • IMPORTANT: Do not scan more than a few pages until the current scans have been corrected and saved; not all scans are good and some may need to be repeated.
    Save the entire file every 2 to 4 scans.

  • The last page scanned will be the cover outside (and inside, if present).

  • When done, the file may be closed using File, Close and then reopened and added to as necessary later.





  • Adobe Acrobat Professional Actions

    Load Adobe Acrobat Professional (not the Reader) and then load the PDF file to be edited.

    Pagination

    1. On the tabs on the left side of the screen, click Pages to get the thumbnails of the pages in a column on the left side.

    2. Click on the cover once to highlight it.

    3. Click on the Options at the top of the column, and select Number Pages.

    4. Change the page number for just the cover to Style set for A, B, C,... and the Start number set to A.

    5. Click OK.

    6. If there is an inside cover numbered page, change it to page B. Change any printed back cover pages which might not be numbered to C and D, as needed.

    Linking Contents Page in Adobe Acrobat Professional

    1. Load Adobe Acrobat Professional (the editing software, not the reader).

    2. Go to the Contents page of the file.

    3. Use Tools, Advanced Editing, and select Link Tool.

    4. This gives you a little crosshairs cursor to draw boxes. Use the Link Tool to draw a box around the first content item.

    5. The popup box defaults to linking to another page in the same document. Change the page number to the proper page for that content item.

    6. Right click on the box created to get a menu, and select Properties.

    7. Link properties

      Link properties

      • Link Type is Visible Rectangle.
      • Line Style is Underline
      • Highlight Style is None
      • Color is red (click on the color box and select red)
      • Line Thickness is Medium
      • Click on Close
    8. Right click on the link again and select Use Current Appearance as New Default.

    9. Repeat linking for each content item. You should not have to set the link properties again.

    10. Do the same for the COVER description to link to the cover page (which should be A.

    11. Save the file to the C:\1work directory.

    Extracting Articles

    The convention for naming articles is: Volume hyphen issue underscore followed by the title with underscores between each word.
    Example: 10-1_National_Weather_Service.pdf

    1. In the top Toolbar, select Document.

    2. Select Pages.

    3. Select Extract.

    4. In the popup window, select the pages containing the article to be copied.
      Do NOT check Delete Pages After Extracting.

    5. If the article is continued on separate pages later in the issue, you can select the first page(s) and the last page(s) to begin and end. Then, once the individual article pages are on the screen, click on the Pages tab on the left side of the window to get the thumbnails of pages. Select the first unwanted page, then go down to the last unwanted page and use Shift click to select all the unwanted pages between. Then use the Options menu at the top of the column and select to delete all the unwanted pages. You will be left with only the separate pages that contain the entire article.

    6. When the article is on the screen, select File from the top Toolbar.

    7. Select Save As and save the properly named file in the same directory as your main issue file.
      Example: 10-1_National_Weather_Service.pdf

    8. Continue until all the articles normally indexed are done.
    9. You can close the individual articles and get back to the main issue by using the lower X in the upper right corner.

      You can save time if you have more than one complete article on a page. Just save the page once with both titles combined in the file name, and later you can link both entries to the same page.

    10. Extract the index page, if present, into its own PDF file as well, named with the main issue convention naming, followed by underscore and Index.
    11. Example: 29-2_2005Sept_Index

    12. This will provide a list of files, grouped together by the naming convention.





    Creating TWIKI pages

    1. Load TWIKI, select Admin, select Library, select Web Index.

    2. If you are adding to an existing volume, select the appropriate volume page and then skip down.

    3. Select a recent sample JFSHS Vol page to copy.

    4. Select Edit to get the TWIKI HTML coding to copy.

    5. Copy the TWIKI HTML from the sample page to the Windows Clipboard (highlight, then use Ctrl-C).

    6. Cancel the edit.

    7. Enter JFSHS Vol followed by the volume number in Jump to Topic box. You will be told there is no such page. Click on Create (at the far left) to create the page.

    8. Paste the sample TWIKI HTML into the new page, and then change as needed to create the page for the new volume.

    9. Remove the old issue and article links.

    10. Use Attach to create new links and upload the files, using the article title as the Comment for each one. You can do these one after another.

    11. Arrange the articles on the page in the same order as the Contents in the issue.

    12. Upload the FSHS logo and the TheJournal logo. You do not need to create a link at the end of the page for these two only.

    13. Preview, then Save the page.

    14. Using the Web Index, go to the Journal of the Fort Smith Historical Society page and Edit it.

    15. Adjust the table as needed to add the new issue, and link to the new page.










    Notes on Settings

    Resolution Note

    REMEMBER: any viewer used to check this may not show the scan as clearly as it will eventually appear online. Check the resolution online first by scanning/importing into Millennium Media and using the viewer there.

    PDF files can and should have higher (300 dpi or more) resolution than files to be used with the III/Acordex viewer. The III viewer actually functions better with a lower 75 to 150 dpi resolution.

    Increase the resolution to improve the sharpness.

    For OCR use, lower resolutions may actually be better. Higher resolutions allow more detail, which OCR reacts to as "fuzziness" which makes it harder to recognize characters. Lowering resolution may get better results if OCR is having this problem.



    Target Note

    To change Target Size, click on the + next to it to expand choices, and set the size.
    REMEMBER: For smaller sizes than 8.5 x 11", you can simply choose part of the scan using Preview, rather than changing Target Size. That avoids having to change it back, or scanning too small an area next time you scan.



    Unsharp Mask Filter Note

    The Unsharp Mask Filter is used to bring out all of the hidden detail in a well-focused image. This odd name comes from the fact that the sharpen algorithm only sharpens areas of the image which have edges or lots of detail. Areas which do not "measure up' are left alone - or masked off from the sharpen algorithm. This technique will NOT work for poorly focused or out of focus images. It will also NOT work for most digital camera images. [explanation from Adobe Photoshop] If the detail is not sharp enough overall, turn this filter off and scan again, then check results.



    Descreening Filter

    This is especially important for photographs!

    The Descreening Filter can remove the moiré patterns from a scanned image. You can remove a wavy or ripped pattern called "moiré," which tends to happen in areas of halftone color, such as skin color.
    This can also improves results when scanning magazine or newspaper images which include screening (a pattern made up from tiny dots) in their original print processes. Choose from Newspaper 85lpi, Magazine 125 lpi or Fine Prints 175 lpi settings.