Appendix F
Technical Justification for a Digitization Standard for the Consortium
A major premise in the technical underpinnings of the new consortial model is that a relatively inexpensive scanner can be located in the academic libraries of consortium members. After evaluating virtually every scanning device on the market, including some in laboratories under development, we concluded that the 400 dot-per-inch (dpi) scanner from Minolta was fully adequate for the purpose of scanning all the hundreds of chemical sciences journals in which we were interested. Thus, for our consortium, the Minolta 400 dpi scanner was taken to be the
digitization standard. The standard that was adopted preserves 100% of the informational content required by our end users.
More formally, the standard for digitization in the consortium is defined as follows:
The scanner captures 256 levels of gray in a single pass with a density of 400 dots per inch and converts the grayscale image to black and white using threshold and edge-detection algorithms.
We arrived at this standard by considering our fundamental requirements:
• Handle the smallest significant information presented in the source documents of the chemical sciences literature, which is the lowercase e in superscript or subscript as occurs in footnotes
• Satisfy both legibility and fidelity to the source document
• Minimize scanning artifacts or "noise" from background
• Operate in the range of preservation scanning
• Be affordable by academic and research libraries
The scanning standard adopted by this project was subjected to tests of footnoted information, and 100% of the occurrences of these characters were captured in both image and character modes and recognized for displaying and searching.
At 400 dpi, the Minolta scanner works in the range of preservation quality scanning as defined by researchers at the Library of Congress (Fleischhauer and Erway 1992).
We were also cautioned about the problems unique to very high resolution scanning in which the scanner produces artifacts or "noise" from imperfections in the paper used. We happily note that we did not encounter this problem in this project because the paper used by publishers of chemical sciences journals is coated.
When more is less: images scanned at 600 dpi require larger file sizes than those scanned at 400 dpi. Thus, 600 dpi is less efficient than 400 dpi. Further, in a series of tests that we conducted, a 600 dpi scanner actually produced an image of effectively lower resolution than 400 dpi. It appears that this loss of information occurs when the scanned image is viewed on a computer screen where there is relatively heavy use of anti-aliasing in the display. When viewed with software that permitted zooming in for looking at details of the scanned image (which is supported by both PDF and TIFF viewers), the 600 dpi anti-aliased image actually had lower resolution than did an image produced from the same source document by the 400 dpi Minolta scanner according to our consortium's digitization standard. With the 600 dpi scanner, the only way for the end user to see the full resolution was to download the image and then print it out. When a side-by-side comparison was made of the soft-copy displayed images, the presentation image quality of 600 dpi was deemed unacceptable by our end users; the 400 dpi image was just right. Thus, our delivery approach is more useful to the scholar who needs to examine
fine details on-screen. We conducted some tests on reconstructing the journal page from the scanned image by printing it out on a Xerox DocuTech 6135 (600 dpi). We found that the smallest fonts and fine details of the articles were uniformly excellent. Interestingly, in many of the tests we performed, our faculty colleagues judged the end result by their own "acid test": how the scanned image, when printed out, compared with the image produced by a photocopier. For the consortium standard, they were satisfied with the result and pleased with the improvement in quality that the 400 dpi scanner provided in comparison with conventional photocopying of the journal page.