Technology and Scholarly Communication "d0e779"

Conversion Benchmarking

Determining what constitutes informational content becomes the first step in the conversion benchmarking process. This can be done objectively or subjectively. Let's consider an objective approach first.

Objective Evaluation

One way to perform an objective evaluation would be to determine conversion requirements based on the process used to create the original document. Take resolution, for instance. Film resolution can be measured by the size of the silver crystalline clusters suspended in an emulsion, whose distinct characteristics are appreciated only under microscopic examination. Should we aim for capturing the

― 42 ―

properties of the chemical process used to create the original? Or should we peg resolution requirements at the recording capability of the camera or printer used?

There are objective scientific tests that can measure the overall information carrying capacity of an imaging system, such as the Modulation Transfer Function, but such tests require expensive equipment and are still beyond the capability of most institutions except industrial or research labs.^[9] In practical applications, the resolving power of a microfilm camera is measured by means of a technical test chart in which the distinct number of black and white lines discerned is multiplied by the reduction ratio used to determine the number of line pairs per millimeter. A system resolution of 120 line pairs per millimeter (lppm) is considered good; above 120 is considered excellent. To digitally capture all the information present on a 35 mm frame of film with a resolution of 120 Ippm would take a bitonal film scanner with a pixel array of 12,240.^[10] There is no such beast on the market today.

How far down this path should we go? It may be appropriate to require that the digital image accurately depict the gouges of a woodcut or the scoops of a stipple engraving, but what about the exact dot pattern and screen ruling of a halftone? the strokes and acid bite of an etching? the black lace of an aquatint that becomes visible only at a magnification above 25³ ? Offset publications are printed at 1200 dpi-should we choose that resolution as our starting point for scanning text?

Significant information may well be present at that level in some cases, as may be argued for medical X rays, but in other cases, attempting to capture all possible information will far exceed the inherent properties of the image as distinct from the medium and process used to create it. Consider for instance a 4" ³ 5" negative of a badly blurred photograph. The negative is incredibly information dense, but the information it conveys is not significant.

Obviously, any practical application of digital conversion would be overwhelmed by the recording, computing, and storage requirements that would be needed to support capture at the structure or process level. Although offset printing may be produced at 1200 dpi, most individuals would not be able to discern the difference between a 600 dpi and a 1000 dpi digital image of that page, even under magnification. The higher resolution adds more bits and increases the file size but with little to no appreciable gain. The difference between 300 dpi and 600 dpi, however, can be easily observed and, in my opinion, is worth the extra time and expense to obtain. The relationship between resolution and image quality is not linear: at some point as resolution increases, the gain in image quality will level off. Benchmarking will help you to determine where the leveling begins.

Subjective Evaluation

I would argue, then, that determining what constitutes informational content is best done subjectively. It should be based on an assessment of the attributes of the document rather than the process used to create that document. Reformatting via

― 43 ―

digital-or analog-techniques presumes that the essential meaning of an original can somehow be captured and presented in another format. There is always some loss of information when an object is copied. The key is to determine whether that informational loss is significant. Obviously for some items, particularly those of intrinsic value, a copy can serve only as a surrogate, not as a replacement. This determination should be made by those with curatorial responsibility and a good understanding of the nature and significance of the material. Those with a trained eye should consider the attributes of the document itself as well as the immediate and potential uses that researchers will make of its informational content.

Determining Scanning Resolution Requirements for Replacement Purposes

To illustrate benchmarking for conversion, let's consider the brittle book. For brittle books published during the last century and a half, detail has come to represent the size of the smallest significant character in the text, usually the lowercase e. To capture this information-which consists of black ink on a light background-resolution is the key determinant of image quality.

Benchmarking resolution requirements in a digital world have their roots in micrographics, where standards for predicting image quality are based on the Quality Index (QI). QI provides a means for relating system resolution and text legibility. It is based on multiplying the height of the smallest significant character, h, by the smallest line pair pattern resolved by a camera on a technical test target, p: QI = h³p. The resulting number is called the Quality Index, and it is used to forecast levels of image quality-marginal (3.6), medium (5.0), or high (8.0)-that will be achieved on the film. This approach can be used in the digital world, but the differences in the ways microfilm cameras and scanners capture detail must be accounted for.^[11] Specifically, it is necessary to make the following adjustments:

1. Establish levels of image quality for digitally rendered characters that are analogous to those established for microfilming. In photographically reproduced images, quality degradation results in a fuzzy or blurred image. Usually degradation with digital conversion is revealed in the ragged or stairstepping appearance of diagonal lines or curves, known as aliasing, or "jaggies."

2. Rationalize system measurements. Digital resolution is measured in dots per inch; classic resolution is measured in line pairs per millimeter. To calculate QI based on scanning resolution, you must convert from one to the other. One millimeter equals 0.039 inches, so to determine the number of dots per millimeter, multiply the dpi by 0.039.

3. Equate dots to line pairs. Again, classic resolution refers to line pairs per millimeter (one black line and one white line), and since a dot occupies the same space as a line, two dots must be used to represent one line pair. This means the dpi must be divided by two to be made equivalent to p.

― 44 ―

With these adjustments, we can modify the QI formula to create a digital equivalent. From QI = p ³h, we now have QI = 0.039 dpi ³h/2, which can be simplified to 0.0195 dpi ³h.

For bitonal scanning, we would also want to adjust for possible misregistration due to sampling errors brought about in the thresholding process in which all pixels are reduced to either black or white. To be on the conservative side, the authors of AIIM TR26-1993 advise increasing the input scanning resolution by at least 50% to compensate for possible image detector misalignment. The formula would then be QI = 0.039 dpi ³h/3, which can be simplified to 0.013 dpi ³h.

So How Does Conversion Benchmarking Work?

Consider a printed page that contains characters measuring 2 mm high or greater. If the page were scanned at 300 dpi, what level of quality would you expect to obtain? By plugging in the dpi and the character height and solving for QI, you would discover that you can expect a QI of 8, or excellent rendering.

You can also solve the equation for the other variables. Consider, for example, a scanner with a maximum of 400 dpi. You can benchmark the size of the smallest character that you could capture with medium quality (a QI of 5), which would be .96 mm high. Or you can calculate the input scanning resolution required to achieve excellent rendering of a character that is 3 mm high (200 dpi).

With this formula and an understanding of the nature of your source documents, you can benchmark the scanning resolution needs for printed material. We took this knowledge and applied it to the types of documents we were scanning- brittle books published from 1850 to 1950. We reviewed printers' type sizes commonly used by publishers during this period and discovered that virtually none utilized type fonts smaller than I mm in height, which, according to our benchmarking formula, could be captured with excellent quality using 600 dpi bitonal scanning. We then tested these benchmarks by conducting an extensive on-screen and in-print examination of digital facsimiles for the smallest font-sized Roman and non-Roman type scripts used during this period. This verification process confirmed that an input scanning resolution of 600 dpi was indeed sufficient to capture the monochrome text-based information contained in virtually all books published during the period of paper's greatest brittleness. Although many of those books do not contain text that is as small as I mm in height, a sufficient number of them do. To avoid the labor and expense of performing item-by-item review, we currently scan all books at 600 dpi resolution.^[12]

Conversion Benchmarking beyond Text

Although we've conducted most of our experiments on printed text, we are beginning to benchmark resolution requirements for nontextual documents as well. For non-text-based material, we have begun to develop a benchmarking formula that would be based on the width of the smallest stroke or mark on the page rather

― 45 ―

than a complete detail. This approach was used by the Nordic Digital Research Institute to determine resolution requirements for the conversion of historic Icelandic maps and is being followed in the current New York State Kodak Photo CD project being conducted at Cornell on behalf of the Eleven Comprehensive Research Libraries of New York State.^[13] The measurement of such fine detail will require the use of a 25 to 50 ³ loupe with a metric hairline that differentiates below 0.1 mm.

Benchmarking for conversion can be extended beyond resolution to tonal reproduction (both grayscale and color); to the capture of depth, overlay, and translucency; to assessing the effects of compression techniques and levels of compression used on image quality; to evaluating the capabilities of a particular scanning methodology, such as the Kodak Photo CD format. Benchmarking can also be used for evaluating quality requirements for a particular category of material- halftones, for example-or to examine the relationship between the size of the document and the size of its significant details, a very challenging relationship that affects both the conversion and the presentation of maps, newspapers, architectural drawings, and other oversized, highly detailed source documents.

In sum, conversion benchmarking involves both subjective and objective components. There must be the means to establish levels of quality (through technical targets or samples of acceptable materials), the means to identify and measure significant information present in the document, the means to relate one to another via a formula, and the means to judge results on-screen and in-print for a sample group of documents. Armed with this information, benchmarking enables informed decision making-which often leads to a balancing act involving tradeoffs between quality and cost, between quality and completeness, between completeness and size, or between quality and speed.