Defing the Costs
When the product or service is one that has not previously been offered, projecting potential costs is more art than science. Even if one has some experience providing a version of the product, as JSTOR had because of the Mellon initiative, one finds that the costs that have been incurred during the initial start-up period are irregular and unstable and thus not reliable for projecting beyond that phase. Even now, with nearly 200 paying participants, we still have much to learn about what our stable running costs are likely to be.
What we have learned is that our costs fall into six categories:
1. Production: identifying, finding, and preparing the complete run; defining indexing guidelines to inform a scanning subcontractor; and performing quality control on the work of the scanning subcontractor.
2. Conversion: scanning, OCR, and inputting of index information to serve as the electronic table of contents (performed by a scanning subcontractor).
3. Storage and access: maintaining the database (at a number of mirror sites), which involves continuous updating of hardware and systems software.
4. Software development: migrating the data to new platforms and systems and providing new capabilities and features to maximize its usefulness to scholars as technological capabilities evolve.
5. User support: providing adequate user help desk services for a growing user base.
6. Administration and oversight: managing the overall operations of the enterprise.
Some of these costs are one-time (capital) expenditures and some of them are ongoing (operating) costs. For the most part, production and conversion (#1 and #2 above) are one-time costs. We hope that we are digitizing from the paper to the digital equivalent only once.[4] The costs in the other categories will be incurred regardless of whether new journals are added to the database and are thus a reflection of the ongoing costs of the enterprise.[5]
Because the most visible element of what JSTOR provides is the database of page images, many people tend to think that the cost of scanning is the only cost factor that needs to be considered. Although the scanning cost is relevant, it does not reflect the total cost of conversion for a database like JSTOR. In fact, scanning is not even the most expensive factor in the work done by our scanning contractor. During the conversion process, JSTOR's scanning vendor creates an electronic table of contents, which is just as costly as the scanning. In addition, because
creating a text file suitable for searching requires manual intervention after running OCR software, that step has proven to be even more expensive than scanning. All told, the direct incremental costs of creating the three-part representation of a journal page in the JSTOR database (page image, electronic table of contents entry, and text file) is approximately $.75 to $1.00 per page.
Payments to the scanning bureau do not represent the complete production cost picture. Converting 100,000 pages per month requires a full-time staff to prepare the journals and to give the scanning bureau instructions to ensure that table of contents and indexing entries are made correctly. At present production levels, these costs are approximately equal to the outlays made to the scanning bureau. On average then, JSTOR production costs approach $2.00 per page.
Other costs of operating JSTOR are less easily segregated into their respective functional "department". Our present estimates are that once all of the 100 Phase I journals are available in the database, operating costs (independent of the onetime costs associated with production) will be approximately $2.5 million annually.