VISIONS OF THE FUTURE
Licensing, Copyright, and Fair Use
The Thesauron Project (Toward an ASCAP for Academics)
The Thesauron Project takes its name from the ancient Greek term meaning treasury or inventory. This project envisions the creation of a digital depository and licensing and tracking service for unpublished "academic" works, including working papers, other works-in-progress, lectures, and other writings that are not normally published in formal academic journals. A centralized digital clearinghouse for this material confers a number of benefits on the academic authors and on users, particularly users of university libraries, including students, professors, and other researchers.
First, a centralized depository offers a more systematic and convenient means to discover the unpublished literature than does wandering around individual professors' or departments' Web pages. The depository's detailed and dynamic catalog of its works, identifying new and revised submissions, will significantly enhance the accessibility of this material.
Second, academic authors may not always have a significant financial stake in the electronic exploitation of their works (whether the works are unpublished or published; in the latter instance, many academics may have assigned all rights to publishers-sometimes inadvertently). But academics do have a very significant glory interest. A depository that undertakes what one might call "prestige accounting" for the authors adds an important feature and may serve as an incentive to participation.
What is "prestige accounting"? It is the tracking of use in a way that would permit authors to interrogate the depository to learn if and how their works are being used, for example, on reserve or in course packs at identified universities, for identified courses. Currently, academic authors generally do not know, apart from general sales figures (if they receive any), what has been the dissemination of their works. With some prodding of publishers, one might find out which bookstores placed orders for the book and thus infer which schools were using the work.
However, this kind of information is not generally available (or, at any rate, disseminated) for photocopied course packs, even when rights are cleared.
Third, and especially important to the digital environment, a service of this kind would add considerable value if it could ensure that the digital version made available is authentic. Many works may be traveling on the Web, but the user may not (or should not) be confident that the document downloaded is completely consistent with the work as created. This quality control is particularly significant when many different versions (e.g., prior drafts) are accessible at multiple Internet sites (not all of them with the author's permission).
Defining the thesauron universe
What Kinds of Works Will the Thesauron Depository Include?
At least as an initial matter, the depository will be confined to unpublished works such as drafts, lectures, occasional pieces, conference proceedings, masters theses, and perhaps, doctoral dissertations. This definition should help avoid possible conflict with publishers (or those that are the copyright holders of works written by academics) who are or will be undertaking their own licensing programs. Moreover, the universe of "unpublished" works may grow as that of formal academic publications shrinks.
Whose Works Will Be Included in the Thesauron Depository?
Any academic (term to be defined; e.g., anyone with an institutional IP address) who wishes to deposit a work will be welcome to do so. There will be no screening or peer review.
Participating authors will register with the Thesauron depository and will receive a password (Thesauron registration information will also be relevant to terms and conditions and to authenticity; the password will tie into use reporting; see IIC, IVA, VB, infra).
Entry of Works
Deposits must be made by or under the authority of the author (if living) or successor in title (if dead); the depository will not accept submissions from unauthorized third parties.
Deposited works should be sent in HTML format.
Upon depositing, the author will supply information necessary to cataloging the work, including author name and the title of the work, and will categorize the work for the Thesauron catalog by selecting from LC classifications and sub-classifications supplied on menu screens (see also IIIC, infra).
Every work deposited in Thesauron will automatically receive an identifying
ISBN-type number ("Thesauron number"). The number will be communicated to each author upon deposit as well as maintained in the catalog.
Exit of Works
The author, upon submitting the work, may demand that it self-delete from the depository by a date selected. Any document so designated should bear a legend that indicates at what date it will no longer be included in the depository.
The author may also demand deletion from the depository at any time. The catalog (see IIIC, infra) will indicate the date that a work has been deleted and whether it has been replaced by an updated version. A morgue catalog will be established to keep a record of these deletions.
Terms and Conditions
With each deposit, a participating author who wishes to impose terms and conditions on use of the work may select from a menu of choices. These choices will include:
What kind of access to permit (e.g., browsing only)
What purpose (e.g., personal research but not library reserve or course packs)
Whether to charge for access, storage, or further reproductions
What Users May Access the Thesauron Depository?
As a starting point, access will be limited to university-affiliated (or research institute-affiliated) users. These users will make their first contact with Thesauron from their institutional host in order to establish a user ID number from which they may subsequently gain access from both institutional and noninstitutional hosts (i.e., work or home).
When registering, the user will indicate a user category (e.g., professor, postdoctoral, graduate, undergraduate) and disciplines (research and teaching subject matter areas); this information will be relevant to the depository's catalog and tracking functions (see IIIC, VA, infra).
A second phase of the project would extend access to independent scholars who do not have institutional affiliations. At a later date, access to the depository might be expanded to the general public.
Conditions on Use
When registering, the user will encounter a series of screens setting forth the general conditions on using Thesauron. These conditions include agreement to abide by the terms and conditions (if any) that each author has imposed on the deposited works (e.g., the author permits browsing and personal copying, but not further copying or distribution). The user will also agree that in the event of a dispute
between the user and Thesauron, or between the user and a Thesauron author, any judicial proceeding will be before the U.S. District Court for the Southern District of New York (or, if that court lacks subject matter jurisdiction, before the New York State Supreme Court) and will be governed by U.S. copyright law and New York law. (The choice of forum and of state law assumes that Thesauron will be established at Columbia University.)
How Will Users Know Thesauron's Holdings?
The depository will include an electronic catalog searchable by keyword or by Boolean logic. The catalog will also be organized in a scroll-through format employing LC subject headings. The catalog will be dynamic so as to reflect new submissions or revisions of material (and will also indicate when an author has deleted material from the depository).
The catalog will be dynamic in another way. Along the lines of SmartCILP (Current Index to Legal Periodicals) and similar products, it will regularly e-mail registered users with information about new submissions in the subject matter categories that the Thesauron user has requested.
How Will Users Access Material from the Thesauron Depository?
After finding the requested work's Thesauron number in the general on-line catalog or in the e-mailed updates, the registered user will click on the catalog listing or type in the Thesauron number to receive the work.
It is also possible to envision links to specific works in the depository from online course syllabi or other on-line reading lists.
In addition to the general conditions screens encountered on first registration with Thesauron, the terms and conditions (if any) pertinent to each work will appear on the initial screen prefacing each work. In order to access the rest of the document, the user will be obliged to click on a consent to those terms and conditions.
Delivery from the Thesauron Depository
Documents in the depository will be authentic when submitted by the author. The depository will add digital signatures or other marking material to identify the author, the work, and its date of submission.
Subsequent Generations of Documents Originally Obtained from the Depository
The Thesauron project does not now contemplate attempting to prevent users from making or circulating further copies of works obtained from the depository. But it is important to provide the means for anyone who obtains a document of
uncertain provenance to compare it with the authentic version to ensure that no alterations have occurred. Thus, if a registered user has obtained a copy from a source other than Thesauron, the user should verify that copy against the version in the depository.
Identification of Uses
Registered users will respond to a menu screen indicating the purpose of their access, e.g., library reserve, course pack, personal research.
Registered authors will have electronic "prestige" reports that they may interrogate at any time to learn:
The number of hits each deposited work has received
The source of the hit (institution, department, user category-names of users will not be divulged)
The nature of the use (library reserve, course pack, research)
If the author has requested payment for access or copying, the registered user will need a debit account to access the work; the debit would be credited to the author's account. These operations may be implemented through links to a participating bank.
Other potential applications of thesauron
As currently conceived, Thesauron's universe is unpublished academic works. But once all its features have been put into place, Thesauron could either expand its holdings or work in tandem with copyright owners of published works to supplement whatever rights clearance system the publisher has devised. Similarly, in situations in which authors have not assigned their copyrights or have at least retained electronic rights, Thesauron could work together with collective licensing agencies, such as the Authors' Registry, to supplement their rights clearance and reporting mechanisms.
Costs of implementation and maintenance
The primary initial costs will be in acquiring hardware to accommodate the depository and in creating or adapting the software for the various components of
the system: author registration; deposit; cataloging; user registration; use tracking and reporting; billing. It will also be important to publicize Thesauron to potential participating institutions, authors, and users; some portion of the initial budget should be allocated to this promotion.
Because most of the information in Thesauron is author-or user-generated, the maintenance costs should be largely limited to general system maintenance and gradual expansion of disk storage. It may be desirable to provide for part-time help line assistance.
Paying for Thesauron
It will be necessary to seek a grant to support the initial setup of and publicity for the system. The maintenance and help line costs should be covered by a modest subscription from participating institutions in exchange for the service of receiving and delivering works into and from the depository.
If the payment feature becomes a significant aspect of Thesauron, a portion of the access or copying charges could go to defray maintenance expenses.
Appendix A The Thesauron Project: Annotated Bibliography of On-line Sources
Compiled by Deirdre von Dornum, J.D., Columbia, 1997.
Defining the Thesauron Universe
What Kinds of Works Will the Thesauron Depository Include?
1. See University of Texas Copyright Management Center (www.utsystem.edu/OGC/intellectualproperty/ ) for overview of "Faculty as Authors, Distributors and Users: The Roles of Libraries and Scholarly Presses in the Electronic Environment."
Whose Works Will Be Included in the Thesauron Depository?
1. General information on universities and copyright: Copyright Management Center at email@example.com.
2. For definition of "educator," see Educational Fair Use Guidelines for Digital Images 1.4, available at www.utsystem.edu/OGC/intellectualproperty/imagguid.htm.
3. The WATCH file (Writers And Their Copyright Holders), a database of names and addresses of copyright holders of unpublished works primarily housed in libraries and archives in North America and the United Kingdom: www.lib.utexas.edu/Libs/HRC/WATCH.
4. Hypatia Electronic Library, a directory of research workers in computer science and pure mathematics, and a library of their papers: http://hypatia.dcs.qmw.ac.uk.
Entry of Works
1. Existing depository: CORDS (lcweb.loc.gov/copyright/): U.S. Copyright Office's electronic system for receipt and processing of copyright applications; working with small number of representative copyright owners to establish digital depository.
2. How to assemble depository: ACM has guidelines for electronic submission of works to start a depository (www.acm.com ).
Terms and Conditions
1. Copyright Clearance Center (www.copyright.com ): has templates of rights and pricing schemes; individual publishers fill in specifics.
2. JSTOR (www.jstor.org/about/license.html ): model licensing agreement.
What Users May Access the Thesauron Depository?
1. European Copyright User Platform (accessible via arl.cni.org/scomm/sum .html ) has a grid model for legitimate access considering the following dimensions:
a. type of library: national, university, public, and so on.
b. whether user groups are open (the general public), closed (a specific subset who have a formal relationship with the organization), or registered (individuals who have authorized passwords).
c. types of permissible activities, including digitization and storage, viewing, downloading, copying, exporting, and so on.
2. Project MUSE (Johns Hopkins University Press) (126.96.36.199/proj_descrip/rights.html): allows access through universities only to faculty, students, and staff (access expected to be enforced by subscribing universities).
3. University of Texas system (www.utsyem.edu/OGC/intellectualproperty /l-resele.htm) discusses restriction of electronic distribution of copyrighted materials to enrolled students.
4. Virginia Tech (ei.cs.vt.edu/courses..html) digital library in use for computer science courses.
Conditions on Use
1. Nontechnological means of control
a. University of Texas system (http://www.utsystem.edu/OGC/intellectualproperty/rsrvguid.htm ) suggests: retrieval of works in electronic reserve systems by course number or instructor name, but not by author or title of work.
b. ASCAP (www.ascap.com ): collective on-line licensing for all copyrighted musical works in ASCAP's repertory; does not allow reproduction, copy, or distribution by any means (enforced contractually, not technologically).
2. Technological devices
a. CORDS (lcweb.loc.gov/copyright/): individual digital works will be assigned "handles" that code for access terms and conditions established by rights holders.
b. Ivy League (www.cultech.yorku.ca/projects/docs/ivymain.html ): Canadian consortium of companies, universities, and rights clearance organizations; employs encryption, fingerprinting, tagging, and copy prohibition to enforce limitations on user and use.
c. IMPRIMATUR (www.imprimatur.alcs.co.uk/ ): U.K. consortium in development for copyright managed Internet server; interested in using numbering system and cryptography to limit access.
d. Technology providers (information available through IMPRIMATUR site or www.ncri.com/articles/rights_management/ifrro95.html )
How Would Users Access Material from the Thesauron Depository?
1. Course syllabi/electronic reserve lists
a. For summary of fair use and academic environment, see arl.cni.org/aau/IPi.html#Background. For a computer science-oriented digital library aleady in use with computer science courses at Virginia Tech, see ei.cs.vt.edu/courses.html.
c. links to further info: www.columbia.edu/~rosedale.
2. General search engine
a. The Computation and Language E-print Archive, an electronic archive and distribution server for papers on computational linguistics, natural language processing, speech processing, and related fields, is accessible by a variety of means, such as: title/author search; abstract number; most recent acquisitions; form interface searches. See http://xxx.lanl.gov/cmp-lg/.
Delivery from the Thesauron Depository
1. CORDS (lcweb.loc.gov/copyright/): authenticity initially verified by Copyright Office, and then guaranteed.
2. Clickshare (www.clickshare.com ) operates across the Internet as an authentication and payment facilitator.
Subsequent Generations of Documents Originally Obtained from the Depositor
1. ACM project (www.acm.org ): very concerned about authenticity of documents.
1. Copyright Clearance Center (www.copyright.com ): currently licenses on behalf of over 9,200 publishers, representing hundreds of thousands of authors; collects usage information from meters (appears to be volume and billing rather than specific use) and reports to rights holders.
2. Technological devices: Clickshare (www.clickshare.com ) operates across the Internet as an authentication and payment facilitator; can also provide user profiling and user-access verification services. Publishers maintain their own content on their own Internet server; the Clickshare software enables the provider to track and receive royalties from users who click on content pages; the publishers retain the copyrights.
1. Authors' Registry (www.webcom.com/registry/ ): accounting system for paying royalties to registered authors for electronic media uses.
2. ASCAP (www.ascap.com ): collective on-line licensing for all copyrighted muscial works in ASCAP's repertory; four different rate schedules (on-line service providers select one).
3. Publication Rights Clearinghouse (National Writers' Union) (www.nwu.org/nwu/ ): rights to previously published articles by freelance writers sold to fax-for-fee database. PRC sets royalties and forwards to authors when articles used, minus 20% fee.
This project was developed by Jane C. Ginsburg, Morton L. Jankldow Professor of Literary and Artistic Property Law, Columbia University School of Law, in consultation with James Hoover, Professor of Law and Associate Dean for Library and Computer Services, Columbia University School of Law; Carol Mandel, Deputy University Librarian, Columbia University; David Millman, Manager, Academic Information Systems, Columbia University; and with research assistance from Deirdre von Dornum, Columbia University School of Law, class of 1997.
Technical Standards and Medieval Manuscripts
Medieval manuscripts-that is, handwritten codices produced between the fifth century and the late fifteenth century-are counted among the greatest intellectual treasures of western civilization. Manuscripts are significant to scholars of medieval culture, art historians, calligraphers, musicologists, paleographers, and other researchers for a multiplicity of reasons. They contain what remains of the classical literary corpus; and they chronicle the development of religion, history, law, philosophy, language, and science from the Middle Ages into early modern times.
Even though manuscripts represent the most voluminous surviving artifact from the Middle Ages, the very nature of this resource presents challenges for usage. Each manuscript-as a handwritten document-is a unique creation. As such, copies of a particular work may contain variances that make all copies-wherever they might be-necessary for review by an interested scholar. Also, access to unique manuscripts that are spread across several countries or continents can be both costly and limited. A scholar wishing to consult manuscripts must often travel throughout Europe, the United States, and other countries to find and study manuscripts of interest. Such research is costly and time-consuming. The universities, museums, and libraries that own these manuscripts may lack the space and personnel to accommodate visiting scholars, and in some cases research appointments need to be arranged months in advance. Compounding these difficulties can be the challenge of inconvenient geography. While eminent collections reside in the great capitals of Europe, other collections of scholarly interest are housed in remote sites with no easy access at all. And finally, the uniqueness of each manuscript presents special issues of preservation. Because manuscripts represent finite and nonrenewable resources, librarians concerned with the general wear and tear on manuscripts have begun to restrict access to these codices.
In an effort to preserve medieval manuscripts and to create broader and more
economical access to their contents, many libraries have in recent decades sought to provide filmed copies of their manuscripts to users. This practice has been a long-established one at such institutions as the British Library, the Bibliotheque National, and the Vatican Library. Additionally, some libraries have been established for the specific purpose of microfilming manuscript collections. The Institut de Recherche et d'Histoire des Textes in Paris, for example, for decades has been filming the manuscripts of the provincial libraries in France. Since its founding in 1965, the Hill Monastic Manuscript Library (HMML) at Saint John's University in Minnesota has filmed libraries in Austria, Germany, Switzerland, Spain, Portugal, Malta, and Ethiopia. And at the Vatican Film Library at Saint Louis University, one can find microfilms of 37,000 manuscript codices from the Biblioteca Apostolica Vaticana in Rome. Instead of traveling from country to country and from library to library, researchers may make a single trip to one of these microfilm libraries to consult texts, or in certain circumstances, they may order microfilm copy by mail. Microfilm was a great step forward in providing access to manuscripts, and it still offers tremendous advantages of economy and democratic access to scholars. Still, there are certain limitations because in some situations researchers must visit the microfilm institutions to consult directly, and the purchase of microfilm-even if ordered from a distance-can entail long waits for delivery. And compounding these difficulties can be the inconsistency or inadequacy of existing descriptions of medieval manuscripts.
Access to manuscripts in particular collections is guided by the finding aids that have been developed through the centuries. The medieval shelf list has given way to the modern catalog in most cases, but challenges in locating particular manuscripts and in acquiring consistent information abound. Traditionally, libraries in Europe, the United States, and elsewhere have published manuscript catalogs to describe their handwritten books. These catalogs are themselves scholarly works that combine identification of texts with a description of the codex as a physical object. Although these catalogs are tremendously valuable to scholars, they are not without their shortcomings. With respect to manuscript catalogs, there is presently no agreement within the medieval community on the amount and choice of detail reported, on the amount of scholarly discussion provided, and on the format of presentation. Moreover, to consult these published books in the aggregate requires access to a research library prepared to maintain an increasingly large collection of expensive and specialized books. And beyond that, the production of a modern catalog requires expertise of high caliber and the financial resources that facilitate the work. Because many libraries do not have such resources available, many collections have gone uncataloged or have been cataloged only in an incomplete fashion. The result for the scholar is a paucity of the kind of information that makes manuscript identification and location possible.
Existing and emerging electronic technologies present extraordinary opportunities for overcoming these challenges and underscore the need to create a longterm vision for the Electronic Access to Medieval Manuscripts project. Electronic
access both to manuscript images and to bibliographic information presents remarkable opportunities. For one, the distance between the manuscript and the reader vanishes-providing the opportunity for a researcher anywhere to consult the image of a manuscript in even the remotest location. Second, electronic access obviates the security issues and the preservation concerns that accompany usage. Furthermore, electronic access will permit the scholar to unite the parts of a manuscript that may have been taken apart, scattered, and subsequently housed at different sites. It also allows for image enhancement and manipulation that conventional reproductions simply do not make available. Electronic access will also make possible comprehensive searches of catalog records, research information, texts, and tools-with profound implications in terms of cost to the researcher and a more democratic availability of materials to a wider public.
One may imagine a research scenario that contrasts sharply with the conventional methods that have been the mainstay of manuscript researchers. Using a personal computer in an office, home, educational institution, or library, scholars will be able to log on to a bibliographic utility (i.e., RLIN or OCLC) or to an SGML database on the World Wide Web and browse catalog records from the major manuscript collections around the world. To make this vision a reality requires adherence to standards, however-content standards to ensure that records include the information that scholars need and encoding standards to ensure that that information will be widely accessible both now and in the future.
This point may be demonstrated by considering several computer cataloging projects developed since the mid-1980s. These efforts include the Benjamin Catalogue for the History of Science, the International Computer Catalog of Medieval Scientific Manuscripts in Munich, the Zentralinventar Mittelalterlicher Handschriften (ZIH) at the Deutsche Staatsbibliothek in Berlin, MEDIUM at the Institut de Recherche et d'Histoire des Textes in Paris, and PhiloBiblon at the University of California, Berkeley. The Hill Monastic Manuscript Library has also embarked on several electronic projects to increase and enhance scholarly access to its manuscript resources. In 1985, Thomas Amos, then Cataloger of Western Manuscripts at HMML, began development of the Computer Assisted Cataloging Project, a relational database that he used to catalog manuscripts from Portuguese libraries filmed by HMML.
These electronic databases as well as others from manuscript institutions around the world represent an enormous advancement in scholarly communication in the field of manuscript studies. As in the case of printed catalogs and finding aids, however, these data management systems fall short of the ideal on several counts. First, each is a local system that must be consulted on-site or purchased independently. Second, the development and maintenance of these various databases involve duplication of time, money, and human resources. All rely on locally developed or proprietary software, which has posed problems for the long-term maintenance and accessibility of the information. Finally, and probably most important, each system contains its own unique set of data elements and
rules and procedures for data entry and retrieval. When each of these projects was begun, its founders decided independently what information to record about a manuscript, how to encode it, and how to retrieve it. Each of the databases adopted a different solution to the basic problems of description and indexing, and the projects differed from each other with regard to completeness of the data entered and the modes in which it could be retrieved.
The lessons to be drawn from these experiences are clear and enunciate the hazards for the future if approaches distinctively different from the ones now being used are not pursued. First of all, local institutions could not maintain locally developed software and systems. In the instances of projects that chose to rely on proprietary software, it became apparent that the latter was dependent on support from the manufacturer, whose own longevity in business could not be guaranteed or who could easily abandon such software programs when advances provided new opportunities. Furthermore, experience has demonstrated that such material is not always easily translated into other formats, and if modified, it poses the same problems of maintenance as locally developed software. Beyond that, different projects made substantially different decisions about record content, and those decisions were sometimes influenced by the software that was available. This lack of consistency made it difficult to disseminate the information gathered by each project, and for their part funding agencies were reluctant to continue their support for such limited projects. All of which reiterates the fundamental need for content standards to ensure that records include the information that scholars need and encoding standards to ensure the wide accessibility of that information both now and into the future. It is the objective of Electronic Access to Medieval Manuscripts to address these issues.
Electronic Access to Medieval Manuscripts is sponsored by the Hill Monastic Manuscript Library, Saint John's University, Collegeville, Minnesota, in association with the Vatican Film Library, Saint Louis University, and has been funded by a grant from The Andrew W. Mellon Foundation. It is a three-year project to develop guidelines for cataloging medieval and renaissance manuscripts in electronic form. For this purpose it has assembled an international team of experts in manuscript studies and library and information science that will examine the best current manuscript cataloging practice in order to identify the information appropriate to describing and indexing manuscripts on two levels, core and detailed. Core-level descriptions, which will contain the basic or minimum elements required for the identification of a manuscript, will be useful for describing manuscripts that have not yet been fully cataloged, and may also be used to give access to detailed descriptions or to identify the sources of digital images or other information extracted from manuscripts. Guidelines for detailed or full descriptions will be designed to accommodate the kinds of information found in full scholarly manuscript cataloging.
In addition to suggesting guidelines for content, Electronic Access to Medieval Manuscripts will also develop standards for encoding both core-level and detailed
manuscript descriptions in both MARC and SGML. The MARC (Machine-Readable Cataloging) format underlies most electronic library catalogs in North America and the United Kingdom, and it is used also as a vehicle for international exchange of bibliographic information. MARC bibliographic records are widely accessible through local and national databases, and libraries with MARC-based cataloging systems can be expected to maintain them for the foreseeable future. SGML (Standard Generalized Markup Language) is a platform-independent and extremely flexible way of encoding electronic texts for transmission and indexing. It supports the linking of texts and images, and SGML-encoded descriptions are easily converted to HTML for display on the World Wide Web. In developing standards for SGML encoding of manuscript descriptions, Electronic Access to Medieval Manuscripts will work closely with the Digital Scriptorium, a project sponsored jointly by the Bancroft Library at the University of California, Berkeley, and the Butler Library at Columbia University.
The project working group for Electronic Access to Medieval Manuscripts consists of representatives from a number of North American and European institutions. Drafts produced by the working group will be advertised and circulated to the international community of manuscript scholars for review and suggestions. The cataloging and encoding guidelines that result from the work of the project will be made freely available to any institution that wishes to use them.
For the purposes of Electronic Access to Medieval Manuscripts, the standards for cataloging medieval manuscripts are crucial, but so too is the application of content standards to the two encoding standards whose existence and ubiquitous usage address the issues noted earlier. At the risk of stating the obvious, Electronic Access to Medieval Manuscripts has chosen to work with two existing and widely used encoding standards because it is unwise for medievalists to reinvent the wheel and waste resources on solutions that are temporary and that will require added resources to take them into future applications.
With regard to encoding standards, the universal acceptance of MARC and the accessibility of MARC records on-line make it a particularly attractive option. But other compelling reasons make MARC an excellent choice. First, most libraries already have access to a bibliographic utility (such as OCLC and RLIN) that utilizes MARC-based records, and these institutions have invested considerable resources in creating catalog records for their printed books and other collections. Second, since most catalog records for printed books and reference materials are already in MARC-based systems, placing manuscript records in the same system makes good sense from the standpoint of proximity and one-stop searching. Third, by using MARC, local libraries need not develop or maintain their own database systems. Finally, although it may be unrealistic to expect that all manuscript catalog records will one day reside in a single database, therefore allowing for a universal search of manuscript records, it is likely that a majority of manuscript institutions in the United States will be willing to place their manuscript records in this bibliographic utility rather than in other existing environments.
Thus the value of selecting MARC as an encoding standard seems clear. MARC systems exist; they are widely accessible; they are supported by other, broader interests; and enough bibliographic data already exists in MARC to guarantee its maintenance or its automatic transfer to any future platform. A significant number of records for medieval manuscripts or microfilms of them, prepared and entered by the various institutions that hold these items, already exist in USMARC (RLIN and OCLC databases). Regrettably, there is generally little consistency in description, indexing, or retrieval for these records, which points back to the need for content standards as well as encoding standards. Furthermore, MARC as it currently exists has limits in its abilities to describe medieval manuscripts (e.g., it does not provide for the inclusion of incipits), but nonetheless it offers possibilities for short records that point to broader sets of data in other contexts. Still, MARC, with its records in existing bibliographic databases, is particularly advantageous for small institutions with few manuscript holdings, and it remains for them perhaps the most promising vehicle for disseminating information about their collections.
The second viable encoding option, particularly in light of the recent success of the Archival Finding Aid Project at the University of California, Berkeley, is the use of SGML. As a universal standard for encoding text, SGML can be used to encode and index catalog records and other data including text, graphics, images, and multimedia objects such as video and sound. A more flexible tool than MARC, SGML is more easily adapted to complex hierarchical structures such as traditional descriptions of medieval manuscripts, and it offers broad possibilities for encoding and indexing existing, as well as new, manuscript catalogs. As an encoding scheme, SGML demonstrates its value as a nonproprietary standard. In many respects it is much more flexible than MARC or any established database program, and it is possible to write a Document Type Definition (DTD) that takes into account the particular characteristics of any class of document. SGML offers the further advantage that encoded descriptions can be linked directly to digital images, sound clips (e.g., for musical performances), or other bodies of digital information relating to a manuscript. Numerous initiatives using SGML suggest great promise for the future. The experience of the American archival profession with the Encoded Archival Description (EAD) suggests that the latter can be a good approach to encoding manuscript descriptions, which have many structural analogies to archival finding aids. The Canterbury Tales project, based at Oxford, has demonstrated that SGML, based on a Text Encoding Initiative (TEI) format, can be used successfully to give sophisticated access to images of manuscripts, text transcriptions, and related materials. In addition, several English libraries have already experimented with SGML DTDs, mostly TEI-conformational, for manuscripts. And finally, MASTER, an Oxford-based group, is interested in developing a standard DTD for catalog descriptions of medieval manuscripts, and it and Electronic Access to Medieval Manuscripts have begun to coordinate their efforts toward achieving this common goal.
The emerging interconnectivity of MARC and SGML presents tremendous opportunities for Electronic Access to Medieval Manuscripts. Currently there is work on a DTD for the MARC format that will allow automatic conversion of MARC-encoded records into SGML. Recently, a new field (856) was added to the MARC record that will accommodate Web addresses. Implementation of this field will allow researchers seeking access to a cataloging record in a bibliographic utility to read the URL (Uniform Resource Locator) and then enter the address into a Web browser and link directly to a Web site containing a detailed manuscript record or other scholarly information. In the future, researchers who enter the bibliographic utility through a Web browser will find field 856 to be an active hypertext link. Electronic Access to Medieval Manuscripts envisions an environment in which institutions can enter their manuscript catalog records into MARC, display them in a bibliographic utility to maximize economy and access, and then embed a hypertext link to a more detailed catalog record, an image file, or scholarly information on an SGML server.
The cumulative experience of recent years has shaped the development and goals of Electronic Access to Medieval Manuscripts. Concerned with arriving at standards for cataloging manuscripts in an electronic environment, the project seeks to provide standards for both core-level and full, or detailed, manuscript records that will serve the expectations and needs of scholars who seek consistent information from one library to another; at the same time, these standards will afford flexibility to those catalogers and libraries wishing to provide various levels of information about their individual manuscripts. In structuring its program and goals, Electronic Access to Medieval Manuscripts also has sought to arrive at guidelines for encoding into MARC and SGML formats that will provide useful, economic, and practical long-term alternatives to the libraries that select one of these options in the future.
A Unifying or Distributing Force?
There are several future trends that everyone seems to agree upon. They include
• widespread availability of computers for all college and university students and faculty
• general substitution of electronic for paper information
• library purchase of access to scholarly publications rather than physical copies of them
Early steps in these directions have been followed by many libraries. Much of this movement has taken the form of digitization. Unfortunately some of the digitized material is not used as much as we would like. This lack of interest may reflect the choice of the material to convert; realistically, nineteenth-century books that have never been reprinted or microfilmed may have been obscure for good reasons and will not be used much in the future. But some more general problems with the style of much electronic library material suggest that the difficulties may be more pervasive.
The primary means today whereby people gain access to electronic material is over the World Wide Web. The growth of the Web is amply documented at http://www.cyberatlas.com and similar sites. Predictions for the number of Web users worldwide in the year 2000 run up to 1 billion (Negroponte 1996); students have the highest Web usage of any demographic group, with about 40% of them in 1996 showing medium or high Web usage; and people have been predicting the end of paper libraries since at least 1964 (Samuel 1964). Web surfing appears to be
substituting for TV viewing and CD-ROM purchasing, taking its share of approximately 7 hours per day that the average American spends dealing with media of all forms. Advertisers are lining up to investigate Web users and find the best way to send product messages to them (Novak and Hoffman 1996). Figure 21.1 shows the growth of Web hosts just in the last few years.
On-Line Journals and the Web
Following the move of information to digital form, there have been many experiments with on-line journals. Among the best known projects of this sort are the TULIP project of Elsevier (Borghuis 1996) and the CORE project of Cornell, the American Chemical Society, Bellcore, Chemical Abstracts, and OCLC. These projects achieved more or less usage, but none of them approached the degree of epidemic success shown by the Web. The CORE project, for example, logged 87,000 sessions of 75 users, but when we ended access to primary chemical journals at Cornell, nobody stormed the library demanding the restoration of service. Imagine what would happen if the Cornell administration were to cut access to the Web.
In the CORE project (see Entlich 1996), the majority of the usage was from the Chemistry and Materials Science departments. They provided 70% of active users and 86% of all sessions with the journals. Various other departments at Cornell use chemical information (Food Sciences, Chemical Engineering, etc.) but make less use of the on-line journals. Apparently the overhead of starting to use the system and learning its use discouraged those who did not have a primary interest in it. Many of the users printed out articles rather than read them on-line. About one article was printed for every four viewed, and people tended to print an article rather than flip through the bitmap images. People accessed articles through both browsing and searching, but they read the same kinds of articles they would have read otherwise; they did not change their reading habits.
Some years ago the CORE project had compared the ability of people to read bitmaps versus reformatted text and found that people could read screen bitmaps just as fast as new text (Egan 1991). Yet in the actual use of the journals, the readers did not seem to like the page images. The Scepter interface provided a choice of page image or text format, and readers only looked at about one image page in every four articles. "This suggests that despite assertions by some chemists in early interviews that they particularly liked the layout of ACS journal pages, for viewing on-line they prefer reformatted text to images of those pages, even though they can read either at the same speed. The Web-like style is preferred for on-line viewing."
Perhaps it is not surprising that the Web is more popular than scientific journals. After all, Analytical Chemistry has never had the circulation among undergraduates of Time or Playboy. But the Web is not being used only to find out sports scores or other nonscholarly activities (30% of all Alta Vista queries are about sex;
Weiderhold 1997). The Web is routinely used by students to access all kinds of information needed in classroom work or for research. When I taught a course at Columbia, the students complained about reading that was assigned on paper, much preferring the reading that was available on the Web. The Web is preferred not just because it has recreational content but also because it is a way of getting scholarly material.
The convenience of the Web is obvious. If I need a chart or quote from a Mel-
lon Foundation report, I can bring it up in a few tens of seconds at most on my workstation. If I need to find it on paper and it isn't in my office, I'm faced with a delay of a few minutes (to visit the Bellcore library) and probably a few weeks (because, like most libraries, they are cutting back on acquisitions and will have to borrow it from somewhere else). The Web is so convenient that I frequently use it even to read publications that I do have in my office.
Web use is greeted so enthusiastically that volunteers have been typing in (or scanning) out-of-copyright literature on a large scale, as for example for Project Gutenberg. Figure 21.2 shows the number of books added to the Project Gutenberg archive each year in the 1990s; by comparison, in the entire 1980s, only two books were entered.
By comparison, some of the electronic journal trials seem disappointing. Some of the reasons that digital library experiments have been less successful than they might have been involve the details of access. Whereas Web browsers are by now effectively universal on campuses, the specific software needed for the CORE project, as an example, was somewhat of a pain for users to install and use. Many of the electronic library projects involve scanned images, which are difficult to manipulate on small screens, and they have rarely involved material that was designed for the kind of use common on computer systems. By contrast, most HTML material is written with the knowledge of the format in which it will be read and is adapted to that style. I note anecdotal complaints even that Acrobat documents as not as easy to read as normal Web pages.
Web pages in particular may have illustrations in color, and even animations, beyond the practical ability of any conventional publisher. Only one in a thousand pages of a chemical journal, for example, is likely to have a color illustration. Yet most popular Web pages have color (although the blinking colored ad banners might be thought to detract rather than help Web users). Also, Web pages need not be written to the traditional standards of publishing; the transparencies that represent the talk associated with a scholarly paper may be easier to read than the paper itself.
Such arguments suggest that the issue with the popularity of the Web compared with digital library experiments is not just content or convenience but also style. In the same way that Scientific American is easier to read than traditional professional journals, Web pages can be designed to be easier for students to read than the textbooks they buy now. Reasons might include the way material is broken into fairly short units, each of which is easy to grasp; the informal style; the power of easy cross-referencing, so that details need not be repeated; the extreme personality shown by some Web pages; and the use of illustrations as mentioned before. Perhaps some of these techniques, well known to professional writers, could be encouraged by universities for research writing.
The attractiveness of the newer Web material also suggests that older material will become less and less read. In the same way that vinyl records have suddenly become very old or that TV stations refuse to show black-and-white movies, libraries may find that the nineteenth-century material in many libraries disappears
from the view of the students. Mere scanning to produce bitmaps results in material that cannot be searched and does not look like newly written text; scanning may produce material that, although more accessible than the old volumes, is still not as welcome to students as new material. How much conversion of the older bitmaps can be justified? Of course, many vinyl recordings are reissued on CD and some movies are colorized, but libraries are unlikely to have resources to do much updating. How can we present the past in a way that students will use? Perhaps the present will become a golden age for scholars because nearly the entire world supply of reference books will have to be rewritten for HTML.
Risks of the Web
Of course, access to Web pages typically does not involve the academic library or bookstore at all. What does this fact mean for the future of access to information at a university? There are threats to various traditional values of the academic system.
• Shared experience. Santayana wrote that it didn't matter what books students read as long as they all read the same thing. Will the great scattering of ma-
terial on the Web mean that few undergraduates will be able to find somebody else who has been through the same courses reading the same books? When I was an undergraduate I had a friend who would look at people's bookshelves and recite the courses they had taken. This activity will become impossible.
• Diversity. Since we can always fear two contradictory dangers, perhaps the ease of getting a few well-promoted Web sites will mean that fewer sources are read. If nobody wants to waste time on a Web site that does not have cartoons, fancy color pictures, and animation, then only a few well-funded organizations will be able to put up Web sites that get an audience. Again, the United States publishes about 50,000 books each year, but produces less than 500 movies. Will the switch to the Web increase or decrease the variety of materials read at a campus?
• Quality. Much of the material on the Web is junk; Gene Spafford refers to Usenet as a herd of elephants with diarrhea. Are students going to come to rely on this junk as real? Would we stop believing that slavery or the Holocaust really happened if enough followers of revisionist history put up a predominance of Web pages claiming the reverse?
• Loyalty. It has already been a problem for universities that the typical faculty member in surface effect physics, for example, views as colleagues other experts in surface effect physics around the world rather than the other members of the same physics department. Will the Web create this disloyalty in undergraduates as well? Will University of Michigan undergraduates read Web pages from Ohio State? Can the Midwest survive that?
• Equality of access. Will the need for computers to find information produce barriers for people who lack money, good eyesight, or some kinds of interfaceusing skills? Universities want to be sure that all students can use whatever information delivery techniques are offered; is the Web acceptable to at least as wide a span of students as the traditional library is?
• Recognition. Traditionally, faculty obtain recognition and status from publishing in prestigious journals. High-energy physicists used to get their latest information from Physical Review Letter; today they rely on Ginsparg's preprint bulletin board at Los Alamos National Laboratory. Since this Web site is not refereed, how do people select what to read? Typically, they choose papers by authors they have heard of. So the effect of the switch to electronic publishing is that it is now harder for a new physicist to attract attention.
A broader view of threats posed by electronics to the university, not just those threats arising from digital library technology, has been presented by Eli Noam (1995). Noam worries more about videotapes and remote teaching via television and about the possibility that commercial institutions might attempt to supplant universities by offering cheap education based entirely on electronic technologies.
Should these institutions succeed in attracting enough customers to force traditional universities to lower tuition costs, the financial structure of present-day higher education would be destroyed. Noam recommended that universities emphasize personal mentoring and one-to-one instruction to take the greatest advantage of physical presence.
Similarly, Van Alstyne and Brynjolfsson (1996) have warned of balkanization caused by the preference of individuals to select specialized contacts. They point to past triumphs involving cross-field work, such as the history of Watson and Crick, trained in physics and zoology respectively. In their view, search engines can be too effective, since letting people read only exactly what they were looking for may encourage overspecialization.
As an example of the tendency toward seeking collaborators away from one's base institution, Figure 21.3 shows the tendency of multiauthored papers to come from more than one institution. The figures were compiled by taking the first issue each year from the SIAM Journal of Control and Optimization (originally named SIAM Journal of Control ) and counting the fraction of multiauthored papers in which all the authors came from one institution. The results were averaged over each decade. Note the drop in the 1990S. There has also, of course, been an increase in the total number of multiauthored papers (in 1965 the first issue had 14 papers and every paper had only one author; the first issue in 1996 had 17 papers and only two were single-authored). But few of the multiple-authored papers today came from only one research institution.
Of course, there are advantages to the new technology as well, not just threats. And it is clear that the presence of the Web is coming, whatever universities do-this is the first full paper I have written directly in HTML rather than prepared for a typesetting language. Much of the expansiveness of the Web is all to the good; for many purposes, access to random undergraduate opinions, and certainly to their fact gathering, may well be preferable to ignorance. It is hard to imagine students or faculty giving up the speed with which information can be accessed from their desktops any more than we would give up cars because it is healthier to walk or ecologically more desirable to ride trains. How, then, can we ameliorate or prevent the possible dangers elaborated before?
Bellcore, like many corporations, has a formal policy for papers published under its name. These papers must be reviewed by management and others, reducing the chance that something sufficiently erroneous to be embarrassing or something that poses a legal risk to the corporation will appear. Many organizations do not yet have any equally organized policy for managing their Web pages (Bellcore does have such a policy that deals with an overlapping set of concerns). Should universities have rules about what can appear on their Web pages? Should such rules distinguish between what goes out on personal versus organizational pages?
Should the presence of a page on a Harvard Web page connote any particular sign of quality, similar to the appearance of a book under the Harvard University Press imprint? Perhaps a university should have an approved set of pages, providing some assurance of basic correctness, decency of content, and freedom from viruses; then people wishing to search for serious content might restrict their searches to these areas.
The creation of a university Web site as the modern version of a university press or a journal offers a sudden switch from publishers back to the universities as the providers of information. Could a refereed, high-prestige section of a university Web site attract the publication that now goes to journals? Such a Web site would provide a way for students to find quality material and would build institutional loyalty and shared activities among the members of the university community. Perhaps the easiest way of accomplishing this change would be to make tenure depend on contributions to the university Web site instead of contributions to journals.
The community could even be extended beyond the faculty. Undergraduate papers could be placed on a university Web site; one can easily imagine different parts of the site for different genres ranging from the research monograph to the quip of the day. This innovation would let all students participate and get recognition; but some quality control would need to be imposed, and presence on the Web site would need to be recognized as an honor.
In addition to supporting better quality, a university Web site devoted to course reading could make sure that a diversity of views is supported. On-line reading lists, just like paper reading lists, can be compiled to avoid the problem of everyone relying on the same few sites. This diversity would be fostered if many of the
search engines were to start making money by charging people to be listed higher in the list of matches (a recurrent rumor, but perhaps an urban legend). Such an action would also push students to look at sites that perhaps lack fancy graphics and animation.
Excessive reliance on a university Web site could produce too much inbreeding. If university Web sites replace the publications that now provide general prestige, will it be possible for a professor at a less prestigious university to put an article on, say, the Harvard or Stanford Web site? If not, how will anyone ever advance? I do not perceive that this problem will occur soon; the reverse (a total lack of organizational identification) is more likely.
Web sites of this sort would probably not include anonymous contributions. The Net is somewhat overrun right now with untraceable postings that often contain annoying or inflammatory material ranging from the merely boring commercial advertising to the deliberately outrageous political posting. Having Web sites that did not allow this kind of material might help to civilize the Net and make it more productive.
Some professors already provide Web reading lists that correspond to the traditional lists of paper material. The average Columbia course, for example, has 3,000 pages of paper reading (with an occasional additional audiotape in language courses). The lack of quality on the Web means that faculty must provide guidance to undergraduates about what to read there.
More important, it will be necessary for faculty to teach the skill of looking purely at the text of a document and making a judgment as to its credibility. Much of our ability to evaluate a paper document is based on the credibility of the publisher. On the Web, students will have to judge by principles like those of paleography. What do we know, if anything, about the source? Is there a motive for deception? How does the wording of the document read-credibly or excessively emotionally? Do facts that we can check elsewhere agree with those other sources?
The library will also gain a new role. Universities should provide a training service for how to search the Web, and the library is the logical place to provide that training. This logic is partly because librarians are trained in search systems, which are rarely studied formally by any other groups. In addition, librarians will need to keep the old information sources until most students are converted, which will take a while.
The art of learning to retrieve information may also bring students together. I once asked a Columbia librarian whether the advent of computers and networks in the dormitory rooms was creating a generation of introverted nerds lacking social skills. She replied that the reverse was true. In the days of card catalogs, students were rarely seen together; each person searched the cards alone. Now, she said, she frequently sees groups of two or three students at the OPAC terminals,
one explaining to the others how to do something. Oh, I said, so you're improving the students' social skills by providing poor human interface software. Not intentionally, she replied. Even with good software, however, there is still a place for students helping each other find information, and universities can try to encourage this interaction.
Much has been written about the information rich versus the information poor and the fear that once information will need to be obtained via machines that cost several thousand dollars, poor people will be placed at a still greater disadvantage in society than they are today. In the university context, money may not be the key issue, since many university libraries provide computers for general use. However, some people face nonfinancial barriers to the use of electronic systems. These barriers may include limited eyesight or hearing (which of course also affect the use of conventional libraries). More important, perhaps, is the difficulty that some users may have with some kinds of interface design. These difficulties range from relatively straightforward issues such as color blindness to complex perceptual issues involving different kinds of interfaces and their demands on different individuals. So far, we do not know whether some users will have trouble with whatever becomes the standard information interface; in fact, we do not know whether some university students in the past had particular difficulties learning card catalogs.
The library may also be a good place to teach aspects of collaboration and sharing that will grow out of researching references, as hyperlinking replaces traditional citation. Students are going to use the Web to cooperate in writing papers as well as in finding information for them. The ease of including (or pointing to) the work of others is likely to greatly expand the extent to which student work becomes collaborative. Learning how to do collaborative work effectively and fairly is an important skill that students can acquire. In particular, the desire to make attractive multimedia works, which may need expertise in writing, drawing, and perhaps even composing music, will drive us to encourage cooperative work.
Students could also be encouraged to help organize all the information on the local Web site. Why should a student create a Web page that prefers local resources? Perhaps because the student receives some kind of academic credit for doing so. University Web sites, to remain useful, will require constant maintenance and updating. Who is going to do that? Realistically, students.
Applets that implement animation, interactive games, and many other new kinds of presentation modes are proliferating on the Web. The flowering of creativity in these presentations should be encouraged. In the early days of movies and television, the amount of equipment involved was beyond the resources of amateurs, and universities did not play a major role in the development of these technologies. By contrast, universities are important in American theater and classical
music. The Web is also an area in which equipment is not a limitation, and universities have a chance to play a role.
This innovation represents a chance for the university art and music departments to join forces with the library. Just as the traditional tasks of preparing reading lists and scholarly articles can move onto a university Web site, so can the new media. The advantage of collaborating with the library is that we can actually save the beginnings of a new form of creativity. We lack the first e-mail message; nobody understood that it was worth saving. Much of early film (perhaps half the movies made before 1950) no longer survives. The 1950S television entertainment is mostly gone for lack of recording devices. In an earlier age, the Elizabethans did not place a high value on saving their dramatic works; of the plays performed by the Admiral's Men (a competitor to Shakespeare's company), we have only 10% or 15% today. We have a chance not to make the same mistake with innovative Web page designs, providing that such pages are supported in some organized way rather than on computers in individual student dorm rooms.
Recognizing software as a type of scholarship is a change for the academic community. The National Science Foundation tends to say, "we don't pay for software, we pay for knowledge," drawing a sharp distinction between the two. Even computer science departments have said that they do not award a Ph.D. for writing a program. The new kinds of creativity will need a new kind of university recognition. Will we have honorary Web pages instead of honorary degrees? We need undergraduate course credit and tenure consideration for Web pages.
Software and data are new kinds of intellectual output that are not considered creative. Traditionally, for example, the design of a map was considered copyrightable; the data on the map, although representing more of the work, were not considered part of the design and were not protected. In the new university publishing model, data should be a first-class item whose accumulation and collection is valuable and leads to reward.
A switch toward honoring a Web page rather than a paper does have consequences for style, as discussed above. Web pages also have no size constraints; in principle, there is no reason why a gigabyte could not be published by an undergraduate. Universities will need to develop both tools and rules for summarizing and accessing very large items as needed.
Academic institutions, in order to preserve access to quality information while also preserving some sense of community in a university, should take a more active view of their Web sites. By using the Web as a reward and as a way of building links between people, universities could serve a social purpose as well as an information purpose. The ample space and low cost of Web publishing provide a way to extend the intellectual community of a university and to make it more inclusive. Web publishing may encourage students and faculty to work together, maintain-
ing a local bonding of the students. The goal is to use university Web publishing, information searching mechanisms, and rewards for new kinds of creativity to build a new kind of university community.
Borghuis, M., H. Brinckman, A. Fischer, K. Hunter, E. van der Loo, R. Mors, P. Mostert, and J. Zilstra. 1996. TULIP Final Report. (New York: Elsevier Science Publishers, 1996, ISBN 0-444-82540-1). See http://www1.elsevier.nl/homepage/about/resproj/trmenu.htm on the Web.
Egan, D., M. Lesk, R. D. Ketchum, C. C. Lochbaum, J. Remde, M. Littman, and T. K. Landauer. 1991. "Hypertext for the Electronic Library: CORE Sample Results." Proc. Hypertext 91, 229-312. San Antonio, Texas, 15-18 December.
Entlich, R., L. Garson, M. Lesk, L. Normore, J. Olsen and S. Weibel. 1996. "Testing a Digital Library: User Response to the CORE Project." Library Hi Tech 14, no. 4: 99-118.
Negroponte. 1996. "Caught Browsing Again." Wired, issue 4.05 (May). See http://www.hotwired.com/wired/4.05/negroponte.html on the Web.
Noam, Eli M. 1995. "Electronics and the Dim Future of the University." Science 270, no. 5234 (13 October): 247.
Novak, T. P., and D. L. Hoffman. 1996. "New Metrics for New Media: Toward the Development of Web Measurement Standards." Project 2000 White Paper, available on the Web at http://www2000.ogsm.vanderbilt.edu.
Samuel, A. L. 1964. "The Banishment of Paperwork." New Scientist 21, no. 380 (27 February): 529-530.
Van Alstyne, Marshall, and Erik Brynjolfsson. 1996. "Could the Internet Balkanize Science?" Science 274, no. 5292 (29 November): 1479-1480.
Wiederhold, Gio.1997. Private communication.
Digital Documents and the Future of the Academic Community
Today the academic community is the subject of an experiment in technological innovation. That experiment is the introduction of digital documents as a new currency for scholarly communication, an innovation that some think will replace the system of print publication that has evolved over the past century. What will be the long-term consequences of this innovation for the conduct of research and teaching, for the library and the campus as places, and ultimately for our sense of academic community?
This work on technology and scholarly communication has been focused upon one dimension of this process of innovation: the economics of scholarly publishing. The central focus has been to understand the formation and dynamics of new markets for digital information: will digital modes of publication be more cost-effective than print modes, both for publishers and for libraries? We must learn how readers use on-line journals and how the journal format itself will evolve in a digital medium: will digital publications change the form and content of scholarly ideas? Together these papers investigate the emerging outline of a new marketplace for ideas, one that will probably be reshaped by new kinds of intellectual property law, but that will certainly include new kinds of pricing, new products, and new ways of using information. These questions are important economically, yet if we knew the answers, would we know enough to understand the impact of technological innovation upon the academic community for which the system of scholarly publication serves as an infrastructure?
One reason this question must be asked is the debate about what economists call "the productivity paradox." This is the observation that the introduction of information technology into the office has not increased the productivity of knowledge workers thus far, unlike the productivity gains that technology has brought to the process of industrial production. Yet Peter Drucker has described
the productivity of knowledge workers as the key management problem of the twenty-first century. And more recently, Walter Wriston has described information as a new kind of capital that will be the key to wealth in the economy of the future, saying: "The pursuit of wealth is now largely the pursuit of information, and the application of information to the means of production." Why, then, does it seem that information technology has not increased the productivity of knowledge workers?
Erik Brynjolfsson has defined three dimensions within which an explanation of the productivity paradox might be found: (I) perhaps this paradox is a measurement problem, since the outcomes of work mediated by information technology may not fit traditional categories and are perhaps difficult to measure with traditional methodologies; (2) the productivity paradox might be a consequence of the introduction of very different cultures of work based on new kinds of incentives and skills; hence productivity gains may require redesign and reorganization of work processes previously based on printed records; (3) perhaps information technology creates new kinds of economic value (such as variety, timeliness, and customized service), which change the very nature of the enterprise by introducing new dimensions and qualities of service.
Although the analysis of the impact of information technology on scholarly communication has only indirectly been a discussion about productivity thus far, it should be understood that the question of productivity raises issues like changes in academic culture and the organization of academic work, not just about its efficiency. For the purposes of this discussion, however, what is of immediate interest to me as a political theorist and university librarian is the way the productivity paradox frames the possible dimensions of the dynamics of technological innovation, thereby setting a research agenda for the future. How might our understanding of the outcomes or impact of research, teaching, and learning change? How might the incentives for academic work evolve, and would the organization of the process of research and teaching change? Will new kinds of value be introduced into academic work, changing its cultures, and will traditional kinds of value be lost? This broader research agenda provides a context for discussing the price, supply, and demand for digital publications.
In sum, how might the substance and organization of academic work change as information technology changes the infrastructure of scholarly communication? To borrow a term from a very different economic tradition, the question of the social impact of information technology concerns the mode of production, that is, the complex of social relationships and institutions within which academic work is organized, within which the products of academic work are created and consumed, and within which cultural valuation is given to academic work. In the course of this exploration of the changing modes of production that govern knowledge work, it will be necessary to think seriously about whether printed knowledge and digital information are used in the same way, if we are to understand the nature of demand; about the new economic roles of knowledge, if we
are to understand issues of price and supply; and about how the management of knowledge might be a strategy for increasing the productivity of knowledge workers, if we are to understand the future of institutions like the research library and disciplinary guilds.
The System of Scholarly Communication
The idea that there is a system of scholarly communication was popularized by the American Council of Learned Societies newsletter Scholarly Communication, which began by publishing a survey on the impact of personal computers on humanities research in 1985. "Scholarly communication" is a term invented to frame both print publication and digital communication within a single functional schema, tacitly asserting a continuity between them. It is this continuity that is in question, not least because the term "scholarly communication" encompasses the very research processes that are obviously being transformed by information technology, resulting in the creation of new kinds of information products and services that were not part of the scholarly publishing marketplace in the print era. These products and services include, for example, patents on methodological procedures and genetic information; software for gathering, visualizing and analyzing data; information services, such as document delivery and databases; network services; electronic mail, mailing lists, and Web pages; and electronic journals and CD-ROMs.
Today each of the institutional parts of the system of scholarly communication built over the past 50 years-research universities, publishing, and libraries-is changing, and it is unlikely that a new equilibrium will resemble the old. This system is unusual, perhaps, in that different participants perceive it from very different, perhaps contradictory, perspectives. From the perspective of the academic community, both the production and consumption of scholarly information are governed by a culture of gift exchange: production by the faculty as members of scholarly guilds, and consumption by free access to information in libraries. In gift exchange cultures, information is exchanged primarily (although not necessarily exclusively) in order to create and sustain a sense of community greater than the fragmenting force of specialization and markets. From the perspective of publishing, however, production for the academic marketplace is centered on the faculty as authors, a relationship governed by contract, and consumption is centered on academic research libraries, governed by copyright law in print publishing and by contract law in digital publishing. It is this variation in perspective, perhaps, that leads each side to hope that digital documents will replace printed journals without changing other aspects of the system of scholarly communication.
Gift and market exchange are symbiotic, not opposites. If scholarly publishing is governed by the rules of market exchange, it must manage the boundaries between two gift cultures, that within which knowledge is created and that within which knowledge is consumed. The crisis of scholarly communication has made
these boundaries very difficult to manage, as ideas from the university are turned into intellectual property, then sold back to the university to be used as a common good in the library.
Why the crisis in boundary management? The immediate crisis that has destabilized the system is the problem of sharply increasing costs for scholarly journals in science, technology, and medicine. The causes of the crisis are varied, but they begin with the commercialization of scholarly publishing, a dramatic shift from nonprofit to for-profit publishing since the 1950s, that created the hybrid gift/market system. In turn, the historic growth in the amount of scientific, technical, and medical information, driven by federal funding, has increased costs, particularly as specialization has created journals with very small markets. And the waning of a sense of the legitimacy of library collection costs within the university has allowed the rate of growth of collection budgets to fall far below the rate of price increases. Even with cost/price increases, the academic gift economy still subsidizes the market economy, for faculty give away their intellectual property to the publishers, yet remarkably, those who subsidize research do not yet make a property claim on the research they support. Subsidies include, for example, the federal funding of research, institutional subsidies, and the voluntary labor of faculty in providing editorial services to publishers.
This system evolved at the turn of the twentieth century as a subsidy for nonprofit university presses and disciplinary society publishers in order to circulate scholarly information and build a national intellectual infrastructure. Since 1950, however, federal research funding and commercial publishing have reshaped the system, creating the hybrid market-gift exchange system with many unrecognized cross subsidies.
Higher education is both the producer and consumer of scholarly publications. As creators of scholarship, faculty are motivated by nonmarket incentives, primarily promotion and tenure; yet at the same time, faculty see themselves as independent entrepreneurs, managing a professional career through self-governed disciplinary guilds that cross all educational institutions. This guildlike structure is a deliberate anachronism, perhaps, but one that sustains a sense of professional identity through moral as well as material rewards.
Scholarly publications are consumed within a gift culture institution called the library, a subsidized public good within which knowledge appears to the reader as a free good. Publishers would add that this gift culture is, in turn, subsidized by the owners of intellectual property through the fair use and first sale doctrines, which generally allow copyrighted information to be consumed for educational purposes.
The ambiguity at the boundary of gift and market extends to institutions of higher education as well, which are simultaneously corporation and community. But the dominant factor that has shaped the last 50 years of higher education is that universities have become a kind of public interest corporation that serves national policy goals. Just as the Morrill Act created land grant colleges to promote
research and education for the development of the agricultural economy, modern research universities have been shaped by federal research funding since World War II as "milieus of innovation," functioning as tacit national laboratories for a polity uncomfortable with the idea of a formal industrial policy.
This system of scholarly communication is in crisis. Consider, for example, the possible consequences for this system if some of the ideas and questions being debated nationally were to come to pass:
• What is the future of university research? Does the research university still play a central role as a national milieu for innovation, or has the corporation become the focus of innovative research and national information policy?
• What is the future scope of higher education? Historically, colleges and universities have had a tacit monopoly of the education market based on accreditation and geographical proximity, but instructional technology and distance education have created new markets for education. With the Western Governors' University proposal, a national market for education would be created based on selling teaching services that will be evaluated by examination rather than the accreditation of traditional institutional settings for education. Moreover, corporate training and for-profit education is on the verge of competing directly with some sectors of education.
• What is the future of the library as a public good? In the polity, the idea of a national digital library has been modeled upon the universal access policies governing telephone and electric utilities. Here the public good is fulfilled by the provision of "access," but it will be the consumer's responsibility to pay for information used. The cultural legitimation crisis of public institutions within the polity extends to the funding of academic research libraries within the university as well.
• What is the future of the academic disciplines in a world of increasing specialization that makes it difficult for traditional disciplines to find common terms of discourse and at a time in which disciplinary metamorphosis is now creating new fields like molecular biology, neuroscience, cultural studies, and environmental science?
• What is the future of fair use? Rights that exist in print are not being automatically extended to the use of digital works. Federal policy discussions about intellectual property in the digital environment have not included fair use, giving priority to the creation of a robust market in digital publication and the creation of incentives for the publication of educational works.
These are questions, not predictions, but they are questions that are being discussed in the polity, so they are not mere speculation. They are intended only to point out that the system of scholarly communication is a historical creation, a response to certain conditions that may no longer exist.
Three new factors define the conditions within which a system of scholarly
communication may evolve. First, the emergence of a global economy in which intellectual property is an important source of wealth creates a context in which the value of scholarly research may be a matter of national interest extending far beyond the traditional concerns of the academy. Second, the end of the cold war as a stimulus for national information policy that took the form of federal patronage of university research may fundamentally change the shape and content of federal funding for research. And third, information technology has created global communication, enabling new links between researchers around the world, creating the possibility that the intellectual disciplines of the future are likely to develop paradigms and concerns that transcend national boundaries.
Digital Documents and Academic Productivity
What is the nature of digital documents as an innovation, that it is possible to ask whether they might affect the value of information and its use and the organization of academic research? Geoffrey Nunberg has identified two differences between digital and mechanical technologies that affect both the value of knowledge and the organization of its reproduction.
Unlike mechanical antecedents like the printing press, the typewriter, or the telegraph, the computer isn't restricted to a single role in production and diffusion. In fact, the technology tends to erase distinctions between the separate processes of creation, reproduction and distribution that characterize the classical industrial model of print commodities, not just because the electronic technology employed is the same at each stage, but because control over the processes can be exercised at any point.
... The second important difference between the two technologies follows from the immateriality of electronic representations and the resulting reductions in the cost of reproduction.
The fundamental consequence of these differences, Nunberg argues, is that the user has much greater control of the process of digital reproduction of knowledge as well as its content, essentially transforming the meaning of publication by allowing the reader to replace the author in determining the context and form of knowledge.
However, these differences in the process of the reproduction of ideas do not apply to all digital information, only to information that is "born digital," sometimes also called "digital documents." Today's marketplace consists largely of digitized documents, that is, works written for and reproduced in printed journals, then scanned and distributed on the network. Digitized documents conform to the modes of production of print journals: to the rhetorical rules of the genre of scientific and to the traditional relationships between author, publisher, and reader. If prior processes of technological innovation hold in this case, however, digitized documents represent only a transitional stage, one in which the attempt is made to
use new technologies to increase the productivity of traditional modes of production and to reinforce traditional authority patterns. CD-ROM technology is a good example of the attempt to preserve the traditional modes of production yet take advantage of the capability of digital signals to include multimedia, by packaging them within a physical medium that can be managed just like a printed commodity. The immateriality of networked information is much more difficult to control, although encryption and digital watermarking are technologies that give to digital signals some of the characteristics that enable print copyright to be regulated.
The interesting points to watch will be whether the content of digital and print versions of the same works begin to diverge and whether readers will be allowed to appropriate published digital works and reuse them in new contexts. Markets are made by consumers, not just by publishers, and the fundamental question concerns the future of readers' behavior as the consumers of information. What, for example, is the unit of knowledge? Will readers want to consume digital journals by subscription? Or consume single articles and pay for them as stand-alone commodities through document delivery? Or treat a journal run as a database and pay for access to it as a searchable information service? As Nunberg points out, the intersection of technology and markets will be determined by the nature of the digital signal, which unifies the processes of production, reproduction, and use of information.
In thinking about the nature of digital documents and the kind of social relationships that they make possible, consider the credit card, which may well be the most successful digital document thus far. The credit card itself is only an interface to liquid cash and credit, taking advantage of mainframe computer technology and computer networks to manage market transactions wherever they occur around the world. It replaces printed currency and portable forms of wealth such as letters of credit and traveler's checks with a utility service. It creates new kinds of value: liquidity, through an interface to a worldwide financial system; timeliness and access, through 24-hour service anywhere in the world; and customized or personalized service, through credit. These new kinds of value are not easily measured by traditional measures of productivity; Brynjolfsson notes that by traditional measures, the ATM seems to reduce productivity by reducing the use of checks, the traditional output measure of banks. Yet to characterize the new kinds of value simply as improvements in the quality of service is not a sufficient description of the value of credit or debit cards, since they have created entirely new kinds of markets for financial services and a new interface for economic activity that supports entirely new styles of life, creating a mobile society.
One of these new markets is worthy of a second look, not only as an example of innovation, but to explore the reflexive quality of digital documents. When I use a debit card, a profile of my patterns of consumption is created, information that is of economic value for advertising and marketing; thus I often receive coupons for new or competing products on the back of my grocery receipt. Information
about my use of information is a new kind of economic value and the basis of a new kind of market when used by advertisers and market analysis. In tracking the use of digital services, network technologies might also be described as keeping the consumer under surveillance. Issues of privacy aside, and they are not sufficiency recognized as yet, this tracking will make possible an entirely new, direct, and unmediated relationship between consumer and publisher.
Thus the discussion of protecting intellectual property on the Internet has focused not only on technologies that allow for the control of access to copyrighted material, but also on technologies that audit the use of information, including requirements for the authentication of the identity of the user and tracking patterns of use. The consequences of this reflexivity may well reflect a fundamental shift in how we conceive of the value of information. While markets for physical commodities were regulated by laws and inventory management techniques, markets for digital services will focus on both the content and use of information and will use the network as a medium for knowledge management techniques.
To summarize this process of innovation, credit cards might be described in productivity terms as an efficient new way to manage money, but they might also be described as creating entirely new genres of wealth, literally a new kind of currency; as new ways of life that create new kinds of social and geographical mobility; and in terms of the new kinds of markets and organizations that they make possible. Digitized documents may lower the costs of reproduction and distribution of print journals and perhaps some first-copy costs, but they also create new kinds of value in faster modes of access to information, new techniques for searching, and more customized content. And in the longer run, true digital documents will produce new genres of scholarly discourse, new kinds of information markets, and perhaps new kinds of educational institutions to use them.
At the moment these new possibilities tend to be discussed in terms of the capacity of the new technology to disrupt the laws, cultures, and organizations that have managed research, reading, publishing, and intellectual property in the era of print. Most prominent among these disruptions has been the discussion of the protection of copyright on the Internet, but there is also active concern about the social impacts of digital documents. For example, we have just identified the problem of privacy and surveillance of networked communication, a capacity for surveillance that has already begun to change the nature of supervision in the workplace. Or, to take a second kind of social impact, pornography on the Web has been defined as a social problem involving the protection of children. But these problems are only two examples of a broader issue concerning the impact of a global communications medium on local norms, for the scope of the network transcends the jurisdiction even of national regulatory authorities. There is discussion about the quality of social relationships in Cyberia, negatively manifested by the problem of hostile electronic mail and positively manifested by emerging forms of virtual community. And in national information policy, debate continues about the proper balance between the public interest in access to information
and the commercialization of information in order to create robust information markets.
To summarize, digital technology is not so much about the introduction of intelligent machines, a process that Wriston described as "the application of information to the means of production," as it is about the productivity of knowledge workers. The process of technological innovation implies social and economic change and will be marked by changing knowledge cultures and new genres, which support new styles of life; by changing modes of production, which are likely to be manifested in new kinds of rhetoric, discourse, and new uses of information; and by new forms of communication and community, which will be the foundation of new markets and institutions.
Digital Documents and Academic Community
In an essay called "The Social Life of Documents," John Seeley Brown and Paul Duguid have argued that documents should not be interpreted primarily as containers for content, but as the creators of a sense of community. They say, "the circulation of documents first helps make and then helps maintain social communities and institutions in ways that looking at the content alone cannot explain. In offering an alternative to the notion that documents deliver meaning, [there is a] connection between the creation of communities and the creation of meaning." That is, the central focus of analysis should not be on the artifact itself or even, perhaps, on the market as the primary social formation around documents, but instead the focus should be on the function of the artifact and market in creating and sustaining the social worlds, or communities, of the readers. Here, at last, we have identified the missing subject for discussions of the impact of technology and digital documents on social change: the academic community.
Recently the business management literature has begun to consider an interesting variant of this thesis, namely that the formation of a sense of community within electronic commerce on digital networks is the precondition for the creation and sustenance of markets for digital services. For example, John Hagel III and Arthur G. Armstrong argue that producers of digital services must adapt to the culture of the network.
By giving customers the ability to interact with each other as well as with the company itself, businesses can build new and deeper relationships with customers. We believe that commercial success in the on-line arena will belong to those who organize virtual communities to meet multiple social and commercial needs.
Whereas producers controlled traditional markets, they argue, the information revolution shifts the balance of power to the consumer by providing tools to select the best value, creating entirely new modes of competition. The markets of the future will take the form of virtual communities that will be a medium for "direct channels of communication between producers and customers" and that will
"threaten the long-term viability of traditional intermediaries" (p. 204). In the context of scholarly communication, "traditional intermediaries" would mean libraries and perhaps educational institutions as well.
The questions concerning productivity and technological innovation might now be reconstituted as a kind of sociology of knowledge: What kind of academic community first created print genres and was in turn sustained by them? What kind of community is now creating digital genres and is in turn sustained by them? And what is the relationship between the two, now and in the future?
On a larger scale, the relationship between community and digital documents is a problem in national information policy. In the United States, national information policy has tended to focus on the creation of information markets, but the broader discussion of the social and political impact of digital communications has been concerned with issues of community. For example, the Communications Decency Act and subsequent judicial review has concentrated on Internet pornography and its impact on the culture and mores of local communities. Social and political movements ranging from Greenpeace to militia movements have used the Internet to organize dissent and political action; is this protected free speech and association? Universities are concerned about the impact of abusive electronic mail on academic culture.
The bridge between technology and community is suggested by the elements in the analysis of productivity: how new technologies add new value, create new incentives, and enable new kinds of organization. Brown and Duguid argue that our nation's sense of political community was created by newspapers, not so much in the content of the stories, but in their circulation:
Reaching a significant portion of the population, newspapers helped develop an implicit sense of community among the diverse and scattered populace of the separate colonies and the emerging post-revolutionary nation.... That is, the emergence of a common sense of community contributed as much to the formation of nationhood as the rational arguments of Common Sense. Indeed the former helped create the audience for the latter.
Similarly, and closer to the issue of scholarly communication, the scientific letters that circulated among the Fellows of the Royal Society were the prototype for scientific journals, which in turn sustained scholarly disciplines, which are the organizing infrastructure for academic literature and departments. New forms of value, which is to say new uses of information, create new genres of documents, which in turn create literature, which serves as the historical memory for new forms of community.
In the case of print and digital documents, change is not evolutionary because these two kinds of information offer different kinds of value, but they are not opposites. Genre, for example, has been shaped by the physical characteristics of the print medium, including the design within which information is presented (e.g., page layout, font, binding) as well as the rhetorical norms governing the structure
of information (e.g., essay, scientific article, novel). Rhetoric has been described as a structure to govern the allocation of attention, the scarcest resource of modern times. Our frequent complaints of information overload may well reflect the early stage in the development of rhetorical structures for digital media. Certainly we face more information and more kinds of information, but the real problem reflects the difficulty in determining the quality of digital information (e.g., the lack of reputation and branding); or the difficulty of knowing which kind of information is relevant for certain kinds of decisions (e.g., the problem of productivity); or the relatively primitive rhetorical practices that govern new media (e.g., the problem of flaming in electronic mail).
Consider, for example, the technology of scientific visualization and multimedia. Thus far, visual media tend to be consumed as entertainment, which require us to surrender our critical judgment in order to enjoy the show. Thus the problem of the quality of multimedia information is not simply technical, but requires the development of new genres and rhetorical norms within which visual media are used in a manner consistent with academic values such as critical judgment.
Or, consider some of the new genres for digital documents, which might well be described as adding new kinds of value to information: hypertext, the Boolean search, and the database. Most of us did not learn to read these new genres in childhood as part of the process of becoming literate, and we lack conceptual models for learning them other than those derived from our experience with print; there is, perhaps, a generational difference in this regard. The database raises new problems in defining the unit of knowledge: will consumers read the digital journal or the digital articles, or will the unit of knowledge be the screen, the digital analog of the paragraph, which is identified by a search engine or agent? HTML raises the question: who is responsible for the context of information, the author or the reader? As readers jump from text to text, linking things that had not previously been linked, they create context and therefore govern meaning, and reading becomes a kind of performing art. These questions might be described, perhaps, as a legitimation crisis, in that the traditional authorities that governed or mediated the structure and quality of print are no longer authoritative: the author, the editor, the publisher, and the library. Who are the new authorities?
Information technology was designed to solve the information problems of engineers and scientists, becoming instantiated into the hardware and software templates within which new genres and rhetorical forms have evolved, thence into computer literacy training for nontechnical users, and thence the user skills and modes of reading that the younger generation thinks of as intuitive relationships with the machine. Hypertext, for example, turns narrative into a database, which is a highly functional strategy for recovering specific bits of information in scientific research, as, for example, in searching for information with which to solve a problem. Electronic mail is a highly efficient means for exchanging messages but has little scope for lexical or rhetorical nuance. This limitation has little effect on groups sharing a common culture and background, but it becomes a problem
given the diverse social groups that use electronic mail as a medium for communication today, hence, the frequency of flaming and misunderstanding.
In any case, as sociologists like Bruno Latour have noted, the original intent of the designers of a technology does not necessarily govern the process of technological innovation, for the meaning and purpose of a technology mutates as it crosses social contexts. Thus the problem may not best be posed in terms of an emerging cultural hegemony of the sciences and technology over academic institutions, although many of us do not find it intuitive to be giving "commands" to a computer or pressing "control" and "escape" keys.
But the cultural and organizational consequences of information technology need to be thought about, as technologies designed for the military, business, and science are introduced across the academic community. Thus far the discussion of this topic has occurred primarily in the context of thinking about the uses of distance education, which is to say, the extension of the scope of a given institution's teaching services to a national, or perhaps global, market. But there is a broader question about the nature of the academic community itself in a research university: what is the substance of this sense of community, and what sustains it?
It is often claimed that digital communication can sustain a sense of virtual community, but what is meant by virtual, and what is meant by community in this context? The literature on social capital, for example, argues that civic virtue is a function of participation, and those who participate in one voluntary social activity are highly likely to participate in others, creating a social resource called civil society or community. Robert Putnam argues that television, and perhaps other media, are a passive sort of participation that replace and diminish civic communities. The question is whether today's virtual communities represent a kind of social withdrawal or whether they might become resources for social participation and community. If this community is an important goal of digital networks, how might they be better designed to accomplish this purpose? Can networks be designed to facilitate the moral virtues of community, such as trust, reciprocity, and loyalty?
And finally, to return to the question of the productivity of knowledge workers in an information society, and mindful of the heuristic principle that documents can be understood in terms of the communities they sustain, is not the research library best conceptualized as the traditional knowledge management strategy of the academic community? If so, how well does the digital library perform this function, at least as we understand it thus far? The digital library is generally conceived of only as an information resource, as if the library were only a collection, rather than as a shared intellectual resource and site for a community.
The social functions of the library are not easily measured in terms of outcomes, but are an element in the productivity of faculty and students. To some extent, perhaps, libraries have brought this problem on themselves by measuring quality in terms of fiscal inputs and size of collections, and they must begin to define and measure their role in productivity. But in another sense, the focus on
the content and format of information to the exclusion of consideration of the social contexts and functions of knowledge is a distortion of the nature and dynamics of scholarly communication and the academic community.
The Economics of Electronic Journals
It is now practically universally accepted that scholarly journals will have to be available in digital formats. What is not settled is whether they can be much less expensive than print journals. Most traditional print publishers still claim, just as they have claimed for years, that switching to an electronic format can save at most 30% of the costs, namely the expenses of printing and mailing. Prices of electronic versions of established print journals are little, if any, lower than those of the basic paper versions. What publishers talk about most in connection with electronic publishing are the extra costs they bear, not their savings [BoyceD]. On the other hand, there is also rapid growth of electronic-only journals run by scholars themselves and available for free on the Internet.
Will the free electronic journals dominate? Most publishers claim that they will not survive (see, for example, [Babbitt]) and will be replaced by electronic subscription journals. Even some editors of the free journals agree with that assessment. My opinion is that it is too early to tell whether subscriptions will be required. It is likely that we will have a mix of free and subscription journals and that for an extended period neither will dominate. However, I am convinced that even the subscription journals will be much less expensive than the current print journals. The two main reasons are that modern technology makes it possible to provide the required services much more cheaply, and that in scholarly publishing, authors have no incentive to cooperate with the publishers in maintaining a high overhead system.
In section 2 I summarize the economics of the current print journal system. In section 3 I look at the electronic-only journals that have sprung up over the last few years and are available for free on the Net. I discuss the strange economic incen-
tives that exist in scholarly publishing in section 4. Finally, in section 5 I present some tentative conclusions and projections.
This article draws heavily on my two previous papers on scholarly publishing, [Odlyzko1, Odlyzko2], and the references given there. For other references on electronic journals, see also [Bailey, PeekN]. It should be stressed that only scholarly journal publishing is addressed here. Trade publishing will also be revolutionized by new technology. However, institutional and economic incentives are different there, so the outcome will be different.
Scholarly publishing is a public good, paid for largely (although often indirectly) by taxpayers, students' parents, and donors. The basic assumption I am making in this article is that its costs should be minimized to the largest extent consistent with delivering the services required by scholars and by the society they serve.
Costs of Print Journals
Just how expensive is the current print journal system? While careful studies of the entire scholarly journal system had been conducted in the 1970s [KingMR, Machlup], they were obsolete by the 1990s. Recent studies, such as those in [AMSS, Kirby], address primarily prices that libraries pay, and they show great disparities. For example, among the mathematics journals considered in [Kirby], the price per page ranged from $0.07 to $1.53, and the price per 10,000 characters, which compensates for different formats, from under $0.30 to over $3.00. Such statistics are of greatest value in selecting journals to purchase or (much more frequently) to drop, especially when combined with measures of the value of journals, such as the impact factors calculated by the Science Citation Index. However, those measures are not entirely adequate when studying the entire scholarly journal publishing system. For example, in the statistics of [Kirby], the Duke Mathematics Journal (DMJ ), published by Duke University Press, is among the least expensive journals at $0.19 per page. On the other hand, using the same methodology as that in [Kirby], the International Mathematics Research Notices (IMRN ), coming from the same publisher as DMJ, would have been among the most expensive ones several years ago and would be around the median now (its size has expanded while the price has stayed about constant). The difference appears to come from the much smaller circulation of IMRN than of DMJ and not from any inefficiencies or profits at Duke University Press. (This case is considered in more detail in Section 4.)
To estimate the systems cost of the scholarly journal publishing system, it seems advisable to consider total costs associated with an article. In writing the "Tragic Loss" essay [Odlyzko1], I made some estimates based on a sample of journals, all in mathematics and computer science. They were primary research journals, purchased mainly by libraries. The main identifiable costs associated with a typical article were the following:
1. revenue of publisher: $4,000
2. library costs other than purchase of journals and books: $8,000
3. editorial and refereeing costs: $4,000
4. authors' costs of preparing a paper: $20,000
Of these costs, the publishers' revenue of $4,000 per article (i.e., the total revenue from sales of a journal, divided by the number of articles published in that journal) attracts the most attention in discussions of the library or journal publishing "crises." It is also the easiest to measure and most reliable. However, it is also among the smallest, and this fact is a key factor in the economics of scholarly publishing. The direct costs of a journal article are dwarfed by various indirect costs and subsidies.
The cost estimates above are only rough approximations, especially those for the indirect costs of preparing a paper. There is no accounting mechanism in place to associate the costs in items (3) and (4) with budgets of academic departments. However, those costs are there, and they are large, whether they are half or twice the estimates presented here.
Even the revenue estimate (i) is a rough approximation. Most publishers treat their revenue and circulation data as confidential. There are some detailed accounts, such as that for the American Physical Society (APS) publications in [Lustig] and for the Pacific Journal of Mathematics in [Kirby], but they are few.
The estimate of $4,000 in publishers' revenue per article made in [Odlyzko1] has until recently been just about the only one available in the literature. It is supported by the recent study of Tenopir and King [TenopirK], which also estimates that the total costs of preparing the first copy of an article are around $4,000. The estimate in [Odlyzko1] was based primarily on data in [AMSS] and so is about five years out of date. If I were redoing my study, I would adjust for the rapid inflation in journal prices in the intervening period, which would inflate the costs. On the other hand, in discussing general scholarly publishing, I would probably deflate my estimate to account for the shorter articles that are prevalent in most areas. (The various figures for size of the literature and so on derived in [Odlyzko1] were based on samples almost exclusively from mathematics and theoretical computer science, which were estimated to have articles of about 20 pages each. This figure is consistent with the data for these areas in [TenopirK]. However, the average length of an article over all areas is about 12 pages.) Thus, on balance, the final estimate for the entire scholarly literature would probably still be $3,000 to 4,000 as the publisher revenue from each article.
The $4,000 revenue figure was the median of an extremely dispersed sample. Among the journals used in [Odlyzko1] to derive that estimate, the cost per article ranged from under $1,000 for some journals to over $8,000 for others. This disparity in costs brings out another of the most important features of scholarly publishing, namely lack of price competition. Could any airline survive with $8,000 fares if a competitor offered $1,000 fares?
Wide variations in prices for seemingly similar goods are common even in competitive markets, but they are usually associated with substantial differences in quality. For example, one can sometimes purchase round-trip trans-Atlantic tickets for under $400, provided one travels in the off-season in coach, purchases the tickets when the special sales are announced, travels on certain days, and so on. On the other hand, a first-class unrestricted ticket bought at the gate for the same plane can cost 10 times as much. However, it is easy to tell what the difference in price buys in this case. It is much harder to do so in scholarly publishing. There is some positive correlation between quality of presentation (proofreading, typography, and so on) and price, but it is not strong. In the area that matters the most to scholars, that of quality of material published, it is hard to discern any correlation. In mathematics, the three most prestigious journals are published by a commercial publisher, by a university, and by a professional society, respectively, at widely different costs. (Library subscription costs per page differ by more than a factor of 7 [Kirby], and it is unlikely that numbers of subscribers differ by that much.) In economics, the most prestigious journals are published by a professional society, the American Economic Association, and are among the least expensive ones in that field.
Many publishers argue that costs cannot be reduced much, even with electronic publishing, since most of the cost is the first-copy cost of preparing the manuscripts for publication. This argument is refuted by the widely differing costs among publishers. The great disparity in costs among journals is a sign of an industry that has not had to worry about efficiency. Another sign of lack of effective price competition is the existence of large profits. The economic function of high profits is to attract competition and innovation, which then reduce those profits to average levels. However, as an example, Elsevier's pretax margin exceeds 40% [Hayes], a level that is "phenomenally high, comparable as a fraction of revenues to the profits West Publishing derives from the Westlaw legal information service, and to those of Microsoft" [Odlyzko2]. Even professional societies earn substantial profits on their publishing operations.
Not-for-profit scientific societies, particularly in the United States and in the UK, also often realize substantial surpluses from their publishing operations.... Net returns of 30% and more have not been uncommon. [Lustig]
Such surpluses are used to support other activities of the societies, but in economic terms they are profits. Another sign of an industry with little effective competition is that some publishers keep over 75% of the revenues from journals just for distributing those journals, with all the work of editing and printing being done by learned societies.
Although profits are often high in scholarly publishing, it is best to consider them just as an indicator of an inefficient market. While they are a substantial contributor to the journal crisis, they are not its primary cause. Recall that the publisher revenue of $4,000 per article is only half of the $8,000 library cost (i.e.,
costs of buildings, staff, and so on) associated with that article. Thus even if all publishers gave away their journals for free, there would still be a cost problem. The growth in the scholarly literature is the main culprit.
Even in the print medium, costs can be reduced. That they have not been is due to the strange economics of scholarly publishing, which will be discussed in Section 4. However, even the least expensive print publishers still operate at a cost of around $1,000 per article. Electronic publishing offers the possibilities of going far below even that figure and of dramatically lowering library costs.
Costs of "Free" Electronic Journals
How low can the costs of electronic publishing be? One extreme example is provided by Paul Ginsparg's preprint server [Ginsparg]. It currently processes about 20,000 papers per year. These 20,000 papers would cost $40 to $80 million to publish in conventional print journals (and most of them do get published in such journals, creating costs of $40 to $80 million to society). To operate the Ginsparg server in its present state would take perhaps half the time of a systems administrator, plus depreciation and maintenance on the hardware (an ordinary workstation with what is by today's standards a modest disk system). This expenses might come (with overheads) to a maximum of $100,000 per year, or about $5 per paper.
In presentations by publishers, one often hears allusions to big National Science Foundation (NSF) grants and various hidden costs in Ginsparg's operation. Ginsparg does have a grant from NSF for $1 million, spread over three years, but it is for software development, not for the operation of his server. However, let us take an extreme position, and let us suppose that he has an annual subsidy of $1 million. Let us suppose that he spends all his time on the server (which he manifestly does not, as anyone who checks his publications record will realize), and let us toss in a figure of $300,000 for his pay (including the largest overhead one can imagine that even a high-overhead place like Los Alamos might have). Let us also assume that a large new workstation had to be bought each year for the project, say at $20,000, and let us multiply that by 5 to cover the costs of mirror sites. Let us in addition toss in $100,000 per year for several T1 lines just for this project. Even with all these outrageous overestimates, we can barely come to the vicinity of $1.5 million per year, or $75 per paper. That is dramatically less than the $2,000 to $4,000 per paper that print journals require. (I am using a figure of $2,000 for each paper here as well as that of $4,000 from [Odlyzko1] since APS, the publisher of the lion's share of the papers in Ginsparg's server and among the most efficient publishers, collects revenues of about $2,000 per paper.) As Andy Grove of Intel points out [Grove], any time that anything important changes in a business by a factor of 10, it is necessary to rethink the whole enterprise. Ginsparg's server lowers costs by about two orders of magnitude, not just one.
A skeptic might point out that there are other "hidden subsidies" that have not been counted yet, such as those for the use of the Internet by the users of
Ginsparg's server. Those costs are there, although the bulk is not for the Internet, which is comparatively inexpensive, but for the workstations, local area networks, and users' time coping with buggy operating systems. However, those costs would be there no matter how scholarly papers are published. Publishers depend on the postal system to function, yet are not charged the entire cost of that system. Similarly, electronic publishing is a tiny part of the load on the computing and communications infrastructure and so should not be allocated much of the total cost.
Ginsparg's server is an extreme example of minimizing costs. It also minimizes service. There is no filtering of submissions nor any editing, the features that distinguish a journal from a preprint archive. Some scientists argue that no filtering is necessary and that preprints are sufficient to allow the community to function. However, such views are rare, and most scholars agree that journals do perform an important role. Even though some scholars argue that print plays an essential role in the functioning of the journal system (see the arguments in [Rowland] and [Harnad] for opposing views on this issue), it appears that electronic journals can function just as well as print ones. The question in this paper is whether financial costs can be reduced by switching to electronic publishing.
Hundreds of electronic journals are operated by their editors and available for free on the Net. They do provide all the filtering that their print counterparts do. However, although their ranks appear to double every year [ARL], they are all new and small. The question is whether a system of free journals is durable and whether it can be scaled to cover most of scholarly publishing.
Two factors make free electronic journals possible. One is advances in technology, which make it possible for scholars to handle tasks such as typesetting and distribution that used to require trained experts and a large infrastructure. The other factor is a peculiarity of the scholarly journal system that has already been pointed out above. The monetary cost of the time that scholars put into the journal business as editors and referees is about as large as the total revenue that publishers derive from sales of the journals. Scholarly journal publishing could not exist in its present form if scholars were compensated financially for their work. Technology is making their tasks progressively easier. They could take on new roles and still end up devoting less effort to running the journal system than they have done in the past.
Most scholars are already typesetting their own papers. Many were forced to do so by cutbacks in secretarial support. However, even among those, few would go back to the old system of depending on technical typists if they had a choice. Technology is making it easier to do many tasks oneself than to explain to others how to do them.
Editors and referees are increasingly processing electronic submissions, even for journals that appear exclusively in print. Moreover, the general consensus is that this procedure makes their life much easier. Therefore, if the additional load of publishing an electronic journal were small enough, one might expect scholars to do everything themselves. That is what many editors of the free electronic journals think is feasible. As the volume of papers increases, one can add more editors
to spread the load, as the Electronic Journal of Combinatorics [EJC ] has done recently (and as print journals have done in the past). The counterargument (cf. [Babbitt, BoyceD]) is that there will always be too many repetitive and tedious tasks to do and that even those scholars who enjoy doing them now, while they are a novelty, will get tired of them in the long run. If so, it will be necessary to charge for access to electronic journals to pay for the expert help needed to run them. Some editors of the currendly free electronic journals share this view. However, none of the estimates of what would be required to produce acceptable quality come anywhere near the $4,000 per article that current print publishers collect. In [Odlyzko1] I estimated that $300 to $1,000 per article should suffice, and many others, such as Stevan Harnad, have come up with similar figures. In the years since [Odlyzko1] was written, much more experience in operations of free electronic-only journals has been acquired. I have corresponded and had discussions with editors of many journals, both traditional print-only and free electronic-only. The range of estimates of what it would cost to run a journal without requiring audiors, editors, and referees to do noticeably more than they are doing now is illustrated by the following two examples (both from editors of print-only journals):
1. The editor-in-chief of a large journal, which publishes around 200 papers per year (and processes several times that many submissions) and brings in revenues of about $1 million per year to the publisher, thinks he could run an electronic journal of equivalent quality with a subsidy of about $50,000 per year to pay for an assistant to handle correspondence and minor technical issues. He feels that author-supplied copies are usually adequate and that the work of technical editors at the publisher does not contribute much to the scientific quality of the journal. If he is right, then $250 per paper is sufficient.
2. An editor of a much smaller journal thinks that extensive editing of manuscripts is required. In his journal, he does all the editing himself, and the resulting files are then sent directly to the printer, without involving any technical staff at the publisher. He estimates that he spends between 30 minutes and an hour per page and thinks that having somebody with his professional training and technical skills do the work leads to substantially better results. If we assume a loaded salary of $100,000 per year (since such work could often be done by graduate students and junior postdocs looking for some extra earnings in their spare time), we have an estimate of $25 to $50 per page, or $250 to $1,000 per article, for the cost of running an electronic journal of comparable quality.
All the estimates fit in the range of $300 to $1,000 per article that was projected in [Odlyzko1] and do not come close to the $4,000 per article charged by traditional publishers. Why is there such a disparity in views on costs? It is not caused by a simple ignorance of what it takes to run a viable journal on the part of advocates
of free or low-priced publications, since many of them are running successful operations. The disparity arises out of different views of what is necessary.
It has always been much easier to enlarge a design or add new features than to slim down. This tendency has been noted in ship design [Pugh], cars, and airplanes as well as in computers, where the mainframe builders were brought to the brink of ruin (and often beyond) before they learned from the PC industry. Established publishers are increasingly providing electronic versions of their journals, but usually only in addition to the print version. It is no surprise therefore that their costs are not decreasing. The approach of the free electronic journal pioneers has been different, namely to provide only what can be done with the resources available. They are helped by what are variously called the 80/20 or 70/30 rules (the last 20% of what is provided costs 80% of the total, etc.). By throwing out a few features, publishers can lower costs dramatically. Even in the area of electronic publishing, the spectrum of choices is large. Eric Hellman, editor of The MRS Internet Journal of Nitride Semiconductor Research [MRS ], which provides free access to all readers but charges authors $275 for each published paper, commented [private communication] that with electronic publishing,
$250/paper gets you 90% of the quality that $1000/paper gets you.
Electronics offers many choices of quality and price in publishing.
An example of large differences in costs is provided by projects that make archival information available digitally. Astrophysicists are in the process of digitizing about a million pages of journal articles (without doing optical character recognition, OCR, on the output) and are making them available for free on the Web. The scanning project (paid for by a grant from NASA) is carried out in the United States, yet still costs only $0.18 per page in spite of the high wages. On the other hand, the costs of the JSTOR project, which was cited in [Odlyzko2] as paying about $0.20 per page for scanning, are more complicated. JSTOR pays a contractor around $0.40 per page for a combination of scanning, OCR, and human verification of the OCR output, and the work is done in a less-developed country that has low wage costs. However, JSTOR's total costs are much higher, about $1 to $2 per page, since they rely on trained professionals in the United States to ensure that they have complete runs of journals, that articles are properly classified, and so on. Since JSTOR aims to provide libraries with functionality similar to that of bound volumes, it is natural for it to strive for high quality. This goal raises costs, unfortunately.
It is important to realize how easy it is to raise costs. Even though lack of price competition in scholarly publishing has created unusually high profits [Hayes], most of the price that is paid for journals covers skilled labor. The difference in costs between the astrophysics and JSTOR projects is dramatic, but it does not come from any extravagance. Even at $2 per page, the average scholarly article would cost around $25 to process. At a loaded salary of $100,000 per year for a trained professional, that $25 corresponds to only half an hour of that person's
time. Clearly one can boost the costs by doing more, and JSTOR must be frugal in the use of skilled labor.
Is the higher quality of the JSTOR project worth the extra cost? It is probably essential for JSTOR to succeed in its mission, which is to eliminate the huge print collections of back issues of journals. Personally I feel that JSTOR is a great project, the only one I am aware of in scholarly publishing that benefits all three parties: scholars, libraries, and publishers. Whether it will succeed is another question. It does cost more than just basic scanning, and it does require access restrictions. One can argue that the best course of action would be simply to scan the literature right away while there are still low-wage countries that will do the work inexpensively. The costs of the manual work of cutting open volumes and feeding sheets into scanners is not likely to become much smaller. At $0.20 per page, the entire scholarly literature could probably be scanned for less than $200 million. (By comparison, the world is paying several billion dollars per year just for one year of current journals, and the Harvard libraries alone cost around $60 million per year to operate.) Once the material was scanned, it would be available in the future for OCR and addition of other enhancements.
The main conclusion to be drawn from the discussion in this section is that the monetary costs of scholarly publishing can indeed be lowered, even in print. Whether they will be is another question, one closely bound up with the strange economics of the publishing industry.
The Perverse Incentives in Scholarly Publishing
Competition drives the economy, but it often works in strange ways. A study done a few years ago (before managed care became a serious factor) compared hospital costs in mid-sized U.S. cities that had either one or two hospitals. An obvious guess might be that competition between hospitals would lead to lower costs in cities that had two hospitals. However, the results were just the opposite, with the twohospital cities having substantially higher prices. This result did not mean that basic economic laws did not apply. Competition was operating, but at a different level. Since it was doctors who in practice determined what hospital a patient went to, hospitals were competing for doctors by purchasing more equipment, putting in specialty wards, and the like, which was increasing their costs (but not making any noticeable difference in the health of the population they served). The patients (or, more precisely, their insurers and employers) were paying the extra price.
Scholarly publishing as a business has many similarities to the medical system, except that it is even more complicated. Journals do not compete on price, since that is not what determines their success. There are four principal groups of players. The first one consists of scholars as producers of the information that makes journals valuable. The second consists of scholars as users of that information. However, as users, they gain access to journals primarily through the third group, the libraries. Libraries purchase journals from the fourth group, the publishers,
usually in response to requests from scholars. These requests are based overwhelmingly on the perceived quality of the journals, and price seldom plays a role (although that is changing under the pressure to control growth of library costs). The budgets for libraries almost always come from different sources than the budgets for academic departments, so that scholars as users do not have to make an explicit trade-off between graduate assistantships and libraries, for example.
Scholars as writers of papers determine what journals their work will appear in and thus how much it will cost society to publish their work. However, scholars have no incentive to care about those costs. What matters most to them is the prestige of the journals they publish in. Often the economic incentives are to publish in high-cost outlets. It has often been argued that page charges are a rational way to allocate costs of publishing, since they make the author (or the author's institution or research grant) cover some of the costs of the journal, which, after all, is motivated by a desire to further the author's career. However, page charges are less and less frequent. As an extreme example, in the late 1970s, Nuclear Physics B, published by Elsevier, took over as the "journal of choice" in particle physics and field theory from Physical Review D, even though the latter was much less expensive. This takeover happened because Phys. Rev. D had page charges, and physicists decided they would rather use their grant money for travel, postdocs, and the like. Note that the physicists in this story behaved in a perfectly rational way. They did not have to use their grants to pay for the increase in library costs associated with the shift from an inexpensive journal to a much pricier one. Furthermore, even if they had to pay for that cost, they would have come out ahead; the increase in the costs of just their own library associated with an individual decision to publish in Nucl. Phys. B instead of the less expensive Phys. Rev. D (could such a small change have been quantified) would have been much smaller than the savings on page charges. Most of the extra cost would have been absorbed by other institutions.
To make this argument more explicit, consider two journals: H (high priced) and L (low priced). Suppose that each one has 1,000 library subscriptions and no individual ones. L is a lean operation, and it costs them $3,000 to publish each article. They collect $1,000 from authors through page charges and the other $2,000 from subscribers, so that each library in effect pays $2 for each article that appears in L. On the other hand, H collects $7,000 in revenue per article, all from subscriptions, which comes to $7 per article for each library. (It does not matter much whether the extra cost of H is due to profits, higher quality, or inefficiency.)
From the standpoint of the research enterprise or of any individual library, it would be desirable to steer all authors toward publishing in L, as that would save a total of $4,000 for each article. However, look at this situation from the standpoint of the author. If she publishes in L, she loses $1,000 that could be spent on graduate students, conferences, and so on. If she publishes in H, she gets to keep that money. She does not get charged for the extra cost to any library, at least not right away. Eventually the overhead rates on her contract might go up to pay for the higher library spending at her institution. However, this effect is delayed and is
weak. Even if we had accounting mechanisms that would provide instantaneous feedback (which we do not, with journal prices set more than a year in advance and totally insensitive to minor changes caused by individual authors deciding where to publish), our hypothetical author would surely only get charged for the extra $5 that she causes her library to spend ($7 for publication in H as opposed to $2 in L ) and not for the costs to all the other 999 libraries. She would still save $995 ($1,000 - $5) of her grant money. Is it any wonder if she chooses to publish in H ?
A secondary consideration for authors is to ensure that their papers are widely available. However, this factor has seldom played a major role, and with the availability of preprints through e-mail or home pages it is becoming even less significant. Authors are not told what the circulation of a journal is (although for established publications, they probably have a rough idea of how easy it is to access them). Further, it is doubtful this information would make much difference, at least in most areas. Authors can alert the audience they really care about (typically a few dozen experts) through preprints, and the journal publication is for the résumé more than to contact readers.
In 1993-94, there was a big flap about the pricing of International Mathematics Research Notices (IMRN ), a new research announcement journal spun off from the Duke Mathematical Journal. The institutional subscriptions cost $600 per year, and there were not many papers in it. The director of publishing operations for Duke University Press then responded in the Newsletter on Serials Pricing Issues [NSPI ] by saying that his press was doing the best it could to hold down prices. It's just that their costs for IMRN were going to be $60,000 per year, and they expected to have 100 [sic ] subscriptions, so they felt they had to charge $600 per subscription. Now, one possibility is that the Duke University Press miscalculated and that it might have been easier for them to sell 400 subscriptions at $150 than 100 at $600, since IMRN did establish a good reputation as an insert to Duke Math. J. However, if their decision was right, then there seem to be two possibilities: (1) scholars will decide that it does not make sense to publish in a journal that is available in only 100 libraries around the world, or (2) scholars will continue submitting their papers to the most prestigious journals they can find (such as IMRN ) no matter how small their circulation, since prestige is what counts in tenure and promotion decisions and since everybody that they want to read their papers will be able to get them electronically from preprint servers in any case. In neither case are journals such as IMRN likely to survive in their present form. (IMRN itself appears to have gained a longer lease on life, since it seems to have gained considerably more subscribers and, while it has not lowered its price, it is publishing many more papers, lowering its price per page, as mentioned in Section 2.)
The perverse incentives in scholarly publishing that are illustrated in the examples above have led to the current expensive system. They are also leading to its collapse. The central problem is that scholars have no incentive to maintain the current system. In book publishing, royalties align the authors' interests with those of publishers because both wish to maximize revenues. (This situation is most ap-
plicable in the trade press or in textbooks. In scholarly monograph publishing, the decreasing sales combined with the typical royalty rate of, at most, 15% are reducing the financial payoff to authors and appear to be leading to changes, with monographs becoming available electronically for free.) For the bulk of scholarly publishing, though, the market is too small to provide a significant financial payoff to the authors.
Although scholars have no incentive to maintain the current journal system, they currently also have no incentive to dismantle it. Even the physicists who rely on the Ginsparg preprint server continue to publish most of their papers in established print journals. The reason is that it costs them nothing to submit papers to such journals and also costs them nothing to have their library buy the journals. The data from the Association of Research Libraries [ARL] show that the average cost of the library system at leading research universities is about $12,000 per faculty member. (It is far higher at some, with Princeton spending about $30,000 per year per faculty member.) This figure, however, is not visible to the scholars, and they have no control over it. They are not given a choice between spending for the library and for other purposes.
Until the academic library system is modified, with the costs and trade-offs made clear to scholars and administrators, it is unlikely there will be any drastic changes. We are likely to see slow evolution (cf. [Odlyzko3]), with continuing spread of preprints (in spite of attempts of journals in certain areas, such as medicine, to play King Canute roles and attempt to stem this natural growth). Electronic journals will become almost universal but most of them will be versions of established print journals and will be equally expensive. Free or inexpensive electronic journals will grow, but probably not too rapidly. However, this situation is not likely to persist for long. I have been predicting [Odlyzko1, Odlyzko2] that change will come when administrators realize just how expensive the library system is and that scholars can obtain most of the information they need from other sources, primarily preprints. Over the decade from 1982 to 1992, library expenditures have grown by over a third even after adjusting for general inflation [ARL]. However, they have fallen by about 10% as a share of total university spending. Apparently the pressure from scholars to maintain library collection has not been great enough, and other priorities have been winning. At some point in the future more drastic cuts are likely.
Although library budgets are likely to be cut, total spending on information systems is unlikely to decrease. We are entering the Information Age, after all. What is likely to happen is that spending on information will increasingly flow through other channels. Traditional scholarly books and journals have always been only a part of the total communication system. Until recently they were the dominant part, but their relative value has been declining as the phone, the fax, and cheap
airplane travel as well as the more widely noted computers and data networks have opened up other channels of communications among scholars. This decline, together with the feasibility of lower cost journal systems, is likely to lead to cutbacks in library budgets. How those cuts will be distributed is uncertain. In discussions of the library crisis, most attention is devoted to journal costs. However, for each $1 spent on journal acquisitions, other library costs come to $2. If publishers can provide electronic versions of not only their current issues, but also older ones (either themselves or through JSTOR), they can improve access to scholarly materials and lower the costs of the library system (buildings, staff, maintenance) without lowering their own revenues. It is doubtful whether those savings will be enough, though, and it is likely that spending on traditional scholarly journals as well as the rest of the library system will decrease. To maintain their position, publishers will have to move to activities in which they provide more value instead of relying on scholars to do most of the work for them.
I thank Erik Brynjolfsson, Joe Buhler, Peter Denning, Mark Doyle, Paul Ginsparg, Stevan Harnad, Steve Heller, Eric Hellman, Carol Hutchins, Don King, Rob Kirby, Gene Klotz, Silvio Levy, Laszlo Lovasz, Harry Lustig, Robert Miner, Ann Okerson, Bernard Rous, Arthur Smith, Ron Stern, Edward Vielmetti, Lars Wahlbin, Bernd Wegner, and Ronald Wigington for their comments and the information they provided.
[AMSS] Survey of American research journals, Notices Amer. Math. Soc. 40 (1993): 1339-1344.
[ARL] Association of Research Libraries, http://arl.cni.org.
[Babbitt] D. G. Babbitt, Mathematical journals: Past, present and future-a personal view, Notices Amer. Math. Soc. (Jan. 1997). Available at http://www.ams.org.
[Bailey] C. W. Bailey Jr., Scholarly electronic publishing bibliography, available at http://info.lib.uh.edu/sepb/sepb.html.
[BoyceD] P. B. Boyce and H. Dalterio, Electronic publishing of scientific journals, Physics Today (Jan. 1996): 42-47. Available at http://www.aas.org/~pboyce/epubs/.
[EJC] The Electronic Journal of Combinatorics, http://www.comtmiatorics.org/.
[Ginsparg] P. Ginsparg, Winners and losers in the global research village, available at http://xxx.lanl.gov/blurb/pg96unesco.html.
[Grove] A. Grove, Only the Paranoid Survive, Bantam Doubleday Dell, 1996.
[Harnad] S. Harnad, The paper house of cards (and why it's taking so long to collapse), Ariadne, issue # 8 (March 1997). Web version at http://www.anadne.ac.uk/.
[Hayes] J. R. Hayes, The Internet's first victim? Forbes (Dec. 18, 1995): 200-201.
[KingMR] D. W King, D. D. McDonald, and N. K. Roderer, Scientific Journals in the United States. Their Production, Use, and Economics, Hutchinson Ross, 1981.
[Kirby] R. Kirby, Comparative prices of math journals, available at http://math.berkeley.edu/~kirby/journals.html.
[Lustig] H. Lusrig, Electronic publishing: economic issues in a time of transition, in Electronic Publishing for Physics and Astronomy, A. Heck, ed., Kluwer, 1997.
[Machlup] F. Machlup, K. Leeson, and associates, Information Through the Printed Word: The Disseminatum of Scholarly, Scientific, and Intellectual Knowledge, vol. 2: Praeger, 1978.
[MRS] The MRS Internet Journal of Nitride Semiconductor Research, available at http://nsr.mij.mrs.org/.
[NSPI] Newsletter on Serials Pricing Issues, published electronically at Univ. of North Carolina, available at http://www.lib.unc.edu/prices/.
[Odlyzko1] A. M. Odlyzko, Tragic loss or good riddance? The impending demise of traditional scholarly journals, Intern. J. Human-Computer Studies (formerly Intern. J. Man-Machine Studies ) 42 (1995): 71-122. Also in the electronic J. Univ. Comp. Sci., pilot issue, 1994, http://hyperg.iicm.tu-graz.ac.at. Available at http://www.research.aft.com/~amo.
[Odlyzko2] A. M. Odlyzko, On the road to electronic publishing, Euromath Bulletin 2, no. 1 (June 1996): 49-60. Available at author's home page, http://www.research.att.com/ ~amo.
[Odlyzko3] A. M. Odlyzko, The slow evolution of electronic publishing, in Electronic Publishing-New Models and Opportunities A. J. Meadows and F. Rowland, eds., ICCC Press, 1998. Available at http://www.research.att.com/~amo.
[PeekN] R. P. Peek and G. B. Newby, eds., Scholarly Publishing: The Electronic Frontier, MIT Press, 1996.
[Pugh] P. Pugh, The Cost of Seapower: The Influence of Money on Naval Affairs from 1815 to the Present Day, Conway, 1986.
[Rowland] F. Rowland, Print journals: Fit for the future? Ariadne, issue # 7 (Jan. 1997). Web version at http://www.ariadne.ac.uk/.
[TenopirK] C. Tenopir and D. W. King, Trends in scientific scholarly journal publishing in the U. S., J. Scholarly Publishing 48, no. 3 (April 1997): 135-170.
Cost and Value in Electronic Publishing
James J. O'Donnell
This chapter is perhaps best read through binocular lenses. On the one hand, it is an account of the value and function today and for the foreseeable future of electronic networked texts. But on the other hand, it questions our ability to account for such value and function. In search of the particular, it risks the anecdotal; in defense of value, it expresses skepticism about calculations of cost and price.
I am a student of the works of St. Augustine and shall begin accordingly with confession. For my own scholarship, the single most transforming feature of cyberspace as we inhabit it in 1997 can be found in a warehouse on the edges of downtown Seattle. I mean the nerve center of www.amazon.com. The speed with which a half-conceived interest in a book converts itself to a real book in my mailbox (48 to 72 hours later) has implications, retrospective and prospective, on the finances of my sector of higher education that could well be catastrophic.
If my approach seems whimsical, do not be misled. The real habits of working scholars often fall outside the scope of discussion when new and old forms of publication are considered. I will have some things to say shortly about the concrete results of surveys we have done for the Bryn Mawr Reviews project funded by Mellon, and more of our data appear in the paper by my colleague Richard Hamilton (see chapter 12), but first I want to emphasize a few points by personalizing them.
First, and most important, Amazon books is a perfect hybrid: a cyberspace service that delivers the old technology better and faster than ever before.
Second, my ritual allusion to the paradox of scholars wallowing in information that they do not actually read is not merely humorous: it is a fact of life. The file drawers full of photocopies, read and unread, that every working humanist seems now to possess are a very recent innovation. Photocopying is a service that has declined sharply in price-if measured in real terms-over the past 20 years, and it is certainly the case that graduate and undergraduate students can tell the same
joke on themselves today. Perhaps only full professors today reach the point where they can joke similarly about books, but if so surely we are the leading edge of a wedge.
But abundance is not wealth, for wealth is related to scarcity. This, I think, is the point of our jokes. When each new book, pounced on with delight in a bookstore, was an adventure, and when each scholarly article was either a commitment of time or it was nothing, the mechanical systems of rationing that kept information scarce also kept it valuable. But if we now approach a moment when even quite serious books are abundantly available, then their individual value will surely decline.
I am fond of historical illustration. A student of mine at Penn is now working hard on a dissertation that involves late medieval indulgences-not just the theological practice of handing out remission of punishment but the material media through which that remission was attested. It turns out there were indeed some very carefully produced written indulgences before printing was introduced, but indulgences were among the first printed artifacts ever. The sixteenth century saw a boom in the indulgence business as mass production made the physical testimony easier to distribute and obtain. The "information economy" of indulgences showed a steady rise through several generations. (The price history of indulgences seems still obscure, for reasons my student has not yet been able to fathom; it would be interesting to see if supply and demand had more to do with the availability of the artifact or, rather, was measured by the number of years of purgatorial remission.) But there came a point at which, almost at a stroke, the superabundance of printed indulgences was countered by loud assertions of the worthlessness of the thing now overpriced and oversold. There followed the familiar cycle of business process reengineering in the indulgence business: collapse of market, restructuring, downsizing, and a focusing on core competencies. The indulgence business has never been the same.
A third and last confessional point: as founding coeditor of Bryn Mawr Classical Review (BMCR) since 1990, I may reasonably assert that I have been thinking about and anticipating the benefits of networked electronic communication for scholars for some time now. Yet as I observe my own practices, I must accept that my powers of prognostication have been at best imprecisely focused. Yes, a network connection at my desktop has transformed the way I work, but it has done so less through formal deployment of weighty scholarly resources and more through humbler tools. I will list a few:
1. On-line reference: Though I happened to have owned the compact OED for over 20 years and now, in fact, own a set of the Encyclopedia Britannica, I rarely used the former and rarely remember to look at the latter. But their electronic avatars I consult now daily: "information" sources on myriad topics far more detailed and scholarly than any previously in regular use.
2. On-line productivity information: Under this category I include far better information about weather and travel weather than ever before; access to current airline schedules and other travel information including hotel directories; nationwide telephone directories including yellow pages; on-line newspapers and news feeds.
3. E-mail as productivity tool: The positive impact of e-mail communication on scholarship for me cannot be underestimated. Relatively little of my e-mail has to do with my scholarship, but that proportion is important first of all: news of work in progress, often including copies of papers, and ongoing conversation with specialists elsewhere is a great boon, no question.
4. Formal on-line publishing endeavors: I confess that I use the kinds of resources that Mellon grants support far less than I might have expected. I did indeed point my students to a specific article in a MUSE journal a few months ago, and I browse and snoop, but it was only in writing this paper that I had the excellent idea to bookmark on my browser MUSE's Journal of Early Christian Studies and JSTOR's Speculum -they appear just below the exciting new URL for the New York Times Book Review on-line.
So we, or at least I, live in a world where electronic and print information are already intermarrying regularly, where the traditional content of print culture is declining in value, and where the value of electronic information is not so much in the content as in the interconnectedness and the greater usefulness it possesses. For a work as explicitly devoted as this one is to carrying traditional resources into electronic form, all three of those observations from experience should give pause. In fact, I am going to argue that the intermediacy and incompleteness of the mixed environment we inhabit today is an important and likely durable consideration. Later in this chapter I will return to the implications of this argument. To give them some weight, let me recount and discuss some of our experiences with BMCR. Some familiar tales will be told here, but with, I hope, fresh and renewed point.
When we began BMCR, we wrote around to publishers with classics lists and asked for free books. An engaging number responded affirmatively, considering we had no track record. Oxford Press sent many books, Cambridge Press did not respond: a 50% success rate with the most important British publishers seemed very satisfactory for a start-up. During our first year, we reviewed many OUP books but few if any Cambridge titles. There then appeared, sometime in 1991 or 1992, an OUP Classics catalog with no fewer than two dozen titles appending blurbs from Bryn Mawr Classical Review. (From this we should draw first the lesson that brand names continue to have value: OUP could have chosen to identify its blurbs, as it more commonly does, by author of the review than by title of the journal, but we had chosen our "brand" well.) Approximately two weeks after the OUP catalog appeared, we received unsolicited a first handsome box of books
from Cambridge, and we now have a happy and productive relationship with both publishers. Our distinctive value to publishers is our timeliness: books reviewed in time to blurb them in a catalog while the books are still in their prime selling life, not years later. The practical value to scholars is that information about and discussion of current work moves more rapidly into circulation. (Can a dollar price be placed on such value? I doubt it. I will return later to my belief that one very great difficulty in managing technology transitions affecting research and teaching is that our economic understanding of traditional practices is often too poor and imprecise to furnish a basis for proper analysis. In this particular case, we must cope with the possibility that a short-term advantage will in the long term devalue the information by increasing its speed of movement and decreasing its lifetime of value.)
We began BMCR in part because we already had a circle of collaborators in place. Rick Hamilton had created Bryn Mawr Commentaries in 1980, offering cheap, serviceable, reliable texts of Greek and Latin authors with annotation designed to help real American students of our own time; in a market dominated by reprints of texts for students in the upper forms of British public schools in another century, the series was an immediate hit. It quickly became the most successful textbook series in American classics teaching. I had joined that project in 1984 and in slightly over a decade we had almost 100 titles in print. In the course of that project, Hamilton had assembled a team of younger scholars of proven ability to do good work on a short deadline without exclusive regard for how it would look on a curriculum vitae-textbook-writing is notoriously problematic for tenure committees. This group formed the core of both our editorial board and our reviewing team. If you had asked us in 1990 what we were doing, we would have said that we were getting our friends to review books for us. This statement was true insofar as it meant that we could do a better job more quickly of getting good reviews moving because we had already done the work of building the community on which to draw.
But what surprised us most was that a little more than a year after we began work, we looked at the list of people who had reviewed for us and found that it had grown rapidly beyond the circle of our friends and even the friends of our friends. A book review journal seems unusually well situated to build community in this way because it does not wait for contributions: it solicits them and even offers small compensation-free books-to win people over. If then it can offer timely publication, at least in this field, it is possible to persuade even eminent and computer-hostile contributors to participate. (To be sure, there are no truly computer-hostile contributors left. The most recent review we have published by someone not using at least a word processor is three years old.)
But the fact of networked communication meant that the reviewer base could grow in another way. A large part of our working practice, quite apart from our means of publication, has been facilitated by the Internet. Even if we only printed
and bound our product, what we do would not be possible without the productivity enhancement of e-mail and word processing. We virtually never "typeset" or "keyboard" texts, a great savings at the outset. But we also do a very high proportion of our communication with reviewers by e-mail. Given the difficulties that persist even now of moving formatted files across platforms, we still receive many reviews on floppy disks with accompanying paper copies to ensure accuracy, but that step is only a last one in a process greatly improved by the speed of optical fiber.
Further, in July 1993 our imitation of an old practice led to a fresh transformation of our reviewing population. We began to publish a listing of books received-enough were arriving to make publishing this list seem like a reasonable practice, one we now follow every month. By stroke of simple intuition and good luck, Hamilton had the idea to prepend to that list a request for volunteers to review titles yet unplaced. (I may interpose here that Hamilton and I both felt acutely guilty in the early years every time one or two books were left unplaced for review after several months. Only when we read some time later the musings of a book review editor for a distinguished journal in another field well known for its reviews and found that he was publishing reviews of approximately 5% of the titles that came to his desk did we start to think that our own practice [reviewing, on a conservative estimate, 60 to 70% of titles] was satisfactory.) The request for volunteers drew an unexpected flood of responses. We have now institutionalized that practice to the point that each month's publication of the "books received" list needs to be coordinated for a time when both Hamilton and I are prepared to handle the incoming flood of requests: 30 to 40 a month for a dozen or so stillavailable titles.
But the result of this infusion of talent has been an extraordinary broadening of our talent pool. Though a few reviewers (no more than half a dozen) are household names to our readers as authors of more than a dozen reviews over the seven years of our life, we are delighted to discover that we have published, in the classical review journal alone, 430 different authors from a total of about 1,000 reviews. Our contributors come from several continents: North America, Europe, Africa, Asia, and Australia. By the luck of our having begun with a strategy based in praxis rather than ideology (beginning, that is, with people who had contributed to our textbook series), we have succeeded in creating a conversation that ranges widely across disciplinary and ideological boundaries. The difficulty of establishing working relations with European publishers remains an obstacle that perplexes us: but that difficulty chiefly resides in the old technology of postal delays and the fact that even e-mail does not eradicate the unfamiliarity that inheres when too few opportunities for face-to-face encounter exist.
Our experience with Bryn Mawr Medieval Review has been instructively different. There we began not with a cadre of people and an idea, but merely with an idea. Two senior editors, including myself, recruited a managing editor who tried to do in a vacuum what Hamilton and I had done with the considerably greater re-
sources described above. It never got off the ground. We put together an editorial board consisting of smart people, but people who had no track record of doing good work in a timely way with us: they never really engaged. There was no cadre of prospective reviewers to begin with, and so we built painstakingly slowly. In the circumstances, there was little feedback in the form of good reviews and a buzz of conversation about them, and publication never exceeded a trickle.
We have speculated that some intrinsic differences between "classics" and "medieval studies" as organized fields in this country are relevant here. Classicists tend to self-identify with the profession as a whole and to know and care about materials well beyond their immediate ken. A professor of Greek history can typically tell you in a moment who the leading people in a subfield of Latin literature are and even who some of the rising talent would be. But a medievalist typically selfidentifies with a disciplinary field (like "history") at least as strongly as with "medieval studies," and the historian of Merovingian Gaul neither knows nor cares what is going on in Provençal literature studies. I am disinclined to emphasize such disparities, but they need to be kept in mind for what follows.
After two and a half years of spinning our wheels, with, to be sure, a fair number of reviews, but only a fair number, and with productivity clearly flagging, we made the decision to transfer the review's offices to new management. We were fortunate in gaining agreement from Professor Paul Szarmach of the Medieval Institute of Western Michigan University to give the journal a home and some institutional support. Western Michigan has been the host for a quarter century of the largest "come-all-ye" in medieval studies in the world, the annual Kalamazoo meetings. Suddenly we had planted the journal at the center of a network of selfidentified medievalists. The managing editorship has been taken up by two WMU faculty, Rand Johnson in Classics and Deborah Deliyannis in History, and since they took over the files in spring 1996, the difference has been dramatic. In the last months of 1996, they had the most productive months in the journal's life and on two occasions distributed more reviews in one month than BMCR did. BMCR looks as if it will continue to outproduce BMMR over the next twelve months by an appreciable pace, but the gap is narrowing.
Both BMCR and BMMR stand to gain from our Mellon grant. Features such as a new interface on the World Wide Web, a mechanism for displaying Greek text in Greek font, and enhanced search capabilities will be added to what is still the plain-ASCII text of our archives, which are still, I am either proud or embarrassed to claim, on a gopher server at the University of Virginia Library. Indeed, when we began our conversations with Richard Ekman and Richard Quandt in 1993, one chief feature of our imagined future for BMCR was that we would not only continue to invent the journal of the future, but we would put ourselves in the position of packaging what we had done for distribution to others who might wish to emulate the hardy innovation of an electronic journal. About the time we first spoke those words, Mosaic was born; about the time we received notice of funding from The Mellon Foundation, Netscape sprang to life. Today the "NewJour"
archive based on a list comoderated by myself and Ann Okerson on which we distribute news of new electronic journals suggests that there have been at least 3,500 electronic journals born-some flourishing, some already vanished. Though BMCR is still one of the grandfathers of the genre (Okerson's 1991 pathbreaking directory of e-journals listed 29 titles including BMCR, and that list was near exhaustive), we are scarcely exemplary: it's getting crowded out here.
But meanwhile, a striking thing has happened. Our users have, with astonishing unanimity, not complained about our retro tech appearance. To be sure, we have always had regrets expressed to us about our Greekless appearance and our habit of reducing French to an accentless state otherwise seen in print chiefly in Molly Bloom's final soliloquy in the French translation of Ulysses. But those complaints have not increased. Format, at a moment when the Web is alive with animation, colors, Java scripts, and real audio, turns out to be far less importance than we might have guessed. Meanwhile, to be sure, our usage has to some extent plateaued. During the first heady years, I would send regular messages to my coeditors about the boom in our numbers. That boom has never ended, and I am very pleased to say that we have always seen fewer losses than gains to our subscription lists, but we are leveling out. Although Internet usage statistics continue to seek the stratosphere, we saw a "mere" 14% increase in subscriptions between this time 12 months ago and today. (Our paper subscriptions have always remained very consistent and very flat.) It is my impression that we are part of a larger Internet phenomenon that began in 1996, when the supply of sites began to catch up to demand and everyone's hits-per-site rate began to level off.
But we are still a success, in strikingly traditional ways. Is what we do worth it? How can we measure that? My difficulty in answering such questions is that in precisely the domain of academic life that feels most like home to me, we have always been astonishingly bad at answering such questions. Tony Grafton and Lisa Jardine, in their important book on Renaissance education, From Humanism to the Humanities, make it clear how deeply rooted the cognitive dissonance in our profession is between what we claim and what we do. Any discussion of the productivity of higher education is going to be inflammatory, and any attempt to measure what we do against the standards of contemporary service industries will evoke defenses of a more priestly vision of what we are and what we can be-in the face of economic pressures that defer little, if at all, to priesthoods.
But I will also suggest one additional reason why it is premature to begin measuring too closely what we do. Pioneers are entitled to be fools. Busting sod on the prairie was a disastrous mistake for many, a barely sustainable life for many more (read Wallace Stegner's luminous memoir Wolfwillow for chapter and verse), and an adventure rewarding to few. But it was also a necessary stage toward a productive and, I think we would all agree, valuable economy and culture. I suggest that if we do not know how to count and measure what we do now on the western
frontier with any certainty, we do already know how to fret about it. We know what the issues are, and we know the range of debate.
By contrast, any attempt to measure the value of electronic texts and images or of the communities they facilitate is premature in a hundred ways. We have no common space or ground on which to measure them, for one thing: a thousand or a million experiments are not yet a system. We do not know what survives, what scales, what has value that proves itself to an audience willing to pay to sustain it. We can measure some of the costs, but academic enterprises are appallingly bad at giving fully loaded costs, inasmuch as faculty time, library resources, and the heat that keeps the fingers of the assistant typing HTML from freezing are either unaccounted for or accounted for far more arbitrarily than is the case for, for example, amazon.com. We can measure some of the benefits, but until there is an audience making intelligent choices about electronic texts and their uses, those measures will be equally arbitrary.
Let me put it this way. Was an automobile a cost-effective purchase in 1915? I know just enough of the early history of telegraphy to surmise, but not enough to prove, that the investment in the first generation of poles and wires-Ezra Cornell's great invention-could never possibly have recouped itself to investors. In fact, as with many other new technologies of the nineteenth century, one important stage in development was the great crash of bankruptcies, mergers, and reorganizations that came at the end of the first generation. Western Union, in which Cornell was a principal shareholder, was one economic giant to emerge in that way. A similar crash happened to railroads in the late nineteenth century. Such a reading of history suggests that what we really want to ask is not whether we can afford the benefits of electronic texts but whether and how far we can allow universities and other research institutions to afford the risks of such investment.
For we do not know how to predict successes: there are no "leading economic indicators" in cyberspace to help us hedge and lay our bets. Those of us who have responsibility for large institutional ventures at one level or another find this situation horribly disconcerting, and our temptation over the next months and years is always going to be to ask the tough, green-eyeshade questions, as indeed we must. But at the same time, what we must be working for is an environment in which not every question is pressed to an early answer and in which opportunity and openness are sustained long enough to shape a new space of discourse and community. We are not yet ready for systems thinking about electronic information, for all that we are tempted to it: the pace of change and the shifts of scale are too rapid. The risk is always that we will think we discern the system of the future and so seek to institutionalize it as rapidly as possible, to force a system into existing by closing it off by main force of software, hardware, or text-encoding choices. To do so now, I believe, is a mistake.
For one example: Yahoo and Alta Vista are powerful tools to help organize cyberspace in 1997. But they are heavily dependent on the relative sizes of the spaces they index for the effectiveness of their results: they cannot in present form scale
up. Accordingly, any and all attempts to measure their power and effectiveness are fruitless. For another example: there is as yet no systemic use of information technology in higher education beyond the very pedestrian and pragmatic tools I outlined above. Any attempt to measure one experiment thus falls short of its potential precisely because no such experiment is yet systemic. There is nothing to compare it with, no way to identify the distortions introduced by uniqueness or by the avenues in which the demands of present institutional structures distort an experiment so as to limit its effectiveness.
What we still lack is any kind of economic model for the most effective use of information technology in education and scholarship: that much must be freely granted. The interest and value of the Mellon grants, I would contend, lie in the curiosity with which various of our enterprises push our camel-like noses under one or another tent flap in search of rewarding treats. Until we find those treats, we must, however, be content to recognize that from a distance we all appear as so many back ends of camels showing an uncanny interest in a mysterious tent.
The Future of Electronic Journals
Hal R. Varian
It is widely expected that a great deal of scholarly communication will move to an electronic format. The Internet offers much lower cost of reproduction and distribution than print, the scholarly community has excellent connectivity, and the current system of journal pricing seems to be too expensive. Each of these factors is helping push journals from paper to electronic media.
In this paper I want to speculate about the impact this movement will have on the form of scholarly communication. How will electronic journals evolve?
Each new medium has started by emulating the medium it replaced. Eventually the capabilities added by the new medium allow it to evolve in innovative and often surprising ways. Alexander Graham Bell thought that the telephone would be used to broadcast music into homes. Thomas Edison thought that recordings would be mostly of speech rather than music. Marconi thought that radio's most common use would be two-way communication rather than broadcast.
The first use of the Internet for academic communication has been as a replacement for the printed page. But there are obviously many more possibilities.
Demand and Supply
In order to understand how journals might evolve, it is helpful to start with an understanding of the demand and supply for scholarly communication today.
Supply of Scholarly Communication
The academic reward system is structured to encourage the production of ideas. It does this by rewarding the production and dissemination of "good" ideas- ideas that are widely read and acknowledged.
Scholarly publications are produced by researchers as part of their jobs. At
most universities and research organizations, publication counts significantly toward salary and job security (e.g., tenure). All publications are not created equally: competition for space in top-ranked journals is intense.
The demand for space in those journals is intense because they are highly visible and widely read. Publication in a topflight journal is an important measure of visibility. In some fields, citation data have become an important observable proxy for "impact." Citations are a way of proving that the articles that you publish are, in fact, read.
Demand for Scholarly Communication
Scholarly communication also serves as an input to academic research. It is important to know what other researchers in your area are doing so as to improve your own work and to avoid duplicating their work. Hence, scholars generally want access to a broad range of academic journals.
The ability of universities to attract topflight researchers depends on the size of the library collection. Threats to cancel journal subscriptions are met with cries of outrage by faculty.
The Production of Academic Journals
Tenopir and King  have provided a comprehensive overview of the economics of journal production. According to their estimates, the first-copy costs of an academic article are between $2,000 and $4,000. The bulk of these costs are labor costs, mostly clerical costs for managing the submission, review, editing, typesetting, and setup.
The marginal cost of printing and mailing an issue of a journal is on the order of $6. A special-purpose, nontechnical academic journal that publishes four issues per year with 10 articles each issue would have fixed costs of about $120,000. The variable costs of printing and mailing would be about $24 per year. Such a journal might have a subscriber list of about 600, which leads to a break-even price of $224.
Of course, many journals of this size are sold by for-profit firms and the actual prices may be much higher: subscription prices of $600 per year or more are not uncommon for journals of this nature.
If the variable costs of printing and shipping were eliminated, the break-even price would fall to $200. This simple calculation illustrates the following point: fixed costs dominate the production of academic journals; reduction in printing and distribution costs because of electronic distribution will have negligible effect on break-even prices.
Of course, if many new journals are produced and distributed electronically, the resulting competition may chip away at the $600 monopoly prices. But if these new journals use the same manuscript-handling processes, the $200 cost per subscription will remain the effective floor to journal prices.
Two other costs should be mentioned. First is the cost of archiving. Cooper  estimates that the present value of the storage cost of a single issue of a journal to a typical library is between $25 and $40.
Another interesting figure is yearly cost per article read. This figure varies widely by field, but I can offer a few order-of-magnitude guesses. According to a chart in Lesk [1997, p. 218], 22% of scientific papers published in 1984 were not cited in the ensuing 10-year period. The figure rises to 48% for social science papers and a remarkable 93% for humanities papers!
Odlyzko  estimates that the cost per reader of a mathematical article may be on the order of $200. By comparison, the director of a major medical library has told me that his policy is to cancel journals for which the cost per article read appears to be over $50.
It is not commonly appreciated that one of the major impacts of on-line publication is that use can be easily and precisely monitored. Will academic administrators really pay subscription rates implying costs per reading of several hundred dollars?
Reengineering Journal Production
It seems clear that reduction in the costs of academic communication can only be achieved by reengineering the manuscript handling process. Here I use "reengineering" in both its original sense-rethinking the process-and its popular sense-reducing labor costs.
The current process of manuscript handling is not particularly mysterious. The American Economic Review works something like this. The author sends three paper copies of an article to the main office in Princeton. The editor assigns each manuscript to a coeditor based on the topic of the manuscript and the expertise of the coeditor. (The editor also reviews manuscripts in his own area of expertise.) The editor is assisted in these tasks by a staff of two to three FTE clerical workers.
The manuscripts arrive in the office of the coeditor, who assigns them to two or more reviewers. The coeditor is assisted in this task by a half-time clerical worker. After some nudging, the referees usually report back and the coeditor makes a decision about whether the article merits publication. At the AER, about 12% of the submitted articles are accepted.
Typically the author revises accepted articles for both content and form, and the article is again sent to the referees for further review. In most cases, the article is then accepted and sent to the main office for further processing. At the main office, the article is copyedited and further prepared for publication. It is then sent to be typeset. The proof sheets are sent to the author for checking. After corrections are made, the article is sent to the production facilities where it is printed, bound, and mailed.
Much of the cost in this process is in coordinating the communication: the author sends the paper to the editor, the editor sends it to the coeditor, the coeditor sends it to referees, and so on. These costs require postage and time, but most important, they require coordination. This role is played by the clerical assistants.
Universal use of electronic mail could undoubtedly save significant costs in this component of the publication process. The major enabling technology are standards for document representation (e.g., Microsoft Word, PostScript, SGML, etc.) and multimedia e-mail.
Revelt  sampled Internet working paper sites to determine what formats were being used. According to his survey, PostScript and PDF are the most popular formats for e-prints, with TEX being common in technical areas and HTML for nontechnical areas. It is likely that standardization on two to three formats would be adequate for most authors and readers. My personal recommendation would be to standardize on Adobe PDF since it is readily available, flexible, and inexpensive.
With respect to e-mail, the market seems to be rapidly converging to MIME as a standard for e-mail inclusion; I expect this convergence to be complete within a year or two.
These developments mean that the standards are essentially in place to move to electronic document management during the editorial and refereeing process. Obviously, new practices would have to be developed to ensure security and document integrity. Systems for time-stamping documents, such as Electronic Postmarks, are readily available; the main barrier to their adoption is training necessary for their use.
Impact of Reengineering
If all articles were submitted and distributed electronically, I would guess that the costs of the editorial process would drop by a factor of 50% due to the reduction in clerical labor costs, postage, photocopying, and so on. Such costs comprise about half the first-copy costs, so this savings would be noteworthy for small journals. (See Appendix A for the cost breakdown of a small mathematics journal.)
Once the manuscript was accepted for publication, it would still have to be copyedited and converted to a uniform style. In most academic publishing, copy editing is rather light, but there are exceptions. Conversion to a uniform style is still rather expensive because of the idiosyncrasies of authors' word processing systems and writing habits.
It is possible that journals could distribute electronic style sheets that would help authors achieve a uniform style, but experience thus far has not given great reason for optimism on this front. Journals that accept electronic submissions report significant costs in conversion to a uniform style.
One question that should be taken seriously is whether these conversion costs for uniform style are worth it. Typesetting costs are about $15 to $25 per page for
moderately technical material. Markup costs probably require two to three hours of a copyeditor's time. These figures mean that preparation costs for a 20-page article are on the order of $500. If a hundred people read the article, is the uniform style worth $5 apiece to them? Or, more to the point, if 10 people read the article, is the uniform style worth $50 apiece?
The advent of desktop publishing dramatically reduced the cost of small-scale publication. But it is not obvious that the average quality of published documents went up. The earlier movement from hard type to digital typography had the same impact. As Knuth  observes, digitally typeset documents cost less but had lower quality than did documents set manually.
My own guess about this benefit-cost trade-off is that the quality from professional formatted documents isn't worth the cost for material that is only read by small numbers of individuals. The larger the audience, the more beneficial and cost-effective formatting becomes. I suggest a two-tiered approach: articles that are formatted by authors are published very inexpensively. Of these, the "classics" can be "reprinted" in professionally designed formats.
A further issue arises in some subjects. Author-formatted documents may be adequate for reading, but they are not adequate for archiving. It is very useful to be able to search and manipulate subcomponents of an article, such as abstracts and references. This archiving capability means that the article must be formatted in such a way that these subcomponents can be identified. Standardized Generalized Markup Language (SGML) allows for such formatting, but it is rather unlikely that it could be implemented by most authors, at least using tools available today.
The benefits from structured markup are significant, but markup is also quite costly, so the benefit-cost trade-off is far from clear. I return to this point below.
In summary, reengineering the manuscript-handling process by moving to electronic submission and review may save close to half of the first-copy costs of journal production. If we take the $2,000 first-copy costs per article as representative, we can move the first-copy costs to about $1,000. Shifting the formatting responsibility to authors would reduce quality, but would also save even more on first-copy costs. For journals with small readership, this trade-off may be worth it. Indeed, many humanities journals have moved to on-line publication for reasons of reduced cost.
Odlyzko  estimates that the cost of Ginsparg's  electronic preprint server is between $5 and $75 per paper. These papers are formatted entirely by the authors (mostly using TE X) and are not refereed. Creation and electronic distribution of scholarly work can be very inexpensive; you have to wonder whether the value added by traditional publishing practices is really worth it.
Up until now we have only considered the costs of preparing the manuscript for publication. If the material were subsequently distributed electronically, there would be further savings. We can classify these as follows:
• Shelf space savings to libraries. As we've seen, these savings could be on the order of $35 per volume in present value. However, electronic archiving is not free. Running a Web server or creating a CD is costly. Even more costly is updating the media. Books that are hundreds of years old can easily be read today. Floppy disks that are 10 years old may be unreadable because of obsolete storage media or formatting. Electronic archives will need to be backed up, transported to new media, and translated. All these activities are costly. (Of course, traditional libraries are also costly; the ARL estimates this cost to be on the order of $12,000 per faculty member per year. Electronic document archives will undoubtedly reduce many of the traditional library costs once they are fully implemented.)
• Monitoring. As mentioned above, it is much easier to monitor the use of electronic media. Since the primary point of the editorial and refereeing process is to economize on readers' attention, it should be very useful to have some feedback on whether articles are actually read. This feedback would help university administrators make more rational decisions about journal acquisition, faculty retention, and other critical resource allocation issues.
• Search. It is much easier to search electronic media. References can be immediately displayed using hyperlinks. Both forward and reverse bibliographic searches can be done using on-line materials, which should greatly aid literature analysis.
• Supporting materials. The incremental costs to storing longer documents are very small, so it is easy to include data sets, images, detailed analyses, simulations, and so on that can improve scientific communication.
Chickens and Eggs
The big issue facing those who want to publish an electronic journal is how to get the ball rolling. People will publish in electronic journals that have lots of readers; people will read electronic journals that contain lots of high-quality material.
This kind of chicken-and-egg problem is known in economics as a "network externalities" problem. We say that a good (such as an electronic journal) exhibits network externalities if an individual's value for the product depends on how many other people use it. Telephones, faxes, and e-mail all exhibit network externalities. Electronic journals exhibit a kind of indirect form of network externalities since the readers' value depends on how many authors publish in the journal and the number of authors who publish depends on how many readers the journal has.
There are several ways around this problem, most of which involve discounts for initial purchasers. You can give the journal away for a while, and eventually charge for it, as the Wall Street Journal has done. You can pay authors to publish, as the Bell Journal of Economics did when it started. It is important to realize that the
payment doesn't have to be a monetary one. A very attractive form of payment is to offer prizes for the best articles published each year in the journal. The prizes can offer a nominal amount of money, but the real value is being able to list such a prize on your curriculum vitae. In order to be credible, such prizes should be juried and promoted widely. This reward system may be an attractive way to overcome young authors' reluctance to publish in electronic journals.
When Everything is Electronic
Let us now speculate a bit about what will happen when all academic publication is electronic. I suggest that (1) publications will have more general form; (2) new filtering and refereeing mechanisms will be used; (3) archiving and standardization will remain a problem.
The fundamental problem with specialized academic communication is that it is specialized. Many academic publications have fewer than 100 readers. Despite these small numbers, the academic undertaking may still be worthwhile. Progress in academic research comes by dividing problems up into small pieces and investigating these pieces in depth. Painstaking examination of minute topics provides the building blocks for grand theories.
However, much can be said for the viewpoint that academic research may be excessively narrow. Rumor has it that a ghost named Pedro haunts the bell tower at Berkeley. The undergrads make offerings to Pedro at the Campanile on the evening before an exam. Pedro, it is said, was a graduate student in linguistics who wanted to write his thesis on Sanskrit. In fact, it was a thesis about one word in Sanskrit. And, it was not just one word, but in fact was on one of this word's forms in one of the particularly obscure declensions of Sanskrit. Alas, his thesis committee rejected Pedro's topic as "too broad."
The narrowness of academic publication, however, is not entirely due to the process of research, but is also due to the costs of publication. Editors encourage short articles, partly to save on publication costs but mostly to save on the attention costs of the readers. Physics Letters is widely read because the articles are required to be short. But one way that authors achieve the required brevity is to remove all "unnecessary" words-such as conjunctions, prepositions, and articles.
Electronic publication eliminates the physical costs of length, but not the attention costs. Brevity will still be a virtue for some readers; depth will be a virtue for others. Electronic publication allows for mass customization of articles, much like the famous inverted triangle in journalism: there can be a one-paragraph abstract, a one-page executive summary, a four-page overview, a 20-page article, and a 50 page appendix. User interfaces can be devised to read this "stretchtext."
Some of these textual components can be targeted toward generalists in a field,
some toward specialists. It is even possible that some components could be directed toward readers who are outside the academic specialty represented. Reaching a large audience would, presumably, provide some incentive for the time and trouble necessary to create such stretchtext documents.
This possibility for variable-depth documents that can have multiple representations is very exciting. Well-written articles could appeal both to specialists and to those outside the specialty. The curse of the small audience could be overcome if the full flexibility of electronic publication were exploited.
As I noted earlier, one of the critical functions of the academic publishing system is to filter. Work cannot be cumulative unless authors have some faith that prior literature is accurate. Peer review helps ensure that work meets appropriate standards for publication.
There is a recognized pecking order among journals, with high-quality journals in each discipline having a reputation for being more selective than others. This pecking order helps researchers focus their attention on areas that are thought by their profession to be particularly important.
In the last 25 years many new journals have been introduced, with the majority coming from the private sector. Nowadays almost anything can be published somewhere -the only issue is where. Publication itself conveys little information about quality.
Many new journals are published by for-profit publishers. They make money by selling journal subscriptions, which generally means publishing more articles. But the value of peer review comes in being selective, a value almost diametrically opposed to increasing the output of published articles.
I mentioned above that one of the significant implications of electronic publication was that monitoring costs are much lower. It will be possible to tell with some certainty what is being read. This monitoring will allow for more accurate benefit-cost comparisons with respect to purchase decisions. But perhaps even more significantly, it will allow for better evaluation of the significance of academic research.
Citation counts are often used as a measure of the impact of articles and journals. Studies in economics [Laband and Piette 1994] indicate that most of the citations are to articles published in a few journals. More articles are being published, a smaller fraction of which are read [de Sola Pool 1983]. It is not clear that the filtering function of peer review is working appropriately in the current environment.
Academic hiring and promotion policies contribute an additional complication. Researchers choose narrower specialties, making it more difficult to judge achievement locally. Outside letters of evaluation have become worthless because of the lack of guarantees of privacy. All that is left is the publication record and
the quantity of publication, whose merits are easier to convey to nonexperts than quality of publication.
The result is that young academics are encouraged to publish as much as possible in their first five to six years. Accurate measures of the impact of young researchers' work, such as citation counts, cannot be accumulated in this short a time period. One reform that would probably help matters significantly would be to put an upper limit on the number of papers submitted as part of tenure review. Rather than submitting everything published in the last six years, assistant professors could submit only their five best articles. This reform would, I suggest, lead to higher quality work and higher quality decisions on the part of review boards.
Dimensions of Filtering
If we currently suffer from a glut of information, electronic publication will only make matters worse. Reduced cost of publication and dissemination is likely to make more and more material available. This proliferation isn't necessarily bad; it simply means that the filtering tools will have to be improved.
I would argue that journals filter papers on two dimensions: interest and correctness. The first thing a referee should ask is, "is this interesting?" If the paper is interesting, the next question should be, "is this correct?" Interest is relatively easy to judge; correctness is substantially more difficult. But there isn't much value in determining correctness if interest is lacking.
When publication was a costly activity, it was appropriate to evaluate papers prior to publication. Ideally, only interesting and correct work manuscripts would undergo the expensive transformation of publication. Furthermore, publication is a binary signal: either a manuscript is published or not.
Electronic publication is cheap. Essentially everything should be published, in the sense of being made available for downloading. The filtering process will take place ex post, so as to help users determine which articles are worth downloading and reading. As indicated above, the existing peer review system could simply be translated to this new medium. But the electronic media offer possibilities not easily accomplished in print media. Other models of filtering may be more effective and efficient.
A Model for Electronic Publication
Allow me to sketch one such model for electronic publishing that is based on some of the considerations above. Obviously it is only one model; many models should and will be tried. However, I think that the model I suggest has some interesting features.
First, the journal assembles a board of editors. The function of the board is not just to provide a list of luminaries to grace the front cover of the journal; they will actually have to do some work.
Authors submit (electronic) papers to the journal. These papers have three
parts: a one-paragraph abstract, a five-page summary, and a 20-to 30-page conventional paper. The abstract is a standard part of academic papers and needs no further discussion. The summary is modeled after the Papers and Proceedings issue of the American Economic Review: it should describe what question the author addresses, what methods were used to answer the question, and what the author found. The summary should be aimed at as broad an audience as possible. This summary would then be linked to the supporting evidence: mathematical proofs, econometric analysis, data sets, simulations, and so on. The supporting evidence could be quite technical and would probably end up being similar in structure to current published papers.
Initially, I imagine that authors would write a traditional paper and pull out parts of the introduction and conclusion to construct the summary section. This method would be fine to get started, although I hope that the structure would evolve beyond this.
The submitted materials will be read by two to three members of the editorial board who will rate them with respect to how interesting they are. The editors will be required only to evaluate the five-page summary and will not necessarily be responsible for evaluating the correctness of the entire article. The editors will use a common curve; e.g., no more than 10% of the articles get the highest score. The editorial score will be attached to the paper and be made available on the server. Editors will be anonymous; only the score will be made public.
Note that all papers will be accepted; the current rating system of "publish or not" is replaced by a scale of (say) 1-5. Authors will be notified of the rating they received from the editors, and they can withdraw the paper at this point if they choose to do so. However, once they agree that their paper be posted, it cannot withdrawn (unless it is published elsewhere), although new versions of it can be posted and linked to the old one.
Subscribers to the journal can search all parts of the on-line papers. They can also ask to be notified by e-mail of all papers that receive scores higher than some threshold or that contain certain keywords. When subscribers read a paper, they also score it with respect to its interest, and summary statistics of these scores are also (anonymously) attached to the paper.
Since all evaluations are available on-line, it would be possible to use them in quite creative ways. For example, I might be interested in seeing the ratings of all readers with whom my own judgments are closely correlated (see Konstan et al.  for an elaboration of this scheme). Or I might be interested in seeing all papers that were highly rated by Fellows of the Econometric Society or the Economic History Society.
This sort of "social recommender" system will help people focus their attention on research that their peers-whomever they may be-find interesting. Papers that are deemed interesting can then be evaluated with respect to their correctness.
Authors can submit papers that comment on or extend previous work. When
they do so, they submit a paper in the ordinary way with links to the paper in question as well as to other papers in this general area. This discussion of a topic forms a thread that can be traversed using standard software tools. See Harnad  for more on this topic.
Papers that are widely read and commented on will certainly be evaluated carefully for their correctness. Papers that aren't read may not be correct, but that presumably has low social cost. The length of the thread attached to a paper indicates how many people have (carefully) read it. If many people have read the paper and found it correct, a researcher may have some faith that the results satisfy conventional standards for scientific accuracy.
This model is unlike the conventional publishing model, but it addresses many of the same design considerations. The primary components are as follows:
• Articles have varying depths, which allows them to appeal to a broad audience as well as satisfy specialists.
• Articles are rated first with respect to interest by a board of editors. Articles that are deemed highly interesting are then evaluated with respect to correctness.
• Readers can contribute to the evaluation process.
• The unit of academic discourse becomes a thread of discussion. Interesting articles that are closely read and evaluated can be assumed to be correct and therefore serve as bases for future work.
Appendix A Cost of a Small Math Journal
The production costs of the Pacific Journal of Mathematics3 have been examined by Kirby . This journal publishes 10 issues of about 200 pages each per year. A summary of its yearly costs is given in Table 25.1.
The PJM charges $275 per subscription and has about 1,000 subscribers. The journal also prints about 500 additional copies per year, which go to the sponsoring institutions in lieu of rent, secretarial support, office equipment, and so on.
The first-copy costs per page are about $64, while the variable cost per page printed and distributed is about 3.5 cents. The average article in this journal is about 20 pages long, which makes the first-copy cost per article about $1,280, somewhat smaller than the $2,000 figure in Tenopir and King . However, the PJM does not pay for space and for part of its secretarial support; adding in
these costs would reduce the difference. The cost of printing and distributing a 200-page issue is about $7 per issue, consistent with the figure used in this paper.
Research support from NSF grant SBR-9320481 is gratefully acknowledged.
Michael Cooper. A cost comparison of alternative book storage strategies. Library Quarterly, 59(3), 1989.
Ithiel de Sola Pool. Tracking the flow of information. Science, 221(4611):609-613, 1983.
Paul Ginsparg. Winners and losers in the global research village. Technical report, Los Alamos, 1996. http://xxx.lanl.gov/blurb/pg96unesco.html.
Stevan Harnad. The paper house of cards (and why it's taking so long to collapse). Ariadne, 8, 1997. http://www.ariadne.ac.uk/issue8/harnad/.
Stevan Harnad. The post-Gutenberg galaxy: How to get there from here. Times Higher Education Supplement, 1995. http://cogsci.ecs.soton.ac.uk:80/ harnad/THES/thes.html.
Rob Kirby. Comparative prices of math journals. Technical report, UC Berkeley, 1997. http://math.berkeley.edu/ kirby/journals.html.
Donald Knuth. TE X and Metafont: New Directions in Typesetting. American Mathematical Society, Providence, R.I., 1979.
Joseph A. Konstan, Bradley N. Miller, David Maltz, Jonathan L. Herlocker, Lee R. Gordon, and John Riedl. Grouplens: Applying collaborative filtering to Usenet news. Communications of the ACM, 40(3):77-87, 1997.
David N. Laband and Michael J. Piette. The relative impact of economics journals: 1970-1990. Journal of Economic Literature, 32(2):640-66, 1994.
Michael Lesk. Books, Bytes, and Bucks: Practical Digital Libraries. Morgan Kaufmann, San Francisco, 1997.
Andrew Odlyzko. The economics of electronic journals. Technical report, AT&T Labs, 1997.
David Revelt. Electronic working paper standards. Technical report, UC Berkeley, 1996. http://alfred.sims.berkeley.edu/working-paper-standards.html.
Carol Tenopir and Donald W. King. Trends in scientific scholarly journal publishing. Technical report, School of Information Sciences, University of Tennessee, Knoxville, 1996.