Preferred Citation: Ekman, Richard, and Richard E. Quandt, editors Technology and Scholarly Communication. Berkeley, Calif Pittsburgh?]:  University of California Press Published in association with the Andrew K. Mellon Foundation,  c1999 1999. http://ark.cdlib.org/ark:/13030/ft5w10074r/


cover

Technology and Scholarly Communication

Edited By
Richard Ekman and Richard E. Quandt

UNIVERSITY OF CALIFORNIA PRESS
Berkeley · Los Angeles · Oxford
© 1999 The Regents of the University of California


Preferred Citation: Ekman, Richard, and Richard E. Quandt, editors Technology and Scholarly Communication. Berkeley, Calif Pittsburgh?]:  University of California Press Published in association with the Andrew K. Mellon Foundation,  c1999 1999. http://ark.cdlib.org/ark:/13030/ft5w10074r/


ix

PREFACE

The Andrew W. Mellon Foundation has a long-standing interest in the vitality of both research libraries and scholarly publishing. In the 1970s and early 1980s, the Foundation made large grants for the cataloging of collections and for the general support of the leading independent research libraries and university libraries. The Foundation also offered assistance to university presses. In the late 1980s, escalating operating costs at research libraries, especially for acquisitions, prompted a detailed empirical study of trends in both materials acquired and expenditures. The resulting work, University Libraries and Scholarly Communication,[1] demonstrated that research libraries were spending more of their funds on acquisitions, but were buying a smaller share of what was available or what might be regarded as desirable to purchase by any reasonable standard. The situation seemed inherently unstable, and changes were inevitable.

At the time our report appeared, there was also reason to hope that solutions to at least some of these problems might be found through thoughtful utilization of the then-new information technologies. There was, however, only very limited experience with the application of these technologies to the scholarly communication process-from the electronic publication of works of scholarship, to ways of organizing and cataloging materials, to the provision of electronic access to the source materials for doing scholarship. We therefore decided that the Foundation might be able to make a significant contribution by supporting a variety of natural experiments in different fields of study using diverse formats-including the electronic equivalents of books, journals, manuscripts, sound recordings, photographs, and working papers. This initiative was launched in 1994, and to date the Foundation has made 30 grants totaling $12.8 million in support of projects that attempt to evaluate the effects on actual patterns of scholarly use and measurable costs when electronic approaches to scholarly communication are introduced.


x

Selection of these projects has been guided by Richard Ekman, secretary of the Foundation and a senior program officer, and Richard E. Quandt, HughesRogers Professor of Economics Emeritus at Princeton University and senior advisor to the Foundation.[2] Ekman and Quandt have worked closely with the directors of these projects since their inception and, late in 1996, concluded that the time had come for the first exchange of reports on results achieved through projects funded by the earliest grants. Accordingly, they organized a conference, graciously hosted by Emory University in Atlanta in April 1997, at which some two dozen papers were presented (most of which were authored by the directors of these early projects). Some 60 other individuals participated in the conference, including librarians, publishers, and leaders in the field of information technology. Sessions were organized under the following headings: "Economics of Electronic Publishing-Cost Issues"; "The Evolution of Journals"; "Journal Pricing and User Acceptance"; "Patterns of Usage"; "Technical Choices and Standards"; "Licenses, Copyright, and Fair Use"; "Multi-Institutional Cooperation"; and "Sustaining Change."

The papers in this volume constitute most of those that were presented at the Atlanta conference. None of the conclusions put forth in these papers is definitive-it is too early for that. But they do demonstrate that a substantial amount of learning has already taken place about how to operate more effectively in light of changed economic and technological circumstances; how to set prices for scholarly products in order to make them sustainable in the long run; and how to make these new electronic resources as attractive and useful to students and scholars as traditional books and journals have been. In addition to reporting on "where we have been," these papers offer insights into the potential of information technology and well-informed statements about the future of the publishing and library worlds. The Foundation intends to follow these projects for some years to come and also to learn from the results of additional projects that have been approved more recently and could not be included in the April 1997 conference.

The papers in this volume are best regarded, then, as contributions to the opening of areas of inquiry rather than as fixed judgments about fields that are continuing to change very rapidly. It is a pleasure to applaud the work of the authors of these papers and also to congratulate Richard Ekman and Richard Quandt on the excellent work they have done in organizing and stimulating new thinking in the broad field of scholarly communication. The most daunting challenge, in my view, is how to improve scholarship, learning, and teaching while simultaneously reducing the costs of libraries and other institutions that will continue to work under intense budgetary pressures. The papers in this extremely useful volume will serve their purpose if they provoke as well as inform.


xi

WILLIAM G. BOWEN


1

INTRODUCTION:
ELECTRONIC PUBLISHING, DIGITAL LIBRARIES, AND THE SCHOLARLY ENVIRONMENT

Richard E. Quandt and Richard Ekman

Introduction

By now it is commonplace to observe that the economic position of research libraries has been deteriorating for at least 20 or more years and that derivative pressures have been experienced by publishers of scholarly monographs. The basic facts have been discussed in Cummings et al. (1992), and the facts and their interpretations have been analyzed in countless articles[1] -ground that we do not need to cover from the beginning. Contemporaneously with these unfavorable changes, an explosive growth has occurred in information technology: processing speeds of computers have doubled perhaps every 18 months, hard-disk storage capacities have changed from 10 Mbytes for the first IBM-XTs to 6 to 8 Gbytes, and networks have grown in speed, capacity, and pervasiveness in equal measure. In chapter 21 of this volume, Michael Lesk shows that the number of Internet hosts has grown 10-fold in a four-year period. Parallel with the hardware changes has come the extraordinary development of software. The Web is now a nearly seamless environment about which the principal complaint may be that we are being inundated with too much information, scholarly and otherwise.

Some five years ago, more or less isolated and enthusiastic scholars started to make scholarly information available on the growing electronic networks, and it appeared that they were doing so at very low cost (per item of information). A few electronic journals started to appear; the hope was voiced in some quarters that modern information technology would supplant the traditional print-based forms of scholarly communication and do so at a substantially lower cost. The Association of Research Libraries, compiler of the Directory of Electronic Scholarly Journals, Newsletters, and Academic Discussion Lists since 1991, reports that there are currently approximately 4,000 refereed electronic journals.

The Andrew W. Mellon Foundation, which has had a long-standing commit-


2

ment to support research libraries in their mission, announced a new initiative in 1994 with the dual objective of (1) supporting electronic and digital publishing and library projects that would make significant contributions to assisting scholarly communication, and (2) supporting them in a manner that would, at the same time, permit detailed and searching studies of the economics of these projects. The objective of the projects was not so much the creation of new hardware or software, but the thoughtful application of existing hardware and software to problems of scholarly communication-in Joseph Schumpeter's terms, the emphasis was to be more on "innovation" than on "invention." The Foundation also planned to diversify its portfolio of projects along functional lines-that is, to work with publishers as well as libraries; to deal with journals as well as monographs, reference works, and multimedia approaches; and to support liberal arts colleges as well as research universities. All grantees were required to include in their proposals a section that outlined the methodology to be used in the project to track the evolution of developmental and capital costs as well as continuing costs (the supply side of the equation) and to measure the usage of any new product created, preferably under varying pricing scenarios (the demand side of the equation). Out of these efforts, it was hoped, the outlines of a "business plan" would emerge from which one could ultimately judge the long-term viability of the product created by the project and examine whether it did, indeed, save libraries money in comparison with the conventional print-based delivery mechanism for an analogous product (Ekman and Quandt 1994).

The papers in the present volume represent, for the most part, the findings and analyses that have emerged from the first phase of the Foundation's grant making in this area. They were all presented and discussed at a conference held under the auspices of the Foundation at the Emory University Conference Center in Atlanta, Georgia, on April 24-25, 1997.[2] They fall roughly into five categories: (1) papers that deal with important technical or methodological issues, such as techniques of digitizing, markup languages, or copyright; (2) papers that attempt to analyze what has, in fact, happened in particular experiments to use electronic publishing of various materials; (3) papers that deal specifically with the patterns of use and questions of productivity and long-term viability of electronic journals or books; (4) papers that consider models of how electronic publishing could be organized in the future; and (5) papers that deal with broader or more speculative approaches.

The purpose of this introductory essay is not to summarize each paper. Although we will refer to individual papers in the course of discussion, we would like to raise questions or comment on issues emerging from the papers in the hope of stimulating others to seek answers in the coming months and years.

Information Technology and the Productivity Puzzle

The argument in favor of the wholesale adoption of the new information technology (IT) in universities, publishing houses, libraries, and scholarly communication


3

rests on the hope-indeed the dogma-that IT will substantially raise productivity. It behooves us to take a step back to discuss briefly the general relationship between IT, on the one hand, and economic growth and productivity increases on the other.[3]

There seems to be solid agreement among experts that the trend growth rate of real GDP in the United States has been between 2.0 and 2.5% per annum during the past 20 years, with 2.2% being perhaps the best point estimate. About 1.1% of this increase is accounted for by the growth in the labor force, leaving 1.1% for annual productivity growth. This figure is very unremarkable in light of the miracles that are supposed to have occurred in the past 20 years in IT. Technology communications and information gathering have grown tremendously: for example, some steel factories now have hardly any workers in them, and banking is done by computers. Yet the productivity figures do not seem to reflect these savings. Admittedly, there are measurement problems: to the extent that we overestimate the rate of inflation (and there is some evidence that that we do), we also underestimate the rate of growth of GDP and productivity. It may also be true that our measurement of inflation and hence productivity does not correctly measure the quality improvements caused by IT: perhaps an argument in support of that view is that the worst productivity performance is seen to be in industries in which measurement of output is chronically very difficult (such as in financial intermediaries). But it is difficult to escape the conclusion that IT has not delivered what the hype surrounding it has claimed.

What can we say about the effects of IT in universities, libraries, and publishing houses, or in teaching, research, and administration? Productivity increases are clearly a sine qua non for improvement in the economic situation of universities and libraries, but labor productivity increases are not enough. If every worker in a university produces a greater output than before and the university chooses to produce the greater output without diminishing the number of workers employed (who now need more and more expensive equipment to do their work), its economic situation will not improve. As a minimum, we must secure increases in "total factor productivity"; that is, real output per real composite input ought to rise. But will this improvement be forthcoming? And how do we measure labor or total factor productivity in an institution with highly varied products and inputs, most of which cannot be measured routinely by the piece or by weight or volume? What is the "output contribution" of students being able to write papers with left and right margins aligned and without-thanks to spell checkers-too many spelling errors? What is the output contribution (that is, the contribution to producing "truth" in particle physics) of Ginsparg's preprint server in Los Alamos?

Scott Bennett's important contribution to this volume (chapter 4) gives a specific example of how one might tackle the question of the effect of IT on productivity in teaching and shows that the answer depends very much on the time horizon one has in mind. This conclusion is important (and extremely reasonable). The investments that are necessary to introduce IT in teaching need to be amortized, and that takes time. One need look only as far as the eighteenth and nineteenth centuries to recognize that the inventions that fueled the Industrial Revolu-


4

tion did not achieve their full effects overnight; on the contrary, it took many decades for the steam engine, railroads, and later, electricity to diffuse throughout the economy. Hence, even the most productive IT breakthroughs in an isolated course will not show up in overall university productivity figures: the total investment in IT technology is too small a fraction of aggregate capital to make much difference.[4]

In the papers by Malcom Getz (chapter 6) and Robert Shirrell (chapter 10), we have examples of the potential impact on research productivity. Getz illustrates why libraries will prefer to buy large packages of electronic journals, and Shirrell stresses that productivity is likely to be higher (and costs lower) over longer horizons. By coincidence, both authors happened to choose as their specific illustration the American Economic Review.

Bennett's, Getz's, and Shirrell's papers, among others, suggest that much could be learned by studying academic productivity more systematically and in more detail. We need to study particular examples of innovation in teaching and to analyze the productivity changes that occur over suitably long horizons and with full awareness that a complete understanding of productivity in teaching must cope with the problem of how to measure whether students have learned faster or better or more. We also need to pay more explicit attention to research productivity, mindful of the possibility that research productivity may mean different things in different disciplines.

But these considerations raise particularly murky questions. When journals are electronic and access to information is much faster and requires less effort, do scholars in the sciences and social sciences write more papers and do humanists write more books? Or do they write the same number of articles and books, but these writings are better than they would have been without the IT aids? What measures do we have for scholarly productivity? Obviously, every self-respecting tenure-and-promotion committee will shudder at the thought that productivity is appropriately measured by the quantity of publications; but how do we measure quality? And what is the relationship between the quality of access to information and the quality of ideas? While we agree with Hal Varian's view (chapter 25) that journals tend to have an agreed-upon pecking order and we find his suggestions for a new model of electronic publishing fascinating and promising, we still believe that in the short run, much could be learned by studying the impact of particular IT advances on scholarly productivity in specific fields.

It is possible that in the short run our views about productivity enhancements from IT in universities, libraries, and publishing houses must be expressions of faith. But unlike previous eras, when inventions and innovations did not always lead to self-conscious and subsequently documented examinations of the productivity effects, we have remarkable opportunities to measure the productivity effects in the discrete applications of IT by universities, libraries, and scholarly presses, and thus provide earlier feedback on the innovation process than would otherwise occur.


5

Measuring Demand and Supply: The Foundations for Pricing Strategies and Survival

It may be too facile a generalization to say that the early, "heroic" period of electronic library products was characterized by enormous enthusiasm on the part of their creators and not much concern about costs, usage, and business plans. But the early history of electronic publishing is filled with examples of devoted academics giving freely of their time and pursuing their dreams in "borrowed" physical space and with purloined machine cycles on computers that were originally obtained for other purposes. An instructive (and amusing) example of this phenomenon can be found in the papers by Richard Hamilton (chapter 12) and James J. O'Donnell (chapter 24), which describe the early days of the Bryn Mawr Reviews and how the editors improvised to provide space, hardware, and labor for this effort. Creating electronic library products seemed to be incredibly easy, and it was.

But as we gradually learned what was technically possible,[5] started to learn what users might like or demand, and realized the scope of the efforts that might be involved in, say, digitizing large bodies of materials, it was unavoidable that sooner or later even not-for-profit efforts would be informed by the realities of the marketplace. For example, if we create an electronic counterpart to an existing print-based journal, should the electronic counterpart look identical to the original? What search capabilities should there be? Should the corpus of the electronic material be added to in the future? What are the staffing requirements of creating and maintaining an electronic publication? (See, for example, Willis G. Regier [chapter 9].) Marketplace realities were further compounded by the recognition that commercial publishers were looking to enter the field of electronic publication. In their case the question of pricing had to be explicitly considered, as is amply illustrated in the paper by Karen Hunter (chapter 8).

Of course, pricing cannot be considered in the abstract, and the "proper" pricing strategy will generally depend on (1) the objectives to be accomplished by a pricing policy, (2) costs, and (3) demand for the product. While it is much too early in the development of electronic information products to propose anything beyond casual answers, it is not too early to consider the dimensions of these problems. We shall briefly discuss each of three key elements on which pricing has to depend.

Objectives to Be Accomplished

The important fact is that there are numerous agents in the chain from the original creator of intellectual property to the ultimate user. And the creator-the author-may himself have divided interests. On the one hand, he may want to have the largest conceivable circulation of the work in question in order to spread his academic reputation. On the other hand-and this point is characteristically relevant only for books-he may want to maximize royalty income. Or, indeed, the


6

author may have a compromise solution in mind in which both royalty income and circulation get some weight.

Next in line comes the publisher who is well aware that he is selling a differentiated product that confers upon him some monopoly power: the demand curve for such products is downward sloping and raising the price will diminish the quantity demanded.[6] The market motivations of commercial and not-for-profit publishers may not be very different, but in practice, commercial publishers appear to charge higher prices. Since print-based materials are difficult to resell (or copy in their entirety), publishers are able to practice price discrimination-that is, sell the identical product at different prices to different customers, as in the case of journal subscriptions, which are frequently priced at a higher level for libraries than for individuals.[7] The naive view might be that a commercial publisher would charge a price to maximize short-term profit. But the example of Elsevier, particularly in its TULIP Project, suggests that the picture is much more complicated than that. While Elsevier's design of TULIP may not be compatible with long-run profit maximization, the correct interpretation of that project is still open to question.

Scholars and students want access to scholarly materials that is broad and inexpensive to them, although they do not much care whether their universities bear a large cost in acquiring these materials. On the other hand, academic administrators want to contain costs,[8] perhaps even at the risk of reducing the flow of scholarly information, but also have a stake in preserving certain aspects of the journal and book production process (such as refereeing) as a way of maintaining their ability to judge academic excellence, even if this approach adds to the cost of library materials. The libraries, on the other hand, would like to provide as large a flow of information to their clients as possible and might seek the best combination of different library materials to accomplish this objective.

While none of us can clearly foresee how the actual prices for various types of electronic library products will evolve, there are two general ways in which electronic library products can be defined and two general ways in which they can be priced. Either the product itself can be an individual product (for example, a given journal, such as The Chicago Journal of Theoretical Computer Science, or a particular monograph, or even an individual paper or chapter), or it can be a bundle of journals or monographs with the understanding that the purchaser in this latter case buys the entire bundle or nothing. If the product is a bundle, a further question is whether the items bundled are essentially similar (that is, good substitutes for one another), as would be the case if one bundled into a single product 20 economics journals; whether the items bundled are sufficiently dissimilar so that they would not be good substitutes for one another, such as Project MUSE; or whether the bundle is a "cluster of clusters" in the sense that it contains several subsets that have the characteristic of high substitutability within but low substitutability across subsets, as is the case in JSTOR.[9]

With regard to pricing, the vendor may offer site licenses for the product (how-


7

ever defined in light of the discussion above), which provide the purchaser with very substantial rights of downloading, printing, and so on, or charge the user each time the user accesses the product (known as "charging by the drink"). The principal difference here is not in who ultimately bears the cost, since even in the latter case universities may cover the tabs run up by their members. Two contrasting approaches, JSTOR and Project MUSE, are described in the papers by Kevin M. Guthrie (chapter 7) and Regier, respectively. In the case of JSTOR, both initial fees and annual maintenance charges vary by the size of the subscribing institution. Project MUSE, in addition, offers-not unproblematic-discounts for groups of institutions joined in consortia.

Costs

Several papers in this volume discuss the issue of costs. The complexity of the cost issues is staggering. Everybody agrees that journals as well as monographs have first-copy costs, which much resemble what the economist calls fixed costs, and variable costs. Printing, binding, and mailing are fairly sizable portions of total costs (23% for the American Economic Review if we ignore the fixed component and more like 36% if we include it), and it is tempting to hope that electronic publications will completely avoid these costs. (It is particularly bothersome that the marginal cost of producing an additional unit of an electronic product is [nearly] zero; hence a competitive pricing strategy would prescribe an optimal price of zero, at which, however, the vendor cannot make ends meet.) While it is true that publishers may avoid these particular costs, they clearly incur others, such as hardware, which periodically needs to be replaced, and digitizing or markup costs. Thus, estimates by Project MUSE, for example, are that to provide both the print-based and the electronic copies of journals costs 130% of the print-based publication by itself, whereas for Immunology Today, the combined price is set at 125% of the print version (see chapter 8). But these figures just underscore how much in the process is truly variable or adjustable: one could, presumably, save on editorial costs by requiring authors to submit papers ready to be placed on the Web (to be sure, with some risk of deteriorating visual quality).[10]

Most important, the cost implications of electronic publication are not only those costs from actually producing the product. Suddenly, the costs incurred by other entities are also affected. First is the library. Traditionally the library has borne costs as a result of providing access to scholarly information: the book or journal has to be ordered, it has to be cataloged, sometimes bound (and even rebound), shelved and reshelved, circulated, and so on. But electronic products, while they may offer some savings, also bring new costs. Libraries, for example, now have to provide workstations at which users can access the relevant materials; they must devote resources to archiving electronic materials or to providing help desks for the uninitiated. The university's computer center may also get involved in the process. But equally important, the costs to a user may also depend on the


8

specific type of electronic product. Meanwhile, to the extent that a professor no longer has to walk to the library to consult a book, a benefit is conferred that has the effect of a de facto cost reduction. But let us agree that university administrators may not care much about costs that do not get translated into actual dollars and cents and that have to be actually disbursed (as Bennett points out). Nevertheless, there may well be actual costs that can be reduced. For example, a digital library of rare materials may obviate the need for a professor to undertake expensive research trips to distant libraries (which we may therefore call avoided costs). This factor may represent a saving to the university if it normally finances such trips or may be a saving to the National Science Foundation or the National Endowment for the Humanities if they were to end up paying the tab. The main point is that certain costs that used to be deemed external to the library now become internal to a broader system, and the costs of the provision of information resources must be regarded, as a minimum, on a university-wide basis. Hence these costs belong not only in the librarian's office (who would not normally care about the costs of professors' research trips) but in the provost's office as well.

Finally, we should note that many types of electronic products have up-front development costs that, given the current state of the market for such products, may not be recouped in the short run. (See, for example, Janet H. Fisher [chapter 5].) But to the extent that electronic library products will be more competitive at some future time, investing in current development efforts without the expectation of a payback may be analogous to the infant industry argument for tariff protection and may well have a lot of justification for it.

Usage and Demand

One area that we know even less about than costs is usage and demand. The traditional view has been that scientists will adapt rapidly to electronic publications, whatever they may be, and the humanists will adapt rather slowly, if at all. The picture is probably more complicated than that.

Some kinds of usage-for example, hits on the Web-may be easy to measure but tell us correspondingly little. Because hits may include aimless browsing or be only a few seconds in duration, the mere occurrence of a hit may not tell us a great deal. Nor are we able to generate in the short run the type of information from which the econometrician can easily estimate a demand function, because we do not have alternative prices at which alternative quantities demanded can be observed. But we can learn much from detailed surveys of users in which they describe what they like and what they do not like in the product and how the product makes their lives as researchers or students easier or harder (see the surveying described by Mary Summerfield and Carol A. Mandel in chapter 17). Thus, for example, it appears that critical mass is an important characteristic of certain types of electronic products, and the TULIP project may have been less than fully successful because it failed to reach the critical mass.


9

Electronic library products make access to information easier in some respects and certainly faster; but these benefits do not mean that the electronic information is always more convenient (reading the screen can be a nuisance in contrast to reading the printed page), nor is it clear that the more convenient access makes students learn better or faster. In fact, the acceptance of electronic products has been slower than anticipated in a number of instances. (See the papers about the Chicago Journal of Theoretical Computer Science [chapter 5], Project MUSE [chapters 9 and 15], JSTOR [chapters 7 and 11], and the Columbia On-line Books project [chapter 17].) But all the temporary setbacks and the numerous dimensions that the usage questions entail make it imperative that we track our experiences when we create an electronic or digital product; only in the light of such information will we be able to design products that are readily acceptable and marketable at prices that ensure the vendor's long-term survival.

A special aspect of usage is highlighted by the possibility that institutions may join forces for the common consortial exploitation of library resources, as in the case of the Associated Colleges of the South (Richard W. Meyer [chapter 14]) and Case Western Reserve/Akron Universities (Raymond K. Neff [chapter 16]). These approaches offer potentially large economies but may face new problems in technology, relations with vendors, and consortial governance (Andrew Lass [chapter 13]). When the consortium is concerned not only with shared usage, but also with publishing or compilation of research resources (as in the cases of Project MUSE and the Case Western/Akron project), the issues of consortial governance are even more complex.

A Look into the Future: Questions But No Answers (Yet)

A key question is not whether electronic publishing will grow in the future at the expense of print-based publishing nor whether electronic access to scholarly materials in universities will account for an increasing share of all access to such materials. The answers to both these broad questions are clearly "yes." But some of the more important and interrelated questions are the following:[11] (1) How will the costs of electronic and conventional publishing evolve over time? (2) How will products be priced? (3) What kind of use will be made of electronic information products in teaching and in research? (4) How will the use affect the productivity of all types of academic activities? (5) What will be the bottom line for academic institutions as a result of the changes that are and will be occurring?

At present, the cost comparison between electronic and conventional publications may be ambiguous, and the ambiguity is due, in part, to our inability to reduce first-copy costs substantially: electronic publications save on fulfillment costs but require high initial investments and continued substantial editorial involvement. Andrew Odlyzko, in chapter 23, argues that electronic journals can be published at a much lower per-page cost than conventional journals, but this reduc-


10

tion does not appear to be happening yet. Even if we were able to reduce costs substantially by turning to electronic journals wholesale, the question for the future of library and university budgets is how the costs of electronic journals will increase over time relative to other university costs.

Curiously, most studies of the determinants of journal prices have focused on what makes one journal more expensive than another and not on what makes journal prices increase faster than, say, the prices of classroom seats.[12] If electronic journals were half the price of comparable print-based journals, universities could realize a one-time saving by substituting electronic journals for print-based ones; but if these journals increased in price over time as rapidly as paper journals, eventually universities would observe the same budget squeeze that has occurred in the last decade.

In speculating about the future evolution of costs, we can paint both optimistic and pessimistic scenarios. Hardware capabilities have increased enormously. PC processors have gone from 8-bit processors running at just over 4 Mhertz to 32-bit processors running at 450 Mhertz. Any given piece of software will run blindingly fast on a PC of the latter type in comparison with one of the former. But there is a continuing escalation of software: as soon as faster PCs appear on the market, more demanding software is created that will not perform adequately on an older PC. Will software developments continually make our hardware obsolete? If so, we may be able to carry out more elaborate functions, but some may not serve directly the objective of efficient access to scholarly information and all will be at the cost of an unending stream of equipment upgrades or replacements. On the other hand, some software improvements may reduce first-copy costs directly. It is not difficult to imagine a "learned-paper-writing software" that has the feel of a Windows 95 application, with a drop-down menu that allows the user to select the journal in whose style the paper is to be written, similarly to select mathematical notation, and so on.[13] Perhaps under such circumstances editing and copyediting might consist of little more than finding errors of logic or substance. Furthermore, as the complexity of hardware and software grows, will the need for technical support staff continue to grow and perhaps represent an increasing share of the budget?[14] It would take a crystal ball to answer all these questions. The questions provide, perhaps, some justification for O'Donnell's skeptical paper in this volume about the possibilities of measurement at this early stage in the history of electronic libraries.

Other questions asked at the beginning of this section cannot be answered in isolation from one another. The usage made of electronic products-which, one imagines, will be paid for mostly by universities and other academic institutions- will depend on the price, and the price will clearly depend on the usage: as in the standard economic model, quantity and price are jointly determined, neither being the independent cause of the other. But it is certain that usage will lag behind the hype about usage. Miller (1997) cites a state legislator who believes that the entire holdings of the Harvard University library system have (already) been


11

digitized and are available to the public free of charge, and at least one East European librarian has stated that conventional library acquisitions are no longer relevant, since all important material will be available electronically.

In returning to the productivity puzzle, it is important to be clear about what productivity means. One may be awed by the fact that some 30 years ago the number of shares traded daily on the New York Stock Exchange was measured in the millions or perhaps tens of millions, whereas today a day with 700 million shares traded is commonplace. However, the productivity of the brokerage industry is not measured by the number of shares traded, but by the value added per worker in that industry, a figure that exhibits substantially lower rates of growth. Likewise, in instruction or research, productivity is not measured by the number of accesses to information but by the learning imparted (in instruction) or by the number of ideas or even papers generated (in research). If information gathering is a relatively small portion of total instructional activity (that is, if explanation of the underlying logic of an argument or the weaving of an intellectual web represent a much larger fraction), the productivity impact in teaching may end up being small. If information gathering is a small portion of research (that is, if performing laboratory experiments or working out solutions to mathematical models are much larger fractions), then the productivity impact in research may end up being low. And in these fundamental instructional and research activities there may be no breakthroughs resulting from the information technology revolution, just as you still need exactly four people to perform a string quartet and cannot increase productivity by playing it, say, twice as fast (see Baumol and Bowen 1966).

Teaching and research methods will change, but less rapidly than some may expect. The change will be more rapid if searching for the relevant information can be accomplished effectively. Effective search techniques are less relevant for instructional units (courses) in which the professor has canned the access procedures to information (such as the art course materials described in Bennett's paper in this volume). But individualized electronic products are not likely to sweep the broad ranges of academia: the specific art course at Yale, discussed by Bennett, is likely to be taught only at Yale despite the fact that it is similar to courses taught at hundreds of universities. For scholars who are truly searching for new information, Web searches that report 38,732 hits are not useful and suggest that neither students nor faculty members are well trained in effective search techniques.[15] It is fortunate that important new research into better search algorithms is being carried out. Librarians will play a vital role in this process by helping to guide scholars toward the best electronic sources, just as they have helped generations of scholars to find their way among print-based sources. This new role may, of course, require that librarians themselves redefine their functions to some extent and acquire new expertise; but these changes are happening anyway.

And what about the bottom line? This question is the most difficult one of all, and only the wildest guesses can be hazarded at this time. Taking a short-run or intermediate-run perspective (that is, a period of time up to, say, seven years from


12

now), we do not expect that the revolution in information technology is going to reduce university costs; in fact it may increase them. Beyond the intermediate horizon, things may well change. Hardware costs may decline even more precipitously than heretofore, software (including search engines) may become ever more effective, and the cost savings due to information technology that are not centered on library activities may become properly attributed to the electronic revolution. In the long run, the budgetary implications are probably much more favorable than the short-or intermediate-term implications. But what we need to emphasize is that the proper way of assessing the longer term evolution of costs is not by way of one part of an institution-say, the library-or even by way of an individual institution viewed as a whole system, but ideally by way of an interdependent multiinstitutional system. Just as electronic technology may provide a university with savings that do not fall within the traditional library budget, thus implying that savings are spread university-wide, so too will the savings be spread over the entire higher educational system. Even though some costs at individual institutions may rise, we are confident that system costs will eventually fall.


15

PART ONE—
TECHNOLOGICAL FUNDAMENTALS


17

Chapter 1—
Making Technology Work for Scholarship
Investing in the Data

Susan Hockey

The introduction of any kind of new technology is often a painful and timeconsuming process, at least for those who must incorporate it into their everyday lives. This is particularly true of computing technology, where the learning curve can be steep, what is learned changes rapidly, and ever more new and exciting things seem to be perpetually on the horizon. How can the providers and consumers of electronic information make the best use of this new medium and ensure that the information they create and use will outlast the current system on which it is used? In this chapter we examine some of these issues, concentrating on the humanities, where the nature of the information studied by scholars can be almost anything and where the information can be studied for almost any purpose.

Today's computer programs are not sophisticated enough to process raw data sensibly. This situation will remain true until artificial intelligence and natural language processing research has made very much more progress. Early on in my days as a humanities computing specialist, I saw a library catalog that had been typed into the computer without anything to separate the fields in the information. There was no way of knowing what was the author, title, publisher, or call number of any of the items. The catalog could be printed out, but the titles could not be searched at all, nor could the items in the catalog be sorted by author name. Although a human can tell which is the author or title from reading the catalog, a computer program cannot. Something must be inserted in the data to give the program more information. This situation is a very simple example of markup, or encoding, which is needed to make computers work better for us. Since we are so far from having the kind of intelligence we really need in computer programs, we must put that intelligence in the data so that computer programs can be informed by it. The more intelligence there is in our data, the better our programs will perform. But what should that intelligence look like? How can we ensure that we make the right decisions in creating it so that computers can really do what we


18

want? Some scholarly communication and digital library projects are beginning to provide answers to these questions.

New Technology or Old?

Many current technology and digital library projects use the new technology as an access mechanism to deliver the old technology. These projects rest on the assumption that the typical scholarly product is an article or monograph and that it will be read in a sequential fashion as indeed we have done for hundreds of years, ever since these products began to be produced on paper and be bound into physical artifacts such as books. The difference is that instead of going only to the library or bookstore to obtain the object, we access it over the network-and then almost certainly have to print a copy of it in order to read it. Of course there is a tremendous savings of time for those who have instant access to the network, can find the material they are looking for easily, and have high-speed printers. I want to argue here that delivering the old technology via the new is only a transitory phase and that it must not be viewed as an end in itself. Before we embark on the large-scale compilation of electronic information, we must consider how future scholars might use this information and what are the best ways of ensuring that the information will last beyond the current technology.

The old (print) technology developed into a sophisticated model over a long period of time.[1] Books consist of pages bound up in sequential fashion, delivering the text in a single linear sequence. Page numbers and running heads are used for identification purposes. Books also often include other organizational aids, such as tables of contents and back-of-the-book indexes, which are conventionally placed at the beginning and end of the book respectively. Footnotes, bibliographies, illustrations, and so forth, provide additional methods of cross-referencing. A title page provides a convention for identifying the book and its author and publication details. The length of a book is often determined by publishers' costs or requirements rather than by what the author really wants to say about the subject. Journal articles exhibit similar characteristics, also being designed for reproduction on pieces of paper. Furthermore, the ease of reading printed books and journals is determined by their typography, which is designed to help the reader by reinforcing what the author wants to say. Conventions of typography (headings, italic, bold, etc.) make things stand out on the page.

When we put information into electronic form, we find that we can do many more things with it than we can with a printed book. We can still read it, though not as well as we can read a printed book. The real advantage of the electronic medium is that we can search and manipulate the information in many different ways. We are no longer dependent on the back-of-the-book index to find things within the text, but can search for any word or phrase using retrieval software. We no longer need the whole book to look up one paragraph but can just access the piece of information we need. We can also access several different pieces of infor-


19

mation at the same time and make links between them. We can find a bibliographic reference and go immediately to the place to which it points. We can merge different representations of the same material into a coherent whole and we can count instances of features within the information. We can thus begin to think of the material we want as "information objects."[2]

To reinforce the arguments I am making here, I call electronic images of printed pages "dead text" and use the term "live text" for searchable representations of text.[3] For dead text we can use only those retrieval tools that were designed for finding printed items, and even then this information must be added as searchable live text, usually in the form of bibliographic references or tables of contents. Of course most of the dead text produced over the past fifteen or so years began its life as live text in the form of word-processed documents. The obvious question is, how can the utility of that live text be retained and not lost forever?

Electronic Text and Data Formats

Long before digital libraries became popular, live electronic text was being created for many different purposes, most often, as we have seen, with word-processing or typesetting programs. Unfortunately this kind of live electronic text is normally searchable only by the word-processing program that produced it and then only in a very simple way. We have all encountered the problems involved in moving from one word-processing program to another. Although some of these problems have been solved in more recent versions of the software, maintaining an electronic document as a word-processing file is not a sensible option for the long term unless the creator of the document is absolutely sure that this document will be needed only in the short-term future and only for the purposes of word processing by the program that created it. Word-processed documents contain typographic markup, or codes, to specify the formatting. If there were no markup, the document would be much more difficult to read. However, typesetting markup is ambiguous and thus cannot be used sensibly by any retrieval program. For example, italics can be used for titles of books, or for emphasized words, or for foreign words. With typographic markup, we cannot distinguish titles of books from foreign words, which we may, at some stage, want to search for separately.

Other electronic texts were created for the purposes of retrieval and analysis. Many such examples exist, ranging from the large text databases of legal statutes to humanities collections such as the Thesaurus Linguae Graecae (TLG) and the Trésor de la langue française. The scholars working on these projects all realized that they needed to put some intelligence into the data in order to search it effectively. Most project staff devised markup schemes that focus on ways of identifying the reference citations for items that have been retrieved; for example, in the TLG, those items would be the name of the author, work, book, and chapter number. Such


20

markup schemes do not easily provide for representing items of interest within a text, for example, foreign words or quotations. Most of these markup schemes are specific to one or two computer programs, and texts prepared in them are not easily interchangeable. A meeting in 1987 examined the many markup schemes for humanities electronic texts and concluded that the present situation was "chaos."[4] No existing markup scheme satisfied the needs of all users, and much time was being wasted converting from one deficient scheme to another.

Another commonly used method of storing and retrieving information is a relational database such as, for example, Microsoft Access or dBASE or the mainframe program Oracle. In a relational database, data is assumed to take the form of one or more tables consisting of rows and columns, that is, the form of rectangular structures.[5] A simple table of biographical information may have rows representing people and columns holding information about those people, for example, name, date of birth, occupation, and so on. When a person has more than one occupation, the data becomes clumsy and the information is best represented in two tables, in which the second has a row for each occupation of each person. The tables are linked, or related, by the person. A third table may hold information about the occupations. It is not difficult for a human to conceptualize the data structures of a relational database or for a computer to process them. Relational databases work well for some kinds of information, for example, an address list, but in reality not much data in the real world fits well into rectangular structures. Such a structure means that the information is distorted when it is entered into the computer, and processing and analyses are carried out on the distorted forms, whose distortion tends to be forgotten. Relational databases also force the allocation of information to fixed data categories, whereas, in the humanities at any rate, much of the information is subject to scholarly debate and dispute, requiring multiple views of the material to be represented. Furthermore, getting information out of a relational database for use by other programs usually requires some programming knowledge.

The progress of too many retrieval and database projects can be characterized as follows. A project group decides that it wants to "make a CD-ROM." It finds that it has to investigate possible software programs for delivery of the results and chooses the one that has the most seductive user interface or most persuasive salesperson. If the data include some nonstandard characters, the highest priority is often given to displaying those characters on the screen; little attention is paid to the functions needed to manipulate those characters. Data are then entered directly into this software over a period of time during which the software interface begins to look outmoded as technology changes. By the time the data have been entered for the project, the software company has gone out of business, leaving the project staff with a lot of valuable information in a proprietary software format that is no longer supported. More often than not, the data are lost and much time and money has been wasted. The investment is clearly in the data, and it makes


21

sense to ensure that these data are not dependent on one particular program but can be used by other programs as well.

Standard Generalized Markup Language (Sgml)

Given the time and effort involved in creating electronic information, it makes sense to step back and think about how to ensure that the information can outlast the computer system on which it is created and can also be used for many different purposes. These are the two main principles of the Standard Generalized Markup Language (SGML), which became an international standard (ISO 8879) in 1986.[6] SGML was designed as a general purpose markup scheme that can be applied to many different types of documents and in fact to any electronic information. It consists of plain ASCII files, which can easily be moved from one computer system to another. SGML is a descriptive language. Most encoding schemes prior to SGML use prescriptive markup. One example of prescriptive markup is word-processing or typesetting codes embedded in a text that give instructions to the computer such as "center the next line" or "print these words in italic." Another example is fielded data that is specific to a retrieval program, for example, reference citations or author's names, which must be in a specific format for the retrieval program to recognize them as such. By contrast, a descriptive markup language merely identifies what the components of a document are. It does not give specific instructions to any program. In it, for example, a title is encoded as a title, or a paragraph as a paragraph. This very simple approach ultimately allows much more flexibility. A printing program can print all the titles in italic, a retrieval program can search on the titles, and a hypertext program can link to and from the titles, all without making any changes to the data.

Strictly speaking, SGML itself is not a markup scheme, but a kind of computer language for defining markup, or encoding, schemes. SGML markup schemes assume that each document consists of a collection of objects that nest within each other or are related to each other in some other way. These objects or features can be almost anything. Typically they are structural components such as title, chapter, paragraph, heading, act, scene, speech, but they can also be interpretive information such as parts of speech, names of people and places, quotations (direct and indirect), and even literary or historical interpretation. The first stage of any SGML-based project is document analysis, which identifies all the textual features that are of interest and identifies the relationships between them. This step can take some time, but it is worth investing the time since a thorough document analysis can ensure that data entry proceeds smoothly and that the documents are easily processable by computer programs.

In SGML terms, the objects within a document are called elements. They are identified by a start and end tag as follows: <title>Pride and Prejudice</title>.


22

The SGML syntax allows the document designer to specify all the possible elements as a Document Type Definition (DTD), which is a kind of formal model of the document structure. The DTD indicates which elements are contained within other elements, which are optional, which can be repeated, and so forth. For example, in simple terms a journal article consists of a title, one or more author names, an optional abstract, and an optional list of keywords, followed by the body of the article. The body may contain sections, each with a heading followed by one or more paragraphs of text. The article may finish with a bibliography. The paragraphs of text may contain other features of interest, including quotations, lists, and names, as well as links to notes. A play has a rather different structure: title; author; cast list; one or more acts, each containing one or more scenes, which in turn contain one or more speeches and stage directions; and so on.

SGML elements may also have attributes that further specify or modify the element. One use of attributes may be to normalize the spelling of names for indexing purposes. For example, the name Jack Smyth could be encoded as <name norm="SmithJ"> Jack Smyth</name>, but indexed under S as if it were Smith. Attributes can also be used to normalize date forms for sorting, for example, <date norm=19970315>the Ides of March 1997</date>. Another important function of attributes is to assign a unique identifier to each instance of each SGML element within a document. These identifiers can be used as a cross-reference by any kind of hypertext program. The list of possible attributes for an element may be defined as a closed set, allowing the encoder to pick from a list, or it may be entirely open.

SGML has another very useful feature. Any piece of information can be given a name and be referred to by that name in an SGML document. These names are called entities and are enclosed in an ampersand and a semicolon. One use is for nonstandard characters, where, for example, é can be encoded as &eacute; thus ensuring that it can be transmitted easily across networks and from one machine to another. A standard list of these characters exists, but the document encoder can also create more. Entity references can also be used for any boilerplate text. This use avoids repetitive typing of words and phrases that are repeated, thus also reducing the chance of errors. An entity reference can be resolved to any amount of text from a single letter up to the equivalent of an entire chapter.

The formal structure of SGML means that the encoding of a document can be validated automatically, a process known as parsing. The parser makes use of the SGML DTD to determine the structure of the document and can thus help to eliminate whole classes of encoding errors before the document is processed by an application program. For example, an error can be detected if the DTD specifies that a journal article must have one or more authors, but the author's name has been omitted accidentally. Mistyped element names can be detected as errors, as can elements that are wrongly nested-for example, an act within a scene when the DTD specifies that acts contain scenes. Attributes can also be validated when there is a closed set of possible values. The validation process can also detect un-


23

resolved cross-references that use SGML's built-in identifiers. The SGML document structure and validation process means that any application program can operate more efficiently because it derives information from the DTD about what to expect in the document. It follows that the stricter the DTD, the easier it is to process the document. However, very strict DTDs may force the document encoder to make decisions that simplify what is being encoded. Free DTDs might better reflect the nature of the information but usually require more processing. Another advantage of SGML is very apparent here. Once a project is under way, if a document encoder finds a new feature of interest, that feature can simply be added to the DTD without the need to restructure work that has already been done. Many documents can be encoded and processed with the same DTD.

Text Encoding Initiative

The humanities computing community was among the early adopters of SGML, for two very simple reasons. Humanities primary source texts can be very complex, and they need to be shared and used by different scholars. They can be in different languages and writing systems and can contain textual variants, nonstandard characters, annotations and emendations, multiple parallel texts, and hypertext links, as well as complex canonical reference systems. In electronic form, these texts can be used for many different purposes, including the preparation of new editions, word and phrase searches, stylistic analyses and research on syntax, and other linguistic features. By 1987 it was clear that many encoding schemes existed for humanities electronic texts, but none was sufficiently powerful to allow for all the different features that might be of interest. Following a planning meeting attended by representatives of leading humanities computing projects, a major international project called the Text Encoding Initiative (TEI) was launched.[7] Sponsored by the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing, the TEI enlisted the help of volunteers all over the world to define what features might be of interest to humanities scholars working with electronic text. It built on the expertise of groups such as the Perseus Project (then at Harvard, now at Tufts University), the Brown University Women Writers Project, the Alfa Informatica Group in Groningen, Netherlands, and others who were already working with SGML, to create SGML tags that could be used for many different types of text.

The TEI published its Guidelines for the Encoding and Interchange of Electronic Texts in May 1994 after more than six years' work. The guidelines identify some four hundred tags, but of course no list of tags can be truly comprehensive, and so the TEI builds its DTDs in a way that makes it easy for users to modify them. The TEI SGML application is built on the assumption that all texts share some common core of features to which can be added tags for specific application areas. Very few tags are mandatory, and most of these are concerned with documenting the text


24

and will be further discussed below. The TEI Guidelines are simply guidelines. They serve to help the encoder identify features of interest, and they provide the DTDs with which the encoder will work. The core consists of the header, which documents the text, plus basic structural tags and common features, such as lists, abbreviations, bibliographic citations, quotations, simple names and dates, and so on. The user selects a base tag set: prose, verse, drama, dictionaries, spoken texts, or terminological data. To this are added one or more additional tag sets. The options here include simple analytic mechanisms, linking and hypertext, transcription of primary sources, critical apparatus, names and dates, and some methods of handling graphics. The TEI has also defined a method of handling nonstandard alphabets by using a Writing System Declaration, which the user specifies. This method can also be used for nonalphabetic writing systems, for example, Japanese. Building a TEI DTD has been likened to the preparation of a pizza, where the base tag set is the crust, the core tags are the tomato and cheese, and the additional tag sets are the toppings.

One of the issues addressed at the TEI planning meeting was the need for documentation of an electronic text. Many electronic texts now exist about which little is known, that is, what source text they were taken from, what decisions were made in encoding the text, or what changes have been made to the text. All this information is extremely important to a scholar wanting to work on the text, since it will determine the academic credibility of his or her work. Unknown sources are unreliable at best and lead to inferior work. Experience has shown that electronic texts are more likely to contain errors or have bits missing, but these are more difficult to detect than with printed material. It seems that one of the main reasons for this lack of documentation for electronic texts was simply that there was no common methodology for providing it.

The TEI examined various models for documenting electronic texts and concluded that some SGML elements placed as a header at the beginning of an electronic text file would be the most appropriate way of providing this information. Since the header is part of the electronic text file, it is more likely to remain with that file throughout its life. It can also be processed by the same software as the rest of the text. The TEI header contains four major sections.[8] One section is a bibliographic description of the electronic text file using SGML elements that map closely on to some MARC fields. The electronic text is an intellectual object different from the source from which it was created, and the source is thus also identified in the header. The encoding description section provides information about the principles used in encoding the text, for example, whether the spelling has been normalized, treatment of end-of-line hyphens, and so forth. For spoken texts, the header provides a way of identifying the participants in a conversation and of attaching a simple identifier to each participant that can then be used as an attribute on each utterance. The header also provides a revision history of the text, indicating who made what changes to it and when.

As far as can be ascertained, the TEI header is the first systematic attempt to


25

provide documentation for an electronic text that is a part of the text file itself. A good many projects are now using it, but experience has shown that it would perhaps benefit from some revision. Scholars find it hard to create good headers. Some elements in the header are very obvious, but the relative importance of the remaining elements is not so clear. At some institutions, librarians are creating TEI headers, but they need training in the use and importance of the nonbibliographic sections and in how the header is used by computer software other than the bibliographic tools that they are familiar with.

Encoded Archival Description (Ead)

Another SGML application that has attracted a lot of attention in the scholarly community and archival world is the Encoded Archival Description (EAD). First developed by Daniel Pitti at the University of California at Berkeley and now taken over by the Library of Congress, the EAD is an SGML application for archival finding aids.[9] Finding aids are very suitable for SGML because they are basically hierarchic in structure. In simple terms, a collection is divided into series, which consist of boxes, which contain folders, and so on. Prior to the EAD, there was no effective standard way of preparing finding aids. Typical projects created a collection level record in one of the bibliographic utilities, such as RLIN, and used their own procedures, often a word-processing program, for creating the finding aid. Possibilities now exist for using SGML to link electronic finding aids with electronic representations of the archival material itself. One such experiment, conducted at the Center for Electronic Texts in the Humanities (CETH), has created an EAD-encoded finding aid for part of the Griffis Collection at Rutgers University and has encoded a small number of the items in the collection (nineteenth-century essays) in the TEI scheme.[10] The user can work with the finding aid to locate the item of interest and then move directly to the encoded text and an image of the text to study the item in more detail. The SGML browser program Panorama allows the two DTDs to exist side by side and in fact uses an extended pointer mechanism devised by the TEI to move from one to the other.

Other Applications of SGML

SGML is now being widely adopted in the commercial world as companies see the advantage of investment in data that will move easily from one computer system to another. It is worth noting that the few books on SGML that appeared early in its life were intended for an academic audience. More recent books are intended for a commercial audience and emphasize the cost savings involved in SGML as well as the technical requirements. This is not to say that these books are not of any value to academic users. The SGML Web pages list many projects in the areas of health, legal documents, electronic journals, rail and air transport, semiconductors, the U.S. Internal Revenue Service, and more. SGML is extremely useful


26

for technical documentation, as can be evidenced by the list of customers on the Web page of one of the major SGML software companies, INSO/EBT. This list includes United Airlines, Novell, British Telecom, AT&T, Shell, Boeing, Nissan, and Volvo.

SGML need not be used only with textual data. It can be used to describe almost anything. SGML should not therefore be seen as an alternative to Acrobat, PostScript, or other document formats but as a way of describing and linking together documents in these and other formats, forming the "underground tunnels" that make the documents work for users.[11] SGML can be used to encode the searchable textual information that must accompany images or other formats in order to make them useful. With SGML, the searchable elements can be defined to fit the data exactly and can be used by different systems. This encoding is in contrast with storing image data in some proprietary database system, which is common practice. We can imagine a future situation in which a scholar wants to examine the digital image of a manuscript and also have available a searchable text. He or she may well find something of interest on the image and want to go to occurrences of the same feature elsewhere within the text. In order to do this, the encoded version of the text must know what that feature of interest is and where it occurs on the digital image. Knowing which page it is on is not enough. The exact position on the page must be encoded. This information can be represented in SGML, which thus provides the sophisticated kind of linking needed for scholarly applications. SGML structures can also point to places within a recording of speech or other sound and can be used to link the sound to a transcription of the conversation, again enabling the sound and text to be studied together. Other programs exist that can perform these functions, but the problem with all of them is that they use a proprietary data format that cannot be used for any other purpose.

SGML, HTML, and XML

The relationship between SGML and the Hypertext Markup Language (HTML) needs to be clearly understood. Although not originally designed as such, HTML is now an SGML application, even though many HTML documents exist that cannot be validated according to the rules of SGML. HTML consists of a set of elements that are interpreted by Web browsers for display purposes. The HTML tags were designed for display and not for other kinds of analysis, which is why only crude searches are possible on Web documents. HTML is a rather curious mixture of elements. Larger ones, such as <body>; <h1>, <h2>, and so on for head levels; <p> for paragraph; and <ul> for unordered list, are structural, but the smaller elements, such as <b> for bold and <i> for italic, are typographic, which, as we have seen above, are ambiguous and thus cannot be searched effectively. HTML version 3 attempts to rectify this ambiguity somewhat by introducing a few semantic level elements, but these are very few in comparison with those


27

identified in the TEI core set. HTML can be a good introduction to structured markup. Since it is so easy to create, many project managers begin by using HTML and graduate to SGML once they become used to working with structured text and begin to see the weakness of HTML for anything other than the display of text. SGML can easily be converted automatically to HTML for delivery on the Web, and Web clients have been written for the major SGML retrieval programs.

The move from HTML to SGML can be substantial, and in 1996 work began on XML (Extensible Markup Language), which is a simplified version of SGML for delivery on the Web. It is "an extremely simple dialect of SGML," the goal of which "is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML" (see http://www.w3.org/TR/REC-xml ). XML is being developed under the auspices of the World Wide Web Consortium, and the first draft of the specification for it was available by the SGML conference in December 1996. Essentially XML is SGML with some of the more complex and esoteric features removed. It has been designed for interoperability with both SGML and HTML-that is, to fill the gap between HTML, which is too simple, and fullblown SGML, which can be complicated. As yet there is no specific XML software, but the work of this group has considerable backing and the design of XML has proceeded quickly.[12]

SGML and New Models of Scholarship

SGML's objectlike structures make it possible for scholarly communication to be seen as "chunks" of information that can be put together in different ways. Using SGML, we no longer have to squeeze the product of our research into a single linear sequence of text whose size is often determined by the physical medium in which it will appear; instead we can organize it in many different ways, privileging some for one audience and others for a different audience. Some projects are already exploiting this potential, and I am collaborating in two that are indicative of the way I think humanities scholarship will develop in the twenty-first century. Both projects make use of SGML to create information objects that can be delivered in many different ways.

The Model Editions Partnership (MEP) is defining a set of models for electronic documentary editions.[13] Directed by David Chesnutt of the University of South Carolina, with the TEI Editor, C. Michael Sperberg-McQueen, and myself as co-coordinators, the MEP also includes seven documentary editing projects. Two of these projects are creating image editions, and the other five are preparing letterpress publications. These documentary editions provide the basic source material for the study of American history by adding the historical context that makes the material meaningful to readers. Much of this source material consists of letters, which often refer to people and places by words that only the author and recipient understand. A good deal of the source material is in handwriting that


28

can be read only by scholars specializing in the field. Documentary editors prepare the material for publication by transcribing the documents, organizing the sources into a coherent sequence that tells the story (the history) behind them, and annotating them with information to help the reader understand them. However, the printed page is not a very good vehicle for conveying the information that documentary editors need to say. It forces one organizing principle on the material (the single linear sequence of the book) when the material could well be organized in several different ways (chronologically, for example, or by recipient of letters). Notes must appear at the end of an item to which they refer or at the end of the book. When the same note-for example, a short biographical sketch of somebody mentioned in the sources-is needed in several places, it can appear only once, and after that it is cross-referenced by page numbers, often to earlier volumes. Something that has been crossed out and rewritten in a source document can only be represented clumsily in print even though it may reflect a change of mind that altered the course of history.

At the beginning of the MEP project, the three coordinators visited all seven partner projects, showed the project participants some very simple demonstrations, and then invited them to "dream" about what they would like to do in this new medium. The ideas collected during these visits were incorporated into a prospectus for electronic documentary editions. The MEP sees SGML as the key to providing all the functionality outlined in the prospectus. The MEP has developed an SGML DTD for documentary editions that is based on the TEI and has begun to experiment with delivery of samples from the partner projects. The material for the image editions is wrapped up in an "SGML envelope" that provides the tools to access the images. This envelope can be generated automatically from the relational databases in which the image access information is now stored. For the letterpress editions, many more possibilities are apparent. If desired, it will be possible to merge material from different projects that are working on the same period of history. It will be possible to select subsets of the material easily by any of the tagged features. This means that editions for high school students or the general public could be created almost automatically from the archive of scholarly material. With a click of a mouse, the user can go from a diplomatic edition to a clear reading text and thus trace the author's thoughts as the document was being written. The documentary editions also include very detailed conceptual indexes compiled by the editors. It will be possible to use these indexes as an entry point to the text and also to merge indexes from different projects. The MEP sees the need for making dead text image representations of existing published editions available quickly and believes that these can be made much more useful by wrapping them in SGML and using the conceptual indexes as an entry point to them.

The second project is even more ambitious than the MEP, since it is dealing with entirely new material and has been funded for five years. The Orlando Project at the Universities of Alberta and Guelph is a major collaborative research ini-


29

tiative funded by the Canadian Social Sciences and Humanities Research Council.[14] Directed by Patricia Clements, the project is to create an Integrated History of Women's Writing in the British Isles, which will appear in print and electronic formats. A team of graduate research assistants is carrying out basic research for the project in libraries and elsewhere. The research material they are assembling is being encoded in SGML so that it can be retrieved in many different ways. SGML DTDs have been designed to reflect the biographical details for each woman writer as well as her writing history, other historical events that influenced her writing, a thesaurus of keyword terms, and so forth. The DTDs are based on the TEI but they incorporate much descriptive and interpretive information, reflecting the nature of the research and the views of the literary scholars in the team. Tag sets have been devised for topics such as the issues of authorship and attribution, genre issues, and issues of reception of an author's work.

The Orlando Project is thus building up an SGML-encoded database of many different kinds of information about women's writing in the British Isles. The SGML encoding, for example, greatly assists in the preparation of a chronology by allowing the project to pull out all chronology items from the different documents and sort them by their dates. It facilitates an overview of where the women writers lived, their social background, and what external factors influenced their writing. It helps with the creation and consistency of new entries, since the researchers can see immediately if similar information has already been encountered. The authors of the print volumes will draw on this SGML archive as they write, but the archive can also be used to create many different hypertext products for research and teaching.

Both Orlando and the MEP are, essentially, working with pieces of information, which can be linked in many different ways. The linking, or rather the interpretation that gives rise to the linking, is what humanities scholarship is about. When the information is stored as encoded pieces of information, it can be put together in many different ways and used for many different purposes, of which creating a print publication is only one. We can expect other projects to begin to work in this way as they see the advantages of encoding the features of interest in their material and of manipulating them in different ways.

It is useful to look briefly at some other possibilities. Dictionary publishers were among the first to use SGML. (Although not strictly SGML since it does not have a DTD, the Oxford English Dictionary was the first academic project to use structured markup.) When well designed, the markup enables the dictionary publishers to create spin-off products for different audiences by selecting a subset of the tagged components of an entry. A similar process can be used for other kinds of reference works. Tables of contents, bibliographies, and indexes can all be compiled automatically from SGML markup and can also be cumulative across volumes or collections of material.

The MEP is just one project that uses SGML for scholarly editions. A notable


30

example is the CD-ROM of Chaucer's Wife of Bath's Prologue, prepared by Peter Robinson and published by Cambridge University Press in 1996. This CD-ROM contains all fifty-eight pre-1500 manuscripts of the text, with encoding for all the variant readings as well as digitized images of every page of all the manuscripts. Software programs provided with the CD-ROM can manipulate the material in many different ways, enabling a scholar to collate manuscripts, move immediately from one manuscript to another, and compare transcriptions, spellings, and readings. All the material is encoded in SGML, and the CD-ROM includes more than one million hypertext links generated by a computer program, which means that the investment in the project's data is carried forward from one delivery system to another, indefinitely, into the future.

Making SGML Work Effectively

Getting started with SGML can seem to be a big hurdle to overcome, but in fact the actual mechanics of working with SGML are nowhere near as difficult as is often assumed. SGML tags are rarely typed in, but are normally inserted by software programs such as Author/Editor. These programs can incorporate a template that is filled in with data. Like other SGML software, these programs make use of the DTD. They know which tags are valid at any position in the document and can offer only those tags to the user, who can pick from a menu. They can also provide a pick list of attributes and their values if the values are a closed set. These programs ensure that what is produced is a valid SGML document. They can also toggle the display of tags on and off very easily-Author/Editor encloses them in boxes that are very easy to see. The programs also incorporate style sheets that define the display format for every element.

Nevertheless, inserting tags in this way can be rather cumbersome, and various software tools exist to help in the translation of "legacy" data to SGML. Of course, these tools cannot add intelligence to data if it was not there in the legacy format, but they can do a reasonable and low-cost job of converting material for large-scale projects in which only broad structural information is needed. For UNIX users, the shareware program sgmls and its successor, sp, are excellent tools for validating SGML documents and can be incorporated in processing programs. There are also ways in which the markup can be minimized. End tags can be omitted in some circumstances, for example, in a list where the start of a new list item implies that the previous one has ended.

There is no doubt that SGML is considered expensive by some project managers, but further down the line the payoff can be seen many times over. The quick and dirty solution to a computing problem does not last very long, and history has shown how much time can be wasted converting from one system to another or how much data can be lost because they are in a proprietary system. It is rather surprising that the simple notion of encoding what the parts of a document are,


31

rather than what the computer is supposed to do with them, took so long to catch on. Much of the investment in any computer project is in the data, and SGML is the best way we know so far of ensuring that the data will last for a long time and that they can be used and reused for many different purposes. It also ensures that the project is not dependent on one software vendor.

The amount of encoding is obviously a key factor in the cost, and so any discussion about the cost-effectiveness of an SGML project should always be made with reference to the specific DTD in use and the level of markup to be inserted. Statements that SGML costs x dollars per page are not meaningful without further qualification. Unfortunately at present such further qualification seems rarely to be the case, and misconceptions often occur. It is quite possible, although clearly not sensible, to have a valid SGML document that consists of one start tag at the beginning and one corresponding end tag at the end with no other markup in between. At the other extreme, each word (or even letter) in the document could have several layers of markup attached to it. What is clear is that the more markup there is, the more useful the document is and the more expensive it is to create. As far as I am aware, little research has been done on the optimum level of markup, but at least with SGML it is possible to add markup to a document later without prejudicing what is already encoded.

In my view, it is virtually impossible to make some general cost statements for SGML-based work. Each project needs to be assessed differently depending on its current situation and its objectives.[15] However, I will attempt to discuss some of the issues and the items that make up the overall cost. Many of the costs of an SGML-based project are no different from those of other computer-based projects in that both have start-up costs and ongoing costs.

Start-up costs can depend on how much computing experience and expertise there already is in the organization. Projects that are being started now have the advantage of not being encumbered by large amounts of legacy data and proprietary systems, but they also will need to be started from scratch with the three things that make any computer project work: hardware, software, and skilled people. Hardware costs are insignificant these days, and SGML software will work on almost any current PC or UNIX-based hardware. It does not need an expensive proprietary system. An individual scholar can acquire PC software for creating and viewing SGML-encoded text for under $500. Public domain UNIX tools cost nothing to acquire. That leaves what is, in my view, the most essential component of any computing project, namely, people with good technical skills. Unfortunately, these people are expensive. The market is such that they can expect higher salaries than librarians and publishers receive at the equivalent stages in their careers. However, I think that it is unwise for any organization to embark on a computer-based project without having staff with the proper skills to do the work. Like people in all other disciplines, computer people specialize in one or two areas, and so it is important to hire staff with the right computing skills and thus impor-


32

tant for the person doing the hiring to understand what those skills should be. There are still not many SGML specialists around, but someone with a good basic background in computing could be trained in SGML at a commercial or academic course in a week or so, with some follow-up time for experimentation. This person can then use mailing lists and the SGML Web site to keep in touch with new developments. Having the right kind of technical person around early on in any computing project also means that there is somebody who can advise on the development of the system and ensure that expensive mistakes are not made by decision makers who have had little previous involvement with computing systems. The technical person will also be able to see immediately where costs can be saved by implementing shortcuts.

The one specific start-up cost with SGML is the choice or development of the DTD. Many digital library projects are utilizing existing DTDs-for example, the cut-down version of the TEI called TEILite-either with no changes at all or with only light modifications. However, I think that it is important for project managers to look hard at an existing DTD to see whether it really satisfies their requirements rather than just decide to use it because everyone else they know is using it. A project in a very specialized area may need to have its own DTD developed. This could mean the hiring of SGML consultants for a few days plus time spent by the project's own staff in specifying the objectives of the project in great detail and in defining and refining the features of interest within the project's documents.

Computer-based projects seldom proceed smoothly, and in the start-up phase, time must be allowed for false starts and revisions. SGML is no different here, but by its nature it does force project managers to consider very many aspects at the beginning and thus help prevent the project from going a long way down a wrong road. SGML elements can also be used to assist with essential aspects of project administration, for example, tags for document control and management.

Ongoing costs are largely concerned with document creation and encoding, but they also include general maintenance, upgrades, and revisions. If the material is not already in electronic form, it may be possible to convert it by optical character recognition (OCR). The accuracy of the result will depend on the quality of the type fonts and paper of the original, but the document will almost certainly need to be proofread and edited to reach the level of quality acceptable to the scholarly community. OCR also yields a typographic representation of a document, which is ambiguous for other kinds of computer processing. Whether it comes from word processors or OCR, typographic encoding needs to be converted to SGML. It is possible to write programs or purchase software tools to do this, but only those features that can be unambiguously defined can be converted in this way. Any markup that requires interpretive judgment must be inserted manually at the cost of human time. Most electronic text projects in the humanities have had the material entered directly by keyboarding, not only to attain higher levels of accuracy than with OCR, but also to insert markup at the same time. More often than not, project managers employ graduate students for this


33

work, supervised by a textbase manager who keeps records of decisions made in the encoding and often assisted by a programmer who can identify shortcuts and write programs where necessary to handle these shortcuts.

There are also, of course, costs associated with delivering SGML-encoded material once it has been created. These costs fall into much the same categories as the costs for creating the material. Start-up costs include the choice and installation of delivery software. In practice, most digital library projects use the Opentext search engine, which is affordable for a library or a publisher. The search engine also needs a Web client, which need not be a big task for a programmer. Naturally it takes longer to write a better Web client, but a better client may save end users much time as they sort through the results of a query. Opentext is essentially a retrieval program, and it does not provide much in the way of hypertext linking. INSO/EBT's suite of programs, including DynaText and DynaWeb, provides a model of a document that is much more like an electronic book with hypertext links. INSO's Higher Education Grant Program has enabled projects like MEP and Orlando to deliver samples of their material without the need to purchase SGML delivery software. INSO offers some technical support as part of the grant package, but skilled staff are once again the key component for getting a delivery system up and running. When any delivery system is functioning well, the addition of new SGML-encoded material to the document database can be fully automated with little need for human intervention unless something goes wrong.

Experience has shown that computer-based projects are rarely, if ever, finished. They will always need maintenance and upgrades and will incur ongoing costs more or less forever if the material is not to be lost. SGML seems to me to be the best way of investing for the future, since there are no technical problems in migrating it to new systems. However, I find it difficult to envisage a time when there will be no work and no expense involved with maintaining and updating electronic information. It is as well to understand this and to budget for these ongoing costs at the beginning of a project rather than have them come out gradually as the project proceeds.

SGML does have one significant weakness. It assumes that each document is a single hierarchic structure, but in the real world (at least of the humanities) very few documents are as simple as this.[16] For example, a printed edition of a play has one structure of acts, scenes, and speeches and another of pages and line numbers. A new act or scene does not normally start on a new page, and so there is no relationship between the pages and the act and scene structure. It is simply an accident of the typography. The problem arises even with paragraphs in prose texts, since a new page does not start with a new paragraph or a new paragraph with a new page. For well-known editions the page numbers are important, but they cannot easily be encoded in SGML other than as "empty" tags that simply indicate a point in the text, not the beginning and end of a piece of information. The disadvantage here is that the processing of information marked by empty tags cannot make full use of SGML's capabilities. Another example of the same problem is


34

quotations spanning over paragraphs. They have to be closed and then opened again with attributes to indicate that they are really all the same quotation.

For many scholars, SGML is exciting to work with because it opens up so many more possibilities for working with source material. We now have a much better way than ever before of representing in electronic form the kinds of interpretation and discussion that are the basis of scholarship in the humanities. But as we begin to understand these new possibilities, some new challenges appear.[17] What happens when documents from different sources (and thus different DTDs) are merged into the same database? In theory, computers make it very easy to do this, but how do we merge material that has been encoded according to different theoretical perspectives and retain the identification and individuality of each perspective? It is possible to build some kind of "mega-DTD," but the mega-DTD may become so free in structure that it is difficult to do any useful processing of the material.

Attention must now turn to making SGML work more effectively. Finding better ways of adding markup to documents is a high priority. The tagging could be speeded up by a program that can make intelligent tagging guesses based on information it has derived from similar material that has already been tagged, in much the same way that some word class tagging programs "learn" from text that has already been tagged manually. We also need to find ways of linking encoded text to digital images of the same material without the need for hand coding. Easier ways must be found for handling multiple parallel structures. All research leading to better use of SGML could benefit from a detailed analysis of documents that have already been encoded in SGML. The very fact that they are in SGML makes this analysis easy to do.


37

Chapter 2—
Digital Image Quality
From Conversion to Presentation and Beyond

Anne R. Kenney

There are a number of significant digital library projects under way that are designed to test the economic value of building digital versus physical libraries. Business cases are being developed that demonstrate the economies of digital applications to assist cultural and research institutions in their response to the challenges of the information explosion, spiraling storage and subscription costs, and increasing user demands. These projects also reveal that the costs of selecting, converting, and making digital information available can be staggering and that the costs of archiving and migrating that information over time are not insignificant.

Economic models comparing the digital to the traditional library show that digital will become more cost-effective provided the following four assumptions prove true:

1. that institutions can share digital collections,

2. that digital collections can alleviate the need to support full traditional libraries at the local level,

3. that use will increase with electronic access, and

4. that the long-term value of digital collections will exceed the costs associated with their creation, maintenance, and delivery.[1]

These four assumptions-resource sharing, lower costs, satisfaction of user demands with timely and enhanced access, and continuing value of information-presume that electronic files will have relevant content and will meet baseline measures of functionality over time. Although a number of conferences and publications have addressed the need to develop selection criteria for digital conversion and to evaluate the effective use of digitized material, more rhetoric than substantive information has emerged regarding the impact on scholarly research of creating digital collections and making them accessible over networks.

I believe that digital conversion efforts will prove economically viable only if they focus on creating electronic resources for long-term use. Retrospective


38

sources should be carefully selected based on their intellectual content; digital surrogates should effectively capture that intellectual content; and access should be more timely, usable, or cost-effective than is possible with original source documents. In sum, I argue that long-term utility should be defined by the informational value and functionality of digital images, not limited by technical decisions made at the point of conversion or anywhere else along the digitization chain. I advocate a strategy of "full informational capture" to ensure that digital objects rich enough to be useful over time are created in the most cost-effective manner.[2]

There is much to be said for capturing the best possible digital image. From a preservation perspective, the advantages are obvious. An "archival" digital master can be created to replace rapidly deteriorating originals or to reduce storage costs and access times to office back files, provided the digital surrogate is a trusted representation of the hard copy source. It also makes economic sense, as Michael Lesk has noted, to "turn the pages once" and produce a sufficiently high-level image so as to avoid the expense of reconverting at a later date when technological advances require or can effectively utilize a richer digital file.[3] This economic justification is particularly compelling as the labor costs associated with identifying, preparing, inspecting, and indexing digital information far exceed the costs of the scan itself. In recent years, the costs of scanning and storage have declined rapidly, narrowing the gap between high-quality and low-quality digital image capture. Once created, the archival master can then be used to generate derivatives to meet a variety of current and future user needs: high resolution may be required for printed facsimiles, for on-screen detailed study,[4] and in the future for intensive image processing; moderate to high resolution may be required for character recognition systems and image summarization techniques;[5] and lower resolution images, encoded text, or PDFs derived from the digital masters may be required for on-screen display and browsing.[6] The quality, utility, and expense of all these derivatives will be directly affected by the quality of the initial scan.[7]

If there are compelling reasons for creating the best possible image, there is also much to be said for not capturing more than you need. At some point, adding more resolution will not result in greater quality, just a larger file size and higher costs. The key is to match the conversion process to the informational content of the original. At Cornell, we've been investigating digital imaging in a preservation context for eight years. For the first three years, we concentrated on what was technologically possible-on determining the best image capture we could secure. For the last five years, we've been striving to define the minimal requirements for satisfying informational capture needs. No more, no less.

Digital Benchmarking

To help us determine what is minimally acceptable, we have been developing a methodology called benchmarking. Digital benchmarking is a systematic proce-


39

dure used to forecast a likely outcome. It begins with an assessment of the source documents and user needs; factors in relevant objective and subjective variables associated with stated quality, cost, and/or performance objectives; involves the use of formulas that represent the interrelationship of those variables to desired outcomes; and concludes with confirmation through carefully structured testing and evaluation. If the benchmarking formula does not consistently predict the outcome, it may not contain the relevant variables or reflect their proper relationship-in which case it should be revised.

Benchmarking does not provide easy answers but a means to evaluate possible answers for how best to balance quality, costs, timeliness, user requirements, and technological capabilities in the conversion, delivery, and maintenance of digital resources. It is also intended as a means to formulate a range of possible solutions on the macro level rather than on an individual, case-by-case basis. For many aspects of digital imaging, benchmarking is still uncharted territory. Much work remains in defining conversion requirements for certain document types (photographs and complex book illustrations, for example); in conveying color information; in evaluating the effects of new compression algorithms; and in providing access on a mass scale to a digital database of material representing a wide range of document types and document characteristics.

We began benchmarking with the conversion of printed text. We anticipate that within several years, quality benchmarks for image capture and presentation of the broad range of paper- and film-based research materials-including manuscripts, graphic art, halftones, and photographs-will be well defined through a number of projects currently under way.[8] In general, these projects are designed to be system independent and are based increasingly on assessing the attributes and functionality characteristic of the source documents themselves coupled with an understanding of user perceptions and requirements.

Why Do Benchmarking?

Because there are no standards for image quality and because different document types require different scanning processes, there is no "silver bullet" for conversion. This frustrates many librarians and archivists who are seeking a simple solution to a complex issue. I suppose if there really were the need for a silver bullet, I'd recommend that most source documents be scanned at a minimum of 600 dpi with 24-bit color, but that would result in tremendously large file sizes and a hefty conversion cost. You would also be left with the problems of transmitting and displaying those images.

We began benchmarking with conversion, but we are now applying this approach to the presentation of information on-screen. The number of variables that govern display are many, and it will come as no surprise that they preclude the establishment of a single best method for presenting digital images. But here, too,


40

the urge is strong to seek a single solution. If display requirements paralleled conversion requirements-that is, if a 600 dpi, 24-bit image had to be presented onscreen, then at best, with the highest resolution monitors commercially available, only documents whose physical dimensions did not exceed 2.7" ³ 2.13" could be displayed-and they could not be displayed at their native size. Now most of us are interested in converting and displaying items that are larger than postage stamps, so these "simple solutions" are for most purposes impractical, and compromises will have to be made.

The object of benchmarking is to make informed decisions about a range of choices and to understand in advance the consequences of such decisions. The benchmarking approach can be applied across the full continuum of the digitization chain, from conversion to storage to access to presentation. Our belief at Cornell is that benchmarking must be approached holistically, that it is essential to understand at the point of selection what the consequences will be for conversion and presentation. This is especially important as institutions consider inaugurating large-scale conversion projects. Toward this end, the advantages of benchmarking are several in number.

1. Benchmarking is first and foremost a management tool, designed to lead to informed decision making. It offers a starting point and a means for narrowing the range of choices to a manageable number. Although clearly benchmarking decisions must be judged through actual implementations, the time spent in experimentation can be reduced, the temptation to overstate or understate requirements may be avoided, and the initial assessment requires no specialized equipment or expenditure of funds. Benchmarking allows you to scale knowledgeably and to make decisions on a macro level rather than to determine those requirements through item-by-item review or by setting requirements for groups of materials that may be adequate for only a portion of them.

2. Benchmarking provides a means for interpreting vendor claims. If you have spent any time reading product literature, you may have become convinced, as I have, that the sole aim of any company is to sell its product. Technical information will be presented in the most favorable light, which is often incomplete and intended to discourage product comparisons. One film scanner, for instance, may be advertised as having a resolution of 7500 dpi; another may claim 400 dpi. In reality, these two scanners could provide the very same capabilities, but it may be difficult to determine that without additional information. You may end up spending considerable time on the phone, first getting past the marketing representatives and then closely questioning those with a technical understanding of the product's capabilities. If you have benchmarked your requirements, you will be able to focus the discussion on your particular needs.


41

3. Benchmarking can assist you in negotiating with vendors for services and products. I've spent many years advocating the use of 600 dpi bitonal scanning for printed text, and invariably when I begin a discussion with a representative of an imaging service bureau, he will try to talk me out of such a high resolution, claiming that I do not need it or that it will be exorbitantly expensive. I suspect the representative is motivated to make those claims in part because he believes them and in part because the company may not provide that service and the salesperson wants my business. If I had not benchmarked my resolution requirements, I might be persuaded by what this salesperson has to say.

4. Benchmarking can lead to careful management of resources. If you know up front what your requirements are likely to be and the consequences of those requirements, you can develop a budget that reflects the actual costs, identify prerequisites for meeting those needs, and, perhaps most important, avoid costly mistakes. Nothing will doom an imaging project more quickly than buying the wrong equipment or having to manage image files that are not supported by your institution's technical infrastructure.

5. Benchmarking can also allow you to predict what you can deliver under specific conditions. It is important to understand that an imaging project may break at the weakest link in the digitization chain. For instance, if your institution is considering scanning its map collection, you should be realistic about what ultimately can be delivered to the user's desktop. Benchmarking lets you predict how much of the image and what level of detail can be presented on-screen for various monitors. Even with the most expensive monitor available, presenting oversize material completely, with small detail intact, is impractical.

Having spent some time extolling the virtues of digital benchmarking, I'd like to turn next to describing this methodology as it applies to conversion and then to move to a discussion of on-screen presentation.

Conversion Benchmarking

Determining what constitutes informational content becomes the first step in the conversion benchmarking process. This can be done objectively or subjectively. Let's consider an objective approach first.

Objective Evaluation

One way to perform an objective evaluation would be to determine conversion requirements based on the process used to create the original document. Take resolution, for instance. Film resolution can be measured by the size of the silver crystalline clusters suspended in an emulsion, whose distinct characteristics are appreciated only under microscopic examination. Should we aim for capturing the


42

properties of the chemical process used to create the original? Or should we peg resolution requirements at the recording capability of the camera or printer used?

There are objective scientific tests that can measure the overall information carrying capacity of an imaging system, such as the Modulation Transfer Function, but such tests require expensive equipment and are still beyond the capability of most institutions except industrial or research labs.[9] In practical applications, the resolving power of a microfilm camera is measured by means of a technical test chart in which the distinct number of black and white lines discerned is multiplied by the reduction ratio used to determine the number of line pairs per millimeter. A system resolution of 120 line pairs per millimeter (lppm) is considered good; above 120 is considered excellent. To digitally capture all the information present on a 35 mm frame of film with a resolution of 120 Ippm would take a bitonal film scanner with a pixel array of 12,240.[10] There is no such beast on the market today.

How far down this path should we go? It may be appropriate to require that the digital image accurately depict the gouges of a woodcut or the scoops of a stipple engraving, but what about the exact dot pattern and screen ruling of a halftone? the strokes and acid bite of an etching? the black lace of an aquatint that becomes visible only at a magnification above 25³ ? Offset publications are printed at 1200 dpi-should we choose that resolution as our starting point for scanning text?

Significant information may well be present at that level in some cases, as may be argued for medical X rays, but in other cases, attempting to capture all possible information will far exceed the inherent properties of the image as distinct from the medium and process used to create it. Consider for instance a 4" ³ 5" negative of a badly blurred photograph. The negative is incredibly information dense, but the information it conveys is not significant.

Obviously, any practical application of digital conversion would be overwhelmed by the recording, computing, and storage requirements that would be needed to support capture at the structure or process level. Although offset printing may be produced at 1200 dpi, most individuals would not be able to discern the difference between a 600 dpi and a 1000 dpi digital image of that page, even under magnification. The higher resolution adds more bits and increases the file size but with little to no appreciable gain. The difference between 300 dpi and 600 dpi, however, can be easily observed and, in my opinion, is worth the extra time and expense to obtain. The relationship between resolution and image quality is not linear: at some point as resolution increases, the gain in image quality will level off. Benchmarking will help you to determine where the leveling begins.

Subjective Evaluation

I would argue, then, that determining what constitutes informational content is best done subjectively. It should be based on an assessment of the attributes of the document rather than the process used to create that document. Reformatting via


43

digital-or analog-techniques presumes that the essential meaning of an original can somehow be captured and presented in another format. There is always some loss of information when an object is copied. The key is to determine whether that informational loss is significant. Obviously for some items, particularly those of intrinsic value, a copy can serve only as a surrogate, not as a replacement. This determination should be made by those with curatorial responsibility and a good understanding of the nature and significance of the material. Those with a trained eye should consider the attributes of the document itself as well as the immediate and potential uses that researchers will make of its informational content.

Determining Scanning Resolution Requirements for Replacement Purposes

To illustrate benchmarking for conversion, let's consider the brittle book. For brittle books published during the last century and a half, detail has come to represent the size of the smallest significant character in the text, usually the lowercase e. To capture this information-which consists of black ink on a light background-resolution is the key determinant of image quality.

Benchmarking resolution requirements in a digital world have their roots in micrographics, where standards for predicting image quality are based on the Quality Index (QI). QI provides a means for relating system resolution and text legibility. It is based on multiplying the height of the smallest significant character, h, by the smallest line pair pattern resolved by a camera on a technical test target, p: QI = h³p. The resulting number is called the Quality Index, and it is used to forecast levels of image quality-marginal (3.6), medium (5.0), or high (8.0)-that will be achieved on the film. This approach can be used in the digital world, but the differences in the ways microfilm cameras and scanners capture detail must be accounted for.[11] Specifically, it is necessary to make the following adjustments:

1. Establish levels of image quality for digitally rendered characters that are analogous to those established for microfilming. In photographically reproduced images, quality degradation results in a fuzzy or blurred image. Usually degradation with digital conversion is revealed in the ragged or stairstepping appearance of diagonal lines or curves, known as aliasing, or "jaggies."

2. Rationalize system measurements. Digital resolution is measured in dots per inch; classic resolution is measured in line pairs per millimeter. To calculate QI based on scanning resolution, you must convert from one to the other. One millimeter equals 0.039 inches, so to determine the number of dots per millimeter, multiply the dpi by 0.039.

3. Equate dots to line pairs. Again, classic resolution refers to line pairs per millimeter (one black line and one white line), and since a dot occupies the same space as a line, two dots must be used to represent one line pair. This means the dpi must be divided by two to be made equivalent to p.


44

With these adjustments, we can modify the QI formula to create a digital equivalent. From QI = p ³h, we now have QI = 0.039 dpi ³h/2, which can be simplified to 0.0195 dpi ³h.

For bitonal scanning, we would also want to adjust for possible misregistration due to sampling errors brought about in the thresholding process in which all pixels are reduced to either black or white. To be on the conservative side, the authors of AIIM TR26-1993 advise increasing the input scanning resolution by at least 50% to compensate for possible image detector misalignment. The formula would then be QI = 0.039 dpi ³h/3, which can be simplified to 0.013 dpi ³h.

So How Does Conversion Benchmarking Work?

Consider a printed page that contains characters measuring 2 mm high or greater. If the page were scanned at 300 dpi, what level of quality would you expect to obtain? By plugging in the dpi and the character height and solving for QI, you would discover that you can expect a QI of 8, or excellent rendering.

You can also solve the equation for the other variables. Consider, for example, a scanner with a maximum of 400 dpi. You can benchmark the size of the smallest character that you could capture with medium quality (a QI of 5), which would be .96 mm high. Or you can calculate the input scanning resolution required to achieve excellent rendering of a character that is 3 mm high (200 dpi).

With this formula and an understanding of the nature of your source documents, you can benchmark the scanning resolution needs for printed material. We took this knowledge and applied it to the types of documents we were scanning- brittle books published from 1850 to 1950. We reviewed printers' type sizes commonly used by publishers during this period and discovered that virtually none utilized type fonts smaller than I mm in height, which, according to our benchmarking formula, could be captured with excellent quality using 600 dpi bitonal scanning. We then tested these benchmarks by conducting an extensive on-screen and in-print examination of digital facsimiles for the smallest font-sized Roman and non-Roman type scripts used during this period. This verification process confirmed that an input scanning resolution of 600 dpi was indeed sufficient to capture the monochrome text-based information contained in virtually all books published during the period of paper's greatest brittleness. Although many of those books do not contain text that is as small as I mm in height, a sufficient number of them do. To avoid the labor and expense of performing item-by-item review, we currently scan all books at 600 dpi resolution.[12]

Conversion Benchmarking beyond Text

Although we've conducted most of our experiments on printed text, we are beginning to benchmark resolution requirements for nontextual documents as well. For non-text-based material, we have begun to develop a benchmarking formula that would be based on the width of the smallest stroke or mark on the page rather


45

than a complete detail. This approach was used by the Nordic Digital Research Institute to determine resolution requirements for the conversion of historic Icelandic maps and is being followed in the current New York State Kodak Photo CD project being conducted at Cornell on behalf of the Eleven Comprehensive Research Libraries of New York State.[13] The measurement of such fine detail will require the use of a 25 to 50 ³ loupe with a metric hairline that differentiates below 0.1 mm.

Benchmarking for conversion can be extended beyond resolution to tonal reproduction (both grayscale and color); to the capture of depth, overlay, and translucency; to assessing the effects of compression techniques and levels of compression used on image quality; to evaluating the capabilities of a particular scanning methodology, such as the Kodak Photo CD format. Benchmarking can also be used for evaluating quality requirements for a particular category of material- halftones, for example-or to examine the relationship between the size of the document and the size of its significant details, a very challenging relationship that affects both the conversion and the presentation of maps, newspapers, architectural drawings, and other oversized, highly detailed source documents.

In sum, conversion benchmarking involves both subjective and objective components. There must be the means to establish levels of quality (through technical targets or samples of acceptable materials), the means to identify and measure significant information present in the document, the means to relate one to another via a formula, and the means to judge results on-screen and in-print for a sample group of documents. Armed with this information, benchmarking enables informed decision making-which often leads to a balancing act involving tradeoffs between quality and cost, between quality and completeness, between completeness and size, or between quality and speed.

Display Benchmarking

Quality assessments can be extended beyond capture requirements to the presentation and timeliness of delivery options. We began our benchmarking for conversion with the attributes of the source documents. We begin our benchmarking for display with the attributes of the digital images.

I believe that all researchers in their heart of hearts expect three things from displayed digital images: (I) they want the full-size image to be presented onscreen; (2) they expect legibility and adequate color rendering; and (3) they want images to be displayed quickly. Of course they want lots of other things, too, such as the means to manipulate, annotate, and compare images, and for text-based material, they want to be able to conduct key word searches across the images. But for the moment, let's just consider those three requirements: full image, full detail and tonal reproduction, and quick display.

Unfortunately, for many categories of documents, satisfying all three criteria at


46

once will be a problem, given the limitations of screen design, computing capabilities, and network speeds. Benchmarking screen display must take all these variables into consideration and the attributes of the digital images themselves as user expectations are weighed one against the other. We are just beginning to investigate this interrelationship at Cornell, and although our findings are still tentative and not broadly confirmed through experimentation, I'm convinced that display benchmarking will offer the same advantages as conversion benchmarking to research institutions that are beginning to make their materials available electronically.[14]

Now for the good news: it is easy to display the complete image and it is possible to display it quickly. It is easy to ensure screen legibility-in fact, intensive scrutiny of highly detailed information is facilitated on-screen. Color fidelity is a little more difficult to deliver, but progress is occurring on that front.[15]

Now for the not-so-good news: given common desktop computer configurations, it may not be possible to deliver full 24-bit color to the screen-the monitor may have the native capability but not enough video memory, or its refresh rate cannot sustain a nonflickering image. The complete image that is quickly displayed may not be legible. A highly detailed image may take a long time to deliver, and only a small percent of it will be seen at any given time. You may call up a photograph of Yul Brynner only to discover you have landed somewhere on his bald pate.

Benchmarking will allow you to predict in advance the pros and cons of digital image display. Conflicts between legibility and completeness, between timeliness and detail, can be identified and compromises developed. Benchmarking allows you to predetermine a set process for delivering images of uniform size and content and to assess how well that process will accommodate other document types. Scaling to 72 dpi and adding 3 bits of gray may be a good choice for technical reports produced at 10-point type and above but will be totally inadequate for delivering digital renderings of full-size newspapers.

To illustrate benchmarking as it applies to display, consider the first two user expectations: complete display and legibility. We expect printed facsimiles produced from digital images to look very similar to the original. They should be the same size, preserve the layout, and convey detail and tonal information that is faithful to the original. Many readers assume that the digital image on-screen can also be the same, that if the page were correctly converted, it could be brought up at approximately the same size and with the same level of detail as the original. It is certainly possible to scale the image to be the same size as the original document, but most likely the information contained therein will not be legible.

If the scanned image's dpi does not equal the screen dpi, then the image onscreen will appear either larger or smaller than the original document's size. Because scanning dpi most often exceeds the screen dpi, the image will appear larger on the screen-and chances are that not all of it will be represented at once. This is because monitors have a limited number of pixels that can be displayed both


47

horizontally and vertically. If the number of pixels in the image exceeds those of the screen and if the scanning dpi is higher, the image will be enlarged on the screen and will not be completely presented.

The problems of presenting completeness, detail, and native size are more pronounced in on-screen display than in printing. In the latter, very high printing resolutions are possible, and the total number of dots that can be laid down for a given image is great, enabling the creation of facsimiles that are the same size- and often with the same detail-as the original.

The limited pixel dimensions and dpi of monitors can be both a strength and a weakness. On the plus side, detail can be presented more legibly and without the aid of a microscope, which, for those conducting extensive textual analysis, may represent a major improvement over reviewing the source documents themselves. For instance, papyrologists can rely on monitors to provide the enlarged view of fragment details required in their study. When the original documents themselves are examined, they are typically viewed under a microscope at 4 to 10 ³ magnification.[16] Art historians can zoom in on high-resolution images to enlarge details or to examine brush strokes that convey different surfaces and materials.[17] On the downside, because the screen dpi is often exceeded by the scanning dpi and because screens have very limited pixel dimensions, many documents cannot be fully displayed if legibility must be conveyed. This conflict between overall size and level of detail is most apparent when dealing with oversized material, but it also affects a surprisingly large percentage of normal-sized documents as well.

Consider the physical limitations of computer monitors: typical monitors offer resolutions from 640 ³ 480 at the low end to 1600 ³ 1200 at the high end. The lowest level SVGA monitor offers the possibility of displaying material at 1024 ³ 768. These numbers, known as the pixel matrix, refer to the number of horizontal by vertical pixels painted on the screen when an image appears.

In product literature, monitor resolutions are often given in dpi, which can range from 60 to 120 depending on the screen width and horizontal pixel dimension. The screen dpi can be a misleading representation of a monitor's quality and performance. For example, when SVGA resolution is used on a 14", 17", and 21" monitor, the screen dpi decreases as screen size increases. We might intuitively expect image resolution to increase, not decrease, with the size of the monitor. In reality, the same amount of an image-and level of detail-would be displayed on all three monitors set to the same pixel dimensions. The only difference would be that the image displayed on the 21" monitor would appear enlarged compared to the same image displayed on the 17" and 14" monitors.

The pixel matrix of a monitor limits the number of pixels of a digital image that can be displayed at any one time. And if there is insufficient video memory, you will also be limited to how much gray or color information can be supported at any pixel dimension. For instance, while the three-year-old 14" SVGA monitor on my desk supports a 1024 ³ 768 display resolution, it came bundled with half a


48

megabyte of video memory. It cannot display an 8-bit grayscale image at that resolution and it cannot display a 24-bit color image at all, even if it is set at the lowest resolution of 640 ³ 480. If I increased its VRAM, I would be bothered by an annoying flicker, because the monitor's refresh rate is not great enough to support a stable image on-screen at higher resolutions. It is not coincidental that while the most basic SVGA monitors can support a pixel matrix of 1024 ³ 768, most of them come packaged with the monitor set at a resolution of 800 ³ 600. As others have noted, network speeds and the limitations of graphical user interfaces will also profoundly affect user satisfaction with on-screen presentation of digital images.

So How Does Display Benchmarking Work?

Consider the brittle book and how best to display it. Recall that it may contain font sizes at 1 mm and above, so we have scanned each page at 600 dpi, bitonal mode. Let's assume that the typical page averages 4" ³ 6" in size. The pixel matrix of this image will be 4³ 600 by 6 ³ 600, or 2400 ³ 3600-far above any monitor pixel matrix currently available. Now if I want to display that image at its full scanning resolution on my monitor, set to the default resolution of 800 ³ 600, it should be obvious to many of you that I will be showing only a small portion of that image- approximately 5% of it will appear on the screen. Let's suppose I went out and purchased a $2,500 monitor that offered a resolution of 1600 ³ 1200. I'd still only be able to display less than a fourth of that image at any one time.

Obviously for most access purposes, this display would be unacceptable. It requires too much scrolling or zooming out to study the image. If it is an absolute requirement that the full image be displayed with all details fully rendered, I'd suggest converting only items whose smallest significant detail represents nothing smaller than one third of 1% of the total document surface. This means that if you had a document with a one-millimeter-high character that was scanned at 600 dpi and you wanted to display the full document at its scanning resolution on a 1024 ³ 768 monitor, the document's physical dimensions could not exceed 1.7" (horizontal) ³ 1.3" (vertical). This document size may work well for items such as papyri, which are relatively small, at least as they have survived to the present. It also works well for items that are physically large and contain large-sized features, such as posters that are meant to be viewed from a distance. If the smallest detail on the poster measured 1", the poster could be as large as 42" ³ 32" and still be fully displayed with all detail intact.[18]

Most images will have to be scaled down from their scanning resolutions for onscreen access, and this can occur a number of ways. Let's first consider full display on the monitor, and then consider legibility. In order to display the full image on a given monitor, the image pixel matrix must be reduced to fit within the monitor's pixel dimensions. The image is scaled by setting one of its pixel matrixes to the corresponding pixel dimension of the monitor.[19]


49

To fit the complete page image from our brittle book on a monitor set at 800 ³ 600, we would scale the vertical dimension of our image to 600; the horizontal dimension would be 400 to preserve the aspect ratio of the original. By reducing the 2400 ³ 3600 pixel image to 400 ³ 600, we will have discarded 97% of the information in the original. The advantages to doing this are several: it facilitates browsing by displaying the full image, and it decreases file size, which in turn decreases the transmission time. The downside should also be obvious. There will be a major decrease in image quality as a significant number of pixels are discarded. In other words, the image can be fully displayed, but the information contained in that image may not be legible. To determine whether that information will be useful, we can turn to the use of benchmarking formulas for legible display.

Here are the benchmarking resolution formulas for scaling bitonal and grayscale images for on-screen display:[20]

figure

Note: Recall that in the benchmarking resolution formulas for conversion, dpi refers to the scanning resolution. In the scaling formulas, dpi refers to the image dpi (not to be confused with the monitor's dpi).

Let's return to the example of our 4" ³ 6" brittle page. If we assume that we need to be able to read the 1-mm-high character but that it doesn't have to be fully rendered, then we set our QI requirement at 3.6, which should ensure legibility of characters in context. We can use the benchmarking formula to predict the scaled image dpi:

figure

The image could be fully displayed with minimal legibility on a 120 dpi monitor. The pixel dimensions for the scaled image would be 120 ³ 4 by 120 ³ 6, or 480 ³ 720. This full image could be viewed on SVGA monitors set at 1024 ³ 768 or above; slightly more than 80% of it could be viewed on my monitor set at 800 ³ 600.

We can also use this formula to determine a preset scaling dpi for a group of documents to be conveyed to a particular clientele. Consider a scenario in which your primary users have access to monitors that can effectively support an 800 ³ 600 resolution. We could decide whether the user population would be satisfied with receiving only 80% of the document if it meant that they could read the smallest type, which may occur only in footnotes. If your users are more interested in quick browsing, you might want to benchmark against the body of the text


50

rather than the smallest typed character. For instance, if the main text were in 12-point type and the smallest lowercase e measured 1.6 mm in height, then our sample page could be sent to the screen with a QI of 3.6 at a pixel dimension of 300 ³ 450, or an image dpi of 75-well within the capabilities of the 800 ³ 600 monitor.

You can also benchmark the time it will take to deliver this image to the screen. If your clientele are connected via ethernet, this image (with 3 bits of gray added to smooth out rough edges of characters and improve legibility) could be sent to the desktop in less than a second-providing readers with full display of the document, legibility of the main text, and a timely delivery. If your readers are connected to the ethernet via a 9600-baud modem, however, the image will take 42 seconds to be delivered. If the footnotes must be readable, the full text cannot be displayed on-screen and the time it will take to retrieve the image will increase. Benchmarking allows you to identify these variables and consider the trade-offs or compromises associated with optimizing any one of them.

Conclusion

Benchmarking is an approach, not a prescription. It offers a means to evaluate choices for how best to balance quality, costs, timeliness, user requirements, and technological capabilities in the conversion, delivery, and presentation of digital resources. The value of this approach will best be determined by extensive field testing. We at Cornell are committed to further refinement of the benchmarking methodology, and we urge others to consider its utility before they commit considerable resources to bringing about the brave new world of digitized information.


53

Chapter 3—
The Transition to Electronic Content Licensing
The Institutional Context in 1997

Ann S. Okerson

Introduction

The public discourse about electronic publishing, as heard at scholarly and library gatherings on the topic of scholarly communications, has changed little over the past several years. Librarians and academics fret about the serials crisis, argue about the influence of commercial offshore publishers, wonder when the academic reward system will begin to take electronic publications into account, and debate what steps to take to rationalize copyright policy in our institutions. There is progress in that a wider community now comes together to ponder these familiar themes, but to those of us who have been party to the dialogue for some years, the tedium of ritual sometimes sets in.

At Yale, subject-specialist librarians talk to real publishers every day about the terms on which the library will acquire their electronic products: reference works, abstracts, data, journals, and other full-text offerings. Every week, or several times a week, we are swept up in negotiating the terms of licenses with producers whose works are needed by our students and faculty. Electronic publications are now a vital part of libraries' business and services. For example, at a NorthEast Research Libraries Consortium (NERL) meeting in February 1997, each of the 13 research library representatives at the table stated that his or her library is expending about 6-7% of its acquisitions budget on electronic resources.

This essay will offer some observations on the overall progress of library licensing negotiations. But the main point of this essay will be to make this case: in

ã 1999 by Ann Okerson. Readers of this article may copy it without the copyright owner's permission if the author and publisher are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.


54

the real world of libraries, we have begun to move past the predictable, ritual discourse. The market has brought librarians and publishers together; the parties are discovering where their interests mesh; and they are beginning to build a new set of arrangements that meet needs both for access (on the part of the institution) and remuneration (on the part of the producer). Even though the prices for electronic resources are becoming a major concern, libraries are able to secure crucial and significant use terms via site licenses, use terms that often allow the customer's students, faculty, and scholars significant copying latitude for their work (including articles for reserves and course packs), at times more latitude than what is permitted via the fair use and library provisions in the Copyright Act of the United States. In short, institutions and publishers perhaps do not realize how advanced they are in making a digital market, more advanced at that, in fact, than they are at resolving a number of critical technological issues.[1]

Why do Contracts or Licenses (Rather Than Copyright) Govern Electronic Content?

Society now faces what seems to be a powerful competitor for copyright's influence over the marketplace of cultural products, one that carries its own assumptions about what intellectual property is, how it is to be used, how it can be controlled, and what economic order can emerge as a result.

For convenience's sake, the codification of intellectual property is assigned to the early eighteenth century. That time period is when the evolving notion of copyright was enacted into law, shaping a marketplace for cultural products unlike any seen before. In that eighteenth-century form, copyright legislation depended in three ways on the technologies of the time:

1. The power of copyright was already being affirmed through the development of high-speed printing presses that increased the printer's at-risk capital investment and greatly multiplied the number of copies of a given original that could be produced (and thus lowered the selling price).

2. An author could begin to realize financial rewards through signing over copyright to a publisher. Owning the copyright meant that the publisher, who had assumed the expense and risk of publication, stood to gain a substantial portion of the publication revenue.

3. Punishment for breaking the law (i.e., printing illegal copies) was feasible, for the ability to escape detection was relatively slight. The visibility and the capital costs of establishing and operating a printing press meant that those who used such presses to violate copyright were liable to confiscatory punishment at least commensurate with the injury done by the crime itself.

In the 1970s, technology advances produced the photocopier, an invention that empowered the user to produce multiple copies cheaply and comparatively unnoticed. In the 1980s, the fax machine took the world by storm, multiplying copies


55

and speeding up their distribution. Computer networking technology of the 1990s marries the convenience, affordability, and ease of distribution, eclipsing the power of all previous technologies. We can attribute the exponential increase in electronic content, at least indirectly, to the current inhabitants of the White House. The Clinton-Gore campaign of 1992 introduced the Internet to the general public, and this administration has been passionately committed to rapid development of the National Information Infrastructure (NII) and determined to advance the electronic marketplace. Part of that commitment arises from national leaders' unwavering faith that electronic networks create an environment and a set of instruments vital to the overall economic growth of the United States.

While copyright (that is, the notion that creative works can be owned) is still and probably always will be recognized as a fundamental principle by most players in the information chain, many believe that its currently articulated "rules" do not effectively address either the technical capabilities or reader needs of a high-speed information distribution age. It could be argued (and many educators do) that the nineteenth-and twentieth-century drafters of copyright law intended to lay down societally beneficial and, by extension, technologically neutral principles about intellectual property ownership and copying,[2] but in fact Thomas Jefferson knew nothing of photocopiers, and the legislators who crafted the 1976 Copyright Act of the United States knew nothing of computer networks. Had they even begun to imagine such things, the law might have been written differently-and in fact the case can be made that it should now be written differently.[3] So to many people, the gulf between copyright laws or treaties and the universe that those laws ought to address today feels vast and deep. Therefore, instead of relying on national copyright law, surrounding case law, international treaties, and prevailing practice to govern information transactions for electronic information, copyright holders have turned to contracts (or licenses, as they are more commonly called in the library world) as the mechanism for defining the owner, user, and uses of any given piece of information.

That is, the license-contract is invoked because the prospective deal is for both parties a substantial transaction (in cash or in consequence). The new atmosphere creates a new kind of marketplace or a market for a new kind of product, and neither the selling nor the buying parties are sure of the other or of their position visà-vis the law and the courts. Publishers come to the table with real anxieties that their products may be abused by promiscuous reproduction of a sort that ultimately saps their product's marketability, while libraries are fearful that restrictions on permitted uses will mean less usable or more expensive products.

In short, what licensing agreements have in common with the copyright regime is that both accept the fundamental idea of the nature of intellectual property-that even when intangible, it can be owned. Where they differ is in the vehicle by which they seek to balance creators', producers', and users' rights and to regulate the economy that springs up around those rights. Copyright represents a set of general regulations negotiated through statutory enactment. Licenses, on the


56

other hand, represent a market-driven approach to this regulation through deals struck between buyers and sellers.

When Did This Mode of Doing Business Begin for Libraries?

The concept of a license is old and fundamentally transparent. A license is essentially a means of providing use of a piece of property without giving up the ownership. For example, if you own a piece of property and allow another to use it without transferring title, you may, by law of contract, stipulate your conditions; if the other party agrees to them, then a mutually agreeable deal has come into being. A similar transaction takes place in the case of performance rights for films and recordings. This example moves from the tangible property mode of real estate, in which exclusive licenses (granting of rights to only one user) are common, to the intangible property mode of intellectual property such as copyright, in which nonexclusive licenses are the norm. The owner of a movie theater rarely owns the cans of film delivered weekly to the cinema, holding them instead under strict conditions of use: so many showings, so much payment for each ticket sold, and so on. With the right price such an arrangement, like the economic relationship between author and publisher that is sanctioned by copyright, can be extraordinarily fruitful. In the license mode of doing business (precisely defined by the legal contract that describes the license), the relationships are driven entirely by contract law: the owner of a piece of property is free to ask whatever price and set whatever conditions on use the market will bear. The ensuing deal is pure "marketplace" a meeting of minds between a willing buyer and a willing seller. A crucial point here is that the license becomes a particularly powerful tool for that property owner who has a copyright-protected monopoly.

Most academics began to be parties to license agreements when personal computer software (WordStar, WordPerfect) appeared in the 1980s in shrink-wrap packages for the first time. Some purchasers of such software may have read the fine print on the wrapper that detailed the terms and conditions of use, but most either did not or have ceased to do so. The thrust of such documents is simple: by opening the package the purchaser has agreed to certain terms, terms that include limited rights of ownership and use of the item paid for. In many ways, this mode of licensing raises problematic questions,[4] but in other ways, such as sheer efficiency, shrink-wrap licensing suggests the kind of transaction that the scholarly information marketplace needs to achieve. It is noteworthy that the shrink-wrap license has moved easily into the World Wide Web environment, where it shows itself in clickable "I agree" form. The user's click supposedly affirms that he or she has said yes to the user terms and is ready to abide by them. The downsides and benefits are similar to those of shrink-wrapped software.

The phenomenon of institutional licensing for electronic content has evolved in a short time. Over the past 20 years or so, the licensing of software has become a way of life for institutions of higher education. These kinds of licenses are gen-


57

erally for systems that run institutional computers or on-line catalogs or software packages (e.g., for instruction or for office support). The licenses, often substantial in scale and price, are arranged by institutional counsel (an increasingly overworked segment of an educational institution's professional staff) along with information technology managers.

Libraries' entrée into this arena has been comparatively recent and initially on a small scale. In fact, the initial library business encounter with electronic content may not have happened via license at all, but rather via deposit account. Some 20 years ago, academic and research libraries began accessing electronic information through mediated searching of indexing and abstracting services provided by consolidators such as Dialog. Different database owners levied different per hour charges (each database also required its own searching vocabularies and strategies), and Dialog (in this example) aggregated them for the educational customer. For the most part, libraries established accounts to which these searches (usually mediated by librarians or information specialists) were charged.

By the late 1980s, libraries also began to purchase shrink-wrapped (prelicensed) content, though shrink-wrapped purchases did not form-and still do not-any very visible part of library transactions. Concurrently, a number of indexing and abstracting services offered electronic versions directly to libraries via CD-ROM or through dial-up access (for example, an important early player in this arena was ISI, the Institute for Scientific Information). It was at this point, within the last ten years, that library licenses gradually became recognized as a means to a new and different sort of information acquisition or access. Such licenses were often arranged by library subject specialists for important resources in well-defined areas of use. The license terms offered to libraries were accepted or not, the library customer regarding them mostly as nonnegotiable. Nonacceptance was most often a matter of affordability, and there seemed to be little room for the library customer to affect the terms. Complaints about terms of licenses began to be (and persist in being) legion, for important reasons such as the following:

Potential loss of knowledge. By definition, licenses are arranged for specific periods of time. At the end of that time, librarians rapidly discovered, if the license is not renewed, prior investment can become worthless as the access ceases (for example, where a CD-ROM must be returned or perhaps stops being able to read the information; or where connections to a remote server are severed).

License restrictions on use and users. In order to reduce or curtail the leakage of electronic information, institutions are often asked to ensure that only members of the institution can use that information.,

Limitations on users' rights. Initial license language not infrequently asks that institutional users severely limit what and how much they may copy from the information resource and may prescribe the means by which such copying can be done.


58

Cost. In general, electronic licenses for indexing and abstracting services cost significantly more than print equivalents.[5]

What has happend to increase Libraries' Awareness of Licenses?

1. Sheer numbers have increased. Thousands of information providers have jumped into the scholarly marketplace with electronic products of one sort or another: CDs, on-line databases, full text resources, multimedia. Many scientific publishers, learned societies, university presses, full-text publishers, and vendor/aggregators, as well as new entrants to the publishing arena, now offer beta or well-tested versions of either print-originating or completely electronic information. The numbers have ballooned in a short two to three years, with no signs of abating. For example, Newfour, the on-line forum for announcing new e-journals, magazines, and newsletters, reports 3,634 titles in its archive as of April 5,1997, and this figure does not include the 1,100 science journal titles that Elsevier is now making available in electronic form.[6] The Yale University Library licenses more than 400 electronic resources of varying sizes, types, media, and price, and it reviews about two new electronic content licenses a week.

2. The attempt by various players in the information chain to create guidelines about electronic fair use has not so far proved fruitful. In connection with the Clinton Administration's National Information Infrastructure initiative, the Working Group on Intellectual Property Rights in the Electronic Environment called upon copyright stakeholders to negotiate guidelines for the fair use of electronic materials in a variety of nonprofit educational contexts. Anyone who wished to participate was invited to do so, and a large group calling itself CONFU, the Conference on Fair Use, began to negotiate such guidelines for a variety of activities (such as library reserves, multimedia in the classroom, interlibrary loans, etc.) in September 1997.[7] The interests of all participants in the information chain were represented, and the group quickly began to come unstuck in reaching agreements on most of the dozen or more areas defined as needing guidelines. Such stalemates should come as no surprise; in fact, they are healthy and proper. Any changes to national guidelines, let alone national law or international treaty, should happen only when the public debate has been extensive and consensus has been reached. What many have come to realize during the current licensing activities is that the license arrangements that libraries currently are making are in fact achieving legislation's business more quickly and by other means. Instead of waiting on Congress or CONFU and allowing terms to be dictated to both parties by law, publishers and institutions are starting to make their peace together, thoughtfully and responsibly, one step at a time. Crafting these agreements and relationships is altogether the most important achievement of the licensing environment.


59

3. Numerous formal partnerships and informal dialogues have been spawned by the capabilities of new publications technologies. A number of libraries collaborate with the publishing and vendor communities as product developers or testers. Such relationships are fruitful in multiple ways. They encourage friction, pushback, and conversation that lead to positive and productive outcomes. Libraries have been offered-and have greatly appreciated-the opportunity to discuss at length the library licenses of various producers, for example, JSTOR, and libraries feel they have had the opportunity to shape and influence these licenses with mutually satisfactory results.

4. Library consortia have aggressively entered the content negotiating arena. While library consortia have existed for decades and one of their primary aims has been effective information sharing, it is only in the 1990s (and mostly in the last two to three years) that a combination of additional state funding (for statewide consortia), library demands, and producers' willingness to negotiate with multiple institutions has come together to make the consortial license an efficient and perhaps cost-effective way to manage access to large bodies of electronic content. An example of a particularly fruitful marketplace encounter (with beautiful as well as charged moments) occurred from February 3 to 5, 1997, as a group of consortial leaders, directors, and coordinators who had communicated informally for a year or two through mailing list messages arranged a meeting at the University of Missouri-St. Louis. The Consortium of Consortia (COC, as we sweepingly named ourselves) invited a dozen major electronic content vendors to describe their products briefly and their consortial working arrangements in detail.[8] By every account, this encounter achieved an exceptional level of information swapping, interaction, and understandings, both of specific resources and of the needs of producers and customers. That said, the future of consortial licensing is no more certain than it is for individual library licenses, though for different reasons.[9]

5. Academia's best legal talent offers invaluable support to libraries. Libraries are indebted to the intelligent and outspoken lawyerly voices in institutions of higher learning in this country. The copyright specialists in universities' general counsel offices have, in a number of cases, led in negotiating content licenses for the institution and have shared their strategies and knowledge generously. Law school experts have published important articles, taught courses, contributed to Internet postings, and participated in national task forces where such matters are discussed.[10]

6. The library community has organized itself to understand the licensing environment for its constituents. The Association of Research Libraries (ARL) has produced an introductory licensing brochure,[11] the Council on Library Resources/Commission on Preservation and Access has supported Yale Library's creation of an important Web site about library content licensing,[12] and the Yale Library offers the library, publisher, vendor, and lawyer world


60

a high-quality, moderated, on-line list where the issues of libraries and producers are aired daily.[13]

7. Options are limited. Right now, licensing and contracts are the only way to obtain the increasing number of electronic information resources that library users need for their education and research.

Some notable challenges of the library licensing Environment Today

I identify these challenges because they are important and need to be addressed.

1. Terms of use. This area needs to be mentioned at the outset, as it has caused some of the most anguished discussions between publishers and libraries. Initially, many publishers' contract language for electronic information was highly restrictive about both permitted users and permitted uses. Assumptions and requirements about how use ought to be contained have been at times ludicrous, for example, in phrases such as "no copies may be made by any means electronic or mechanical." Through dialogue between librarians and producers, who are usually eager to market their work to happy customers, much of this language has disappeared from the first draft contracts presented to library customers. Where libraries are energetic and aggressive on behalf of their users, the terms of use can indeed be changed to facilitate educational and research goals. The Yale Library, for example, is now party to a number of licenses that permit substantial amounts of copying and downloading for research, individual learning, in-the-classroom learning, library reserves, course packs, and related activities. Interlibrary loan and transmission of works to individual scholars in other organizations are matters that still need a great deal of work. However, the licenses of 1996 and 1997 represent significant all-around improvements and surely reinforce the feeling that rapid progress is being made.

2. Scalability. Institutional electronic content licenses are now generally regarded as negotiable, mostly because the library-customer side of the marketplace is treating them as such (which publishers seem to welcome). Successes of different sorts have ensued (success being defined as a mutually agreeable contract), making all parties feel that they can work together effectively in this new mode. However, negotiations are labor intensive. Negotiation requires time (to develop the expertise and to negotiate), and time is a major cost. The current method of one-on-one negotiations between libraries and their publishers seems at the moment necessary, for many reasons, and at the same time it places new demands on institutional staff. Scalability is the biggest challenge for the licensing environment.

• Clearly, it is too early to shift the burden onto intermediaries such as subscription agencies or other vendors who have vested interests of their


61

own. So far their intervention has been absent or not particularly successful. In fact, in some of the situations in which intermediaries purvey electronic databases, library customers secure less advantageous use terms than those libraries could obtain by licensing directly from the publishers. This is because those vendors are securing commercial licenses from the producers whereas libraries are able to obtain educational licenses. Thus, it is no surprise that in unveiling their latest electronic products and services, important organizations such as Blackwell's (Navigator ) and OCLC (EFO-Electronic Fournals On-line ) leave license negotiating for the journal content as a matter between the individual journal publishers and their library customers.

• The contract that codifies the license terms is a pervasive document that covers every aspect of the library/producer relationship, from authorized uses and users to technology base, duration, security mechanisms, price, liability, responsibility, and so on. That is, the license describes the full dimensions of the "deal" for any resource. The library and educational communities, in their attempts to draft general principles or models to address content licensing, characteristically forget this important fact, and the results inevitably fall short in the scaling-up efforts.

3. Price. Pricing models for electronic information are in their infancy; they tend to be creative, complicated, and often hard to understand.[14] Some of these models can range from wacky to bizarre. Consortial pricing can be particularly complex. Each new model solves some of the equity or revenue problems associated with earlier models but introduces confusion of its own. While pricing of electronic resources is not, strictly speaking, a problem with the license itself, price has been a major obstacle in making electronic agreements. The seemingly high price tags for certain electronic resources leave the "serials crisis" in the dust.[15] It is clear that academic libraries, particularly through their consortial negotiators, expect bulk pricing arrangements, sliding scales, early signing bonuses, and other financial inducements that publishers may not necessarily feel they are able to offer. Some of the most fraught moments at the St. Louis COC meeting involved clashes between consortial representatives who affirmed that products should be priced at whatever a willing buyer can or will pay, even if this means widely inconsistent pricing by the vendor, and producers who affirmed the need to stick with a set price that enables them to meet their business plan.

4. The liability-trust conundrum. One of the most vexing issues for producers and their licensees has been the producers' assumption that institutions can and ought to vouch for the behavior of individual users (in licenses, the sections that deal with this matter are usually called "Authorized or Permitted Users" and what users may do under the terms of a license is called an "Authorized or Permitted Use") and the fact that individual users' abuses of the terms of


62

a license can kill the deal for a library or a whole group of libraries. Working through this matter with provider after provider in a partnership/cooperative approach poses many challenges. In fact, this matter may be a microcosm of a larger issue: the development of the kind of trust that must underlie any electronic content license. Generally the marketplace for goods is not thought of in terms of trust; it regarded as a cold cash (or virtual cash) transaction environment. Yet the kinds of scaled-up scholarly information licenses that libraries are engaging with now depend on mutual understanding and trust in a way not needed for the standard trade-or even the print-market to work. In negotiating electronic content licenses, publishers must trust-and, given the opening up of user/use language, it seems they are coming to trust-their library customers to live up to the terms of the deal.

In part, we currently rely on licenses because publishers do not trust users to respect their property and because libraries are fretful that publishers will seek to use the new media to tilt the economic balance in their favor. Both fears are probably overplayed. If libraries continue to find, as they are beginning to do, that publishers are willing to give the same or even more copying rights via licenses as copyright oners, both parties may not be far from discovering that fears have abated, trust has grown, and the ability to revert to copyright as the primary assurance of trust can therefore increase. But many further technological winds must blow-for example, the cybercash facility to allow micropayment transactions-before the players may be ready to settle down to such a new equilibrium.

5. The aggregator aggravation (and opportunity). The costly technological investments that producers need to make to move their publications onto an electronic base; the publishing processes that are being massively reconceived and reorganized; and not least, the compelling vision of digital libraries that proffer information to the end user through a single or small number of interfaces, with a single or modest number of search engines, give rise to information aggregators of many sorts:[16] those who develop important searching, indexing, and/or display softwares (Alta Vista, OpenText, etc.); those who provide an interface or gateway to products (Blackwell's, etc.); and those who do all that plus offer to deliver the information (DIALOG @CARL, OCLC, etc.). Few publishers convert or create just one journal or publication in an electronic format. From the viewpoint of academic research libraries, it appears that the electronic environment has the effect of shifting transaction emphasis from single titles to collections or aggregations of electronic materials as marketplace products.

In turn, licensing collections from aggregators makes libraries dependent on publishers and vendors for services in a brand new way. That is, libraries' original expectation for electronic publications, no more than five years ago, was that publishers would provide the data and the subscribing library or


63

groups of libraries would mount and make content available. But mounting and integrating electronic information requires a great deal of capital, effort, and technological sophistication as well as multiple licenses for software and content. Thus, the prognosis for institutions meeting all or most of their users' electronic information needs locally is slim. The currently emerging mode, thus, takes us to a very different world in which publishers have positioned themselves to be the electronic information providers of the moment.[17]

The electronic collections offered to the academic library marketplace are frequently not in configurations that librarians would have chosen for their institutions had these resources been unbundled. This issue has surfaced in several of Yale Library's negotiations. For example, one publisher of a large number of high-quality journals made only the full collection available in e-form and only through consortial sale. By this means, the Yale Library recently "added" 50 electronic journal titles to its cohort, titles it had not chosen to purchase in print. The pricing model did not include a cost for those additional 50 titles; it was simply easier for the publisher to include all titles than to exclude the less desirable ones. While this forum is not the place to explore this particular kind of scaling up of commercial digital collections, it is a topic of potentially great impact on the academic library world.

6. The challenge of consortial dealings. Ideally, groups of libraries acting in consort to license electronic resources can negotiate powerfully for usage terms and prices with producers. In practice, both licensors and licensees have much to learn about how to approach this scaled-up environment. Here are some of the particularly vexing issues:

• Not all producers are willing to negotiate with all consortia; some are not able to negotiate with consortia at all.

• In the early days of making a consortial agreement, the libraries may not achieve any efficiencies because all of them (and their institutional counsel) may feel the need or desire to participate in the negotiating process. Thus, in fact, a license for 12 institutions may take nearly as long to negotiate as 12 separate licenses.

• Consortia overlap greatly, particularly with existing bodies such as cataloging and lending utilities that are offering consortial deals to their members. It seems that every library is in several consortia these days, and many of us are experiencing a competition for our business from several different consortia at once for a single product's license.

• No one is sure precisely what comprises a consortial "good deal." That is, it is hard to define and measure success. The bases for comparison between individual institutional and multiple institutional prices are thin, and the stated savings can often feel like a sales pitch.


64

• Small institutions are more likely to be unaffiliated with large or powerful institutions and left out of seemingly "good deals" secured by the larger, more prosperous libraries. Surprisingly enough, private schools can be at a disadvantage since they are generally not part of state-established and funded consortial groups.

• In fact, treating individual libraries differently from collectives may, in the long run, not be in the interests of publishers or those libraries.

7. Institutional workflow restructuring. How to absorb the additional licensing work (and create the necessary expertise) within educational institutions is a challenge. I can foresee a time when certain kinds of institutional licenses (electronic journals, for example) might offer standard, signable language, for surely producers are in the same scaling-up bind that libraries are. At the moment, licenses are negotiated in various departments and offices of universities and libraries. Many universities require that license negotiation, or at least a review and signature, happen through the office of general counsel and sometimes over the signature of the purchasing department. In such circumstances, the best result is delay; the worst is that the library may not secure the terms it deems most important. Other institutions delegate the negotiating and signing to library officers who have an appropriate level of responsibility and accountability for this type of legal contract. Most likely the initial contact between the library and the electronic provider involves the public service or collections librarians who are most interested in bringing the resource to campus.

One way of sharing the workload is to make sure that all selector staff receive formal or informal training in the basics and purposes of electronic licenses, so that they can see the negotiations through as far as possible and leave only the final review and approval to those with signing authority.[18] In some libraries, the licensing effort is coordinated from the acquisitions or serials departments, the rationale being that this is where purchase orders are cut and funds released for payment. However, such an arrangement can have the effect of removing the publisher interaction from the library staff best positioned to understand a given resource and the needs of the library readers who will be using it. Whatever the delegation of duties may be at any given institution, it is clear that the tasks must be carved out in a sensible fashion, for it will be a long time before the act of licensing electronic content becomes transparent. Clearly, this new means of working is not the "old" acquisitions model. How does everyone in an institution who should be involved in crafting licensing "deals" get a share of the action?

Succeeding (Not Just Coping)

On the positive side, both individual libraries and consortia of libraries have reported negotiating electronic content licenses with a number of publishers who


65

have been particularly understanding of research library needs. In general, academic publishers are proving to be willing to give and take on license language and terms, provided that the licensees know what terms are important to them. In many cases, librarians ask that the publisher reinstate the "public good" clauses of the Copyright Act into the electronic content license, allowing fair use copying or downloading, interlibrary loan, and archiving for the institutional licensee and its customers. Consortial negotiations are having a highly positive impact on the usefulness and quality of licenses.

While several downsides to the rapidly growing licensing environment have been mentioned, the greatest difficulty at this point is caused by the proliferation of licenses that land on the desks of librarians, university counsel, and purchasing officers. The answers to this workload conundrum might lie in several directions.

1. National or association support. National organizations such as ARL and the Council on Library and Information Resources (CLIR) are doing a great deal to educate as many people as possible about licensing. Practicing librarians treasure that support and ask that licensing continue to be part of strategic and funding plans. For example, the Yale Library has proposed next-step ideas for the World Wide Web Liblicense project. Under discussion are such possibilities as: further development of a prototype licensing software that will enable librarians to create licenses on the fly, via the World Wide Web, for presentation to producers and vendors as a negotiating position;[19] and assembling a working group meeting that involves publisher representatives in order to explore how many pieces of an academic electronic content are amenable to standardization. Clearly, academic libraries are working with the same producers to license the same core of products over and over again. It might be valuable for the ARL and other organizations to hire a negotiator to develop acceptable language for certain key producers-say the top 100-with the result that individual libraries would not need to work out this language numerous times. Pricing and technology issues, among others, might nonetheless need to remain as items for local negotiation.

2. Aggregators. As libraries, vendors, and producers become more skilled as aggregators, the scaling issues will abate somewhat. Three aggregating directions are emerging:

• Information bundlers, such as Lexis-Nexis, OCLC, UMI, IAC, OVID, and a number of others offer large collections of materials to libraries under license. Some of these are sizeable take-it-or-leave-it groupings; others allow libraries to choose subsets or groups of titles.

• Subscription agents are beginning to develop gateways to electronic resources and to offer to manage libraries' licensing needs.

• Consortia of libraries can be considered as aggregators of library customers for publishers.


66

3. Transactional licensing. This paper treats only institutional licenses, be they site licenses, simultaneous user/port licenses, or single-user types. An increasing number of library transactions demand rights clearance for a piece at a time (situations that involve, say, course reserves or provision of articles that are not held in the library through a document supplier such as CARL). Mechanisms for easy or automatic rights clearance are of surpassing importance, and various entities are applying considerable energies to them. The academic library community has been skittish about embracing the services of rights management or licensing organizations, arguing that participation would abrogate fair use rights. It seems important, particularly in light of recent court decisions, that libraries pay close attention to their position vis-àvis individual copies (when they are covered by fair use and when they are not, particularly in the electronic environment) and take the lead in crafting appropriate and fair arrangements to simplify the payment of fees in circumstances when such fees are necessary.[20]

Beyond the license?

As we have seen, the content license comes into play when the producer of an electronic resource seeks to define a "deal" and an income stream to support the creation and distribution of the content. Yet other kinds of arrangements are possible.

1. Unrestricted and for free. Some important resources are funded up front by, for example, governments or institutions, and the resources are available to all end users. Examples include the notable Los Alamos High Energy Physics Preprints; the various large genome databases; the recent announcement by the National Institutes of Health of MEDLNE's availability on-line; and numerous university-based electronic scholarly journals or databases. The number of such important resources is growing, though they may always be in the minority of scholarly resources. Characteristically, such information is widely accessible, the restrictions on use are minimal or nonexistent, and license negotiations are largely irrelevant or very straightforward.

2. For a subscription fee and unrestricted to subscribers. Some producers are, in fact, charging an on-line subscription fee but licenses need not be crafted or signed. The terms of use are clearly stated and generous. The most significant and prominent example of such not-licensed but paid-for resources is the rapidly growing collection of high-impact scientific and medical society journals published by Stanford University's HighWire Press.[21]

Both of these trends are important; they bear watching and deserve to be nurtured. In the first case, the up-front funding model seems to very well serve the needs of large scientific or academic communities without directly charging users or institutions; the databases are products of public- or university-funded re-


67

search. In the second instance, although users are paying for access to the databases, the gap between the copyright and licensed way of doing business seems to have narrowed, and in fact the HighWire publications are treated as if copyrightgoverned. Over time, it would not be unreasonable to expect this kind of merger of the two constructs (copyright and contract) and to benefit from the subsequent simplification that the merger would bring.

In short, much is still to be learned in the content licensing environment, but much has been learned already. We are in a period of experimentation and exploration. All the players have real fears about the security of their livelihood and mission; all are vulnerable to the risks of information in new technologies; many are learning to work together pragmatically toward at least midterm modest solutions and are, in turn, using those modest solutions as stepping-stones into the future.


71

PART TWO—
ELECTRONIC PUBLISHING: EMPIRICAL STUDIES


73

Chapter 4—
Information-Based Productivity

Scott Bennett

Convenience is a key word in the library lexicon. As service organizations, libraries give high priority to enhancing the convenience of their operations. Readers themselves regularly use the word to describe what they value.[1] By contrast, when NEXIS-LEXIS describes itself as a sponsor of public radio, it emphasizes not convenience but productivity for professionals. Does NEXIS-LEXIS know something that we are missing?

I think so. Talk about productivity is unambiguously grounded in the discourse of economics, whereas talk about convenience rarely is. Quite notably, The Andrew W. Mellon Foundation has self-consciously insisted that its programs in scholarly communication operate within the realm of economics. Foundation president William G. Bowen explains this focus, in speaking of the Foundation's JSTOR project, by observing that "when new technologies evolve, they offer benefits that can be enjoyed either in the form of more output (including opportunities for scholars to do new things or to do existing tasks better) or in the form of cost savings.... In universities electronic technologies have almost always led to greater output and rarely to reduced costs.... This proclivity for enjoying the fruits of technological change mainly in the form of 'more and better' cannot persist. Technological gains must generate at least some cost savings."[2] In its JSTOR project and the other scholarly communication projects it supports, the Foundation calls for attention "to economic realities and to the cost-effectiveness" of different ways of meeting reader needs. The Foundation wishes to promote change that will endure because the changes embody "more effective and less costly ways of doing [the] business" of both libraries and publishers.[3]

ã 1999 by Scott Bennett. Readers of this article may copy it without the copyright owner's permission if the author and publisher are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.


74

Productivity is the underlying measure of such effectiveness, so I want briefly to recall what economists mean by the word and to reflect on the problematic application of productivity measures to higher education. I will then describe a modest project recently undertaken to support one of the most famous of Yale's undergraduate courses. I will conclude with some observations about why the productivity of libraries and of higher education must command our attention.

Productivity

Productivity is one of the most basic measures of economic activity. Comparative productivity figures are used to judge the degree to which resources are efficiently used, standards of living are changed, and wealth is created.[4] Productivity is the ratio of what is produced to the resources required to produce it, or the ratio of economic outputs to economic inputs:

Productivity = Outputs/Inputs

Outputs can be any goods, services, or financial outcomes; inputs are the labor, services, materials, and capital costs incurred in creating the output. If outputs increase faster than inputs, productivity increases. Conversely, if inputs increase faster than outputs, productivity falls. Technological innovation has historically been one of the chief engines of productivity gain.[5]

Useful indicators of productivity require that both inputs and outputs be clearly defined and measured with little ambiguity. Moreover, the process for turning inputs into outputs must be clearly understood. And those processes must be susceptible to management if productivity increases are to be secured. Finally, meaningful quality changes in outputs need to be conceptually neutralized in measuring changes in productivity.

One need only list these conditions for measuring and managing productivity to understand how problematic they are as applied to higher education.[6] To be sure, some of the least meaningful outputs of higher education can be measured, such as the number of credit hours taught or degrees granted. But the outputs that actively prompt people to pursue education-enhanced knowledge, aesthetic cultivation, leadership ability, economic advantage, and the like-are decidedly difficult to measure. And while we know a great deal about effective teaching, the best of classroom inputs remains more an art in the hands of master teachers than a process readily duplicated from person to person. Not surprisingly, we commonly believe that few teaching practices can be consciously managed to increase productivity and are deeply suspicious of calls to do so.

Outside the classroom and seminar, ideas of productivity have greater acceptance. Productive research programs are a condition of promotion and tenure at research universities; and while scholars express uneasiness about counting research productivity, it certainly happens. The ability to generate research dollars and the number of articles and books written undeniably count, along with the in-


75

tellectual merit of the work. There is little dispute that many other higher education activities are appropriately judged by productivity standards. Some support services, such as the financial management of endowment resources, are subject to systematic and intense productivity analysis. Other academic support activities, including the provision of library services, are expected to be efficient and productive, even where few actual measures of their productivity are taken.[7]

In many cases, discussion of productivity in higher education touches highly sensitive nerves.[8] Faculty, for instance, commonly complain that administration is bloated and unproductive. Concern for the productivity of higher education informs a significant range of the community's journalistic writing and its scholarship.[9] This sensitivity reflects both the truly problematic application of productivity measures to much that happens in education and the tension between concerns about productivity and quality. But it also reflects the fact that we are "unable and, on many campuses, unwilling to answer the hard questions about student learning and educational costs" that a mature teaching enterprise is inescapably responsible for answering.[10]

The Scully Project

A modest digital project undertaken in 1996 at Yale offers an opportunity to explore productivity matters. The project aimed at improving the quality of library support and of student learning in one of the most heavily enrolled undergraduate courses at Yale. We wished to do the project as cost-effectively as possible, but initially we gave no other thought to productivity matters. To echo Bowen's words, we wanted to take the fruits of digital technology in the form of more output, as "more and better." But the project provided an opportunity to explore possibilities for cost savings, for reduced inputs. The project, in spite of its modest objectives and scale (or perhaps exactly for those reasons!), became an instructive "natural experiment" in scholarly communication very much like those supported by the Mellon Foundation.

For decades, now Emeritus Professor Vincent Scully has been teaching his renowned Introduction to the History of Art, from Prehistory to the Renaissance. The course commonly enrolls 500 students, or about 10% of the entire undergraduate student body at Yale. Working with Professor Mary E. Miller, head of the History of Art department, and with Elizabeth Owen and Brian Allen, head Teaching Fellows with substantial experience in Professor Scully's course, Max Marmor, the head of Yale's Arts Library, and his colleague Christine de Vallet undertook to provide improved library support for this course. Their Scully Project was part of a joint program between the University Library and Information Technology Services at Yale designed to offer targeted support to faculty as they employ digital technologies for teaching, research, and administration. The Scully Project was also our first effort to demonstrate what it could mean to move from film-based to digital-based systems to support teaching in art history.[11]


76

The digital material created for Professor Scully's students included:

• An extensive and detailed course syllabus, including general information about the course and requirements for completing it.

• A schedule of section meetings and a roster of the 25 Teaching Fellows who help conduct the course, complete with their e-mail addresses.

• A list of the four required texts and the six journal articles provided in a course pack.

• A comprehensive list of the works of art discussed in the course, along with detailed information about the artists, dates of creation, media and size, and references to texts that discuss the works.

Useful as this textual material is, it would not meet the course's key information need for images. The Scully Project therefore includes 1,250 images of sculptures, paintings, buildings, vases, and other objects. These images are presented in a Web image browser that is both handsome and easily used and contains a written guide advising students on study strategies to make the best use of the Web site.[12]

How did the Scully project change student learning? To answer that question, I must first describe how the library used to meet the course's need for study images. The library traditionally selected mounted photographs closely related to, but not necessarily identical to, the images used in Professor Scully's lectures. We hung the photographs in about 480 square feet of study gallery space in the History of Art department. Approximately 200 photographs were available to students for four weeks before the midterm exam and 400 photographs for four weeks before the final exam. In those exams, students are asked to identify images and to comment on them. With 500 students enrolled and with the photos available in a relatively small space for just over half of the semester, the result was extreme crowding of students primarily engaged in visual memorization. To deal with the obvious imperfections of this arrangement, some of Professor Scully's more entrepreneurial students made videotapes of the mounted photos and sold them for study in the residential colleges. Less resourceful students simply stole the photos from the walls.

The Scully Project employed information technology to do more and better.

• Students can study the slide images that Professor Scully actually uses in class rather than frequently different photographs, often in black-and-white and sometimes carrying outdated identifying labels.

• The 1,250 digital images on the Web site include not only those that Professor Scully uses in class, but also other views of the same object and still other images that the Teaching Fellows refer to in discussion sessions. Students now have easy access to three times the number of images they could see in the study gallery space. For instance, where before students viewed one picture of Stonehenge, they now can view eight, including a diagram of the site and drawings showing construction methods and details.


77

• Digital images are available for study throughout the semester, not just before term exams. They are also available at all hours of day and night, consistent with student study habits.

• The digital images are available as a Web site anywhere there is a networked computer at Yale. This includes the residential colleges, where probably three-fourths of undergraduates have their own computers, as well as computing clusters at various locations on campus.

• The images are usually of much better quality than the photographs mounted on the wall; they read to the screen quickly in three different magnifications; and they are particularly effective on 17" and larger monitors.

• The digital images cannot be stolen or defaced. They are always available in exactly the form intended by Professor Scully and his Teaching Fellows.

Student comments on the Scully Project emphasized the convenience of the Web site. Comments like "convenient, comfortable, detailed all at the push of a button," and "fantastic for studying for exams" were common, as were grateful comments on the 24-hour-a-day availability of the images and the need not to fight for viewing space in the study gallery. One student told us, "it was wonderful. It made my life so much easier." Another student said, "it was very, very convenient to have the images available on-line. That way I could study in my own room in small chunks of time instead of having to go to the photo study. I mainly just used the Web site to memorize the pictures like a photo study in my room."[13]

Visual memory training is a key element in the study of art history, and the Scully Web site was used primarily for memorization. Reports from Teaching Fellows on whether the digital images enhanced student learning varied, and only two of the Fellows had taught the course before and could make comparisons between the photo study space and the Web site. The following statements represent the range of opinion:

• Students "did think it was 'cool' to have a web site but [I] can't say they wrote better or learned more due to it."

• "I don't think they learned more, but I do think it [the Web site] helped them learn more easily."

• The head Teaching Fellow for the course reported that student test performance on visual recognition was "greatly enhanced" over her previous experience in the course. Another Teaching Fellow reported that students grasped the course content much earlier in the semester because of the earlier availability of the Web site images.

• One Teaching Fellow expressed an unqualified view that students learned more, wrote better papers, participated in class more effectively, and enjoyed the course more because of the Scully Project.[14]


78

• Another Teaching Fellow commented, I "wish we had such a thing in my survey days!"

The Web site apparently contributed significantly to at least one key part of Professor Scully's course-that part concerned with visual memory training. We accomplished this improvement at reasonable cost. The initial creation of digital images cost about $2.25 an image, while the total cash outlay for creating the Web site was $10,500. We did not track computing costs or the time spent on the project by permanent university staff, but including these costs might well drive the total to about $17,200 and the per image cost to around $14. Using this higher cost figure, one might say we invested $34 for every student enrolled in the course, or $11 per student if one assumes that the database remains useful for six years and the course is offered every other year.

This glow of good feeling about reasonable costs, quality products, improved learning, and convenience for readers is often as much as one has to guide decisions on investing in information technology. Last year, however, Yale Professor of Cardiology Carl Jaffe took me up short by describing a criterion by which he judges his noteworthy work in instructional media.[15] For Professor Jaffe, improved products must help solve the cost problem of good education. One must therefore ask whether the Scully Project passes not only the test of educational utility and convenience set by Professor Scully's Teaching Fellows, but also the productivity test set by Professor Jaffe. Does the Scully Project help solve cost problems in higher education? Does it allow us to use university resources more productively?

Achieving Information-Based Productivity Gains

For more than a generation, libraries have been notably successful in improving the productivity of their own operations with digital technology. It is inconceivable that existing staffing levels could handle today's circulation workload if we were using McBee punch cards or-worse yet-typewriter-written circulation cards kept in book pockets and marked with date stamps attached to the tops of pencils. While libraries have an admirable record of deploying information technology to increase the productivity of their own operations, and while there is more of this to be done, the most important productivity gains in the future will lie elsewhere. The emergence of massive amounts of textual, numeric, spatial, and image information in digital formats, and the delivery of that information through networks, is decisively shifting the question to one of teacher and reader productivity.

What does the Scully Project tell us about library, teacher, and reader productivity? To answer that question, I will comment first on a set of operational issues that includes the use of library staff and Teaching Fellows to select and prepare images for class use; the preservation of the images over time; and the use of space. I will assess the Scully Project both as it was actually deployed, with little impact on the conduct of classroom instruction, and as one might imagine it being deployed, as the primary source of images in the classroom. The operations I will


79

describe are more or less under the university's administrative control, and savings achieved in any of them can at least theoretically be pushed to the bottom line or redirected elsewhere. I will also comment on student productivity. This is a much more problematic topic because we can barely imagine controlling or redirecting for productivity purposes any gains readers might achieve.

Productivity Gains Subject to Administrative Control

The comparative costs of selecting images and preparing them for instructional use in both the photographic and digital environments are set out in the four tables that follow. These tables are built from a cost model of more than three dozen facts, estimates, and assumptions about Professor Scully's course and the library support it requires.[16] The appendix presents the model, with some information obscured to protect confidentiality. I do not explain the details of the cost model here[17] but focus instead on what it tells us. One cautionary word is in order. The cost model generates the numbers given in the tables (rounded to the nearest dollar, producing minor summing errors), but these numbers are probably meaningful only to the nearest $500. In the discussion that follows, I round the numbers accordingly.

Table 4.1 compares the cost of library support for Professor Scully's course in its former dependence on photos exhibited in the study gallery and in its present dependence on digital images delivered in a Web site.[18]

Before the Scully Project, the university incurred about $7,000 in academic support costs for Professor Scully's course in the year it was taught. These costs over a six-year period, during which the course would be taught three times, are estimated at $22,000. As deployed in the fall of 1996, Web-site support for Professor Scully's course cost an estimated $21,000, or $34,000 over a six-year period. The result is a $12,500 balance arguing against digital provision of images in Professor Scully's course, or a 36% productivity loss in the use of university resources. However, a longer amortization period clearly works in favor of digital provision. The cost model suggests that the break-even point on the productive use of university resources comes in about 16 rather than 6 years.[19] This gradual improvement happens for the following reasons:

• The higher absolute cost of the digital images results from one-time staff and vendor cost of converting analog images to digital format. While there is some incremental growth in these costs over six years, staff costs for providing analog images grows linearly. The long-term structure of these costs favors digital provision.

• The cost of the "real" space of bricks and mortar needed to house the photo collection is substantial and grows every year. Similarly, the operation and maintenance of physical space carries the relative high increases of costs for staff and energy. By contrast, the "virtual" space of digital media is relatively inexpensive to begin with, and its unit cost is falling rapidly. Again, the longterm structure of costs favors digital provision.


80
 

TABLE 4.1. "As Done" Condition: 1,250 Images Used Primarily for Memory Training

1 st Year and Cumulative 6-Year Expenses

400 Photos

1.250 Digital Images

 

1 st Year

6-Year Total

1 st Year

6-Year Total

Preparation of Images

       

Full-time library staff for photo collection

   797

  2,392

   6,200

7,440

Library student staff

    10

     

Selection of digital images for digitizing

   

   6,200

7,440

Digitization of images

   

   2,800

3,360

Web site design

   

    1,500

  1,500

Selection of images for class use

       

Library student staff (mounting photos, etc.)

   310

    930

   

Teaching Fellows (selecting photos)

   980

  2,940

   

Teaching Fellows (selecting slides)

1,120

  3,360

  1,120

   3,360

Preservation of Images

       

Library student staff

   45

    271

   

Collection shelving space (capital)

    70

     417

   

Collection shelving space (maintenance)

19

      113

   

Digital storage and access

   

      470

    2,049

Study space

       

Photo study gallery (capital)

2,986

   8,959

   

Photo study gallery (maintenance)

   812

    2,436

   

Network connections

   

   3,000

     9,000

Totals

$7,149

$21,849

$21,290

$34,149

Film/photo less digital

   

($14,141)

($12,300)

Productive (unproductive) use of resources

     

     -36%

Funding source

       

Library

1,163

3,624

17,170

   21,789

Art history department

2,100

6,300

  1,120

     3,360

Universal space costs

3,887

11,925

  3,000

      9,000

Totals

$7,149

$21,849

$21,290

  $34,149

Along with the amortization period, the number of images digitized is another major variable that can be used to lower the total cost of digital provision and so move toward a productive use of resources. For years, it has been possible to mount no more than 400 photos in the study gallery. As Table 4.2 shows, if the Scully Web site had contained 400 digital images rather than 1,250 images, conversion costs


81
 

TABLE 4.2. "What If" Condition #1: 400 Images Used Primarily for Memory Training

1 st Year and Cumulative 6-Year Expenses

400 Photos

400 Digital Images

 

1 st Year

6-Year Total

1 st Year

6-Year Total

Preparation of Images

       

Full-time library staff for photo collection

797

2,392

2,067

2,480

Library student staff

  10

     30

   

Selection of digital images for digitizing

   

2,067

2,480

Digitization of Images

   

   933

1,120

Web site design

   

1,500

1,500

Selection of images for class use

       

Library student staff (mounting photos, etc.)

310

  930

   

Teaching Fellows (selecting photos)

  980

2,940

   

Teaching Fellows (selecting slides)

1,120

3,360

1,120

3,360

Preservation of images

       

Library student staff

   45

   271

   

Collection shelving space (capital)

    70

    417

   

Collection shelving space (maintenance)

     19

    113

   

Digital storage and access

   

    157

    682

Study space

       

Photo study gallery (capital)

2,986

  8,959

   

Photo study gallery (maintenance)

   8 12

   2,436

   

Network connections

   

  3,000

  9,000

Totals

$7,149

$21,849

$10,843

$20,622

Film/photo less digital

   

($3,694)

$1,227

Productive (unproductive) use of resources

     

      6%

Funding source

       

Library

  1,163

  3,624

  6,723

  8,262

Art history department

  2,100

   6,300

   1,120

  3,360

University space costs

  3,887

11,925

   3,000

  9,000

Totals

$7,149

$21,849

$10,843

$20,622


82

(italicized to isolate the changes from Table 4.1) would drop significantly, and the six-year cost of digital provision ($20,500) would be just under the cost of analog provision ($22,000). There is a 6% productivity gain over six years favoring digital provision.

The choice between 400 and 1,250 images has a dramatic impact on costs and productivity. That being so, one must ask what motivates the choice and what impact it has on student learning. Further consideration of this "what if" case is best deferred to the discussion of student productivity.

Speculation about another "what if" case is worthwhile. Professor Scully and his Teaching Fellows made no use of the Web site in the lecture hall or discussion sessions.[20] What if they had been able to depend on the Web site instead of traditional slides for their face-to-face teaching? There is of course a warm debate on whether digital images can match film images in quality or ease of classroom use. The question posed here speculatively assumes no technological reason to favor either analog or digital media and focuses solely on what happens to costs when classroom teaching is factored in.

Two changes are identified (in italics) in Table 4.3. They are (I) the cost savings when Teaching Fellows no longer need to assemble slides for the three classroom discussion sessions that each conducts during the term and (2) the added cost of equipping a classroom for digital instruction.

This "what if" modeling of the Scully Project shows an $11,000 negative balance, or a 34% loss in productivity. While digital provision in this scenario is not productive within six years, the significant comparison is with the 36% loss in productivity without using digital images in the classroom (Table 4.1). The conclusion is that substituting digital technology for the labor of selecting slides is itself productive and moves the overall results of digital provision modestly toward a productive use of university resources. This conclusion is strongly reinforced if one considers a variant "what if" condition in which the Teaching Fellows teach not just 3 of these discussion sessions in a classroom but all 14 of them, and in which each Fellow selects his or her own slides instead of depending in considerable measure on slides selected by the head Teaching Fellow. This scenario is modeled in Table 4.4. As a comparison of Tables 4.3 and 4.4 indicates, the weekly cost of selecting slides in this new scenario increases 12-fold, while the use of the electronic classroom increases fivefold. That the classroom costs are absolutely the lower number to begin with also helps drive this scenario to the highly favorable result of a 44% increase in productivity.

It is important to emphasize that these scenarios all assume that funds for Teaching Fellows are fungible in the same way that the library's operating and capital budgets are assumed to be fungible. Faculty and graduate students are most unlikely to make that assumption. Graduate education is one of the core products of a research university. The funds that support it will not be traded about in the way one imagines trades between the operating and capital funds being made for a unit, like the library, that supports education but does not constitute its core product.


83
 

TABLE 4.3. "What If" Condition #2: 1,250 Images Used for Memorization and Instruction

1 st Year and Cumulative 6-Year Expenses

400 Photos

1,250 Digital Images

 

1st Year

6-Year Total

1st Year

6-Year Total

Preparation of images

       

Full-time library staff for photo collection

   797

2,392

6,200

7,440

Library student staff

    10

    30

   

Selection of digital images for digitizing

   

6,200

7,440

Digitization of images

   

2,800

3,360

Web site design

   

1,500

1,500

Selection of images for class use

       

Library student staff(mounting photos, etc.)

  310

   930

   

Teaching Fellows (selecting photos)

   980

2,940

   

Teaching Fellows(selecting slides)

1,120

3,360

       0

     0

Preservation of images

       

Library student staff

   45

   271

   

Collection shelving space (capital)

   70

    417

   

Collection shelving space (maintenance)

    19

     113

   

Digital storage and access

   

     470

2,049

Study space

       

Photo study gallery (capital)

2,986

  8,959

   

Photo study gallery (maintenance)

   812

  2,436

   

Digitally equipped classroom (capital)

   

      692

2,075

Digitally equipped classroom (maintenance)

   

       69

     208

Network connections

   

  3,000

   9,000

Totals

$7,149

$21,849

$20,931

$33,071

Film/photo less digital

   

($13,782)

($11,222)

Productive (unproductive) use of resources

     

    -34%

Funding source

       

Library

1,163

   3,624

17,170

  21,789

Art history department

2,100

   6,300

          0

            0

University space costs

3,887

  11,925

   3,761

  11,283

Totals

$7,149

$21,849

$20,931

$33,071


84
 

TABLE 4.4. "What If" Condition #3: 1,250 Images Used for Memorization and Instruction

1st Year and Cumulative 6-Year Expenses

400 Photos

1,250 Digital Images

 

1st Year

6-Year Total

1st Year

6-Year Total

Preparation of images

       

Full-time library staff for photo collection

  797

2,392

6,200

7,440

Library student staff

    10

     30 

   

Selection of digital images for digitizing

   

6,200

7,440

Digitization of images

   

2,800

3,360

Web site design

   

1,500

1,500

Selection of images for class use

       

Library student staff (mounting photos, etc.)

  310

    930

   

Teaching Fellows (selecting photos)

   980

  2,940

   

Teaching Fellows (selecting slides, 700 hours)

14,000

42,000

0

0

Preservation of images

       

Library student staff

     45

    271

   

Collection shelving space (capital)

     70

    417

   

Collection shelving space
(maintenance)

      19

     113

   

Digital storage and access

   

470

2,049

Study space

       

Photo study gallery (capital)

  2,986

  8,959

   

Photo study gallery (maintenance)

     812

  2,436

   

Digitally equipped classroom (capital)

   

3,358

10,075

Digitally equipped classroom (maintenance)

   

336

1,008

Network connections

   

3,000

9,000

Totals

$20,029

$60,489

$23,864

$41,871

Film/photo less digital

   

($3,835)

$18,618

Productive (unproductive) use of resources

     

44%

Funding source

       

Library

  1,163

   3,624

17,170

21,789

Art history department

14,980

44,940

0

0

University space costs

   3,887

  11,925

       6,694

20,083

Totals

$20,029

$60,489

$23,864

$41,871


85

Productivity Gains Subject to Reader Control

Having accounted for the costs and potential productivity gains that are substantially under the university's administrative control, I will look briefly at potential productivity gains that lie beyond such control-the productivity of readers. In doing this we must consider the value of the qualitative differences between film and digital technologies for supporting Professor Scully's course. The availability of the images throughout the semester at all times of day and night, rather than just before exams, and the large increase in the number of images available for study constitute improvements in quality that make any discussion of increased productivity difficult-but interesting and important as well.

Students were enthusiastic about the convenience of the Web site. They could examine the images more closely, without competing for limited viewing space, at any time they wished. Without question this availability made their study time more efficient and possibly-though the evidence is inconclusive-more effective.

Let us focus first on the possibility that, as one of the Teaching Fellows observed, students learned more easily but did not learn more. Let us imagine, arbitrarily, that on average students were able to spend two hours less on memory training over the course of the semester because of easy and effective access to digital images. What is the value of this productivity gain for each of Professor Scully's 500 students? It would probably be possible to develop a dollar value for it, related to the direct cost and the short-term opportunity cost of attending Yale. Otherwise, there is no obvious way to answer the question, because each student will appropriately treat the time as a trivial consideration and use it with no regard for the resources needed to provide it. Whether the time is used for having coffee with friends, for sleeping, for volunteer community work, for additional study and a better term paper, or in some other way, the student alone will decide about the productive use of this time. And because there is no administrative means to cumulate the time saved or bring the student's increased productivity to bear on the creation of the information systems that enable the increase, there is no way to use the values created for the student in the calculation of how productive it was to spend library resources on creating the Scully Project.

The possibility that students would use the time they gain to prepare better for tests or to write a better paper raises the issue of quality improvements. How are we to think about the possibility that the teaching and learning that libraries support with digital information might become not only more efficient and productive, but also just better? What are the measures of better, and how were better educational results actually achieved? Was it, for instance, better to have 1,250 images for study rather than 400? The head Teaching Fellow answered with an unequivocal yes, affirming that she saw richer, more thoughtful comparisons


86

among objects being made in student papers. But some student responses suggested they wanted to have on the Web site only those images they were directly responsible for memorizing-many fewer than 1,250. Do more images create new burdens or new opportunities for learning? Which objectives and what standards should guide decisions about enhancing instructional support? In the absence of some economically viable way to support additional costs, how does one decide on quality enhancements?

Such questions about quality traditionally mark the boundary of productivity studies. Considerations of quality drive us to acknowledge that, for education, we generally do not have the two essential features needed to measure productivity: clear measures of outputs and a well-understood production technology that allows one to convert inputs into outputs.[21] In such an environment, we have generally avoided talking about productivity for fear that doing so would distort goals-as when competency-based evaluation produces students who only take tests well.[22] Moreover, the rhetoric of productivity can undermine socially rather than empirically validated beliefs among students, parents, and the public about how higher education achieves its purposes. All institutions of higher education depend fundamentally on the maintenance of such socially validated beliefs.

So I end this account of the Scully Project by observing that what we actually did was not productive, but could be made so by extending the amortization period for the project or by reducing the number of images provided to students.[23] It also appears that the project made study much more convenient for students and may well have enhanced their learning. Such quality improvement, even without measurable productivity gain, is one of the fundamental objectives of the library.

These are conditionally positive findings about the economic productivity and educational value of a shift from photographs to digital images to support instruction in the history of art. Such findings should be tested in other courses and, if confirmed, should guide further investment in digital imaging. The soft finding that the use of digital images in the classroom may be productive is heartening, given that digital images may support improvements in the quality of teaching by simplifying the probing of image details and by enabling much more spontaneity in classroom instruction.[24]

All of my arguments about the Scully Project posit that new investment in digital technology would be supported by reduced spending elsewhere. However, such reductions would be difficult, forcing us to regard capital and operating budgets-especially the funds that support both "real" and "virtual" space-as fungible. Other possible cost shifts involve even more fundamental difficulties. It is, for instance, a degree requirement at Yale that graduate students in the History of Art participate in undergraduate instruction. Teaching discussion sections in Professor Scully's course are often the first opportunity graduate students take for meeting this academic requirement. For this reason and others, none of the shifts imagined


87

in the scenarios described above would be easily achieved, and some would challenge us to revisit strongly embedded administrative practices and academic values. Funds rarely flow across such organizational boundaries. Failing to make at least some of these shifts would, however, imperil our ability to improve the quality and productivity of higher education.

Productivity as an Urgent Concern of Higher Education

For a long time, higher education has behaved as if compelling opportunities for improving student learning should be pursued without much attention to productivity issues. Our community has focused on desirable results, on the outputs of the productivity formula, without disciplined attention to the inputs part of the equation.[25] One result has been that expenditures per student at public universities in the United States grew between 1979 and 1989 at an average annual rate of 1.82% above inflation. The annual growth rate for private universities was a much higher 3.36%.[26]

It is hard to believe that such patterns of cost increase can be sustained much longer or that we can continue simply to increase the price of higher education as the principal means for improving it and especially for meeting apparently insatiable demands for information technology. We must seriously engage with issues of productivity. Otherwise, there will be little to determine the pace of technology innovation except the squeaky wheel of student or faculty demand or, less commonly, an institutional vision for technology-enhanced education. In neither case is there economically cogent guidance for the right level of investment in information technology. We are left to invest as much as we can, with nothing but socially validated political and educational ideas about what the phrase "as much as we can" actually means. Because we so rarely close the economic loop between the productivity value we create for users and our investment in technology, the language for decision making almost never reaches beyond that of improving convenience and enhancing quality. I believe it is vitally important for managers of information technology to understand the fundamental economic disconnect in the language of convenience and service we primarily use and to add the language of productivity to our deliberations about investing in information technology.

In connecting productivity gains with technology investment, we may find-as analysis of the Scully Project suggests-that some improvements can be justified while others cannot. Productivity measures should not be the sole guide to investment in information technology. But by insisting on securing productivity gains where we can, we will at least identify appropriate if sometimes only partial sources for funding new investments and thereby lower the rate at which overall costs rise in higher education above those in the rest of the economy.[27]

The stakes for higher education in acting on the productivity problems con-


88

fronting it are immense. Today, it is regularly asserted that administrative activities are wasteful and should be made more productive. But turning to core academic activities, especially teaching, we feel that no productivity gains can be made without compromising quality. Teaching is rather like playing in a string quartet. A string quartet required four musicians in Mozart's day, and it still does. To talk about making the performance of a string quartet more productive is to talk patent nonsense. To talk about making classroom teaching more productive seems to many almost as objectionable. The observable result is that higher education has had to live off the productivity gains of other sectors of the economy. The extreme pressure on all of higher education's income sources suggests that we are coming to the end of the time when people are willing uncritically to transfer wealth to higher education. Socially validated beliefs about the effectiveness of higher education are in serious jeopardy.[28] If our community continues to stare blindly at these facts, if we refuse to engage seriously with productivity issues on an institutional and community-wide basis, we will bring disaster upon the enterprise of teaching and learning to which we have devoted our professional lives.

If this seems alarmist, consider the work of 10 governors in the western United States intent on creating a high-tech, virtual university, the Western Governors' University.[29] Faced with growing populations and burgeoning demand for higher education, but strong taxpayer resistance to meeting that demand through the traditional cost structures of higher education, state officials are determined to create a much more productive regional system of higher education. That productivity is the key issue is evident in the statement of Alvin Meiklejohn, the chairman of the State Senate Education Committee in Colorado. "Many students in Colorado," he said, "are now taking six years to get an A.B. degree. If we could reduce that by just one year ... it would reduce the cost to the student by one-sixth and also free up some seats in the classrooms for the tidal wave we see coming our way."[30]

Senator Meiklejohn is looking for a 17% increase in productivity. I think library and information technology managers know where some of that gain may be found. If, however, we scoff at the idea of increasing student productivity through the use of information technologies, if we insist that the job of measuring and redirecting the productivity gains we create with information technology is impossible, if we trap ourselves in the language of convenience and fail to engage with issues of productivity, then the consequences-at least in the West-are clear. Major new investment in higher education will be directed not to established institutions but to new organizations that can meet the productivity standards insisted on by Senator Meiklejohn and the taxpayers he represents.

A second and larger groundswell in American life is also instructive on the question of productivity. Health care reform and managed care are both driven by the idea that the high cost and poor delivery of health care must change, that


89

costs must be controlled-that health care services must become much more productive.[31] Arguments about the incompatibility of higher productivity and the maintenance of quality care resonate strongly with parallel arguments about the impossibility of making higher education more productive without compromising quality. What makes the health care debate so instructive is that we already know which side will prevail. Everywhere we turn, medical institutions and the practitioners who lead them are scrambling to find ways to survive within a managed care environment. Survival means the preservation of quality care, to be sure, but the ineluctable reality is that quality will now be defined within terms set by managed care. We are beginning to find ways to talk about increased productivity and quality as complementary rather than as antithetical ideas.

Given the current state of public opinion about higher education, it is impossible for me to believe we will not soon follow health care. We will almost certainly find ourselves embroiled in divisive, rancorous debates about higher education reform. I hope we will avail ourselves in these debates of a language about information technology that continues to embrace ideas of convenience but reaches strongly beyond them. We will need to talk meaningfully about productivity and link our ability to create productivity gains with investment in information technology. And I hope we will follow the medical community in working to make productivity and quality regularly cognate rather than always antagonistic ideas.

For the past 150 years or so, libraries have been the guardians in the Western world of socially equitable access to information. Libraries have become public institutions instead of institutions serving powerful elites, as they once were. This is a noble heritage and a worthy ongoing mission for our profession. And information technology will play a key role in advancing it. As Richard Lanham argues in a landmark essay, "if our business is general literacy, as some of us think, then electronic instructional systems offer the only hope for the radically leveraged mass instruction the problems of general literacy pose."[32] But unless information technologies are employed productively, they will not offer the leverage on information access and literacy for which Lanham and others of us hope. Indeed, unless those who manage libraries and other instruments of scholarly discourse are prepared to embrace the language of productivity, we will find our ability to provide socially equitable access to information weakened as decisions are made about where investments for democratic education will be directed. I look at managed health care and the Western Governors' University and fear that traditional universities and their libraries will lose ground, not because we have failed to embrace information technology, but because we have failed to embrace it productively. I fear that outcome most because it imperils the wonderful accomplishment of libraries and because it could significantly weaken the public good that free libraries have been creating for the past 150 years.


90

Appendix
Cost Model for the Scully Project

The cost model uses the following facts, estimates, and assumptions:

Introduction to the History of Art, 112a

-Course offered once every two years; three times in six years

-Number of students enrolled in Scully course = 500/term

-Number of weeks Scully photos available in study space = 9 weeks per term

-Length of term = 14 weeks

-Number of Teaching Fellows for Scully course = 25

-Approximate value/hour of Teaching Fellow time = $20

-Hourly wage for library student staff = $6.46

Staff costs for selection, maintenance, and display of slide and photo images

-1 FTE permanent staff devoted to photo collection = $xx,xxx for salary and benefits

-% of permanent library staff effort devoted to Scully course = x%

-Library student staff devoted to photo collection = 40% of $11,500 = $4,600 at $6.46/hr = 712hrs

-Library student staff devoted to exhibiting Scully photos = 48 hrs/year

-Time spent by Teaching Fellows assembling photo study = 3.5 hr/wk ³ 14 wks = 49 hrs

-Time spent by Teaching Fellows assembling slides for review classes = 56 hrs

Cost to prepare digital images for instructional use

-Number of images in Scully Project = 1,250

-Digitization of images (outsourced) = $2,800

-Change in Scully Project Web site content over 6 years = 20%

-Selection and creation of images (by 2 Teaching Fellows) = $6,200

-Web site design = $1,500

Preservation and access costs for slide, photo, and digital images

-Library student staff hours spent on mending and maintenance of photos = 7 hrs/year

-Disk space required for Scully Project = .855 GB

-Disk space required per volume for Project Open Book = .015 GB

-Scully Project images = 57 Open Book vols

-Digital Storage costs = $2.58/year/Open Book vol


91

-Digital access costs = $5.67/year/Open Book vol

-Storage and access cost inflation = -13%/year

Study and other space costs

-Number of items in photo collection = 182,432

-Number of Scully photos mounted in study space = 200 for midterm; 400 for final

-NSF of photo collection in Street Hall = 1,733

-NSF collection shelving for Scully photos = 400/182,432 ³ (1,733 - 500) = 2.7

-NSF of photo study space = 2,019 + .25 ³ 1500 = 2,394

-% of photo study space devoted to Scully photos per term = 20%

-NSF of photo study space available for Scully photos = 2,394 ³ .2 ³ (9/28) = 154

-NSF of photo study space utilized during term = 154 ³ 75% = 116

-Annual cost of space maintenance = $7 NSF

-Cost of new construction = $300 NSF

-Amortization of capital costs at 8% over 35 yrs = $85.81 per $1,000

-Capital cost of converting existing classroom for digital display = $50,000 depreciated over 6 years

-Maintenance of digital classroom hardware and software = 10% of capital cost/year = $5,000/year

-Availability of digital classroom = 8 class hours ³ 5 days/wk ³ 28 wks ³ .8 efficiency factor = 896 sessions/yr

-Need by Scully grad assistants for digital classroom sessions = 25 ³ 3 = 75 sessions/yr = 8.3% of available sessions

-Annual cost of maintaining a network connection = $300

-% use of network connection for study of Scully Web site = 2%


95

Chapter 5—
Comparing Electronic Journals to Print Journals
Are There Savings?

Janet H. Fisher

Three years ago the rhetoric of academics and librarians alike urged publishers to get on with it-to move their publications from print to electronic formats. The relentless pressure on library budgets from annual increases of 10 to 20% in serials prices made many academics and librarians look to electronic publication as the savior that would allow librarians to retain their role in the scholarly communication chain. Academics and university administrators were urged to start their own publications and take back ownership of their own research. The future role of the publisher was questioned: What did they do after all? Since so many scholars were now creating their own works on computer, why couldn't they just put them up on the Net? Who needs proofreading, copyediting, and design anymore? And since technology has made it possible for everyone to become a publisher, surely electronic publication would be cheaper than print.

Quite a few experiments in the last three years have tried to answer some of the questions posed by the emergence of the Internet, but few have yielded hard numbers to date. Most experiments have been focused on developing electronic versions of print products. MIT Press took a piece of the puzzle that we saw as important in the long run and within the capabilities of a university-based journal publisher with space and staff constraints. Many of our authors had been using e-mail, mailing lists, discussion groups, and so on, for 10 years or more, and we wanted to be visible on the Internet early.

We decided it was easier, cheaper, and less of a financial risk to try publishing a purely electronic journal rather than reengineering our production and delivery process for our print journals when we had so little feedback about what authors and customers really wanted. Starting with Chicago Journal of Theoretical Computer Science (CJTCS), which was announced in late 1994 and which began publication in June of 1995, we began publishing our first purely electronic journals. CJTCS, as well as Journal of Functional and Logic Programming (JFLP) and Journal of Contem-


96

porary Neurology (JCN), are published article-by-article. We ask subscribers to pay an annual subscription fee, but we have not yet installed elaborate mechanisms to ensure that only those who pay have access to the full text. Studies in Nonlinear Dynamics and Econometrics (SNDE), begun in 1996, is published quarterly in issues with the full text password protected. Another issue-based electronic journal-Videre: Journal of Computer Vision Research -began publishing in the fall of 1997. You can view these publications at our Web site (http://mitpress.mit.edu/ ).

The lack of one format for all material available in electronic format has been a problem for these electronic journals and our production staff. The publication format varies from journal to journal based on several criteria:

• the format most often received from authors

• the content of the material (particularly math, tables, special characters)

• the cost to implement

• the availability of appropriate browser technology

CJTCS and JFLP are published in LaTeX and PostScript in addition to PDF (Adobe's Portable Document Format), which was added in 1997. JCN is published in PDF and HTML (Hypertext Markup Language, the language of the World Wide Web) because the PostScript files were too large to be practical. SNDE is published in PostScript and PDF. Videre is published in PDF.

Here I will present our preliminary results on the costs of electronic-only journals and compare them to the costs of traditional print journals. I will use Chicago Journal of Theoretical Computer Science as the model but will include relevant information from our experience with our other electronic journals.

Background on the Project

CJTCS was announced in fall of 1994 and began publication in June of 1995. Material is forwarded to us from the journal editor once the review process and revisions have been completed. Four articles were published from June through December of 1995, and six articles were published in 1996. The Web site is hosted at the University of Chicago, with entry from the MIT Press Web site. The production process includes the following steps:

1. manuscript is copyedited

2. copyedited manuscript is returned to author

3. author's response goes back to copyeditor

4. final copyedited article goes to "typesetter"

5. typesetter enters edits/tagging/formatting

6. article is proofread

7. author sees formatted version

8. typesetter makes final corrections

9. article is published (i.e., posted on the site)


97

Tagging and "typesetting" has been done by Michael J. O'Donnell, managing editor of CJTCS, who is a professor at University of Chicago.

The subscription price is $30 per year for individuals and $125 per year for institutions. When an article is published, subscribers receive an e-mail message announcing its publication. Included is the title, the author, the abstract, the location of the file, and the articles published to date in the volume. Articles are numbered sequentially in the volume (e.g., 1996-1, 1996-2). Individuals and institutions are allowed to use the content liberally, with permission to do the following:

• read articles directly from the official journal servers or from any other server that grants them access

• copy articles to their own file space for temporary use

• form a permanent archive of articles, which they may keep even after their subscription lapses

• display articles in the ways they find most convenient (on computer, printed on paper, converted to spoken form, etc.)

• apply agreeable typographical styles from any source to lay out and display articles

• apply any information retrieval, information processing, and browsing software from any source to aid their study of articles

• convert articles to other formats from the LaTeX and PostScript forms on the official servers

• share copies of articles with other subscribers

• share copies of articles with nonsubscribing collaborators as a direct part of their collaborative study or research

Library subscribers may also

• print individual articles and other items for inclusion in their periodical collection or for placing on reserve at the request of a faculty member

• place articles on their campus network for access by local users, or post article listings and notices on the network

• share print or electronic copy of articles with other libraries under standard interlibrary loan procedures

In February 1996, Michael O'Donnell installed a HyperNews feature to accompany each article, which allows readers to give feedback on articles. Forward pointers, which were planned to update the articles with appropriate citations to other material published later, have not yet been instituted.


98

Archiving arrangements were made with (1) the MIT Libraries, which is creating archival microfiche and archiving the PostScript form of the files; (2) MIT Information Systems, which is storing the LaTeX source on magnetic tape and refreshing it periodically; and (3) the Virginia Polytechnic Institute Scholarly Communications Project, which is mirroring the site (http://scholar.lib.vt.edu ).

Direct Costs of Publication

To date, CJTCS has published ten articles with a total of 244 pages. I have chosen to compare the direct costs we have incurred in publishing those 244 pages with the direct costs we incurred for a 244-page issue (Volume 8, Number 5, July 1996) of one of our print journals, Neural Computation (NC). NC has a print run of approximately 2,000 copies, and typesetting is done from LaTeX files supplied by the authors (as is the case for CJTCS) (Table 5.1). Several important differences in production processes affect these costs:

1. The number of articles published is different (10 in CJTCS, 12 in NC).

2. The copyeditor handles author queries for NC and bills us hourly. This contributed $100 to its copyediting bill.

3. Composition for CJTCS is done on a flat fee basis of $200 per article. Tagging and formatting has been done by Michael O'Donnell, the journal's managing editor at University of Chicago, because we were unable to find a traditional vendor willing to tag on the basis of content rather than format. The $200 figure was developed in conjunction with a LaTeX coding house that we planned to use initially but that was unable to meet the journal's schedule requirements. In comparison, the typesetting cost per article for NC is approximately $326, which includes a $58 per article charge for producing repro pages to send to the printer and a $21 per article charge for author alteration charges. These items are not included on the CJTCS composition bills.

For comparison, Table 5.2 shows the direct costs associated with three other electronic journals to date: Journal of Contemporary Neurology (JCN), Journal of Functional and Logic Programming (JFLP), and Studies in Nonlinear Dynamics and Econometrics (SNDE). JCN's cost per page is much higher than the other e-journals because the typesetter produces PDF and HTML formats and deals with complex images.

The issue-based electronic journal Studies in Nonlinear Dynamics and Econometrics (SNDE) is comparable in direct costs with a standard print journal, with the only difference being the lack of printing and binding costs. Table 5.3 is a comparison of the direct costs incurred for SNDE, Volume 1, Number 1, April 1996 (76 pages) and an 80-page issue (Volume 8, Number 4, Fall 1995) of one of our print journals, Computing Systems (COSY), that follows a similar production path.

Composition cost per page is comparable in these journals, but the total pro-


99
 

TABLE 5.1. Production Costs by Article of Electronic and Print Journals

 

CJTCS

NC

%Difference

Copyediting/proofreading

$1,114

$1,577

+42%

Composition

$2,070

$3,914

+89%

Printing and binding

-

$6,965

-

Total production cost

$3,184

$12,456

+291%

Composition cost per page

  $8,48

$16,24

+92%

Total production cost per page

$13,05

$51,05

+291%

 

TABLE 5.2. Cost per Page Comparison of Electronic Journals

 

#Pages

#Articles/Issues

Direct Costs

Cost/Pg

JCN

34

6 articles

$1,666

$49.00

JFLP

280

7 articles

$2,204

$7.87

SNDE

152

2 issues

$4,184

$27.53

 

TABLE 5.3. Cost per Issue Comparison of Electronic and Print Journals

 

SNDE 1:1

COSY 8:4

Copyediting/proofreading

$551

$554

Composition

$1,383

$1,371

Printing and binding

-

$6,501

Total production cost

$1,934

$8,426

Composition cost per page

$18.20

$17.57

Total production cost per page

$25.44

$105.33

duction cost per page of SNDE is only 24% of that of COSY, which includes the printing and binding costs associated with a 6,000-copy print run.

Indirect Costs

The overhead costs associated with CJTCS and the comparable issue of NC vary greatly. Overhead for our print journals is allocated on the following basis:

• Production-charged to each journal based on the number of issues published


100

• Circulation-charged to each journal based on the number of subscribers, the number of issues published, whether the journal has staggered or nonstaggered renewals, and whether copies are sold to bookstores and news-stands

• Marketing/General and Administrative-divided evenly among all journals

For CJTCS, MIT Press incurs additional overhead costs associated with the Digital Projects Lab (DPL). These include the cost of staff, and the cost of hardware and software associated with the Press's World Wide Web server. These costs are allocated to each electronic publication on the following basis:

• Costs of hardware and software for the file server, network drops, staff time spent maintaining the server, and so on, are allocated to each e-journal based on the percentage of disk space that the journal files occupy as a function of all Web-related files on our server

• Amount of time per issue or article that DPL staff work on the journal is multiplied by the rate per hour of staff

Table 5.4 shows a comparison of overhead costs associated with CJTCS and the comparable issue of NC. CJTCS's production overhead is much higher than NC's because it is almost the same amount of work to traffic individual articles as it is an entire issue. Even though each batch of material was much smaller in terms of pages than an issue of NC would have been, it still required virtually the same tracking and oversight. Correspondingly, the general and administrative overhead from the journals division for CJTCS is dramatically higher than NC because of the small amount of content published in CJTCS. The overhead costs associated with publishing CJTCS for 11/2 years had to be allocated to only 244 pages published, whereas NC published 2,320 pages in the same period of time.

JCN takes additional time from our DPL staff because of the HTML coding and linking of illustrations, which adds an additional $7 per page to its costs. The total of direct and indirect costs per page for JCN is, therefore, in line with our print journals even though there is no printing and binding expense. SNDE incurs an additional $1,400 per issue in indirect costs for the staff, hardware, and software in the DPL.

Market Differences

The other side of the picture is whether the market reacts similarly to electroniconly products. Since this question is outside the scope of this paper, I will only generalize here from our experience to date. For the four electronic journals we have started, the average paid circulation to date is approximately 100, with 20 to 40 of those being institutional subscriptions. For the two print journals we started in 1996 (both in the social sciences), the average circulation at the end of their first volumes (1996) was 550, with an average of 475 individuals and 75 institutions.


101
 

TABLE 5.4. Indirect Cost Comparison by Article of Electronic and Print Journals

 

CJTCS

NC 8:5

Journals department

   

Production

$8,000

$1,000

Fulfillment cost per subscriber

$108

$1

General and administrative

$31,050

$2,300

Digital projects lab

   

Staff

$200

-

Hardware and software

$5,000

-

Total overhead per subscriber

$44,358

$3,301

OH costs per page published

$182

$14

There appears to be a substantial difference in the readiness of the market to accept electronic-only journals at this point as well as reluctance on the part of the author community to submit material. It is, therefore, more difficult for the publisher to reach break even with only one-fifth of the market willing to purchase, unless subscription prices are increased substantially. Doing this would likely dampen the paid subscriptions even more.

Conclusion

From the comparison between CJTCS and NC, it seems that the direct costs of publishing an electronic journal are substantially below that of a print journal with comparable pages. The overhead costs, however, are much higher-1,240% higher in this case-but that figure is adversely affected by the small amount of content published in CJTCS over the course of 18 months of overhead costs compared with NC which published 12 issues over the same period of time. The disparity in the markets for electronic products and print products is, at this point in time, a very big obstacle to their financial viability, as is also the conservatism of the author community.


102

Chapter 6—
Electronic Publishing in Academia
An Economic Perspective

Malcolm Getz

The Library at Washington University reports 150,000 hits per year on its electronic, networked Encyclopedia Britannica at a cost to the library of $.04 per hit.[1] This use rate seems to be an order of magnitude larger than the use rate of the print version of the document in the library. At the same time, the volunteer Project Gutenberg, whose goal was to build an electronic file of 10,000 classic, public domain texts on the Internet, has failed to sustain itself.[2] The University of Illinois decided it could no longer afford to provide the electronic storage space and no other entity stepped forward to sustain the venture.[3]

A first lesson here is that production values, the quality of indexing and presentation, the packaging and marketing of the work, matter. Those ventures that take the approach of unrestricted free access don't necessarily dominate ventures that collect revenues. When a shopper asks, "What does it cost?" we can naturally respond, "What is it worth to you?" Electronic communication among academics is growing when it is valuable. In contemplating investments in electronic publishing, the publishers', and indeed academia's, goal is to create the most value for the funds invested. Generally, the freebie culture that launched the Internet represents only a subset of a much wider range of possible uses. Many quality information products that flow through the Net will be generating revenue flows sufficient to sustain them.

The Encyclopedia gives a second lesson, namely, that the costs of electronic distribution may be significanly less than print. Serviceable home encyclopedias on CD now cost about $50 and Britannica CD '98 Multimedia Edition is $125, a small fraction of the $1,500 price for the 32-volume print edition of the same Encyclopedia. Britannica also offers a World Wide Web subscription at $85 per year or $8.50 per month with a discount to purchasers of the print or CD product. The World Wide Web service is updated thrice annually and offers more articles than the print edition. Of course, the price charged for a given format may reflect differ-


103

ences in the price elasticities of demand. Nevertheless, the lower price for the electronic product is consistent with a considerable cost advantage.

Indeed, the latest word processing software includes tools that will allow anyone who uses word processing to create documents tagged for posting on the World Wide Web. Essentially, anyone who owns a current vintage computer with sufficient network connection can make formatted text with tables and graphics available instantly to everyone on the Net. The cost of such communication is a small fraction of the cost of photocopying and mailing documents.

An important consequence of the dramatic decline in the cost of sharing documents is the likelihood of a dramatic increase in the quantity of material available. Everyone who writes may post the whole history of their work on the Web at little incremental cost. Availability is then hardly an issue.

The challenge to academia is to invest in services that will turn the ocean of data into sound, useful, compelling information products. The process of filtering, labeling, refining, and packaging, that is, the process of editing and publishing, takes resources and will be shaped by the electronic world in significant ways. This essay is concerned with this process.

Scholar

Begin with first principles. Academia may become more useful to our society at large by communicating electronically. When electronic scholarship is more valuable, our institutions will invest more.

Scholarship plays three roles in our society. First, academia educates the next generation of professionals, managers, and leaders. Second, it makes formal knowledge available to society at large, stimulating the development of new products, informing debates on public policy, and improving understanding of our culture. Third, it develops new knowledge. Digital communication ought ultimately to be judged by how well it serves these three activities, teaching, service, and research. Consider each in turn.

Access to networked, digital information is already enhancing education. More students at more institutions have access to more information because of the World Wide Web. About 60% of high school graduates now pursue some college, and President Clinton has called for universal access to two years of college.[4] The importance of the educational mission is growing. Of course, today networked information is sporadic and poorly organized relative to what it might someday become. Still, the available search services, rapid access, and the wide availability of the network are sufficient to demonstrate the power of the tool. Contrast the service with a conventional two-year college library whose size depends on the budget of the institution, when access often depends on personal interaction with a librarian, and where a student must plan a visit and sometimes even queue for service. Access to well-designed and supported Web-based information gives promise of promoting a more active style of education. Students may have greater


104

success with more open-ended assignments, may participate in on-line discussion with others pursuing similar topics, and may get faster feedback from more colorful, more interactive materials. Integrating academic information into the wider universe of Web information seems likely to have important benefits for students when it is done well.

Similarly, many audiences for academic information outside the walls of the academy already use the World Wide Web. Engineering Information, Inc. (EI), for example, maintains a subscription Web site for both academic and nonacademic engineers.[5] A core feature of the service is access to the premier index to the academic engineering literature with a fulfillment service. But EI's Village offers on line access to professional advisers, conversations with authors, and services for practicing engineers. Higher quality, more immediate access to academic information seems likely to play an increasing role in the information sectors of our society, including nearly every career in which some college is a common prerequisite. Higher education seems likely to find wider audiences by moving its best materials to the networked, digital arena.

In the business of generating new knowledge, the use of networked information is already accelerating the pace. Working papers in physics, for example, are more rapidly and widely accessible from the automated posting service at Los Alamos than could possibly be achieved by print.[6] In text-oriented fields, scholars are able to build concordances and find patterns in ways impossible with print. Duke University's digital papyrus, for example, offers images of papyri with rich, searchable descriptive information in text.[7] In economics, the Web gives the possibility of mounting data sets and algorithmic information and so allows scholars to interact with the work of others at a deeper level than is possible in print. For example, Ray Fair maintains his 130-equation model of the U.S. economy on the Web with data sets and a solution method.[8] Any scholar who wants to experiment with alternative estimations and forecasting assumptions in a fully developed simulation model may do so with modest effort. In biology, the Human Genome Project is only feasible because of the ease of electronic communication, the sharing of databases, and the availability of other on-line tools.[9] In visually oriented fields, digital communication offers substantial benefits, as video and sound may be embedded in digital documents. Animated graphics with sound may have significant value in simulation models in science. In art and drama, digital files may allow comparative studies previously unimaginable. Digital communication, then, may have its most significant consequence in accelerating the development of new knowledge.

The pace of investment in digital communication within academia may well be led by its value in education, service broadly defined, and research. In each case, institutional revenues and success may depend on effective deployment of appropriate digital communication. Of course, individual scholars face a significant challenge in mastering the new tools and employing them in appropriate ways. It


105

is also worth emphasizing that not all things digital are valuable. However, when digital tools are well used, they are often significantly more valuable than print.

Publisher

The evolution of the digital arena will be strongly influenced by cost and by pricing policies. Cost is always a two-way street, a reflection, on the one hand, of the choices of authors and publishers who commit resources to publication and, on the other, of the choices of readers and libraries who perceive value. Publishers are challenged to harvest raw materials from the digital ocean and fashion valuable information products. Universities and their libraries must evaluate the possible ways of using digital materials and restructure budgets to deploy their limited resources to best advantage. Between publisher and library stands the electronic agent who may broker the exchange in new ways. Consider first the publisher.

The opportunity to distribute journals electronically has implications for the publishers' costs and revenues. On the cost side, the digital documents can be distributed at lower cost than paper. The network may also reduce some editorial costs. However, sustaining high production values will continue to involve considerable cost because quality editing and presentation are costly. On the revenue side, sale of individual subscriptions may, to some degree, yield to licenses for access via campus intranets and to pay-per-look services.

Publisher Costs

The central fact of the publishing business is the presence of substantial fixed cost with modest variable cost. The cost of gathering, filtering, refining, and packaging shapes the quality of the publication but does not relate to distribution. The cost of copying and distributing the publication is a modest share of the total expense. A publication with high production values will have high fixed costs. Of course, with larger sale, the fixed costs are spread more widely. Thus, popular publications have lower cost per copy because each copy need carry only a bit of the fixed cost. In thinking about a digital product, the publisher is concerned to invest sufficiently in fixed costs to generate a readership that will pay prices that cover the total cost.

There is a continuum of publications, from widely distributed products with high fixed costs but lower prices to narrowly distributed products with low fixed costs but higher prices. We might expect an even wider range of products in the digital arena.

To understand one end of the publishing spectrum, consider a publisher who reports full financial accounts and is willing to share internal financial records, namely, the American Economic Association (AEA). The AEA is headquartered in Nashville but maintains editorial offices for each of its three major journals in other locations. The AEA has 21,000 members plus 5,500 additional journal subscribers. Membership costs between $52 and $73 per year (students $26), and


106

members get all three journals. The library rate is $140 per year for the bundle of three journals. The association had revenues and expenditures of $3.7 million in 1995.

The AEA prints and distributes nearly 29,000 copies of the American Economic Review (AER), the premier journal in economics. The AER receives nearly 900 manuscripts per year and publishes about 90 of them in quarterly issues. A Papers and Proceedings issue adds another 80 or so papers from the association's annual meeting. The second journal, the Journal of Economic Perspectives (JEP), invites authors to contribute essays and publishes more topical, less technical essays, with 56 essays in four issues in 1995. The third journal, the Journal of Economic Literature (JEL), contains an index to the literature in economics that indexes and abstracts several hundred journals, lists all new English-language books in economics, and reviews nearly 200 books per year. The JEL publishes more than 20 review essays each year in four quarterly issues. The three journals together yield about 5,000 pages, about 10 inches of linear shelf space, per year. The index to the economic literature published in JEL is cumulated and published as an Index of Economic Articles in Journals in 34 volumes back to 1886 and is distributed electronically as EconLit with coverage from 1969. The Index and EconLit are sold separately from the journals.

This publisher's costs are summarized in Figure 6.1. Some costs seem unlikely to be affected by the digital medium, while others may change significantly. The headquarters function accounts for 27% of the AEA's budget. The headquarters maintains the mailing lists, handles the receipts, and does the accounting and legal work. It conducts an annual mail ballot to elect new officers and organizes an annual meeting that typically draws 8,000 persons.[10] The headquarters function seems likely to continue in about its current size as long as the AEA continues as a membership organization, a successful publisher, and a coordinator of an annual meeting.[11] Declining membership or new modes of serving members might lead to reduction in headquarters costs. In the short run, headquarters costs are not closely tied to the number of members or sale of journals.

The AEA's second function is editing, the second block in Figure 6.1. Thirty-six percent of the AEA's annual expenditures goes to the editorial function of its three journals. Eighty-eight percent of the editorial cost is for salaries. The editorial function is essential to maintaining the high production values that are necessary for successful information products.

Operating digitally may provide some cost saving in the editorial function for the American Economic Review. The editors could allow manuscripts to be posted on the Internet, and referees could access network copies and dispatch their comments via the network. The flow of some 1,600 referee reports that the AER manages each year might occur faster and at lower cost to both the journals and the referees if the network were used in an effective way.[12] However, the editorial cost


107

figure

Figure 6.1.
American Economic Association Expenses 1995
Source: Elton Hinshaw, "Treasurer's Report," American Economic Review,
May 1996, and unpublished reports.
Note: Percentages do not sum to 100% due to rounding.

will continue to be a significant and essential cost of bringing successful intellectual products to market. Top quality products are likely to have higher editorial costs than are lower quality products.

The top two blocks shown in Figure 6.1 describe the 38% of the AEA's total budget that goes to printing and mailing. These functions are contracted out and have recently gone through a competitive bid process. The costs are likely to be near industry lows. The total printing and mailing costs split into two parts. One part doesn't vary with the size of the print run and is labeled as fixed cost. It includes design and typesetting and thus will remain, to a significant degree, as a necessary function in bringing high quality products to market.[13] The variable-cost part of printing and mailing reflects the extra cost of paper, printing, and mailing individual paper issues. This 23% of total association expenditures, $800,000 out of $3.7 million total, might be reduced considerably by using distribution by network. However, as long as some part of the journal is distributed in print, the association will continue to incur significant fixed costs in printing.


108

In short, distribution of the journals electronically by network might lower the AEA's expenditures by as much as 23%.[14]

Publisher Revenue

Figure 6.2 summarizes the American Economic Association's revenues in six categories. Thirty-eight percent of revenue comes from individual memberships. Another 5% comes from the sale of advertising that appears in the journals. Nineteen percent comes from the sale of subscriptions, primarily to libraries. Another 19% comes from royalties on licenses of the EconLit database; most of these royalties come from SilverPlatter, a distributor of electronic databases. Less than half of one percent of revenues comes from selling rights to reprint journal articles. Finally, 18% of revenues come from other sources, primarily income from the cumulated reserves as well as net earnings from the annual meeting.[15]

Distributing the journals electronically by network seems likely to change the revenue streams. What product pricing and packaging strategies might allow the AEA to sustain the journals? If the journals are to continue to play an important role in the advance of the discipline, then the association must be assured that revenue streams are sufficient to carry the necessary costs.

If the library subscription includes a license for making the journals available by network to all persons within a campus, then a primary reason for membership in the association may be lost. With print, the main distinction between the library subscription and the membership subscription is that the member's copy can be kept at hand while the library copy is at a distance and may be in use or lost. With electronic delivery, access may be the same everywhere on the campus network. The license for electronic network distribution may then undercut revenues from memberships, a core 38% of AEA revenues.

The demand for advertising in the journals is probably motivated by distribution of journals to individual members. If individual subscriptions lag, then advertising revenue may fall as well. Indeed, one may ask the deeper question of whether ads associated with electronic journals will be salient when the journals are distributed electronically? The potential for advertising may be particularly limited if the electronic journals are distributed through intermediaries. If a database intermediary provides an index to hundreds of journals and provides links to individual articles on demand, advertising revenue may accrue to the database vendor rather than to the publisher of the individual journal.

The AEA might see 43% of its revenues (the 38% from member fees plus the 5% from advertising) as vulnerable to being cannibalized by network licensure of its journals. With only a potential 23% saving in cost, the association will be concerned to increase revenues from other sources so as to sustain its journals. The 20% shortfall is about $750,000 for the AEA. Here are three strategies: (I) charge libraries more for campus-use licenses, (2) increase revenues from pay-per-look services, (3) enhance services for members so as to sustain member revenues. Each


109

figure

Figure 6.2.
American Economic Association Revenues 1995
Source: Elton Hinshaw, "Treasurer's Report," American Economic
Review, May 1996, and unpublished reports.

of these strategies may provide new ways of generating revenue from existing readers, but importantly, may attract new readers.

The Campus License The association could charge a higher price to libraries for the right to distribute the electronic journals on campus networks. There are about four memberships for each library or other subscription. If membership went to zero because the subscriptions all became campus intranet licenses, then the AEA would need to recoup the revenues from four memberships from each campus license to sustain current revenues. If network distribution lowered AEA costs by 20%, then the campus intranet license need only recoup the equivalent of two memberships. Libraries currently pay double the rate of memberships, so the campus intranet license need be only double the current library subscription rate. That is, the current library rate of $140 would need to go to about $280 for a campus-wide intranet license for the three journals.[16] Of course, many campuses have more than one library subscription, say one each in the social science, management, law, and agriculture libraries. The association might then set a sliding scale of rates from $280 for a small (one library print subscription) campus to


110

$1,400 for a large (five library print subscription) campus.[17] These rates would be the total revenue required by the association for campus subscriptions, assuming that the library's print subscriptions are abandoned. A database distributor would add some markup.

The campus intranet rate for electronic access is easily differentiated from the print library subscription because it provides a license for anyone on the campus intranet to use the journals in full electronic format. This rate could be established as a price for a new product, allowing the print subscriptions to continue at library rates. Transition from print to electronic distribution could occur gradually with the pace of change set by libraries. Libraries would be free to make separate decisions about adding the campus intranet service and, later, dropping the print subscription.

Individual association members could continue their print subscriptions as long as they wish, reflecting their own tastes for the print product and the quality of service of the electronic one as delivered. Indeed, individual members might get passwords for direct access to the on-line journals. Some members may not be affiliated with institutions that subscribe to network licenses.

It is possible that the campus intranet license will be purchased by campuses that have not previously subscribed to the AEA's journals. If the institution's cost of participating in network delivery is much less than the cost entailed in sustaining the print subscription-for example, the avoidance of added shelf space as will be discussed below-then more campuses might sign on. This effect may be small for the AEA because it is the premier publisher in economics, but might be significant for other journal publishers.

Pay-Per-Look The AEA has had minimal revenues from reprints and royalties on copies. Indeed, it pioneered in guaranteeing in each issue of its journals a limited right to copy for academic purposes without charge.[18] The association adopted the view that the cost of processing the requests to make copies for class purposes (which it routinely granted without charge) was not worth incurring. By publishing a limited, no-charge right to copy, it saved itself the cost of managing the granting of permissions and saved campuses the cost of seeking them.

With electronic distribution, the campus intranet license will automatically grant permission for the journals to be used in course reserves and in print-on-demand services for classes.

On campuses with too little commitment to instruction in economics to justify a library subscription or a campus intranet license, there may still be occasional interest in use of journal articles. There may be law firms, businesses, consulting enterprises, and public interest groups who occasionally seek information and would value the intensity of exploration found in academic journals. With the ubiquitous Internet, they should be able to search a database on-line for a modest usage fee, identify articles of interest, and then call up such articles in full-image format on a pay-per-look basis. Suppose the Internet reaches a million people who are either


111

on campuses without print library subscriptions today or are not on campuses at all but who would have interest in some occasional use of the academic material. This market represents a new potential source of revenue for the AEA that could be reached by an Internet-based pay-per-look price.

What rate should the association set per page to serve the pay-per-look market without unduly cannibalizing the sale of campus intranet licenses? Let's take a one print library subscription campus rate at $280 per year for access to about 3,500 published pages of journal articles (leaving aside the index and abstracts). One look at each published article page per year at $.08 per page would equal the $280 license. A campus that had a distribution of users that averaged one look at each page would break even with the campus intranet license with a pay-per-look rate of $.08 per page. This rate is the rate of net revenue to the association; the database distributor may add a markup. For discussion, suppose the database distributor's markup is 100%. If the Internet users beyond the campus intranet licenses looked at 2 million pages per year at $.16 per page including fees to the Internet service provider, the association would recoup nearly a quarter of its lost membership revenue from the intranet licenses from this source.

A critical issue for the emergence of a pay-per-look market is the ability to account for and collect the charges with a low cost per transaction. If accounting and billing costs $10 per hit with hits averaging 20 pages, then the charge might be $14 per hit ($10 to the agent, $4 to the AEA). Such a rate compares well with the $3o-per-exchange cost incurred in conventional interlibrary loan. Yet such high transaction costs will surely limit the pay-per-look market.

A number of enterprises are offering or plan to offer electronic payment mechanisms on the Internet.[19] In the library world, RLG's WebDOC system may have some of the necessary features. These systems depend on users being registered in advance with the Web bank. As registered users, they have accounts and encrypted "keys" that electronically establish their identity to a computer on the Net. To make a transaction, users need only identify themselves to the electronic database vendor's computer using the "key" for authentication. The vendor's computer checks the authentication and debits the readers' account at the Web bank. In this fashion, secure transactions may occur over the network without human intervention at costs of a few cents per hit. If such Web banks become a general feature of the Internet, Web money will be used for a variety of purposes. The incremental cost of using Web banks for access to information should be modest and should allow the pay-per-look market to gain in importance. Markups per transaction might then be quite modest, with gross charges per page in the vicinity of $.10 to $.20. This rate compares with the $.04-per-hit cost of the Britannica mentioned in the opening sentence of this essay.

The core idea here is that individual readers make the decisions about when to look at a document under a pay-per-look regime. The reader must face a budget constraint, that is, have a limited set of funds for use in buying information products or other services. The fund might be subsidized by the reader's institution, but


112

the core choices about when to pay and look are made individually. When the core decision is made by the reader with limited funds, then the price elasticity of demand for such services may be high. With a highly elastic demand, even for-profit publishers will find that low prices dominate.

Current article fulfillment rates of $10 to $20 could fall greatly. The MIT Press offers to deliver individual articles from its electronic journals for $12. EI Village delivers reprints of articles by fax or other electronic means for fees in this range.

Enhanced Member Services A third strategy for responding to the possible revenue shortfall from the loss of memberships at the AEA would be to enhance membership services. One approach, proposed by Hal Varian, would be to offer superior access to the electronic journals to members only.[20] The electronic database of journal articles might be easily adapted to provide a personal notification to each member as articles of interest are posted. The association's database service for members might then have individual passwords for members and store profiles of member interests so as to send e-mail notices of appropriate new postings. The members' database might also contain ancillary materials, appendices to the published articles with detailed derivations of mathematical results offered in software code (for example, as Mathematica notebooks), copies of the numerical data sets used in empirical estimation, or extended bibliographies. The members' database might support monitored discussions of the published essays, allowing members to post questions and comments and allowing an opportunity for authors to respond if they wish. These enhancements generally take advantage of the personal relationship a member may want to have with the published literature, a service not necessarily practical or appropriate for libraries.

Indeed, one divide in the effort to distinguish member from library access to the journal database is whether the enhancement would have value to libraries if offered. Libraries will be asked to pay a premium price for a campus intranet license. They serve many students and faculty who are not currently members of the AEA and who are unlikely to become members in any event, for example, faculty from disciplines other than economics. Deliberately crippling the library version of the electronic journals by offering lower resolution pages, limited searching strategies, a delay in access, or only a subset of the content will be undesirable for libraries and inconsistent with the association's goal of promoting discussion of economics. However, there may be some demand for lower quality access at reduced prices. The important point is that for membership to be sustained, it must carry worthwhile value when compared to the service provided by the campus license.

Another approach is simply to develop new products that will have a higher appeal to members than to libraries. Such products could be included in the membership fee, but offered to libraries at an added extra cost. One such product would be systematic access to working papers in economics. Indexes, abstracts, and in some cases, the full text of working papers are available without charge at


113

some sites on the World Wide Web today. The association might ally itself with one of these sites, give the service an official status, and invest in the features of the working paper service to make it more robust and useful. Although freebie working paper services are useful, an enhanced working paper service for a fee (or as part of membership) might be much better.[21]

To the extent that enhanced services can sustain memberships in the face of readily available campus intranet access to journals, the premium for campus intranet access could be lower.

The AEA might offer a discount membership rate to those who opt to use the on-line version of the journals in lieu of receiving print copies. Such a discounted rate would reflect not only the association's cost saving with reduced print distribution but also the diminished value of membership given the increased prospect of campus intranet licenses.

To the extent that the pay-per-look market generates new revenue, then the campus intranet rate could also be less. The total of the association's revenues need only cover its fixed and variable costs. (The variable cost may approach zero with electronic distribution.) If membership revenues dropped by two-thirds and pay-per-look generated one-quarter of the gap, then the premium rate for the campus intranet license need be only one-third to one-half above current rates, say, $200 for a one library print subscription campus to $1,000 for a five library print subscription campus (net revenue to the association after the net distributor's markup).

Other Publishers

At the other end of the publishing spectrum from the AEA are those publishers who produce low-volume publications. Some titles have few personal subscriptions and depend primarily on library subscriptions that are already at premium rates. For these titles, replacing the print subscription with an intranet license will simply lower costs. The Johns Hopkins University Press offers its journals electronically at a discount in substitution for the print.

Some titles may have mostly personal subscriptions with no library rate, including popular magazines like the Economist. Such publications might simply be offered as personal subscriptions on the Internet with an individual password for each subscriber. The distribution by network would lower distribution costs and so ought to cause the profit-maximizing publisher to offer network access to individuals at a discount from the print subscription rate. Such a publication may not be available by campus intranet license.

The Journal of Statistics Education (JSE) is distributed via the Internet without charge. It began with an NSF/FIPSE grant to the North Carolina State University in 1993. The JSE receives about 40 manuscripts per year and, after a peer review, publishes about 20 of them.[22] The published essays are posted on a Web site and a table of contents and brief summaries are dispatched by e-mail to a list of


114

about 2,000 interested persons. JSE's costs amount to about $25,000 per year to sustain the clerical work necessary to receive manuscripts, dispatch them to suitable referees, receive referee reports, and return them to the author with the editor's judgment. The JSE also requires a part-time system support person to maintain the server that houses the journal. The JSE has not charged for subscriptions, receives no continuing revenue, and needs about $50,000 per year to survive. Merger with a publisher of other statistics journals may make sense, allowing the JSE to be bundled in a larger member service package. Alternatively, it might begin to charge a subscription fee for individuals and a campus license rate for libraries. Making the transformation from a no-fee to a fee-based publication may prove difficult. A critical issue is how much fixed cost is necessary to maintain reasonable production values in a low-volume publication. At present, JSE is seeking a continuing source of finance.

In general, a publisher will consider three potential markets: (1) the campus intranet license/library sale, (2) the individual subscription, and (3) the pay-per-look/individual article sale. These three markets might be served by one title with shared fixed costs. The issue of whether to offer the title in each market and at what price will reflect the incremental cost of making the title available in that market, the elasticity of demand in each market, and the cross price elasticities between markets. For example, the price of the campus license will have an effect on individual subscription sales and vice versa, and the price of the individual subscriptions will have an effect on the sale of individual articles and vice versa. The more elastic the demands, the lower the prices, even for for-profit publishers. With higher substitution between the three forms, the closer the prices will be across the three forms.[23]

Economies of Scope

To this point, the analysis applies essentially to one journal at a time, as though the journal were the only size package that counted. In fact, of course, the choice of size of package for information could change. Two centuries ago, the book was the package of choice. Authors generally wrote books. Libraries bought books. Readers read books. In the past 50 years, the size of package shifted to the journal in most disciplines. Authors write smaller packages, that is, articles, and get their work to market more quickly in journals. The elemental information product has become more granular. Libraries commit to journals and so receive information faster and at lower cost per unit. In deciding what to read, readers depend on the editors' judgment in publishing articles. In short, libraries buy bigger packages, the journals, while authors and readers work with smaller units, the articles.

With electronic distribution, the library will prefer to buy a still larger package, a database of many journals. A single, large transaction is much less expensive for a library to handle than are multiple, small transactions. Managing many journal titles individually is expensive. Similarly, readers may prefer access to packages


115

smaller than journal articles. They are often satisfied with abstracts. The electronic encyclopedia is attractive because it allows one to zip directly to a short, focused package of information with links to more. Authors, then, will be drawn to package their products in small bundles embedded in a large database with links to other elements of the database with related information. Information will be come still more granular.

If the database becomes the dominant unit of trade in academic information, then publishers with better databases may thrive. The JSTOR enterprise appears to have recognized the economies of scope in building a database with a large quantity of related journal titles. JSTOR is a venture spawned by the Mellon Foundation to store archival copies of the full historic backfiles of journals and make them available by network. The core motive is to save libraries the cost of storing old journals. JSTOR plans to offer 100 journal titles within a few years. Some of the professional societies, for example, psychology and chemistry, exploit economies of scope in the print arena by offering dozens of journal titles in their disciplines. Elsevier's dominance in a number of fields is based in part on the exploitation of scope with many titles in related subdisciplines. The emergence of economies of scope in the electronic arena is illustrated by Academic Press's offer to libraries in Ohio LINK. For 10% more than the cost of the print subscriptions the library had held, it could buy electronic access to the full suite of Academic Press journals electronically on Ohio LINK.

To take advantage of the economies of scope, the electronic journal might begin to include hot links to other materials in the database. The electronic product would then deliver more than the print version. Links to other Web sites is one of the attractive features of the Web version of the Encyclopedia Britannica. An academic journal database could invite authors to include the electronic addresses of references and links to ancillary files. Higher quality databases will have more such links.

The American Economic Association eschews scope in the print arena, preferring instead to let a hundred flowers bloom and to rely on competition to limit prices. Its collection of three journals does not constitute a critical mass of journal articles for an economics database, and so it must depend on integration with other economics journals at the database level. The Johns Hopkins University Press's MUSE enterprise suffers similar lack of scope. Although it has 45 journal titles, they are scattered among many disciplines and do not, collectively, reach critical mass in any field.

The emergence of more powerful, network-based working paper services seems likely to lower the cost of the editorial process, as mentioned above. A common, well-managed electronic working paper service might make the cost of adding a journal title much lower than starting a title from scratch without access to electronic working papers. The enterprise that controls a capable working paper service may well control a significant part of the discipline and reap many of the advantages of scope in academic publishing.


116

In fact, a capable electronic working paper service could support multiple editors of a common literature. One editor might encourage an author to develop a work for a very sophisticated audience and publish the resulting work in a top academic journal. Another editor might invite the author to develop the same ideas in a less technical form for a wider audience. Both essays might appear in a common database of articles and link to longer versions of the work, numerical data sets, bibliographies, and other related material. The published essays will then be front ends to a deeper literature available on the Net.

Rents

In addition to limiting the number of journals it produces, the American Economic Association differs from many publishers by emphasizing low cost. The price of its journals is less than half the industry average for economics journals, and the differential between library and individual rates is low.[24] If the AEA's goal were to maximize profit, it could charge authors more, charge members and libraries more, make more revenue from its meetings, and launch more products to take advantage of its reputation by extending its scope. The rents available in this marketplace are then left to the authors, members, libraries, and competing publishers. The AEA is not maximizing its institutional rents.

Other nonprofit publishers may seek higher revenues to capture more of the available rents and use the proceeds to generate more products and association services. Lobbying activities, professional certification and accreditation, more meetings, and more journals are common among professional societies.

Many for-profit publishers seek to maximize the rents they can extract from the marketplace for the benefit of their shareholders. In considering how to package and price electronic products, the for-profit publishers will continue to be concerned with finding and exploiting the available rents. The profit-maximizing price for a journal is determined by the price elasticity of demand for the title and the marginal cost of producing it. With convenient network access, there may be an increase in demand that would allow a higher price, other things being equal. How the price elasticity of demand might change with network access is unknown. The fall in marginal cost with electronic distribution need not lead to a lower price.

One might then ask how a shift to electronic publishing may affect the size of the rents and their distribution. A shift to the database as the optimal size package with falling marginal costs would seem both to increase the size of potential rents and to make easier their exploitation for profit. Suppose control of a powerful working paper service gives a significant cost advantage to journal publishers. Suppose further that academic institutions find major advantages in subscribing to large databases of information rather than making decisions about individual journal titles. The enterprise that controls the working paper service and the database of journals may then have considerable rent-capturing ability. The price elas-


117

ticities of demand for such large packages may be low and the substitutes poor, and so the markups over costs may be substantial. The possibility of a significant pay-per-look market with high price elasticity of demand might cause the profit-maximizing price to be lower. The possibility of self-publication at personal or small-scale Web sites offers a poor substitute to integration in a database because Web search engines are unlike to point to those sites appropriately.

Library

In contemplating how to take advantage of electronic publications, universities and their libraries face two problems. First, they face decisions about scaling back costly conventional operations so as to make resources available for acquiring electronic licenses. Second, the cost savings occur in a variety of ways, each with its own history, culture, and revenue sources. Although many boards of trustees and their presidents might like all the funds within their institutions to be fungible, in fact they face limitations on their ability to reduce expenditures in one area so as to spend more in another. If donors or legislatures are more willing to provide funds for buildings than for electronic subscriptions, then the dollar cost of a building may not be strictly comparable to the dollar cost of electronic subscriptions. Universities are investing more in campus networks and computer systems and are pruning elsewhere as the campuses become more digital. The following paragraphs consider how conventional operations might be pruned so as to allow more expenditure on electronic information products.

Conventional Library Costs

It is possible that some universities will view electronic access to quality academic journals as sufficiently attractive to justify increasing their library budget to accommodate the electronic subscriptions when publishers seek premium prices for electronic access. Some universities place particular emphasis on being electronic pioneers and seem willing to commit surprising amounts of resources to such activities. Other universities owe a debt to these pathfinders for sorting out what works. For most institutions, however, the value of the electronic journals will be tested by middle management's willingness to prune other activities so as to acquire more electronic journals. The library director is at the front line for such choices, and an understanding of the basic structure of the library's expenditures will help define the library director's choices.

Figure 6.3 provides a summary picture of the pattern of costs in conventional academic libraries. The top four blocks correspond to the operating budgets of the libraries. Acquisitions account for about a third of the operating budget. To give a complete picture, the bottom section of the figure also accounts for the costs of library buildings. The cost of space is treated as the annual lease value of the space including utilities and janitorial services. The total of the operating budget plus


118

figure

Figure 6.3.
Conventional Library Costs
Source: Heuristic characterization based on Association of Research Libraries
Annual Statistical Survey on expenditures on materials and operating
budgets and on the author's own studies of library space and technical
service costs.

the annualized cost of the building space represents a measure of the total institutional financial commitment to the library.

Library management typically has control only of the operating budget. Let's suppose that, on average, campus intranet licenses to electronic journals come at a premium price, reflecting both the electronic database distributor's costs as well as adjustments in publishers' pricing behavior as discussed above. The library, then, confronts a desire to increase its acquisition expenditure, possibly as much as doubling it.

A first choice is to prune expenditures on print so as to commit resources to digital materials. Some publishers offer lower prices for swapping digital for paper and in this case, swapping improves the library's budget. Some publishers may simply offer to swap digital for print at no change in price. However, many publishers may expect a premium gross price for digital access on the campus intranet. The library manager may seek to trim other acquisition expenditures so as to commit to more digital access. For several decades, academic libraries have been reducing the quantity of materials acquired so as to adjust to increases in prices.


119

The possibility of substantial cuts in the quantity of acquisitions so as to afford a smaller suite of products in electronic access seems unappealing and so may have limited effect.

A second possible budget adjustment is to prune technical service costs. The costs of processing arise from the necessity of tracking the arrival of each issue, claiming those that are overdue, making payments, adjusting catalog records, and periodically binding the volumes. If the electronic journal comes embedded in a database of many journals, the library can make one acquisition decision and one payment. It need have little concern for check-in and the claiming of issues. Testing the reliability of the database will be a concern, but presumably large database providers have a substantial incentive to build in considerable redundancy and reliability and will carefully track and claim individual issues. The library will avoid binding costs. The library will likely have some interest in building references to the electronic database into its catalog. Perhaps the database vendor will provide suitable machine readable records to automate this process.

A third possibility is the library's public service operations. Until a substantial quantity of materials are available and widely used via network, the demand for conventional library hours, reference, and circulation services may change only modestly. In 1996, one-third to one-half of the references in my students' essays were to World Wide Web sources. However, these sources generally were complements of conventional sources rather than substitutes for them. As frontline journals become commonly accessible by campus networks, the demand for conventional library services may decline. For example, campuses that operate departmental and small branch libraries primarily to provide convenient access to current journals for faculty might be more likely to consolidate such facilities into a master library when a significant number of the relevant journals are available on the Net. These changes are likely to take a number of years to evolve.

A fourth possibility concerns the cost of library buildings. When journals are used digitally by network, the need for added library space declines. Libraries will need less stack space to hold the addition of current volumes. In many larger libraries, lesser used, older volumes are currently held in less expensive, off-site facilities, with new volumes going into the prime space. The marginal stack space, then, is off-site, with costs of perhaps $.30 per volume per year for sustaining the perpetual storage of the added volumes.[25] Replacing a 100-year run of a journal with an electronic backfile ought to save about $30 per year in continuing storage costs at a low-cost, remote storage facility. Reductions in the extent of processing and in public services will also reduce requirements for space.

The library building expenses typically do not appear in operating budgets, so saving space has no direct effect on the library budget. The capital costs of buildings are frequently raised philanthropically or paid through a state capital budget, keeping the costs out of the university current accounts. Even utilities and janitorial services may appear in a general university operating budget rather than within the library account. Savings in building costs will accrue to those who fund


120

capital projects and to university general budgets but often not to the library operating budget. University presidents and boards may redirect their institutions' capital funds to more productive uses. Of course, the interests of philanthropy and the enthusiasm of state legislators may pose some limit on the ability to make such reallocations. Moreover, library building projects occur relatively infrequently, say every 25 years or so. The savings in capital may not be apparent for some time or, indeed, ever if capital budgets are considered independently of operating budgets. Library buildings, particularly the big ones in the middle of campuses, come to play a symbolic role, an expression of the university's importance, a place of interdisciplinary interaction, a grand presence. Because symbols are important, the master library facility will continue to be important. The marginal savings in building expense will probably be in compact or remote storage facilities and in departmental and smaller branch libraries. Digital access ought then to save the larger campus community some future commitment of capital, but the savings will be visible mostly to the president and board.

A fifth possibility is savings in faculty subscriptions. In law, business, and other schools in which faculty have university expense accounts, faculty may be accustomed to paying for personal subscriptions to core journals from the accounts. If the university acquires a campuswide network license for such journals, the faculty members may rely on the campus license and deploy their expense accounts for other purposes. By adjusting the expense account downward in light of the offering of campus licenses for journals, the university may reclaim some of the cost of the journals. On those campuses and in those departments in which faculty members do not have expense accounts and in which personal copies of core journals are necessary for scholarly success, the faculty salaries might be adjusted downward over a course of time to reflect the fact that faculty may use the campus license rather than pay for personal subscriptions. Indeed, when the personal subscriptions are not deductible under federal and state income taxes, the cost of subscriptions to the faculty in after-tax dollars may be greater than the cost to the university using before-tax dollars. As a result, a shift to university site licenses for core journals should be financially advantageous for faculty and the university.

In sum, the university may find a number of ways to economize by shifting to digital journals distributed by network. Although direct subscription prices may go up in some cases, the university may trim technical and public services, save space, and offer more perquisites to faculty at some saving in cost.

Electronic Agent

Publishers could establish their own digital distribution function by creating a Uniform Resource Locator (URL) for each title. The publisher would deal directly with libraries and individual readers. For a number of reasons, the publisher is likely to prefer to work with an agent for electronic distribution. Just as the typesetting and printing is usually performed by contractors, so the design and distri-


121

bution of electronic products is likely to involve specialized agents. However, the role of electronic distribution agent is becoming more important than that of the printer for two important reasons. The first arises because of economies of scale in managing access to electronic services. The second concerns the potential advantages of integrating individual journals into a wider database of academic information. The electronic agent accepts materials, say journal titles, from publishers and mounts them on electronic services to be accessed by the Internet. The agent captures economies of scale in maintaining the service and in supporting a common payment mechanism and a common search interface and search engine, and may take other steps to integrate articles and journal titles so that the whole is greater than the sum of the parts.

OCLC was an early entrant in the market for electronic distribution of academic journals with Online Clinical Trials. Online Clinical Trials was priced at $220 for institutions and $120 for individuals.[26] OCLC shifted to a World Wide Web interface in January 1997. In 1998, OCLC's First Search Electronic Collections Online offers access to hundreds of titles from many publishers. Most of the journals deliver page images using Adobe's PDF. OCLC's new approach offers publishers the opportunity to sell electronic access to journals by both subscription and pay-per-look.[27] It charges libraries an access fee based on the number of simultaneous users to be supported and the number of electronic journals to which the library subscribes. Libraries buy subscriptions from publishers. Publishers may package multiple titles together and set whatever rates they choose. The following discussion puts the strategies of OCLC and other electronic agents in a broader context.

Storage and Networks

With electronic documents, there is a basic logistical choice. A storage-intensive strategy involves using local storage everywhere. In this case, the network need not be used to read the journal. At the other extreme, the document might be stored once-for-the-world at a single site with network access used each time a journal is read. Between these two extremes is a range of choices. With the cost saving of fewer storage sites comes the extra cost of increased reliance on data communication networks.

Data storage is an important cost. Although the unit costs of digital storage have fallen and will continue to fall sharply through time, there is still a considerable advantage to using less storage. Data storage systems involve not simply the storage medium itself, but a range of services to keep the data on-line. A data center typically involves sophisticated personnel, backup and archiving activities, and software and hardware upgrades. If 10 campuses share a data storage facility, the storage cost per campus should be much less than if each provides its own. Having one storage site for the world might be the lowest storage cost per campus overall.


122

To use a remote storage facility involves data communication. The more remote the storage, the greater the reliance on data networks. A central problem for data communication is congestion. Data networks typically do not involve traffic-based fees. Indeed, the cost of monitoring traffic so as to impose fees may be cost-prohibitive. Monitoring network traffic so as to bill to individuals on the basis of use would require keeping track of the origin of each packet of data and accounting for it by tallying a register that notes source, time, and date. Because even simple mail messages may be broken into numerous packets for network shipment, the quantity of items to be tracked is much more numerous than tracking telephone calls. If every packet must go through the toll plaza, the opportunity for delay and single points of failure may be substantial. Because each packet may follow a different route, tracking backbone use with a tally on each leg would multiply the complexity. Traffic-based fees seem to be impractical for the Internet. Without traffic-based fees, individual users do not face the cost of their access. Like a driver on an urban highway at rush hour, each individual sees only his or her own trip, not the adverse effect of his or her trip in slowing others down. An engineering response to highway congestion is often to build more highways. Yet the added highways are often congested as well. In data networking, an engineering solution is to invent a faster network. Yet individuals deciding to use the network will see only their personal costs and so will have little incentive to economize. The demand for bandwidth on networks will surely grow with the pace of faster networks, for example, with personal videophones and other video-intensive applications. Without traffic-based pricing, congestion will be endemic in data networks.

Another response to network congestion is to build private networks with controlled access. Building networks dedicated to specific functions seems relatively expensive, but may be necessary to maintain a sufficient level of performance. Campus networks are private, and so access can be controlled. Perhaps investments in networking and technical change can proceed fast enough on individual campuses so as to allow the campus network to be reliable enough for access to journals and other academic information.

Because the telephone companies have launched data network services, they seem likely to introduce time-of-day pricing. Higher rates in prime time and higher rates for faster access speeds are first steps in giving incentives to economize the use of the network and so to reduce congestion. America Online (AOL) ran into serious difficulty when, in late 1996, it shifted from a per hour pricing strategy to a flat monthly rate to match other Internet service providers. AOL was swamped with peak period demand, demand it could not easily manage. The long distance telephone services seem to be moving to simpler pricing regimes, dime-a-minute, for example. The possibility of peak period congestion, however, likely means that some use of peak period pricing in telephones and in network services will remain desirable. In the end, higher education's ability to economize on data storage will depend on the success of the networks in limiting congestion.


123

figure

Figure 6.4.
Network Intensity and Database Integration

Some milestones in the choice of storage and networks are illustrated along the horizontal margin of Figure 6.4. The rapid growth of the World Wide Web in the last couple of years has represented a shift toward the right along this margin, with fewer storage sites and more dependence on data communication. The World Wide Web allows a common interface to serve many computer platforms, replacing proprietary tools. Adobe's Portable Document Format (PDF) seems to offer an effective vehicle to present documents in original printed format with equations, tables, and graphics, yet allow text searching and hypertext links to other Web sites. The software for reading PDF documents is available without charge, is compatible with many Web browsers, and allows local printing. Some of the inconveniences of older network-based tools are disappearing.

That rightward shift may offer the electronic agent an advantage over either the publisher or the library. That is, the electronic agent may acquire rights from publishers and sell access to libraries, while taking responsibility for an optimal


124

choice of storage sites and network access. Storage might end up in a low-cost location with the electronic agent responsible for archiving the material and migrating the digital files to future hardware and software environments.

Integration into a Database

The second advantage for an electronic agent is in integrating individual journal titles and other electronic materials into a coherent database. The vertical margin of Figure 6.4 sketches a range of possibilities. At root, a journal title stands as a relatively isolated vehicle for the distribution of information. In the digital world, each title could be distributed on its own CD or have its own URL on the Web. Third party index publishers would index the contents and provide pointers to the title and issue and, perhaps, to the URL. Indeed, the pointer might go directly to an individual article.

However, relatively few scholars depend on a single journal title for their work. Indeed, looking at the citations shown in a sampling of articles of a given journal reveals that scholars typically use a range of sources. A database that provides coherent access to several related journals, as in the second tier of Figure 6.4, offers a service that is more than the sum of its parts.

At yet a higher level, an agent might offer a significant core of the literature in a discipline. The core of journals and other materials might allow searching by words and phrases across the full content of the database. The database then offers new ways of establishing linkages.

At a fourth level, the organizing engine for the database might be the standard index to the literature of the discipline, such as EconLit in economics. A search of the database might achieve a degree of comprehensiveness for the published literature. A significant fraction of the published essays might be delivered on demand by hitting a "fulfill" button. Fulfillment might mean delivery of an electronic image file via network within a few seconds or delivery of a facsimile within a few minutes or hours.

At a fifth level, the database might include hot links from citations in one essay to other elements of the database. The database might include the published works from journals with links to ancillary materials, numeric data sets, computer algorithims, and an author's appendices discussing methods and other matters. The database might invite commentary, and so formal publications might link to suitably moderated on-line discussions.

In integrating materials from a variety of sources into a coherent database, the electronic agent may have an advantage over publishers who offer only individual journal titles. The agent might set standards for inclusion of material that specifies metatags and formats. The agent might manage the index function; indeed, the index might be a basis for forward integration with database distribution, as EI has done. This issue is discussed more fully below.

Integration of diverse materials into a database is likely to come with remote


125

storage and use of networks for access. Integrating the material into a database by achieving higher levels of coherence and interaction among diverse parts may be at lower cost for an electronic agent than for publishers of individual journals or for individual libraries. The agent is able to incur the cost of integration and storage once for the world.

Agent's Strategy

Given the interest of publishers in licensing their products for campus intranets and the universities' interest in securing such licenses, there is opportunity for enterprises to act as brokers, to package the electronic versions of the journals in databases and make them accessible, under suitable licenses, to campus intranets. The brokers may add a markup to reflect their cost of mounting the database. The size of the markup will reflect the extent of integration as well as the choice of storage strategy.

SilverPlatter became the most successful vendor of electronic index databases by making them available on CDs for use on campus intranets with proprietary software. OCLC plays an important role in offering such databases from its master center in Ohio. Ovid, a third vendor, supports sophisticated indexing that integrates full text with Standard Generalized Markup Language (SGML) and Hypertext Markup Language (HTML) tagging. A number of other vendors have also participated in the index market and are likely to seek to be brokers for the electronic distribution of journals.

A core strategy will probably be to mount the database of journals on one or more servers on the World Wide Web, with access limited to persons authorized for use from licensed campuses or through other fee-paid arrangements. This strategy has three important parts: (1) the database server, (2) the Internet communication system, and (3) the campus network.

The advantage of the World Wide Web approach is that the data can be made accessible to many campuses with no server support on any campus. A campus intranet license can be served remotely, saving the university the expense of software, hardware, and system support for the service.

The risk of the Web strategy is with the Internet itself and its inherent congestion. OCLC used a private data communication network so as to achieve a higher level of reliability than the Internet and will do the same to ensure high-quality TCP/IP (the Internet Protocol) access. Some campuses may prefer to mount database files locally, using CD-ROMs and disk servers on the campus network. Some high-intensity campuses may prefer to continue to mount the most used parts of databases locally, even at extra cost, as a method of ensuring against deficiencies in Internet services.

The third element, after storage and the Internet, is the campus network. Campus networks continue to evolve. Among the hundred universities seeking to be top-ten universities, early investment in sophisticated networking may play a


126

strategic role in the quest for rank. On such campuses, network distribution of journals should be well supported and popular. Other campuses will follow with some lag, particularly where funding depends primarily on the public sector. Adoption within 10 years might be expected.[28]

The electronic agent, then, must choose a strategy with two elements: (1) a storage and network choice and (2) an approach to database integration.

Journal publishers generally start at the bottom left of Figure 6.4, the closest to print. They could make a CD and offer it as an alternative to print for current subscribers. The AEA offers the Journal of Economic Literature on CD instead of print for the same price.

Moves to the upper left seem to be economically infeasible. Integrating more materials together increases local storage costs and so tilts the storage-network balance toward less storage and more network. With more data integration, the agent's strategy will shift to the right.

Moves to the lower right, with reduced storage costs and more dependence on networks, should involve considerable cost savings but run risks. One risk is of network congestion. A second is of loss of revenues because traditional subscribers drop purchases in favor of shared network access. The viability of these strategies depends on the level of fees that may be earned from network licenses or pay-per-look.

Moves along the diagonal up and to the right involve greater database integration with cost savings from lower storage costs and more dependence on networks. The advantage of moves upward and to the right is the possibility that integration creates services of significantly more value than the replication of print journals on the Internet. When database integration creates significantly more value, subscribers will be willing to pay premium prices for using products with remote storage with networks. Of course, network congestion will remain a concern.

A move toward more database integration raises a number of interesting questions. The answers to these questions will determine the size of the markup by the electronic agent. How much should information from a variety of sources be integrated into a database with common structure, tags, and linkages? For a large database, more effort at integration and coherence may be more valuable. Just how much effort, particularly how much hand effort, remains an open question. If the electronic agent passively accepts publications from publishers, the level of integration of materials may be relatively low. The publisher may provide an abstract and metatags and might provide URLs for linking to other network sites. The higher level of integration associated with controlled vocabulary indexing and a more systematic structure for the database than comes from journal titles would seem to require either a higher level of handwork by an indexer or the imposition of standard protocols for defining data elements. Is a higher level of integration of journal material from a variety of sources sufficiently valuable to justify its cost? The index function might be centralized with storage of individual journals distributed around the Net. Physical integration of the


127

database is not necessary to logical integration, but will common ownership be necessary to achieve the control and commonality necessary for high levels of integration?

A second question concerns how an agent might generate a net revenue stream from its initial electronic offerings sufficient to allow it to grow. The new regime will not be borne as a whole entity; rather, it will evolve in relatively small steps. Each step must generate a surplus to be used to finance the next step. Early steps that generate larger surpluses will probably define paths that are more likely to be followed. Experimentation with products and prices is already under way. Those agents finding early financial success are likely to attract publishers and libraries and to be imitated by competitors.

JSTOR has captured the full historic run of a significant number of journals, making the promise of 100 titles in suites from major disciplines within three years. However, it does not yet have a program for access to current journals. Its program is primarily to replace archival storage of materials that libraries may or may not have already acquired in print.

OCLC's approach is to sell libraries access services while publishers sell subscriptions to the information. The publisher can avoid the cost of the distribution in print, a saving if the electronic subscriptions generate sufficient revenue. The unbundling of access from subscription sales allows the access to be priced on the basis of simultaneous users, that is, akin to the rate of use, while the information is priced on the basis of quantity and quality of material made available. Of course, the information may also be priced on a pay-per-look basis and so earn revenue as it is used. What mix of pay-per-look and subscription sales will ultimately prevail is an open question.

A third question is whether publishers will establish exclusive arrangements with electronic agents or whether they will offer nonexclusive licenses so as to sustain competition among agents. Some publishers may prefer to be their own electronic agents, retaining control of the distribution channels. If database integration is important, this strategy may be economic only for relatively large publishers with suites of journals in given disciplines. Many publishers may choose to distribute their products through multiple channels, to both capture the advantages of more integration with other sources and promote innovation and cost savings among competing distributors.

As the electronic agents gain experience and build their title lists, competition among them should drive down the markups for electronic access. If the store-once-and-network strategy bears fruit, the cost savings in access should be apparent. If higher levels of database integration prove to be important, the cost savings may be modest. Cost savings here are in terms of units of access. As the cost of access falls, the quantity of information products used may increase. The effect on total expenditure, the product of unit cost and number of units used, is hard to predict. If the demand for information proves to be price elastic, then as unit costs and unit prices fall, expenditures on information will increase.


128

The electronic agents will gather academic journals from publishers and distribute them in electronic formats to libraries and others. They will offer all available advantages of scale in managing electronic storage, optimize the use of networks for distribution, offer superior search interfaces and engines, and take steps to integrate materials from disparate sources into a coherent whole. The agent will be able to offer campus intranet licenses, personal subscriptions, and pay-per-look access from a common source. The agent may manage sales, accounting, billing, and technical support. Today, agents are experimenting with both technical and pricing strategies. It remains to be seen whether single agents will dominate given content areas, whether major publishers can remain apart, or whether publishers and universities can or should sustain a competitive market among agents.

Conclusion

Higher education faces a significant challenge in discovering what academic information will succeed on the Net. In 1996, the MIT Press launched Studies in Non-linear Dynamics and Econometrics (SNDE), one of six titles that the Press distributes by network. The price per year is $40 for individuals and $130 for libraries.[29] MIT's strategy seems to be to launch titles in disciplines in which an electronic journal has some extra value, for example, links to computer code and data sets. The rates for the journals seem to be well below those quoted by OCLC's electronic journal program and lower than at least some new print journals. The cost of launching a new journal electronically seems to be falling. It remains to be seen whether the electronic journals will attract successful editors and valued manuscripts from authors, but the venture shows promise. The number and quality of electronic journals continues to grow. MIT has decided to forgo the use of an electronic agent and to depend only on conventional, independent indexing services for database integration, an incremental approach. Yet, the potential seems greater than an individual journal title reveals.

When Henry Ford launched the first mass-produced automobile, he chose a design that carried double the load, traveled three times farther, and went four times faster than the one-horse buggy it replaced, and yet was modestly priced. Successful digital information products for academia seem likely to exploit the inherent advantages of the digital arena, the timeliness, the sophisticated integration of new essays into the existing stock, the links from brief front-end items to more elaborate treatment, and the opportunity to interact with the material by asking for "fulfillment," "discussion," and the "underlying data." Network delivery will make possible both the campus intranet license and the sale of information on a pay-per-look basis. It will allow the material to be more readily consulted in circles beyond the academy.

Electronic agents will play significant new roles as intermediaries between publishers and campuses by handling the electronic storage and distribution and by integrating material into a more coherent whole. Universities and their libraries


129

will make adjustments in operations so as to expend less on conventional activities and more on digital communication.

Of course, there are unknowns. Agents and publishers will experiment to discover optimal pricing strategies. Agents will explore different ways of storing and delivering electronic products and different approaches to integration. Campuses and libraries will consider just what extra dimensions of service are worth their price. The process here is one of bringing order, meaning, and reliability to the emerging world of the Internet, of discovering what sells and what doesn't.

In the end, universities should be drawn to the electronic information services because of their superiority in instruction, their reach beyond the academy, and their power in the creation of new ideas. American higher education is largely shaped by competitive forces-the competition for faculty, students, research funding, and public and philanthropic support. In different ways, the private and public sector, the large institutions and the small, the two-year and four-year institutions all share the goal of doing a better, more cost-effective job of expanding the human potential. When artfully done, the digital sharing of ideas seems likely to expand that potential significantly.

I appreciate the help of Elton Hinshaw and the American Economic Association in understanding its operations, and the comments of Paul Gherman, David Lucking-Reiley, and Flo Wilson on an earlier draft of this essay.


133

Chapter 7—
JSTOR
The Development of a Cost-Driven, Value-Based Pricing Model

Kevin M. Guthrie

In the spring of 1996, when I was first asked for this contribution and was informed that my topic was pricing and user acceptance, I remember thinking it was quite a leap of faith, since JSTOR had neither a business model with prices, nor users. And we surely did not have user acceptance. Much has happened in a relatively short period of time, most notably the fact that JSTOR signed up 199 charter participants during the first three months of 1997. Our original projections were to have 50 to 75 participating institutions, so we are very encouraged to be off to such a good start.

The purpose of this brief case report is to summarize how JSTOR's economic model was developed, what we have learned along the way, and what we think the future challenges are likely to be. JSTOR is a work-in-progress, so it is not possible, nor would it be wise, to try to assert that we have done things "right." The jury is out and will be for quite some time. My goal is only to describe our approach to this point in the hope that doing so will provide useful experience for others working in the field of scholarly communication. In providing this summary I will try not to stray far from the organizing topic assigned to me-pricing and user acceptance-but I think it is impossible to separate these issues from more general aspects of a not-for-profit's organizational strategy and particularly its mission.

History

JSTOR began as a project of The Andrew W. Mellon Foundation designed to help libraries address growing and persistent space problems. Couldn't advances in technology help reduce the systemwide costs associated with storing commonly held materials like core academic journals? A decision was made to test a prototype system that would make the backfiles of core journals available in electronic


134

form. Mellon Foundation staff signed up journal publishers in history and economics and, working through a grant to the University of Michigan, began to create a database with associated controlling software that was made available to several test site libraries. It became evident very soon both that the concept was extremely complicated to implement and that it held great promise.

JSTOR was established as an independent not-for-profit organization with its own board of trustees in August 1995. From the outset, JSTOR was given the charge to develop a financial plan that would allow it to become self-sustaining-the Mellon Foundation was not going to subsidize the concept indefinitely. At the same time, JSTOR is fortunate to have had Mellon's initial support because enormous resources have been invested in getting the entity launched that never have to be paid back. Apart from the direct investments of funds in the development of software, production capacity, and mirror sites through grants to Michigan and Princeton, there were large investments of time and effort by Mellon Foundation staff. JSTOR has received, in effect, venture capital for which it need not produce an economic return. We have tried to translate these initial grants into lower prices for the services that we provide to JSTOR participants.

Defining The "Product"

Although JSTOR does not have to repay initial investments, it must have a mechanism to recover its ongoing costs. In developing a plan for cost recovery, our first step was to define exactly what it is that our "customers" would pay for-what is the "product"? On the face of it, this step sounds simple, but it is anything but that, especially given the rate of change of technology affecting the Internet and World Wide Web. For example, those publishers reading this paper who are working to put current issues in electronic form will know that even choosing the display format can be extremely difficult. Should the display files be images or text? If text, should they be SGML, PDF, HTML, SGML-to-HTML converted in advance, SGML-to-HTML converted on the fly, or some combination of these or other choices? The format that is chosen has far-reaching implications for present and future software capabilities, charging mechanisms, and user acceptance. It is easy to imagine how this decision alone can be paralyzing.

For nonprofit institutions like JSTOR, a key guidepost for making decisions of this type is the organization's mission. Nonprofits do not set out to maximize profits or shareholder wealth. In fact, they have been created to provide products or services that would not typically be made available by firms focused on maximizing profit. Consequently, not-for-profits cannot rely solely on quantitative approaches for decision making, even when such decisions are quantitative or financial in nature. Without such tools, having a clearly defined mission and using it to inform decisions is essential.

A good example of how JSTOR has relied on its mission for decision making is the question mentioned briefly above-choosing an appropriate display format.


135

We have decided to use a combination of images and text for delivery of the journal pages. We provide the images for display-so a user reads and can print a perfect replication of the original published page-and in the background we allow users to search the full text. This decision has been criticized by some people, but it is an appropriate approach for us, given the fact that our goal is to be a trusted archive and because JSTOR is now chiefly concerned with replicating previously published pages. There would be benefits to tagging the full text with SGML and delivering 100% corrected text files to our users, but because we also are committed to covering our costs, that approach is not practical. We are building a database of millions of pages and the effort required to do so is enormous. Digitizing even a single JSTOR title is a substantial undertaking. I have heard some people wonder why JSTOR is including "only" 100 journals in its first phase when other electronic journal initiatives are projecting hundreds, even thousands of journals. Presently, the 20 JSTOR journals that are available on-line have an average run of more than 50 years. So any calculation about the effort required for converting a single title needs to be multiplied 30 to 50 times to be comparable to the effort required to publish an electronic version of a single year of a journal. That imposes very real constraints.

Having a clear understanding of our fundamental mission has also allowed us to remain flexible as we confront a rapidly evolving environment. Trying to keep up with the technology is a never-ending task. We work hard to remain open to change, and at the same time we are committed to using the appropriate technology to fulfill our objective-no more, no less. Progress can grind to a halt quickly when so much is unknown and so much is changing, but our simple goal is to keep making progress. We recognize that by pushing forward relentlessly we will make some mistakes, but we are convinced that we cannot afford to stop moving if we are to build something meaningful in this dynamic environment.

So we established goals consistent with our mission and have made adjustments as we have gained experience. As mentioned previously, one of our fundamental goals is to serve as a trusted archive of the printed record. That means that output produced by the database has to be at least as good as the printed journals. A key determining factor in the quality of JSTOR printouts is the initial resolution at which the journal pages are scanned. Our original inclination was to scan pages at a resolution of 300 dots per inch (dpi). Anne Kenney[1] was a key advocate for scanning at 600 dpi when most people advised that 300 dpi was adequate and 600 dpi too expensive. Kenney made a strong case that scanning at 600 dpi is not just better than scanning at 300 dpi, but that, for pages comprised mainly of black-andwhite text, there are rapidly diminishing perceivable improvements in the appearance of images scanned at resolutions greater than 600 dpi. It made sense, given the predominance of text in our database, to make the additional investment to gain the assurance that the images we were creating would continue to be acceptable even as technologies continued to improve. We are pleased that we made this


136

choice; the quality of output now available from the JSTOR database is generally superior to a copy made from the original.

Another illustration of how it has been important for us to remain flexible concerns delivery of current issues. In the early days of JSTOR, several scholarly associations approached us with the idea that perhaps we could publish their current issues. The notion of providing scholars with access to the complete run of the journal-from the current issue back to the first issue-had (and has) enormous appeal. On the face of it, it seemed to make sense for JSTOR also to mount current issues in the database, and we began to encourage associations to think about working with us to provide both current issues and the backfiles. It was soon evident, however, that this direction was not going to work for multi-title publishers. These publishers, some of which publish journals owned by other entities such as scholarly associations, justifiably regarded a JSTOR initiative on current issues to be competition. They were not about to provide the backfile of a journal to us only to risk that journal's owners turning to JSTOR for electronic publication of current and future issues. Again, we had to make adjustments. We are now committed to working with publishers of current issues to create linkages that will allow seamless searches between their data and the JSTOR archive, but we will not ourselves publish current issues.[2] If we are to have maximum positive impact on the scholarly community, we must provide a service that benefits not only libraries and scholars but also publishers of all types, commercial and not-for-profit, multi-title and single-title. It is part of having a systemwide perspective, something that has been a central component of our approach from JSTOR's first days.

Determining Viability

Once we had framed the basic parameters of what we were going to offer, the key question we had to ask ourselves was whether the organization could be economically viable. Unfortunately, definitive answers to this question are probably never known in advance. The fact of the matter is that during their earliest phase, projects like JSTOR, even though they are not-for-profit, are still entrepreneurial ventures. They face almost all the same risks as for-profit start-ups, and the same tough questions must be asked before moving forward. Is there a revenue-generating "market" for the service to be provided?[3] Does the enterprise have sufficient capital to fund up-front costs that will be incurred before adequate revenue can be generated? Is the market large enough to support the growth required to keep the entity vibrant?

Pursuing this analysis requires a complicated assessment of interrelated factors. What are the costs for operating the entity? That depends on how much "product" is sold. How much product can be sold, and what are the potential revenues? That depends on how it is priced. What should be the product's price? That depends on the costs of providing it. Because these factors are so closely related,


137

none of them can be analyzed in isolation from the others; however, it is natural for a not-for-profit project focused on cost recovery to begin its assessment with the expense side of the ledger.

Defing the Costs

When the product or service is one that has not previously been offered, projecting potential costs is more art than science. Even if one has some experience providing a version of the product, as JSTOR had because of the Mellon initiative, one finds that the costs that have been incurred during the initial start-up period are irregular and unstable and thus not reliable for projecting beyond that phase. Even now, with nearly 200 paying participants, we still have much to learn about what our stable running costs are likely to be.

What we have learned is that our costs fall into six categories:

1. Production: identifying, finding, and preparing the complete run; defining indexing guidelines to inform a scanning subcontractor; and performing quality control on the work of the scanning subcontractor.

2. Conversion: scanning, OCR, and inputting of index information to serve as the electronic table of contents (performed by a scanning subcontractor).

3. Storage and access: maintaining the database (at a number of mirror sites), which involves continuous updating of hardware and systems software.

4. Software development: migrating the data to new platforms and systems and providing new capabilities and features to maximize its usefulness to scholars as technological capabilities evolve.

5. User support: providing adequate user help desk services for a growing user base.

6. Administration and oversight: managing the overall operations of the enterprise.

Some of these costs are one-time (capital) expenditures and some of them are ongoing (operating) costs. For the most part, production and conversion (#1 and #2 above) are one-time costs. We hope that we are digitizing from the paper to the digital equivalent only once.[4] The costs in the other categories will be incurred regardless of whether new journals are added to the database and are thus a reflection of the ongoing costs of the enterprise.[5]

Because the most visible element of what JSTOR provides is the database of page images, many people tend to think that the cost of scanning is the only cost factor that needs to be considered. Although the scanning cost is relevant, it does not reflect the total cost of conversion for a database like JSTOR. In fact, scanning is not even the most expensive factor in the work done by our scanning contractor. During the conversion process, JSTOR's scanning vendor creates an electronic table of contents, which is just as costly as the scanning. In addition, because


138

creating a text file suitable for searching requires manual intervention after running OCR software, that step has proven to be even more expensive than scanning. All told, the direct incremental costs of creating the three-part representation of a journal page in the JSTOR database (page image, electronic table of contents entry, and text file) is approximately $.75 to $1.00 per page.

Payments to the scanning bureau do not represent the complete production cost picture. Converting 100,000 pages per month requires a full-time staff to prepare the journals and to give the scanning bureau instructions to ensure that table of contents and indexing entries are made correctly. At present production levels, these costs are approximately equal to the outlays made to the scanning bureau. On average then, JSTOR production costs approach $2.00 per page.

Other costs of operating JSTOR are less easily segregated into their respective functional "department". Our present estimates are that once all of the 100 Phase I journals are available in the database, operating costs (independent of the onetime costs associated with production) will be approximately $2.5 million annually.

Defining Pricing

On the one hand, the obvious goal is to develop a pricing plan that will cover the $2.5 million in projected annual expenses plus whatever one-time productionrelated expenses are incurred in converting the journals. These production costs, of course, depend on the rate at which the content is being digitized. For projects designed to recover costs by collecting fees from users, it is also important to assess whether the value of the service to be provided justifies the level of expenditures being projected.

In JSTOR's case, we evaluated the benefits to participants of providing a new and more convenient level of access to important scholarly material while also attempting to calculate costs that might be saved by participants if JSTOR allowed them to free expensive shelf space. A central part of the reason for our founding was to provide a service to the scholarly community that would be both better and cheaper. That goal is one that remains to be tested with real data, but it can and will be tested as JSTOR and its participating institutions gain more experience.

Our initial survey of the research indicated that the cost of library shelf space filled by long runs of core journals was substantial. Using a methodology devised by Malcolm Getz at Vanderbilt and cost data assembled by Michael Cooper at UC-Berkeley, we estimated that the capital cost for storing a single volume ranged between $24 and $41.[6] It follows that storing the complete run of a journal published for 100 years costs the holding institution between $2,400 and $4,100. In addition, operating costs associated with the circulation of volumes are also significant, and resources could be saved by substituting centrally managed electronic access to the material. Estimates of these costs for some of our original test site libraries indicated that costs in staff time for reshelving and other maintenance functions ranged from $45 annually for a core journal at a small college to $180


139

per title at a large research library with heavy use. These estimates of savings do not take into account the long-term costs of preservation or the time saved by users in finding articles of interest to them.

Although these estimates were not used to set prices, they did give us confidence that a pricing strategy could be developed that would offer good value for participating institutions. We set out to define more specifically the key components of the service we would offer and attempted to evaluate them both in the context of our mission and our cost framework. We found that deciding how to price an electronic product was extraordinarily complex, and it was clear that there was no correct answer. This list is by no means exhaustive, but here are some of the key factors that we weighed in our development of a pricing approach:

• Will access be offered on pay-per-use model, or by subscription, or both?

• If by subscription, will the resource be delivered to individuals directly or via a campus site license?

• If by site license, how is the authorized community of users defined?

• Will there be price differentiation or a single price?

• If the price varies in some way for different types of licensees, what classifying approach will be used to make the determinations?

In making decisions, we weighed the merits of various options by evaluating which seemed most consistent with JSTOR's fundamental objectives. For example, we wanted to provide the broadest possible access to JSTOR for the academic community. Because pricing on a pay-per-use model usually yields prices higher than the marginal cost of providing the product, we determined that this approach was not consistent with our goal. We did not want to force students and scholars to have to decide whether it would really be "worth it" to download and print an article. We wanted to encourage liberal searching, displaying, and printing of the resource. In a similar vein, we concluded that it would be better to begin by offering institutional site licenses to participating institutions. We defined the site license broadly by establishing that authorized users would consist of all faculty staff and students of the institution, plus any walk-up patrons using library facilities.[7]

Another decision made to encourage broad access was our determination that different types of users should pay different prices for access. This approach is called price differentiation, which is very common in industries with high fixed costs and low marginal costs (like airlines, telecommunications, etc.). We decided to pursue a value-based pricing approach that seeks to match the amount that institutions would contribute with the value they would receive from participation. By offering different prices to different classes of institutions, we hoped to distribute the costs of operating JSTOR over as many institutions as possible and in a fair way.


140

Once we had decided to offer a range of price levels, we had to select an objective method to place institutions into different price categories. We chose the Carnegie Classification of Institutions of Higher Education for pricing purposes. Our reason for choosing the Carnegie Classes was that these groupings reflect the degree to which academic institutions are committed to research. Because the JSTOR database includes journals primarily used for scholarly research and would therefore be most highly valued by research institutions, the Carnegie Classes offered a rubric consistent with our aims. In addition to the Carnegie Classes, JSTOR factors in the FTE enrollment of each institution, making adjustments that move institutions with smaller enrollments into classes with lower price levels. We decided to break higher education institutions into four JSTOR sizes: Large, Medium, Small, and Very Small.

Having established four pricing classes and a means for determining what institutions would fill them, we still had to set the prices themselves. In doing so, we thought about both the nature of our cost structure and the potential for revenue generation from the likely community of participants. We noted immediately that the nature of JSTOR's cost structure for converting a journal-a large one-time conversion cost followed by smaller annual maintenance costs-was matched by the nature of the costs incurred by libraries to hold the paper volumes. In the case of libraries holding journals, one-time or capital costs are reflected in the cost of land, building, and shelves, while annual outlays are made for such items as circulation/reshelving, heat, light, and electricity. We decided, therefore, to establish a pricing approach with two components: a one-time fee (which we called the Database Development Fee, or DDF) and a recurring fee (which we called the Annual Access Fee, or AAF).

But what should those prices be? As mentioned previously, the long-term goal was to recover $2.5 million in annual fees while also paying the one-time costs of converting the journals to digital formats. Because it was impossible to model potential international interest in JSTOR, we limited our plan to U.S. higher education institutions. We conducted an assessment of the potential number of participants in each of our four pricing classifications. The number of U.S. higher education institutions in each category is shown in Table 7.1.

After thorough analysis of various combinations of prices, participation levels, and cost assumptions, we arrived at a pricing plan that we felt offered a reasonable chance of success. One other complicating aspect that arose as we developed the plan was how to offer a one-time price for a resource that was constantly growing. To deal with that problem, we defined our initial product, JSTOR-Phase I, as a database with the complete runs of a minimum of 100 titles in 10 to 15 fields. We promised that this database would be complete within three years. Prices for participation in JSTOR-Phase I are shown in Table 7.2.

These prices reflect the availability of the complete runs of 100 titles. For a Large institution, perpetual access to 80 years of the American Economic Review (1911-1991) would cost just $400 one-time and $50 per year. For a Small institution,


141
 

TABLE 7.1. Number of U.S. Higher Education Institutions by JSTOR Class

JSTOR Class

Number of Institutions

Large

     176

Medium

     589

Small

    166

Very small

    471

Total

1,402

 

TABLE 7.2. JSTOR Prices-Phase I

JSTOR Class

One-Time Database Development Fee (DDF)

Annual Access Fee (AAF)

Large

$40,000

$5,000

Medium

30,000

4,000

Small

20,000

3,000

Very small

10,000

2,000

the cost would be only $200 one-time and $30 per year. For comparison, consider that purchasing microfilm costs more but offers far less convenient access. Also, institutions that find it possible to move print copies to less expensive warehouses or even to remove duplicate copies from library shelves will capture savings that consists of some or all of the shelving and circulation costs outlined earlier in this paper. (For 80 volumes, that analysis projected capital costs between $24 and $41 per volume, or $1,920 to $3,280 for an 80-volume run. Also, annual circulation costs were estimated as $180 per year for a Large institution.)

We purposely set our prices low in an effort to involve a maximum number of institutions in the endeavor. We are often asked how many participating institutions are needed for JSTOR to reach "breakeven." Because the total revenue generated will depend upon the distribution of participants in the various class sizes, there is no single number of libraries that must participate for JSTOR to reach a self-sustaining level of operations. Further, since our pricing has both one-time and recurring components, breakeven could be defined in a number of ways. One estimate would be to say that breakeven will be reached when revenues from annual access fees match non-production-related annual operating expenditures (since the production-related costs are primarily one-time). Although this guide is useful, it is not totally accurate because, as mentioned previously, there are costs related to production that are very difficult to segregate from other expenses. Another approach would be to try to build an archiving endowment and to set a target endowment size that would support the continuing costs of maintaining and migrating the Phase I archive, even if no additional journals or participants were


142

added after the Phase I period. Our plan combines these two approaches. We believe it is important to match the sources of annual revenues to the nature of the purposes for which they will be used. We require sufficient levels of annual inflows to cover the costs of making JSTOR available to users (user help desk, training, instruction, etc.). These inflows should be collected by way of annual access fees from participants. There is also, however, the archiving function that JSTOR provides, which is not directly attributable to any particular user. Like the role that libraries fill by keeping books on the shelves just in case they are needed, JSTOR's archiving is a public good. We must build a capital base to support the technological migration and other costs associated with this archiving function.

Like our approach to other aspects of our organizational plan, we remain open to making adjustments in pricing when it is fair and appropriate and it does not put our viability at risk. One step we took was to offer a special charter discount for institutions that chose to participate in JSTOR prior to April 1, 1997. We felt it was appropriate to offer this discount in recognition of participants' willingness to support JSTOR in its earliest days. We also have made minor adjustments in the definitions of how Carnegie Classes are slotted into the JSTOR pricing categories. In our initial plan, for example, we included all Carnegie Research (I and II) and Doctoral (I and II) institutions in the Large JSTOR category. Subsequent conversations with librarians and administrators made it clear that including Doctoral II institutions in this category was not appropriate. There proved to be a significant difference in the nature of these institutions and in the resources they invest in research, and so an adjustment was made to place them in the Medium class. Any such adjustments that we have made have not been for a single institution, but for all institutions that share a definable characteristic. We strive to be fair; therefore, we do not negotiate special deals.

There is a component of our pricing strategy that needs some explanation because it has been a disappointment to some people, that is, JSTOR's policy toward consortia. JSTOR's pricing plan was developed to distribute the costs of providing a shared resource among as many institutions as possible. The same forces that have encouraged the growth of consortia-namely, the development of technologies to distribute information over networks-are also what make JSTOR possible. It is not necessary to have materials shelved nearby in order to read them. A consequence of this fact is that marginal costs of distribution are low and economies of scale substantial. Those benefits have already been taken into account in JSTOR's economic model. In effect, JSTOR is itself a consortial enterprise that has attempted to spread its costs over as much of the community as possible. Offering further discounts to large groups of institutions would put at risk JSTOR's viability and with it the potential benefits to the scholarly community.

A second significant factor that prevents JSTOR from offering access through consortia at deep discounts is that the distribution of organizations in consortia is uneven and unstable. Many institutions are members of several consortia, while some are in none at all (although there are increasingly few of those


143

remaining). If the consortial arrangements were more mature and if there was a one-to-one relationship between the institutions in JSTOR's community and consortial groups, it might have been possible for JSTOR to build a plan that would distribute costs fairly across those groups. If, for example, every institution in the United States was a member of one of five separate consortia, a project like JSTOR could divide its costs by five and a fair contribution could be made by all. But there are not five consortia; there are hundreds. The patchwork of consortial affiliations is so complex that it is extremely difficult, if not impossible, to establish prices that will be regarded as fair by participants. JSTOR's commitment to share as much of what it learns with the scholarly community as possible requires that there be no special deals, that we be open about the contributions that institutions make and their reasons for making them. Our economic model would not be sustainable if two very similar institutions contributed different amounts simply because one was a member of a consortium that drove a harder bargain. Instead, we rely on a pricing unit that is easily defined and understood-the individual institution. And we rely on a pricing gradient, the Carnegie Classification, which distributes those institutions objectively into groupings that are consistent with the nature and value of our resource.

Conclusion

The initial response to JSTOR's charter offer in the first three months of this year is a strong signal that JSTOR will be a valued resource for the research community; however, it is still far too early to comment further on "user acceptance." Tom Finholt and JoAnn Brooks's research (see chapter 11) into usage at the test site libraries provides a first snapshot, but this picture was taken prior to there being any effort to increase awareness of JSTOR in the community and on the specific campuses. There is much to learn. JSTOR is committed to tracking usage data both for libraries and publishers and to providing special software tools to enable users to create usage reports tailored to their own needs and interests. We will continue to keep the academic community informed as we learn more.

While we are encouraged by the positive reaction of the library community to JSTOR, we recognize that this good start has raised expectations and has created new challenges. In addition to the challenges of reaching our 100-title goal before the end of 1999, trying to encourage the next 200 libraries to participate, and keeping up with changing technologies, we face other complex challenges, including how to make JSTOR available outside the United States and how to define future phases of JSTOR. Addressing these issues will require the development of new strategic plans and new economic and pricing models. In creating those plans, we know that we will continue to confront complicated choices. As we make decisions, we will remain focused on our mission, making adjustments to our plans as required to keep making progress in appropriate ways.


144

145

Chapter 8—
The Effect of Price:
Early Observations

Karen Hunter

Introduction

Scientific journal publishers have very little commercial experience with electronic full text distribution, and it is hard, if not impossible, to segregate the effect of pricing on user acceptance and behavior. Most experiments or trial offers have been without charge to the user. Most paid services have targeted institutional rather than individual buyers. Nevertheless, we can look at some of the known experiences and at ongoing and proposed experiments to get some sense of the interaction of pricing and acceptance and of the other factors that seem to affect user behavior. We can also look at institutional buying concerns and pricing considerations.

In the Basic Paper World

Many journals have offered reduced prices to individuals. In the case of journals owned by societies or other organizations, there are generally further reductions in the prices for members. It is important to the society that members not only receive the lowest price but can clearly see that price as a benefit of membership. The price for members may be at marginal cost, particularly if (1) the size of the membership is large, (2) subscriptions are included as a part of the membership dues, and (3) there is advertising income to be gained from the presence of a large individual subscription base. This third factor is commonly seen in clinical medical journals, where the presence of 15,000 or 30,000 or more individual subscribers leads to more than $1 million in advertising income-income that would be near zero without the individual subscription base. Publishers can "afford" to sell the subscriptions at cost because of the advertising.

For many other journals, including most published by my company, there either are no individual rates or the number of individual subscribers is trivial. This


146

is largely because the size of the journals, and therefore their prices, are sufficiently high (average $1,600) that it is difficult to set a price for individuals that would be attractive. Giving even a 50% reduction in price does not bring the journal into the price range that attracts individual purchasers.

One alternative is to offer a reduced rate for personal subscriptions to individuals affiliated with an institution that has a library subscription. This permits the individual rate to be lower, but it is still not a large source of subscriptions in paper. The price is still seen as high (e.g., the journal Gene has an institutional price of $6,144 in 1997 and an associated personal rate of $533; the ratio is similar for Earth and Planetary Sciences Letters -$2,333 for an institutional subscription, $150 for individuals affiliated with that institution.) This alternative still draws only a very limited number of subscribers.

We have not recently (this decade) rigorously tested alternative pricing strategies for this type of paper arrangement nor talked with scientists to learn specifically why they have or have not responded to an offer. This decision not to do market research reflects a view that there is only limited growth potential in paper distribution and that the take-up by individuals (if it is to happen) will be in an electronic world.

Altering Services

There is some experience with free distribution, which may be relevant. Over the last decade we have developed a fairly large number of electronic and paper services designed to "alert" our readers to newly published or soon-to-be-published information. These services take many forms, including lists of papers accepted for publication; current tables of contents; groupings of several journals in a discipline; journal-specific alerts; and inclusion of additional discipline-specific news items. Some are mailed. Some are electronically broadcast. Others are electronically profiled and targeted to a specific individual's expressed interest. Finally, some are simply on our server and "pulled" on demand.

All are popular and all are sent only to users who have specifically said they want to receive these services. The electronic services are growing rapidly, but the desire for those that are paper-based continues. We even see "claims" for missing issues should a copy fail to arrive in the mail. What we conclude from this response is that there is a demand for information about our publications-the earlier the better-and that so long as it is free and perceived as valuable, it will be welcomed. Note, however, that in the one case where, together with another publisher, we tried to increase the perceived value of an alerting service by adding more titles to the discipline cluster and adding some other services, there was noticeable resistance to paying a subscription for the service.

Electronic Pricing

In developing and pricing new electronic products and services, journal publishers may consider many factors, including (in random order):


147

• the cost of creating and maintaining the service;

• the possible effect of this product or service on other things you sell ("cannibalization" or substitution);

• the ability to actually implement the pricing (site or user community definitions, estimates of the anticipated usage or number of users, security systems);

• provision for price changes in future years;

• what competitors are doing;

• the functionality actually being offered;

• the perceived value of the content and of the functionality;

• the planned product development path (in markets, functionality, content);

• the ability of the market to pay for the product or service;

• the values that the market will find attractive (e.g., price predictability or stability);

• the anticipated market penetration and growth in sales over time;

• the market behavior that you want to encourage;

• and, not inconsequentially, the effect on your total business if you fail with this product or service.

To make informed judgments, you have to build up experience and expertise. Pricing has long been an important strategic variable in the marketing mix for more mature electronic information players. They have more knowledge of how a market will react to new pricing models. For example, more than five years ago, you would see at an Information Industry Association meeting staff from business, financial, and legal on-line services with titles such as Vice President, Pricing. Nothing comparable existed within the journal publishing industry. A price was set, take it or leave it, and there was little room for nuance or negotiation.

This situation is now changing. Many large journal publishers are actively involved in either negotiating pricing agreements or, under fixed terms, negotiating other aspects of the licensed arrangement that relate to the effective price being paid (such as number of users, number of simultaneous accesses, etc.). At Elsevier in 1996, we engaged consultants to make a rigorous study to assist us in developing pricing models for electronic subscriptions and other electronic services. What we found was that we could not construct algorithms to predict buying behavior in relation to price. That finding has not stopped us from trying to pursue more sophistication in pricing-and indeed, we have now hired our own first full-time Director of Pricing-but until we build up more experience, our pricing decisions are still often a combination of tradition, strategic principle, gut feeling, and trial and error. We do have, however, a view on the desired long-term position and how we want to get there.


148

Too often, some buyers argue that pricing should be based solely on cost (and often without understanding what goes into the cost). They sometimes express the simplistic view that electronic journals are paper journals without the paper and postage and should therefore be priced at a discount. That view clearly is naive because it overlooks all of the new, additional costs that go into creating innovative electronic products (as well as maintaining two product lines simultaneously). Indeed, if you were to price right now on simply the basis of cost, the price for electronic products would likely be prohibitively high.

It is equally doubtful whether you can accurately determine the value added from electronic functionality and set prices based exclusively on the value, with the notion that as more functionality is added, the value-therefore, the price-can be automatically increased. Some value-based pricing is to be expected and is justified, but in this new electronic market there are also limited budgets and highly competitive forces, which keep prices in check. At the same time, it is not likely that the "content" side of the information industry will totally follow the PC hardware side, in other words, that the prices will stay essentially flat, with more and more new goodies bundled in the product. Hardware is much more of a competitive commodity business.

Pricing components are now much more visible and subject to negotiation. In discussions with large accounts, it is assumed that there will be such negotiation. This trend is not necessarily a positive development for either publishers or libraries. I hope that collectively we won't wind up making the purchase of electronic journals the painful equivalent of buying a car ("How about some rust proofing and an extended warranty?").

There is and will continue to be active market feedback and participation on pricing. The most obvious feedback is a refusal to buy, either because the price is too high (the price-value trade-off is not there) or because of other terms and conditions associated with the deal. Other feedback will come via negotiation and public market debates. Over time, electronic journal pricing will begin to settle into well-understood patterns and principles. At the moment, however, there are almost as many definitions and models as there are publishers and intermediaries. One need only note the recent discussions on the e-list on library licensing moderated by Ann Okerson of Yale University to understand that we are all in the early stages of these processes. An early 1997 posting gave a rather lengthy list of pricing permutations.

End User Purchasing

If we talk of pricing and "user acceptance," an immediate question is: who is the user? Is it the end user or is it the person paying the bill, if they are not one and the same? We presume that the intention was to reflect the judgments made by end users when those end users are also the ones bearing the economic consequences of their decisions. In academic information purchasing (as with consumer


149

purchasing), the end user has traditionally been shielded from the full cost (often any cost) of information. Just as newspapers and magazine costs are heavily subsidized by advertising, and radio and television revenues (excluding cable) are totally paid by advertisers, so do academic journal users benefit from the library as the purchasing agent.

In connection with the design of its new Web journal database and host service, ScienceDirect, Elsevier Science in 1996 held a number of focus groups with scientists in the United States and the United Kingdom. Among the questions asked was the amount of money currently spent personally (including from grant funds) annually on the acquisition of information resources. The number was consistently below $500 and was generally between $250 and $400, often including society dues, which provided journal subscriptions as part of the dues. There was almost no willingness to spend more money, and there was a consistent expectation that the library would continue to be the provider of services, including new electronic services.

This finding is consistent with the results of several years of direct sales of documents through the (now) Knight-Ridder CARL UnCover service. When it introduced its service a few years ago, UnCover had expected to have about 50% of the orders coming directly from individuals, billed to their credit cards. In fact, as reported by Martha Whitaker of CARL during the 1997 annual meeting of the Association of American Publishers, Professional/Scholarly Publishing Division in February, the number has stayed at about 20% (of a modestly growing total business).

From their side, libraries are concerned that the user has little or no appreciation of the cost to the library of fulfilling their users' requests. In two private discussions in February of 1997, academic librarians told me of their frustration when interlibrary loan requests are made, the articles procured, and the requesters notified, but then the articles are not picked up. There is a sense that this service is "free," even though it is well-documented (via a Mellon study) that the cost is now more than $30 per ILL transaction.

In this context, discussions with some academic librarians about the introduction of electronic journal services have not always brought the expected reactions. It had been our starting premise that electronic journals should mimic paper journals in certain ways, most notably that once you have paid the subscription, then you have unlimited use within the authorized user community. However, one large library consortium negotiator has taken the position that such an approach may not be desirable, that it may be better to start educating users that information has a cost attached to it.

Similarly, other librarians have expressed concern about on-line facilities that permit users to acquire individual articles on a transactional basis from nonsubscribed titles (e.g., in a service such as ScienceDirect ). While the facilities may be in place to bill the end user directly, the librarians believe the users will not be willing to pay the likely prices ($15-25). Yet, if the library is billed for everything, either


150

the cost will run up quickly or any prepaid quota of articles will be used equally rapidly. The notion that was suggested was to find some way to make a nominal personal charge of perhaps $1 or $2 or $3 per transaction. It was the librarians' belief that such a charge would be enough to make the user stop and think before ordering something that would result in a much larger ultimate charge to the library.

The concern that demand could swamp the system if unregulated is one that would be interesting to test on a large scale. While there have been some experiments, which I will describe further below, we have not yet had sufficient experience to generalize. Journal users are, presumably, different from America Online customers, who so infamously swamped the network in December 1996 when pricing was changed from time-based to unlimited use for $19.95 per month. Students, faculty, and other researchers read journals for professional business purposes and generally try to read as little as possible. They want to be efficient in combing and reviewing the literature and not to read more and more without restraint. The job of a good electronic system is to increase that efficiency by providing tools to sift the relevant from the rest.

It is interesting to note that in a paper environment, the self-described "king of cancellations," Chuck Hamaker, formerly of Louisiana State University, reported during the 1997 mid-winter ALA meeting that he had canceled $738,885 worth of subscriptions between 1986 and 1996 and substituted free, library-sanctioned, commercial document delivery services. The cost to the library has been a fraction of what the subscription cost would have been. He now has about 900 faculty and students who have profiles with the document deliverer (UnCover) and who order directly, on an unmediated basis, with the library getting the bill. He would like to see that number increase (there are 5,000 faculty and students who would qualify). It will be interesting to see if the same pattern will occur if the article is physically available on the network and the charge is incurred as a result of viewing or downloading. Will the decision to print be greater (because it is immediate and easy) than to order from a document delivery service?

This question highlights one of the issues surrounding transactional selling: how much information is sufficient to ensure that the article being ordered will be useful? Within the ScienceDirect environment we hope to answer this question by creating services specifically for individual purchase that offer the user an article snapshot or summary (SummaryPlus), which includes much more than the usual information about the article (e.g., it includes all tables and graphs and all references). The summary allows the user to make a more informed decision about whether to purchase the full article.

Tulip (The University Licensing Program)

Elsevier Science has been working toward the electronic delivery of its journals for nearly two decades. Its early discussions with other publishers about what became


151

ADONIS started in 1979. Throughout the 1990s there have been a number of large and small programs, some experimental, some commercial. Each has given us some knowledge of user behavior in response to price, although in some cases the "user" is the institution rather than the end user. The largest experimental program was TULIP (The University LIcensing Program).

TULIP was a five-year experimental program (1991-1995) in which Elsevier partnered with nine leading U.S. universities (including all the universities within the University of California system) to test desktop delivery of electronic journals. The core of the experiment was the delivery of initially 43, later an additional optional 40, journals in materials science. The files were bitmapped (TIFF) format, with searchable ASCII headers and unedited, OCR-generated ASCII full text. The universities received the files and mounted them locally, using a variety of hardware and software configurations. The notion was to integrate or otherwise present the journals consistently with the way other information was offered on campus networks. No two institutions used the same approach, and the extensive learning that was gained has been summarized in a final report (available at http://www.elsevier.com/locate/TULIP ).

These are a few relevant observations from this report. First, the libraries (through whom the experiment was managed) generally chose a conservative approach in a number of discretionary areas. For example, while there was a document delivery option for titles not subscribed to (each library received the electronic counterparts of their paper subscriptions), no one opted to do this. Similarly, the full electronic versions of nonsubscribed titles were offered at a highly discounted rate (30% of list) but essentially found no takers. The most frequently expressed view was that a decision had been made at some time not to subscribe to the title, so its availability even at a reduced rate was not a good purchasing decision.

Second, one of the initial goals of this experiment was to explore economic issues. Whereas the other goals (technology testing and evaluating user behavior) were well explored, the economic goal was less developed. That resulted perhaps from a failure in the initial expectations and in the experimental design. From our side as publisher, we were anxious to try out different distribution models on campus, including models where there would be at least some charge for access. However, the charging of a fee was never set as a requirement, nor were individual institutions assigned to different economic tests. And, in the end, all opted to make no charges for access. This decision was entirely understandable, because of both the local campus cultures and the other issues to be dealt with in simply getting the service up and running and promoting it to users. However, it did mean that we never gathered any data in this area.

From the universities' side, there was a hope that more progress would be made toward developing new subscription models. We did have a number of serious discussions, but again, not as much was achieved as might have been hoped for if the notion was to test a radical change in the paradigm. I think everyone is now more experienced and realizes that these issues are complex and take time to evolve.


152

Finally, the other relevant finding from the TULIP experiment is that use was very heavily related to the (lack of) perceived critical mass. Offering journals to the desktop is only valuable if they are the right journals and if they are supplied on a timely basis. Timeliness was compromised because the electronic files were produced after the paper-a necessity at the time but not how we (or other publishers) are currently proceeding. Critical mass was also compromised because, although there was a great deal of material delivered (11 GB per year), materials science is a very broad discipline and the number of journals relevant for any one researcher was still limited. If the set included "the" journal or one of the key journals that a researcher (or more likely, graduate student) needed, use was high. Otherwise, users did not return regularly to the system. And use was infrequent even when there was no charge for it.

Elsevier Science Experiences with Commercial Electronic Journals

Elsevier Electronic Subscriptions

The single largest Elsevier program of commercial electronic delivery is the Elsevier Electronic Subscriptions (EES) program. This is the commercial extension of the TULIP program to all 1,100 Elsevier primary and review journals. The licensing negotiations are exclusively with institutions, which receive the journal files and mount them on their local network. The license gives the library unlimited use of the files within their authorized user community. As far as we are aware, academic libraries are not charging their patrons for their use of the files, so there is no data relating user acceptance to price. At least one corporate library charges use back to departments, but this practice is consistent for all of its services and has not affected use as far as is known.

If you broaden the term user to include the paying institution, as discussed above, then there is clearly a relation between pricing and user acceptance. If we can't reach an agreement on price in license negotiations, there is no deal. And it is a negotiation. The desire from the libraries is often for price predictability over a multiyear period. Because prices are subject to both annual price increases and the fluctuation of the dollar, there can be dramatic changes from year to year. For many institutions, the deal is much more "acceptable" if these increases are fixed in advance.

The absolute price is also, of course, an issue. There is little money available, and high pricing of electronic products will result in a reluctant end to discussions. Discussions are both easier and more complicated with consortia. It is easier to make the deal a winning situation for the members of the consortium (with virtually all members getting access to some titles that they previously did not have), but it is more complicated because of the number of parties who have to sign off on the transaction.

Finally, for a product such as EES, the total cost to the subscribing institution


153

goes beyond what is paid to Elsevier as publisher. There is the cost of the hardware and software to store and run the system locally, the staff needed to update and maintain the system, local marketing and training time, and so on. It is part of Elsevier's sales process to explain these costs to the subscribing institution, because it is not in our interest or theirs to underestimate the necessary effort only to have it become clear during implementation. To date, our library customers have appreciated that approach.

Immunology Today Online (ITO)

Immunology Today is one of the world's leading review journals, with an ISI impact factor of more than 24. It is a monthly magazine-like title, with a wide individual and institutional subscription base. (The Elsevier review magazines are the exception to the rule in that they have significant individual subscriptions.) In 1994 Immunology Today's publishing staff decided it was a good title to launch also in an electronic version. They worked with OCLC to make it a part of the OCLC Electronic Journals Online collection, initially offered via proprietary Guidon software and launched in January 1995.

As with other journals then and now making their initial on-line appearance, the first period of use was without charge. A test bed developed of about 5% of the individual subscribers to the paper version and 3% of the library subscribers. In time, there was a conversion to paid subscriptions, with the price for the combined paper and electronic personal subscriptions being 125% of the paper price. (Subscribers were not required to take both the paper and electronic versions-but only three people chose to take electronic only.) At the time that OCLC ended the service at the end of 1996 and we began the process of moving subscribers to a similar Web version of our own, the paid subscription level for individuals was up to about 7.0% of the individual subscribers and 0.3% of the institutional subscribers.

The poor take-up by libraries was not really a surprise. At the beginning, libraries did not know how to evaluate or offer to patrons a single electronic journal subscription as opposed to a database of journals. (There is a steady improvement in this area, provoked in part by the journals-notably The Journal of Biological Chemistry -offered via High Wire Press.) How do you let people know it is available? How and where is it available? And is a review journal-even a very popular review journal-the place to start? It apparently seemed like more trouble than it was worth to many librarians.

In talking with die individual subscribers-and those who did not subscribe-it was clear that price was not a significant factor in their decisions. The functionality of the electronic version was the selling point. It has features that are not in the paper version and is, of course, fully searchable. That means the value was, in part, in efficiency-the ease with which you find that article that you recalled reading six months ago but don't remember the audior or precise month or the


154

ease with which you search for information on a new topic of interest. The electronic version is a complement to the paper, not a substitute. Those individuals who chose not to subscribe either were deterred by the initial OCLC software (which had its problems) and may now be lured back via our Web version or they have not yet seen a value that will add to their satisfaction with paper. But their hesitation has not been a question of price.

Journal of the American College of Cardiology

A project involving the Journal of the American College of Cardiology (JACC) was somewhat different. This flagship journal is owned by a major society and has been published by Elsevier Science since its beginning in the early 1980s. In 1995, in consultation with the society, Elsevier developed a CD-ROM version. The electronic design-style, interface, and access tools-is quite good. The cost of the CD-ROM is relatively low ($295 for institutions, substantially less for members), and it includes not only the journal but also five years of JACC abstracts, the abstracts from the annual meeting, and one year (six issues) of another publication, entitled ACC Current Reviews.

But the CD-ROM has sold only modestly well. Libraries, again, resist CD-ROMs for individual journals (as opposed to journal collections). And the doctors have not found it a compelling purchase. Is it price per se? Or is it the notion of paying anything more, when the paper journal comes bundled as part of the membership dues? Or is there simply no set of well-defined benefits? Clearly, the perceived value to the user is not sufficient to cause many to reach for a credit card.

GeneCOMBIS and Earth and Planetary Sciences Letters Online

I mentioned above that for some paper journals we have personal rates for individuals at subscribing institutions. This model has been extended to Web products related to those paper journals. In addition to the basic journal Gene, mentioned earlier, we publish an electronic section called GeneCOMBIS (for Computing for Molecular Biology Information Service ), which is an electronic-first publication devoted to the computing problems that arise in molecular biology. It publishes its own new papers. The papers are also published in hard copy, but the electronic version includes hypertext links to programs, data sets, genetics databases, and other software objects. GeneCOMBIS is sold to individuals for $75 per year, but only to those individuals whose institutions subscribe to Gene.

The same model is repeated with the electronic version of a leading earth sciences journal, Earth and Planetary Sciences Letters. The affiliated rate for the electronic version was introduced in 1997, with a nominal list price of $90 and a half-price offer for 1997 of $45. The electronic version provides on-line access to the journal and to extra material such as data sets for individuals affiliated with subscribing institutions.


155

It is too early to know whether this model will work. There certainly has been interest. In the case of GeneCOMBIS, its success will ultimately depend on the quality and volume of the papers it attracts. With EPSL Online, success will be determined by the perceived value of the electronic version and its added information. In neither case is price expected to have a significant effect on subscriptions. More likely, there will be pressure to extend the subscriptions to individuals working outside institutions that have the underlying paper subscriptions.

Experiences of Others

It is perhaps useful to note also some of the experiences of other publishers.

Red Sage Experiment

The Red Sage experiment started in 1992 and ran through 1996. It was initially started by Springer-Verlag, the University of California at San Francisco, and AT&T Bell Labs. Ultimately, several other publishers joined in, and more than 70 biomedical journals were delivered to the desktops of medical students and faculty at UCSF. As with TULIP, the experiment proved much harder to implement than had been originally hoped for. To the best of my knowledge, there were no user charges, so no data is available on the interplay of price and user acceptance. But what is notable is that there was greater critical mass of user-preferred titles among the Red Sage titles and, as a result, usage was very high. The horse will drink if brought to the right water.

Society CD-ROM Options

A second anecdote comes from discussions last year with a member of the staff of the American Institute of Physics. At least one of their affiliated member societies decided to offer members an option to receive their member subscriptions on CD-ROM rather than on paper, at the same price (i.e., the amount allocated from their member dues). The numbers I recall are that more than 1,500 members of the society took the option, finding the CD-ROM a more attractive alternative. I suspect that had they tried to sell the CD-ROM on top of the cost of the basic subscription, there would have been few takers. However, in this case, if you ignore the initial investment to develop the CD, the CD option saved the society money because it was cheaper on the incremental cost basis to make and ship the CDs rather than print and mail the paper version. In this case, the economics favored everyone.

BioMedNet

The final observation relates to an electronic service that started last year called BioMedNet. It is a "club" for life scientists, offering some full text journals, Medline, classified ads (the most frequently used service), marketplace features, news,


156

and other items. To date, membership is free. There are more than 55,000 members, and another 1,000 or more come in each week. The site is totally underwritten at the moment by its investors, with an expectation of charging for membership at some later date but with the plan that principal revenues will come from advertising and a share of marketplace transactions. The observation here is that while the membership is growing steadily, usage is not yet high per registered member. There is a core of heavy users, but it is rather small (2-3%). So, again, behavior and acceptance is not a function of price but of perceived value. Is it worth my time to visit the site?

Peak: The Next Experiment

As was mentioned above, the aspect of the TULIP experiment that produced the least data was the economic evaluation. One of the TULIP partners was the University of Michigan, which is now also an EES subscriber for all Elsevier journal titles. As part of our discussions with Michigan, we agreed to further controlled experimentation in pricing. Jeffrey MacKie-Mason, an associate professor of economics and information, has designed the experiment at the University of Michigan. MacKie-Mason is also the project director for the economic aspects of the experiment.

This pricing field trial is called Pricing Electronic Access to Knowledge (PEAK). Michigan will create a variety of access models and administer a pricing system. The university will apply these models to other institutions, which will be serviced from Michigan as the host facility. Some institutions will purchase access on a more or less standard subscription model. Others will buy a generalized or virtual subscription, which allows for prepaid access to a set of N articles, where the articles can be selected from across the database. Finally, a third group will acquire articles strictly on a transactional basis. Careful thought has, of course, gone into the relationship among the unit prices under these three schemes, the absolute level of the prices, and the relationship among the pricing, the concepts of value, and the publishers' need for a return.

The experiment should begin in early 1998 and run at least through August 1999. We are all looking forward to the results of this research.

In Conclusion

Journal publishers have relatively little experience with offering electronic full text to end users for a fee. Most new Web products either are free or have a free introductory period. Many are now in the process of starting to charge (Science, for example, instituted its first subscription fees as of January 1997 and sells electronic subscriptions only to paper personal subscribers). However, it is already clear that a price perceived as fair is a necessary but not sufficient factor in gaining users. Freely available information will not be used if it is not seen as being a productive


157

use of time. Novelty fades quickly. If a Web site or other electronic offering does not offer more (job leads, competitive information, early reporting of research results, discussion forums, simple convenience of bringing key journals to the desktop), it will not be heavily used. In designing electronic services, publishers have to deal with issues of speed, quality control, comprehensiveness-and then price. The evaluation of acceptance by the user will be on the total package.


158

Chapter 9—
Electronic Publishing Is Cheaper

Willis G. Regier

Electronic publishing is cheaper than many kinds of publishing. Cheap electronic publishing proliferates newsletters, fanzines, vanity publishing, testimonials, political sniping, and frantic Chicken Littles eager to get the word out. Cheaper publishing has always meant more publishing. But students, scholars, and libraries complain that there is already an overproduction of academic writing. Electronic publishing would just make matters worse, unless it comes with additional features to manage its quantity. Scholarly publishing is prepared to enter electronic publishing, but will not let go of print. Why? Because the demands of scholars and libraries for enhanced electronic publishing make it more expensive.

Electronic publishing comes with a long menu of choices: differing speeds of access, adjustable breadth and depth of content, higher or lower visibility, flexibility, durability, dependability, differentiation, and ease of use. In such a field of choices, there is not a basic cost or an optimum one or an upper limit. Until the wish for more and the desire to pay less find equilibrium, there will be discomfort and hesitation in the shift from paper to ether.

At present, most mainstream digital publications remain dependent on print, either as a publication of record, as with most scholarly journals, or as a nexus for electronic sites, as with the Web sites for Wired, numerous newspapers and magazines, publishers of all stripes, book clubs, and booksellers. In this parallelpublishing environment, print costs remain in place; the costs of mounting and maintaining a digital presence are added on.

Some publishers have established Web sites, with little expectation of recovering those added costs, in order to maintain an up-to-date profile, to market directly to customers, and to be sure that when and if the Web market matures, they will be ready to compete for it.

Those who declare that electronic publishing is cheaper than print focus chiefly


159

on perceived savings in reproduction and distribution. Once the first copy is prepared, its reproduction and transmission reduce or eliminate the costs of printing, paper, ink, packaging, shipping, spoilage, and inventory. The manufacturing cost of a typical print journal in the humanities, for example, consumes about 50% of the journal's operating budget, and shipping and warehousing can eat up another 10%. Such costs are incidental in the electronic environment.

But electronic publishing adds numerous new costs to preparation of the first copy. Further, the savings enjoyed by the publisher are made possible only if the end user, whether a library or an individual, has also invested a hefty sum in making it possible to receive the publication. Both the scholarly publisher and the end user alike are dependent upon even greater costs being born by colleges and universities.

As costs became more routine for Project MUSE, Marie Hansen calculated that the additional costs for preparing parallel print and electronic journals is about 130% of the cost of print only. Even if print versions were dropped, the costs to produce the first copy ready for mounting on a server would be as high as 90% of the cost of a paper journal.[1] The cost savings for printing, storage, shipping, and spoilage are substantial, but in the digital realm they are replaced by the costs of system administration, content cataloging, tagging, translating codes, checking codes, inserting links, checking links, network charges, computer and peripherals charges, and additional customer service. The susceptibility of the Internet for revision and its vulnerability to piracy impose still other additional costs.[2]

There are also high costs for acquisitions. It has taken longer than expected to negotiate contracts with journal sponsors, to obtain permissions, and to acclimate journal editors to the steps required for realizing the efficiencies of the digital environment. Electronic editors play fast and loose with copyright, always waving the banner of "fair use" while blithely removing copyright notices from texts and images. Explaining to electronic editors why copyright is in their best interest, and thus worthy of observance, has been just one time-consuming task. As Project MUSE matures, we see more clearly the costs of rearing it.

The Supra of the Infra

The costs of building a university infrastructure are enormous. The Homewood campus at Johns Hopkins is home to 5,200 students, faculty, and staff who want connections to the Internet. The start-up costs for rewiring the campus for UTPs (Unshielded Twisted Pairs)-at a rate of about $150 per connection-would have been impossibly high for the university if not for $1 million in help from the Pew Trust. According to Bill Winn, formerly associate director for academic computing at the Hopkins, it costs $20 per person per month to connect to the campus network. The network itself costs $1 million per year to maintain and an additional $200,000 to support PPP (point-to-point protocol) connections. The annual


160

bill to provide Internet access to the 900 students who live off-campus is an additional $200,000. The fee to the campus's Internet service provider for a 4-megabit per-second Internet link, plus maintenance and management, costs the university about $50,000 per year.

Students, Winn says, require high maintenance: if their connections are insecure, it is often because the connections have been ripped from the wall. Last year, students in engineering attempted to install a software upgrade for a switch that exceeded their wildest dreams: it shut down the university's system for more than a week. That adds up to about $20,000 of lost Internet access, not to mention the costs of repair.

In 1996, Johns Hopkins University budgeted $70,000 for hardware maintenance and $175,000 for hardware upgrades, chiefly to handle rapidly increasing traffic. The million-dollar budget supports a staff of three technicians, an engineer, a software analyst, and a director for networking. Their skills are in high demand, the salaries they can command are rising rapidly, and they are notoriously hard to retain.

A $15-to $20-per-month access charge is comparable to other campuses elsewhere in the United States. When it costs $180 to $240 per person per year to link a computer to the Internet, a university's administration confronts a huge recurring cost. And the costs go deeper: it is typical for each academic department to bear most of the costs for its own infrastructure, and often some department systems are incompatible with others. In order to make an initial investment worthwhile, expensive investments must be made regularly: upgrades, peripherals, database access fees, consultants, and specialized software. It is no wonder that many colleges have second thoughts about their level of commitment to Internet access.

To some extent, electronic publishers are stymied by the lag between the Internet's ability to produce and its readers' ability to receive. The lag bears a price tag, and so does any effort to close it. Some institutions cannot or will not pay, most state governments cannot pick up the bill, and the federal government is increasingly reluctant to reserve space or investment for scholarly networking. It becomes a matter for the publisher and the market to decide.

Optimum Optimism

Digital prophets soothsay that electronic publishing will exacerbate monopolies and class divisions, or that a slow, steady spread of access will lower costs and promote democratization. In 1951 a new technology led Theodor Adorno to predict a publishing revolution: "In a world where books have long lost all likeness to books, the real book can no longer be one. If the invention of the printing press inaugurated the bourgeois era, the time is at hand for its repeal by the mimeograph, the only fitting, the unobtrusive means of dissemination."[3] By contrast, Mario Morino, founder of the Legent Corporation, electrifies campus audiences


161

by asking, "Which corporation will be the first to acquire a university?"[4] Costs are not everything. Even if they were, the Internet is full of threads on the inconsistent costs of access from place to place. If the digital revolution is a revolution rather than a colossal marketing scheme, it is because so many people and institutions are involved and invested.[5]

It may be that computers will be as ubiquitous as television sets and an Internet connection as cheap as a telephone,[6] but when I look at the role of the Internet in higher education, I see higher costs and foresee only more differentiation between universities based upon their ability to pay those costs. The conversion from print to pixels is not merely an expensive change of clothes: it is an enormous expansion of capability. The chief reason that scholarly electronic publishing costs more than print is that it offers more, much more, and students, faculty, and libraries want all of it.

Under the domain plan that Project MUSE, JSTOR, ARTFL, and other experiments are refining, electronic publishing achieves no less than seven advances in scholarly transmission: (1) instead of a library maintaining one copy of a work that can be read by one person at one time, the work can now be read by an entire campus simultaneously; (2) instead of having to search for a location and hope that a work is not checked out or misshelved, a user can find the full text at the instant it is identified; (3) the work can be read in the context of a large and extensible compilation of books and journals, including back issues, each as easily accessible as the first; (4) the work is capable of being transformed without disturbing an original copy; pages can be copied without being ripped out; copies can be made even if a photocopier is jammed or out of toner; (5) the work can be electronically searched; (6) there is no worry about misplacing the work or returning it by a due date; and (7) the electronic library can be open all night every day of the year. The increased value, if offered by a corresponding increase in price, permits libraries to spend a little more to be able to acquire much more: more content, more access, more use. Librarians pay close attention to what they pay for and many are willing to purchase ambitious electronic publishing projects. Project MUSE has already attracted 100 library subscribers who previously subscribed to no Johns Hopkins print journals, including libraries in museums and community colleges (see Figure 9.1).

If some claims for the digital revolution are laughably inflated, it is not for lack of information: the revolution has occurred with unprecedented self-consciousness and organizational care. That care comes from many sources. Foundation support has proved essential. The Association of American Publishers has led the way for standardization, defense of copyright, vigilance against piracy, and scrutiny of current and pending legislation. At Hopkins, Stanford, Chicago, and many other places, frank and frequent discussions between publishers and librarians have focused on the price and appeal of potential projects. Conversations with Jim Neal remind me that libraries are the original multimedium. For multiple reasons, librarians' reactions to the systemic costs of digitalization are immediately relevant


162

figure

Figure 9.1.
Project MUSE Subscription Base

to publishing decisions. Many libraries are asked to acquire extraordinarily expensive databases without a clue about the relationship between price and actual costs, but partnering libraries know better.

For Project MUSE, the greatest cost is for personnel. For decades, it has been possible to maintain a journals program staffed by literate and dedicated people; MUSE employees also have to be adept with computers, software, protocols, and platforms. To raise MUSE from infancy, its employees must also be creative, patient, resourceful, and endowed with heroic stamina. Because their jobs require higher and higher levels of education and technical skill, starting positions are more expensive. Disregarding administrative costs, the staff of MUSE cost about 20% more per capita per month than the staff of print journals, and the differential is rising.

We are just beginning to understand the costs of hiring, training, and retaining qualified staff. Because the skills of the Project MUSE team are pioneering, those


163

who succeed are subject to recruitment raiding for higher salaries. Due to the inordinate pressures put upon them-the stress of tight schedules, the frustrations of downtime, the frictions of incompatible programming and opposed ideas-these young people are prone to rapid burnout.

Excluding independent contractor costs, personnel costs account for 46% of the start-up and maintenance costs for Project MUSE. Including independent contractor costs, which are themselves chiefly a matter of personnel, that percentage rises to 59%.

The second-largest expense has been hardware, accounting for 12% of total costs. Third is rent, at 3.3%. Fourth, surprisingly, has been travel, requiring 2.9% of investment. The travel budget is a consequence of the need to parlay and negotiate on every frontier: with the learned societies and editorial boards that run the journals, with the librarians who buy them, and with editors who contemplate moving their journals to MUSE. In the first two years of MUSE's development, our efforts to build MUSE were distracted by the novelties of the Internet-training staff, dealing with journal sponsors, conversing with libraries-each a task as vital as the selection of software or the conversion of codes. Marketing was kept to a minimum until MUSE had a complete package to deliver. With the completion of the 40-journal base in December 1996, Hopkins is now in high gear marketing MUSE. Travel and exhibits will have higher costs as MUSE strives to attract a subscription base strong enough to become self-supporting.

The electronic Market

Marketing on the Web is a different creature than marketing via print or radio, because it must contend with misinformation and with building an audience. Misinformation about an electronic site shows up in the same search that finds the site itself and may require quick response. MUSE responds readily enough to the Internet's search engines, but only if the person is searching. Even then, the searcher can read text only if the searcher's library has already subscribed. At the December 1996 Modern Language Association exhibit, about half the persons who expressed their wish that they could subscribe to MUSE belonged to universities that already did, but the scholars didn't know it. With usage data looming as a subscription criterion, we cannot rest after a subscription is sold; we still have to reach the end user and solicit use. Otherwise scholars and libraries alike will be unable to determine the value of what is available on-line.

The marketplace itself is changing. Most conspicuously, the unexpected formation of library consortia has reshaped many a business plan. Expectations of library sales have often hung fire while libraries consorted, but in the long run it is likely that by stimulating these consortia, electronic publishing will have served an important catalytic function for discovering and implementing many kinds of efficiencies.

The Net market is enormous and enormously fragmented.[7] In the next year there will be numerous marketing experiments on the Web. New and improved


164

tools emerge every month that will help us reply to scholars with specific requests, complaints, and inquiries. Publishers are cautiously optimistic that electronic marketing will prove more advantageous than bulk mail, and it will certainly be cheaper. Already most university presses have their catalogs on-line and many are establishing on-line ordering services.

Customer service is another high cost-at present, much higher than for print journals. Today it takes one customer service agent to attend to 400 Project MUSE subscriptions, while a customer service agent for print journals manages about 10,000 subscriptions. But the future offers bright hope. In February 1997, our customer service agent for Project MUSE sent an e-mail message to 39 pastdue subscribers to MUSE who were not with a consortium. Within 24 hours of sending the message, she received 29 responses to it, and 4 more arrived the next day. Each thanked her for sending the letter, and all 33 renewed for the year. Here the advantages of on-line communication are obvious and immediate.

There are also costs that are difficult or impossible to track or quantify, like intellectual costs. It is these costs that have emerged as the next vexed problem in the development of electronic scholarly resources. The problem has three prongs.

One is scholarly skepticism about the value of electronic publishing for tenure and promotion. Rutgers University has put a policy in place; the University of Illinois and Arizona University are in the process of setting their policies. Like everything else in the digital environment, these policies will likely need frequent change.

The fluidity of the Web, gushing with nautical metaphors, often seems a murky sea. Journal editors are anxious about the futures of their journals and hesitant about entrusting them to a medium as fleeting as electricity. Well aware of past losses to flood and fire, scholars prefer durable media. This preference is firmly based: scholarship studies its own history, its history is full of ideas that took years to hatch, and the Web seems unstable, engulfing, and founded on a premise of perpetual replacement. Scholars care about speed, but they care more that their work endures; that it is a heritage; that if they care for it well, it will live longer than they do. Scholars who use the Net frequently encounter defunct URLs, obsolete references, nonsense, wretched writing, and mistakes of every kind. Ephemera appear more ephemeral on-screen. Chief among the concerns expressed by librarians interested in purchasing an electronic publication is whether the publication is likely to be around next year and the year after.

The third prong is the sharpest: will electronic publishing be able to recover the operating costs of producing it, the costs of editing, of maintaining a membership, of defending a niche? If journals are to migrate to electronic formats, they will have to be able to survive there and survive the transition, too: the current competition is part endurance, part sprint. Since parallel publishing in print and on-line costs more, library budgets will either have to pay more to sustain dual-format journals, choose between them, or cut other purchases.

In the short term, there is reassurance in numbers. Rather than erode reader


165

figure

Figure 9.2.
Types of Journal Subscriptions 1997

and subscription base, electronic versions of journals may increase them (see Figure 9.2). Even if paper subscriptions dwindle, the increase in subscriptions and readership may last. Perhaps, perhaps. Means for cost recovery for each journal must also last, which is why different publishers are trying different pricing strategies.

Competition in the electronic environment is expensive and aggressive (a favorite book for Netizens is Sun Tzu's Art of War ).[8] Foundation assistance was essential for enabling university presses and libraries to enter the competition, but it is uncertain whether their publications can compete for very long when founda-


166

tion support ends. Scholarship has deep reservoirs of learning and goodwill, but next to no savings; one year of red ink could wipe out a hundred-year-old journal. Unless journal publishers and editors migrate quickly and establish a system to recover costs successfully, the razzle-dazzle of paper-thin monitors will cover a casualty list as thick as a tomb.

The risks of migration increase the costs of acquisition. Publishers and their partners are trying to determine what costs must be paid to attract scholars to contribute to their sites. It is obvious that a moment after a scholar has completed a work, a few more keystrokes can put the work on the Web without bothering a publisher, librarian, faculty committee, or foundation officer. Electronic publishing is cheaper than print, if you rule out development, refereeing, editing, design, coding, updating, marketing, accounting, and interlinking. Further, there are numerous scholars who believe they should be well paid for their scholarship or their editing. Stipends paid by commercial publishers have raised their editors' financial expectations, which in turn exacerbate the current crisis in sci-tech-med journals. Retention of such stipends will devour savings otherwise achieved by digitalization.

How much added value is worthwhile? Competitive programs are now testing the academic market to see whether scanned page images are preferable to HTML, whether pricing should sequester electronic versions or bundle them into an omnibus price, what degree of cataloging and linking and tagging are desired, what screen features make sense, and a realm of other differentia, not least of which is the filtering of the true from the spew. We expect to see the differences between the costs and prices of scientific and humanities journals to grow larger; with library partners scrutinizing real usage and comparative costs, we expect these differences will be less and less defensible. We expect to see gradual but salutary changes in scholarship itself as different disciplines come to terms with the high visibility of electronic media. We expect to see shifts in reformation of publishers' reputation, with all that a reputation is worth, as professionally managed electronic media distance their offerings from the Web sites of hobbyists, amateurs, and cranks. Finally, we expect to see increasing academic collaboration between and within disciplines. As electronic publishing increases its pressure on hiring, evaluation, tenure, and promotion, the certification and prestige functions of publishers will increasingly depend on their attention to the emerging criteria of e-publishing, in which costs are measured against benefits that print could never offer. Because students, faculty, and libraries want these benefits, scholarly electronic publishing is not cheaper.


168

Chapter 10—
Economics Of Electronic Publishing—Cost Issues
Comments on Part Two

Robert Shirrell

I have a few brief comments on the very interesting and stimulating chapters prepared by Janet Fisher (chapter 5), Malcolm Getz (chapter 6), and Bill Regier (chapter 9). I'll focus on their presentations of publisher costs. I'll add a few words about the electronic publishing efforts we have undertaken at the University of Chicago Press and contrast the model we have adopted with the ones that have been mentioned earlier.

Janet Fisher, from the MIT Press, gave us costs related both to the electronic journals that they are publishing and to two of MIT's print journals. In Table 10.1, I've reworked the numbers and computed "first-copy" costs on a per-page basis. What I mean by first-copy cost is simply the cost for editing, typesetting, and producing materials that can subsequently be duplicated and distributed to several hundred or several thousand subscribers. The total first-copy costs for electronic journals at MIT Press range from approximately $15 to $56 per page, and the total first-copy costs for the print journals are $22 and $24 per page. In computing these costs, I did not include what Janet labeled as the d G&A costs, the general and administrative costs, but I did include the portion of the cost of the Digital Projects Lab (DPL) that is related to first-copy production.

Several things here are important and worth comment. First, the DPL cost, that is, the cost of preparing an electronic edition after editing and typesetting, is a significant portion of the total. Although the percentage varies between 13% and 62% (as indicated in Table 10.1), the cost is close to 50% of the total first-copy costs of publishing these particular electronic journals.

This breakdown raises the questions, why are these costs so high? and, will they decline over time? I think the expense reflects the fact that there are handcrafted aspects of electronic production, which are expensive, and substantial hardware costs that need to be allocated among a relatively small number of publications and pages. As for the future, the per-page costs at the DPL can be expected to go


169
 

TABLE 10.1. MIT Press First-Copy Cost per Page

Electronic Journals

 

JFLP

SNDE

CJTCS

JCN

MS editing

               

7.25

4.57

 

Composition

               

18.20

8.48

 

Subtotal

7.87

25.45

13.05

49.00

Lab

7.68

18.42

21.31

7.00

Total

15.55

43.87

34.36

56.00

Lab %

49%

42%

62%

13%

Print Journals

 

NC

COSY

   

MS editing

6.46

6.93

   

Composition

16.04

17.57

   

Subtotal

22.50

24.50

   

Lab

           

                 

   

Total

22.50

24.50

   

down as pages increase and new processing techniques are developed, but even if they do go down to 40%, the totals for digital production are going to be a significant portion of the publisher's total cost. This is important.

Another point about these costs. Note that the total first-copy costs of the electronic journals average $40-$43 per page, whereas those for the print journals average about $23 per page-roughly a $20 difference in the costs. For a 200-page issue, that would amount to about $4,000. That is, it is $4,000 more expensive to produce materials for the reproduction and distribution of 200 pages in electronic form than for the reproduction and distribution of 200 pages in hard-copy form.

If $4,000 will pay for printing and distribution of a 200-page issue to 500 subscribers, which is a reasonable estimate, then MIT can produce a print edition less expensively than an electronic edition when the distribution is under 500. That conclusion is important: at this point, for the MIT Press, it's cheaper to produce journals in paper than to do them electronically, if the circulation is small, i.e., less than 500. That situation may evolve over time, but right now, the additional costs of electronic processing are not offset by sufficiently large reductions in printing and distribution costs.

Now let me turn to the paper by Malcolm Getz. Malcolm presented some numbers from the American Economic Association (AEA), and the numbers in Table 10.2 are approximately the same as the ones he presented. I have also presented numbers from the University of Chicago Press for 37 of our titles. That


170
 

TABLE 10.2. Cost Breakdown by Percentage for AEA (3 Journals) and University of Chicago Press (37 Journals)

 

AEA

Press

Edit

36%

32%

Typeset

13%

10% (to 18%)

Print and mail

23%

24%

Other

27%

34%

NOTE: Percentages may not sum to 100% due to rounding.

figure is not the total of our serial publications; we publish 54 in all. The figure excludes The Astrophysical Journal, our largest single title, and a number of journals that we publish in cooperation with other not-for-profit organizations. The journals that are included are principally titles in the humanities and social sciences, with some in medicine and biology.

The breakdown of costs for the Press and for the AEA is quite similar. Editorial costs are 36% for AEA and 32% for the Press. Typesetting is 13% for AEA and 10% at the Press, though it varies substantially by journal. Distribution costs are similar. Overall, these numbers are very close, and they are, it seems to me, reasonable numbers industry-wide.

It is possible to provide a more detailed breakdown of the numbers for the Press, and in Table 10.3, I have broken down the 32% that is related to editorial into the portion that is related to the peer review of manuscripts, which is 22% of the total, and the portion that is related to manuscript editing, which is 10% of the total. Because of the manner in which some of the Press's costs are recorded, the number I have shown for manuscript editing may be somewhat higher, but the breakdown between peer review and manuscript editing is a reasonably accurate division of costs in traditional journal publishing. I think this revised breakdown of costs provides an interesting context for reviewing the way in which costs evolve in an electronic publishing environment, and I would like to turn now to make a few remarks about the possibilities for cost restructuring and cost reduction.

The electronic publishing model we have been discussing is structured so that, basically, electronic costs are add-on costs: you do everything you do in print, and then you do some more. I have outlined the process in Table 10.4. The process includes the traditional functions of peer review, manuscript editing, typesetting, and printing and mailing and adds new functions and new costs for the derivation of electronic materials from the typesetting process and for the management of electronic services.

In this model, as for the vast majority of journals, so long as we continue to produce both print and electronic editions, the total cost is not going to decrease. The reason is that, even if a significant portion of the subscribers convert from paper to electronic editions, the additional costs for electronic processing are not offset by reductions in the printing and distribution costs. As we all know, the marginal


171
 

TABLE 10.3. Cost Breakdown by Percentage for University of Chicago Press (37 Journals)

Edit

 

Peer review

22%

MS edit

10%

Typeset

10% (to 18%)

Print and mail

24%

Other

34%

 

TABLE 10.4. Cost Breakdown for Electronic Publishing, Model One

Edit

   

Peer review

22%

 

MS edit

10%

 

Typeset

10%-18%

 

Derive e-materials

 

New cost

Print and mail

24%

 

Other

34%

 

Manage e-services

 

New cost

cost of printing and mailing is small, much smaller than the average cost, and the additional costs for electronic processing are substantial. The consequence is that, in this model, electronic costs turn out to be added costs, costs in addition to the total that would exist if only a print edition were being produced.

This is exactly what is argued by Regier. He reported that for Project MUSE, the electronic publishing venture of the Johns Hopkins University Press, the total costs for both print and electronic editions were about 130% of the print-only costs. This increase is significant, and I believe it is representative of efforts that are based on deriving electronic materials from typesetting files, as a separate stage of production, undertaken subsequent to the typesetting process.

I would now like to discuss another approach to electronic publishing, another way to obtain electronic materials and to do electronic dissemination. This process is quite different from the one I have just described, with different cost structures and different total costs. The process is outlined in Table 10.5. In this process, data are converted to SGML form in the earliest stages of editing. Then the SGML database is used to derive both the typeset output for hard copy printing and the electronic materials for electronic dissemination.

This process generates costs quite different than those for the model we looked at before. The costs are summarized in Table 10.6. Most important, there is a substantial increase in the cost at the beginning of the process, that is, in the conversion of data to SGML form and the editing of it in that format. SGML editing is not easy and it is not cheap. However, because manuscripts are extensively marked


172
 

TABLE 10.5. Process Analysis for Electronic Publishing, Model Two

Edit

Peer review

Data conversion to SGML

MS edit in SGML

Derive e-materials from SGML

Typeset from SGML

Print and mail

Other

Manage e-services

 

TABLE 10.6. Cost Analysis for Electronic Publishing, Model Two

Edit

 

Peer review

 

Data conversion to SGML

Additional cost

MS edit in SGML

Additional cost

Derive e-materials from SGML

New cost, less than for Model One

Typeset from SGML

Reduced cost

Print and mail

 

Other

 

Manage e-services

New cost

up and formatted in this process, a typeset version can be derived from the SGML database inexpensively, and of course, the electronic files for distribution in electronic form are also straightforward and inexpensive to derive. Overall, the additional costs for conversion and editing are being offset in large part by reductions in typesetting costs.

This process is the approach that we have undertaken with The Astrophysical Journal at the University of Chicago Press and are now implementing for other publications. The Astrophysical Journal, sponsored by the American Astronomical Society, is the world's leading publication in astronomy, issuing some 25,000 pages each year, in both print and on-line editions. The conclusions we have reached in our efforts for that journal are that a reduction in the typesetting costs can offset other additional costs and that this method of producing the journal is less expensive than any alternative way of generating the electronic materials that we want to obtain for the on-line edition.

These general conclusions are probably applicable to most scientific and technical journals, because this method, based on processing in SGML form, results in substantial reductions in the cost of typesetting tabular and mathematical matter. For those publications, we will be able to produce electronic editions for, at most,


173

10% more than the cost of producing print editions alone. In some cases it may be possible to produce electronic versions in addition to the print versions at no additional total cost.

Let me add one other point. Because we are converting manuscripts to SGML immediately and editing in SGML, we can obtain materials for electronic distribution much faster than in the traditional print model. In 1997 we published papers in the on-line edition of The Astrophysical Journal Letters 14 days after acceptance by the editor. That turnaround is possible because we obtain the electronic version immediately from our SGML database and do not derive it by postprocessing typesetting files.

In sum, this process will allow us, in certain circumstances, to publish complex scientific material in a sophisticated electronic version both less expensively and more rapidly than by employing alternative means. This sort of processing is an important alternative approach to electronic publishing.


175

PART THREE—
USE OF ELECTRONIC JOURNALS AND BOOKS: EMPIRICAL STUDIES


177

Chapter 11—
Analysis of JSTOR
The Impact on Scholarly Practice of Access to On-line Journal Archives

Thomas A. Finholt and JoAnn Brooks

Innovations introduced over the past thirty years, such as computerized library catalogs and on-line citation indexes, have transformed scholarly practice. Today, the dramatic growth of worldwide computer networks raises the possibility for further changes in how scholars work. For example, attention has focused on the Internet as an unprecedented mechanism for expanding access to scholarly documents through electronic journals (Olsen 1994; Odlyzko 1995), digital libraries (Fox et al. 1995), and archives of prepublication reports (Taubes 1993). Unfortunately, the rapid evolution of the Internet makes it difficult to accurately predict which of the many experiments in digital provision of scholarly content will succeed. As an illustration, electronic journals have received only modest acceptance by scholars (Kling and Covi 1996). Accurate assessment of the scholarly impact of the Internet requires attention to experiments that combine a high probability of success with the capacity for quick dissemination. According to these criteria, digital journal archives deserve further examination. A digital journal archive provides on-line access to the entire digitized back archive of a paper journal. Traditionally, scholars make heavy use of journal back archives in the form of bound periodicals. Therefore, providing back archive content on-line may significantly enhance access to a resource already in high demand. Further, studying the use of experimental digital journal archives may offer important insight into the design and functionality of a critical Internet-based research tool. This paper, then, reports on the experience of social scientists using JSTOR, a prototype World Wide Web application for viewing and printing the back archives of ten core journals in history and economics.

The Jstor System

JSTOR represents an experiment in the technology, politics, and economics of online provision of journal content. Details of JSTOR's evolution and development


178

are covered elsewhere in this volume (see chapter 7). At the time of this study, early 1996, the faculty audience for JSTOR consisted of economists, historians, and ecologists-reflecting the content of JSTOR at that time. This paper focuses on reports of JSTOR use shortly after the system became officially available at the test sites. Respondents included historians and economists at five private liberal arts colleges (Bryn Mawr College, Denison University, Haverford College, Swarthmore College, and Williams College) and one public research university (the University of Michigan). The core economics journals in JSTOR at the time of this study included American Economic Review, Econometrica, Quarterly Journal of Economics, Journal of Political Economy, and Review of Economics and Statistics. The core history journals included American Historical Review, Journal of American History, Journal of Modern History, William and Mary Quarterly, and Speculum. In the future, JSTOR will expand to include more than 150 journal titles covering dozens of disciplines.

Journal Use in the Social Sciences

To understand JSTOR use requires a general sense of how social scientists seek and use scholarly information. In practice, social scientists apply five main search strategies. First, social scientists use library catalogs. Broadbent (1986) found that 69% of a sample of historians used a card catalog when seeking information, while Lougee, Sandler, and Parker (1990) found that 97% of a sample of social scientists used a card catalog. Second, journal articles are a primary mechanism for communication among social scientists (Garvey 1979; Garvey, Lin, and Nelson 1970). For example, in a study of social science faculty at a large state university, Stenstrom and McBride (1979) found that a majority of the social scientists used citations in articles to locate information. Third, social scientists use indexes and specialty publications to locate information. As an illustration, Stenstrom and McBride found that 55% of social scientists in their sample reported at least occasional use of subject bibliographies, and 50% reported at least occasional use of abstracting journals. Similarly, Olsen (1994) found that in a sample of sociologists, 37.5% reported regular use of annual reviews. Fourth, social scientists browse library shelves. For instance, Lougee et al. and Broadbent both found that social scientists preferred to locate materials by browsing shelves. Sabine and Sabine (1986) found that 20% of a sample of faculty library users reported locating their most recently accessed journal via browsing. On a related note, Stenstrom and McBride found that social scientists used departmental libraries more heavily than the general university library. Finally, social scientists rely on the advice of colleagues and students. For example, various studies show that colleagues have particular value when searching for a specific piece of information (Stenstrom and McBride; Broadbent; Simpson 1988). Also, students working on research projects often locate background material that social scientists find useful (Olsen; Simpson). Simi-


179

larly, faculty report a valuable but infrequent role for librarians in seeking information (Stenstrom and McBride; Broadbent; Lougee et al.).

Computer-based tools do not figure prominently in the preceding description of how social scientists search for scholarly information. Results from previous studies show that the primary application of digital information technology for social scientists consists of computerized searching, which social scientists do at lower rates than physical scientists but at higher rates than humanists (Lougee et al. 1990; Olsen 1994; Broadbent 1986). Lougee et al. and Olsen both report sparse use of on-line catalogs by social scientists. Evidence of the impact of demographic characteristics on use of digital resources is mixed. For example, Lougee et al. found a negative correlation between age and use of digital information technology, while Stenstrom and McBride (1979) found no correlation. Finally, in a comparison of e-mail use by social scientists and humanists, Olsen found higher use rates among the social scientists, apparently correlated with superior access to technology.

In terms of journal access, previous studies indicate that economics faculty tend to subscribe to more journals than do faculty in other social science disciplines (Simpson 1988; Schuegraf and van Bommel 1994). Journal subscriptions are often associated with membership in a professional society. For example, in their analysis of a liberal arts faculty, Schuegraf and van Bommel found that 40.9% of faculty journal subscriptions-including 12 of the 15 most frequently subscribed-to journals-came with society memberships. Stenstrom and McBride (1979) found that membership-related subscriptions often overlapped with library holdings. However, according to Schuegraf and van Bommel, other personal subscriptions included journals not held in library collections. In terms of journal use, Sabine and Sabine (1986) found that only 4% of faculty in their sample reported reading the entire contents of journals, while 9% reported reading single articles, and 87% reported reading only small parts, such as abstracts. Similarly, at least among a sample of sociologists, Olsen (1994) found that all respondents reported using abstracts to determine whether to read an article. Having found a relevant article, faculty often make copies. For instance, Sabine and Sabine found that 47% of their respondents had photocopied the most recently read journal article, Simpson found that 60% of sampled faculty reported "always" making copies, and all the sociologists in Olsen's sample reported copying important articles.

Goals of this Study

The research described above consists of work conducted prior to the advent of the World Wide Web and widespread access to the Internet. Several recent studies suggest that Internet use can change scholarly practice (Finholt and Olson 1997; Hesse, Sproull, and Kiesler 1993; Walsh and Bayma 1997; Carley and Wendt 1991). However, most of these studies focused on physical scientists. A key goal of this study is to create a snapshot of the effect of Internet use on social scientists,


180

specifically baseline use of JSTOR. Therefore, the sections that follow will address core questions about the behavior of JSTOR users, including: (1) how faculty searched for information; (2) which faculty used JSTOR; (3) how journals were used; (4) how the Internet was used; and (5) how journal use and Internet use correlated with JSTOR use.

Method

Participants

The population for this study consisted of the history and economics faculty at the University of Michigan and at five liberal arts colleges: Bryn Mawr College, Denison University, Haverford College, Swarthmore College, and Williams College. History and economics faculty were targeted because the initial JSTOR selections drew on ten journals, reflecting five core journals in each of these disciplines. The institutions were selected based on their status as Andrew W. Mellon Foundation grant recipients for the JSTOR project.

Potential respondents were identified from the roster of full-time history and economics faculty at each institution. With the permission of the respective department chairs at each school, faculty were invited to participate in the JSTOR study by completing a questionnaire. No incentives were offered for respondents, and participation was voluntary. Respondents were told that answers would be confidential, but not anonymous due to plans for matching responses longitudinally. The resulting sample contained 161 respondents representing a response rate of 61%. In this sample, 46% of the respondents were economists, 76% were male, and 48% worked at the University of Michigan. The average respondent was 47.4 years old and had a Ph.D. granted in 1979.

Design and Procedure

Respondents completed a 52-item questionnaire with questions on journal use, computer use, attitudes toward computing, information search behavior, demographic characteristics, and JSTOR use. Respondents had the choice of completing this questionnaire via a telephone interview, via the Web, or via a hard-copy version. Questionnaires were administered to faculty at the five liberal arts colleges and to the faculty at the University of Michigan in the spring of 1996.

Journal Use Journal use was assessed in four ways. First, respondents reported how they traditionally accessed the journal titles held in JSTOR, choosing from: no use; at the library; through a paid subscription; or through a subscription received with membership in a professional society. Second, respondents ranked the journals they used in order of frequency of use for a maximum of ten journals. For each of these journals, respondents indicated whether they had a personal subscription to the journal. Third, respondents described their general use of


181

journals in terms of the frequency of browsing journal contents, photocopying journal contents, saving journal contents, putting journal contents on reserve, or passing journal contents along to colleagues (measured on a 5-point scale, where 1 = never, 2 = rarely, 3 = sometimes, 4 = frequently, and 5 = always). Finally, respondents indicated the sections of journals they used, including the table of contents, article abstracts, articles, book reviews, reference lists, and editorials.

Computer Use Computer use was assessed in three ways. First, respondents described their computer systems in terms of the type of computer (laptop versus desktop), the computer family (e.g., Apple versus DOS), the specific model (e.g., PowerPC), and the operating system (e.g., Windows 95). Second, respondents reported their level of use via a direct network connection (e.g., Ethernet) of the World Wide Web, e-mail, databases, on-line library catalogs, and FTP (measured on a 5-point scale, where 1 = never, 2 = 2-3 times per year, 3 = monthly, 4 = weekly, and 5 = daily). Finally, respondents reported their level of use via a modem connection of the Web, e-mail, databases, on-line library catalogs, and FTP (using the same scale as above).

Attitudes toward Computing Attitudes toward computing were assessed by respondents' reported level of agreement with statements about personal computer literacy, computer literacy relative to others, interest in computers, the importance of computers, confusion experienced while using computers, and the importance of programming knowledge (measured on a 5-point scale, where 1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree, and 5 = strongly agree).

Information Search Behavior Information search behavior was assessed in three ways. First, respondents indicated their use of general search strategies, including: searching/browsing on-line library catalogs; searching/browsing paper library catalogs; browsing library shelves; searching/browsing on-line indexes; searching/browsing paper indexes; browsing departmental collections; reading citations from articles; and consulting colleagues. Second, respondents described the frequency of literature searches within their own field and the frequency of on-line literature searches within their own field (both measured on a 5-point scale, where 1 = never, 2 = 2-3 times per year, 3 = monthly, 4 = weekly, and 5 = daily). Finally, respondents described the frequency of literature searches outside their field and the frequency of on-line literature searches outside their field (measured on the same 5-point scale used above).

Demographic Characteristics Respondents were asked to provide information on demographic characteristics, including age, sex, disciplinary affiliation, institutional affiliation, highest degree attained, and year of highest degree.

JSTOR Use Finally, JSTOR use was assessed in two ways. First, respondents reported whether they had access to JSTOR. Second, respondents described the


182

frequency of JSTOR use (measured on a 5-point scale, where 1 = never, 2 = 2-3 times per year, 3 = monthly, 4 = weekly, and 5 = daily).

Results

The data were analyzed to address five core questions related to the impact of JSTOR: (1) how faculty searched for information; (2) which faculty used JSTOR; (3) how journals were used; (4) how the Internet was used; and (5) how journal use and Internet use correlated with JSTOR use.

Information Searching

Table 11.1 summarizes data on how faculty searched for information. The proportion of faculty using the search strategies did not differ significantly by institution or discipline, with the exception of three strategies. First, the proportion of Michigan economists who reported browsing library shelves (46%) was significantly less than the proportion of five-college historians who used this strategy (86%). Second, the proportion of Michigan economists who reported searching card catalogs (14%) was significantly less than the proportion of five-college historians who used this strategy (39%). And finally, the proportion of Michigan economists who reported browsing departmental collections (48%) was significantly greater than the proportion of five-college historians who used this strategy (4%).[1]

Who Used JSTOR

Overall, 67% of the faculty did not use JSTOR,[2] 14% used JSTOR once a year, 11% used JSTOR once a month, and 8% used JSTOR once a week. None of the faculty used JSTOR daily. Table 11.2 summarizes JSTOR frequency of use by type of institution and discipline. A comparison of use by type of institution shows a higher proportion of JSTOR users at the five colleges (42%) than at the University of Michigan (27%). A further breakdown by discipline shows that the five college economists had the highest proportion of users (46%), followed by the Michigan economists (40%), the five-college historians (39%), and the Michigan historians (16%). One way to put JSTOR use into perspective is to compare this activity with similar, more familiar on-line activities, such as literature searching. Overall, 21% of the faculty did not do on-line searches, 25% searched once a year, 25% searched once a month, 25% searched once a week, and 4% searched daily. Table 11.3 summarizes data on the frequency of on-line searching by type of institution and discipline for the same faculty described in Table 11.2. A comparison of on-line searching by type of institution shows a higher proportion of on-line searchers at the five colleges (85%) than at the University of Michigan (76%). A further breakdown by discipline shows that the five-college economists had the highest proportion of searchers (89%), followed by the five-college historians (82%), and the Michigan economists and historians (both 76%).


183
 

TABLE 11.1. Percentage of Faculty by Search Strategy, Type of Institution, and Discipline (n = 151a )

 

University of Michigan

Five Colleges

Search Strategies

Economics (n = 44)

History (n = 54)

Economics (n = 25)

History (n = 28)

Use citations from related publications

84%

96%

100%

100%

Consult a colleague

93%

85%

96%

  89%

Search electronic catalogs for a known item

80%

89%

88%

  89%

Browse library shelves

46%a

83%

72%

  86%b

Browse electronic catalogs

57%

56%

80%

  79%

Use electronic indexes

59%

59%

84%

  64%

Use printed indexes

34%

57%

64%

  82%

Search card catalogs for a known item

14%a

32%

17%

  39%b

Browse departmental collections

48%a

11%

20%

    4%b

Browse card catalogs

2%

20%

24%

  25%

Note: Means with different subscripts differ significantly at p < .01 in the Tukey honestly significant difference test.

a Nine cases were unusable due to incomplete data.

 

TABLE 11.2. Percentage of Faculty by Frequency of JSTOR Use, Type of Institution, and Discipline (n = 147a )

 

University of Michigan

Five Colleges

Frequency of Use

Overall (n = 93)

Economics (n = 43)

History (n = 50)

Overall (n = 54)

Economics (n = 26)

History (n = 28)

neverb

73%

60%

84%

58%

54%

61%

once a year

12%

17%

8%

17%

15%

18%

once a month

9%

14%

4%

14%

19%

10%

once a week

6%

9%

4%

11%

12%

11%

daily

0%

0%

0%

0%

0%

0%

a Thirteen cases were unusable due to incomplete data.

b The "never" category also includes faculty who were unaware of JSTOR.

Figure 11.1 shows a plot of the cumulative percentage of faculty per institution who used JSTOR and who did on-line searches versus the frequency of these activities. For example, looking at the values plotted on the y-axis against the "Monthly" category shows that over three times as many Michigan faculty searched once a month or more (51%) compared with those who used JSTOR at least once a month (15%). Similarly, over two times as many of the five-college faculty searched once a month or more (62%) compared with those who used JSTOR at least once a month (25%). A further breakdown by discipline shows that


184
 

TABLE 11.3. Percentage of Faculty by Frequency of On-Line Searching, Type of Institution, and Discipline (n = 147a )

 

University of Michigan

Five Colleges

Frequency of Searches

Overall (n = 93)

Economics (n = 43)

History (n = 50)

Overall (n = 54)

Economics (n = 26)

History (n = 28)

never

24%

24%

24%

15%

11%

18%

once a year

25%

28%

22%

24%

16%

32%

once a month

25%

22%

28%

26%

34%

18%

once a week

23%

19%

26%

30%

35%

25%

daily

3%

7%

0%

6%

4%

7%

a Thirteen cases were unusable due to incomplete data.

over twice as many of the five-college economists searched once a month or more (73%) than used JSTOR at least once a month (31%), that over six times as many of the Michigan historians searched once a month or more (54%) than used JSTOR at least once a month (8%), that over twice as many of the five-college historians searched once a month or more (50%) than used JSTOR at least once a month (21%), and that over twice as many of the Michigan economists searched once a month or more (48%) than used JSTOR at least once a month (23%).

Journal Use

Table 11.4 summarizes how faculty used features of journals. Across all journal features, patterns of use were similar except in two areas. First, the proportion of Michigan historians who used article abstracts (31%) was significantly smaller than the proportion of Michigan economists (81%), five-college economists (89%), and five-college historians (61%) who used abstracts. Second, the proportion of Michigan economists who used book reviews (49%) was significantly smaller than the proportion of five-college historians (100%), Michigan historians (98%), and five college economists (85%) who used book reviews.

Overall, faculty in the sample reported that they regularly used 8.7 journals, that they subscribed to 4.1 of these journals, and that 2.2 of these journals were also in JSTOR. Table 11.5 summarizes journal use by institution and discipline. There were no significant differences in the number of journals used across institution and discipline, although Michigan historians reported using the most journals (8.9). There were also no significant differences across institution and discipline in the number of paid journal subscriptions among the journals used, although again Michigan historians reported having the most paid subscriptions (4.6). There was a significant difference in the number of journals used regularly by the economists that were also titles in JSTOR (M = 2.9) compared with those used by the historians ([M = 1.7], t [158] = 5.71, p < .01).


185

figure

Figure 11.1.
Cumulative percentage of on-line searchers versus JSTOR users, by frequency of use
and type of institution (n = 147)

Further examination of differences in use of journals shows a much greater consensus among the economists about the importance of the economics journals in JSTOR than among the historians about the history journals in JSTOR. For example, Table 11.6 shows the economists' ranking in order of use of the five economics journals chosen for JSTOR. The American Economic Review was cited among the top ten most frequently used journals by over 75% of both the Michigan and the five-college economists; the Journal of Political Economy was cited


186
 

TABLE 11.4. Percentage of Faculty by Use of Journal Features, Institution, and Discipline (n = 159a )

 

University of Michigan

Five Colleges

Journal Feature

Economics (n = 47)

History (n = 58)

Economics (n = 26)

History (n = 28)

Articles

96%

98%

100%

100%

Tables of contents

81%

86%

100%

96%

Bibliographies

60%

71%

89%

82%

Book reviews

49%b

98%a

85%a

100%a

Article abstracts

81%a

31%b

89%a

61%a

Editorials

13%

24%

35%

43%

Note: Means with different subscripts differ significantly at p < .01 in the Tukey honestly significant difference test.

a One case was unusable due to incomplete data.

 

TABLE 11.5. Number of Journals Used, Number of Paid Subscriptions, and Number of JSTOR Target Journals by Institution and Discipline (n = 160)

 

University of Michigan

Five Colleges

Journals Used

Economics (n = 48)

History (n = 58)

Economics (n = 26)

History (n = 28)

Total

8.6

8.9

8.4

8.7

Number that are paid subscriptions

3.7

4.6

4.0

3.6

Number that are JSTOR target journals

3.1a

1.6b

2.5

1.9b

Note: Means with different subscripts differ significantly at p < .01 in the Tukey honestly significant difference test.

 

TABLE 11.6. Percentage of Economics Faculty Ranking JSTOR Economics Journals as Top Five Most Frequently Used, Next Five Most Frequently Used, and Not Used (n = 74)

 

University of Michigan ( n = 48)

Five Colleges ( n = 26)

Journal

Top Five

Next Five

Not Used

Top Five

Next Five

Not Used

American Economic Review

79%

  6%

15%

66%

15%

19%

Journal of Political Economy

52%

10%

38%

32%

26%

42%

Quarterly Journal of Economics

41%

15%

44%

16%

26%

58%

Econometrica

26%

30%

44%

  8%

15%

77%

Review of Economics and Statistics

18%

28%

54%

12%

34%

54%


187

among the top ten by over 60% of both the Michigan and the five-college economists; and the Quarterly Journal of Economics and the Review of Economics and Statistics were cited among the top ten by over 50% of the Michigan economists and by over 40% of the five-college economists. By contrast, Table 11.7 shows the historians' ranking in order of use of the five history journals chosen for JSTOR. The American Historical Review was cited among the top ten most frequently used journals by over 60% of both the Michigan and the five-college historians. However, none of the other four journals were used by a majority of the historians at Michigan or at the five colleges.

Internet Use

Overall, faculty reported weekly use of e-mail (M = 4.3), monthly use of on-line catalogs (M = 3.2) and the Web (M = 3.0), and two or three uses per year of FTP (M = 2.3) and on-line database (M = 2.1). Table 11.8 summarizes the use of these Internet applications by institution and discipline. In terms of e-mail use, Michigan historians (M = 3.3) were significantly lower than the Michigan economists (M = 4.9), the five-college economists (M = 5.0), and the five-college historians (M = 4.7). In terms of World Wide Web use, Michigan historians (M = 1.8) were significantly lower than everyone, while the five-college historians (M = 2.9) were significantly lower than the five-college economists (M = 4.2) and the Michigan economists (M = 3.9). In terms of FTP use, the Michigan historians (M = 1.4) and the five-college historians (M = 1.7) differed significantly from the Michigan economists (M = 3.4) and the five-college economists (M = 2.7). In terms of on-line database use, the Michigan historians (M = 1.6) were significantly lower than the five-college economists (M = 2.9). Faculty did not differ significantly in terms of on-line catalog use.

The Relationship of Journal and Internet Use to JSTOR Use

Examination of the frequency of JSTOR use among faculty aware of JSTOR (n = 78) showed that 58% of the respondents had varying levels of use, while 42% reported no use. Using the frequency of JSTOR use as the dependent variable, the faculty who reported no use were censored on the dependent variable. The standard zero, lower-bound tobit model was designed for this circumstance (Tobin 1958). Most important, by adjusting for censoring, the tobit model allows inclusion of negative cases in the analysis of variation in frequency of use among positive cases, which greatly enhances degrees of freedom. Therefore, hierarchical tobit regression analyses were used to examine the influence of demographic characteristics, journal use, search preferences, Internet use, and attitude toward computing on the frequency of JSTOR use. Independent variables used in these analyses were selected on the basis of significance in univariate tobit regressions


188
 

TABLE 11.7. Percentage of History Faculty Ranking JSTOR History Journal as Top Five Most Frequently Used, Next Five Most Frequently Used, and Not Used (n = 86)

 

University of Michigan ( n = 58)

Five Colleges ( n = 28)

Journal

Top Five

Next Five

Not Used

Top Five

Next Five

Not Used

American Historical Review

44%

19%

37%

58%

24%

18%

Journal of American History

31%

  6%

63%

39%

  4%

57%

Journal of Modern History

15%

10%

75%

18%

11%

71%

William and Mary Quarterly

13%

  6%

81%

15%

  3%

82%

Speculam

  9%

  3%

88%

11%

10%

79%

 

TABLE 11.8. Mean Frequency of Computer Application Use over Direct Connection (High-Speed Network) by Institution and Discipline (n = 158a )

 

University of Michigan

Five Colleges

Computer Application

Economics (n = 47)

History (n = 57)

Economics (n = 26)

History (n = 28)

E-mail

4.9a

3.3b

5.0a

4.7a

On-line catalogs

3.3

2.8

3.6

3.7

On-line databases

2.3

1.6a

2.9b

2.1

World Wide Web

3.9a

1.8b

4.2a

2.9c

File Transfer Protocol (FTP)

3.4a

1.4b

2.7a

1.7b

Note: Frequency of use was reported on a 5-point scale (1 = never; 2 = 2-3 times per year; 3 = monthly; 4 = weekly; 5 = daily).

Note: Means with different subscripts differ significantly at p < .01 in the Tukey honestly significant difference test.

a Two cases were unusable due to incomplete data.

on the frequency of use variable. Table 11.9 summarizes the independent variables used in the multiple tobit regression analyses.

Table 11.10 summarizes the results of the hierarchical tobit regression of demographic, journal use, search preference, Internet use, and computing attitude variables on frequency of JSTOR use. The line second from the bottom in Table 11.10 summarizes the log likelihood score for each model. Analysis of the change in log likelihood score between adjacent models gives a measure of the significance of independent variables added to the model. For example, in Model 1, the addition of the demographic variables failed to produce a significant change in the log likelihood score compared to the null model. By contrast, in Model 2, the addition of journal use variables produced a significant change in the log likelihood score compared to Model 1-suggesting that the addition of the journal


189
 

TABLE 11.9. Descriptive Statistics for Faculty Aware of JSTOR (n = 78)

Variable

Mean

Std

At Michigan

49%

  -

In economics

54%

  -

Male

82%

  -

Years since degree

17.2

11.5

Copies articles

  3.09

  0.91

Puts articles on reserve

  2.73

  1.15

Reads abstracts

68%

  -

Total # subs., JSTOR

  2.5

  1.5

Total # subs., all

  8.8

  1.96

# paid subs.

  4.04

  2.43

Uses on-line indexes

60%

  -

Searches on-line catalog

85%

  -

Browses on-line catalog

65%

  -

Frequency of on-line catalog use

  3.47

  1.25

Frequency of on-line database use

  2.33

  1.31

Frequency of WWW use

  3.47

  1.62

Frequency of FTP use

  2.39

  1.42

Attitude toward computing

  3.52

  0.70

Frequency of JSTOR use

  2.05

  2.09

use variables improved the fit in Model 2 over Model 1. Similarly, the addition of search variables in Model 3 and of Internet use variables in Model 4 both produced significant improvements in fit, but the addition of the computer attitude variable in Model 5 did not. Therefore, Model 4 was selected as the best model. From Model 4, the coefficients for gender, article copying, abstract reading, and searching on-line catalogs are all positive and significant. These results suggest that, controlling for other factors, men were 0.77 points higher on frequency of JSTOR use than were women, that there was a 0.29-point increase in the frequency of JSTOR use for every point increase in the frequency of article copying, that faculty who read article abstracts were 0.82 points higher on frequency of JSTOR use than were faculty who didn't read abstracts, and that there was a 1.13point increase in the frequency of JSTOR use for every point increase in the frequency of on-line catalog searching. From Model 4, the coefficients for affiliation with an economics department and the number of paid journal subscriptions are both negative and significant. These results suggest that, controlling for other factors, economists were 0.88 points lower on frequency of JSTOR use than were historians and that there was a 0.18-point decrease in frequency of JSTOR use for every unit increase in the number of paid journal subscriptions.


190
 

TABLE 11.10. Tobit Regression on Frequency of JSTOR Use among Faculty Aware of JSTOR (n = 78)

Variable

Model 1

Model 2

Model 3

Model 4

Model 5

Constant

  0.56

-2.45*

-3.89***

-3.86***

-4.63***

At Michigan

-0.11

   .28

   .47

   .47

   .47

In economics

0.20

  -.73

  -.48

  -.88*

  -.94**

Male

   .77

   .82*

   .91**

   .77*

   .77*

Years since degree

-0.04**

-0.02

-0.00

0.00

0.00

Copies articles

 

   .29

   .28

   .29*

   .29*

Puts articles on reserve

 

   .28*

   .33**

   .24

   .22

Reads abstracts

 

1.38***

1.22***

   .82**

   .86**

Total # subs., JSTOR

 

   .27*

   .26*

   .21

   .23

Total # subs., all

 

0.03

-0.02

-0.02

-0.03

# paid subs.

 

  -.17**

  -.16**

  -.18**

  -.19**

Uses on-line indexes

   

   .37

   .22

   .25

Searches on-line catalog

   

1.34**

1.13*

1.17*

Browses on-line catalog

   

-0.02

  -.15

  -.25

Frequency of on-line catalog use

     

0.02

0.01

Frequency of on-line database use

     

0.02

-0.00

Frequency of WWW use

     

   .22

   .19

Frequency of FTP use

     

   .20

   .15

Attitude toward computing

       

   .31

-Log likelihood

111.94

98.08

93.56

89.31

88.70

Chi-square

    6.72

27.72***

  9.04**

  8.5*

  1.2

Note: -Log likelihood for the null model = 115.30.

* = p < .10; ** = p < .05; *** = p < 01.

Discussion

This study addressed five questions related to the preliminary impact of JSTOR: (1) how faculty searched for information; (2) which faculty used JSTOR; (3) how journals were used; (4) how the Internet was used; and (5) how journal use and Internet use correlated with JSTOR use.

Summary of Findings

In terms of how faculty searched for information, results were consistent with earlier findings reported in the literature. Specifically, a strong majority of the faculty reported relying on citations from related publications, on colleagues, on electronic catalogs, and on browsing library shelves when seeking information. Faculty did not differ dramatically in selection of search strategies, except that Michigan


191

economists were less likely to browse library shelves and less likely to search card catalogs.

In terms of JSTOR use, Michigan faculty were less likely to know about JSTOR than were the five-college faculty, and Michigan faculty were less likely to use JSTOR than were the five-college faculty. These results probably reflected the delayed rollout and availability of JSTOR at Michigan. Economists were more likely to use JSTOR than historians were. Of the faculty who reported JSTOR use, frequency of use did not differ dramatically from frequency of use of a related, more traditional technology: on-line searching. That is, 58% of the faculty who used JSTOR said they used JSTOR once a month or more, while 69% of the faculty who did on-line searches reported doing searches once a month or more. Note, however, that over twice as many faculty reported doing on-line searches (75%) as reported use of JSTOR (33%).

In terms of journal use, faculty did not vary greatly in their use of journal features, except that Michigan historians were less likely to use article abstracts and that Michigan economists were less likely to use book reviews. Economists and historians did not differ in the total number of journals used; however, there was greater consensus among the economists about core journals. Specifically, two of the five economics titles included in JSTOR (the American Economic Review and the Journal of Political Economy ) were cited among the top 10 most frequently used journals by a majority of the economists, while four of the five titles (the two mentioned above plus the Quarterly Journal of Economics and the Review of Economics and Statistics ) were cited among the top 10 most frequently used journals by a majority of the Michigan economists. By contrast, only one of the five history titles included in JSTOR (the American Historical Review ) was cited among the top 10 most frequently used journals by a majority of the historians.

In terms of Internet use, the Michigan historians lagged their colleagues in economics at Michigan and the five-college faculty. For example, the Michigan historians reported less use of e-mail, the World Wide Web, FTP, and on-line databases than did the other faculty. The economists were more likely to use FTP and more likely to use the World Wide Web than the historians were. Faculty used online catalogs at similar rates.

In terms of factors correlated with JSTOR use, the tobit regressions showed that a model including demographic factors, journal use factors, search factors, and Internet use factors offered the best fit to the data on frequency of JSTOR use. The addition of the computer attitude variable did not improve the fit of this model. In the best fit model, gender, article copying, abstract reading, and searching on-line catalogs were all positively and significantly related to frequency of JSTOR use. Also from the best fit model, affiliation with an economics department and greater numbers of journal subscriptions were negatively and significantly related to frequency of JSTOR use.


192

Limitations of the Study

These data represent a snapshot of faculty response to JSTOR at an extremely early stage in the evolution of the JSTOR system. In the spring of 1996, JSTOR had been available to the five-college faculty for less than six months, while at Michigan, the system had not yet been officially announced to faculty. Therefore, the results probably underestimate eventual use of the mature JSTOR system. Further, as a survey study, self-reports of use were crude compared to measures that could have been derived from actual behavior. For example, it was intended to match use reports with automated usage statistics from the JSTOR Web servers, but the usage statistics proved too unreliable. Another problem was that the survey contained no items on the frequency of traditional journal use. Therefore, it is unknown whether the low use of JSTOR reported by the faculty reflected dissatisfaction with the technology or simply a low base rate for journal use. Finally, the faculty at Michigan and at the five colleges were atypical in the extent of their access to the Internet and in the modernity of their computing equipment. Faculty with older computers and slower network links would probably be even less likely to use JSTOR.

Implications for the JSTOR Experiment

Although extremely preliminary, these early data suggest trends that merit further exploration as JSTOR expands. First, it is encouraging to discover that among faculty who have used JSTOR, rates of use are already comparable to rates for use of on-line searching-a technology that predates JSTOR by two decades. It will be interesting to see if JSTOR use grows beyond this modest level to equal the use of key Internet applications, like e-mail and Web browsing. Second, there appear to be clear differences in journal use across disciplinary lines. For example, economists focus attention on a smaller set of journals than is the case in history. Therefore, it may be easier to satisfy demand for on-line access to back archives in fields that have one or two flagship journals than in more diverse fields where scholarly attention is divided among dozens of journals. This conclusion may lead commercial providers of back archive content to ignore more diverse disciplines at the expense of easier-to-service, focused disciplines. Finally, the negative correlation between the number of journal subscriptions and JSTOR use suggests the possibility of a substitution effect (i.e., JSTOR for paper). However, the significance of this correlation is difficult to determine, since there is no way to know the direction of causality in a cross-sectional study.

Preparation of this article was supported by a grant to the University of Michigan from the Andrew W. Mellon Foundation. JSTOR is the proprietary product of JSTOR, a nonprofit


193

corporation dedicated to provision of digital access to the back archives of scholarly journals. For more information, please consult www.jstor.org.

We gratefully acknowledge the assistance of Kristin Garlock, Marcia Heringa, Christina Maresca, William Mott, Sherry Piontek, Tony Ratanaproeksa, Blake Sloan, and Melissa Stucki in gathering the data for this study. Also, we thank Ann Bishop, Joan Durrance, Kristin Garlock, Kevin Guthrie, Wendy Lougee, Sherry Piontek, Sarah Sully, and the participants of The Andrew W. Mellon Foundation Scholarly Communication and Technology Conference for comments on earlier drafts. Finally, we thank the history and economics faculty of Bryn Mawr College, Denison University, Haverford College, Swarthmore College, the University of Michigan, and Williams College for their patience and cooperation as participants in this research.

Requests for copies should be sent to: (1) Thomas Finholt, Collaboratory for Research on Electronic Work, C-2420 701 Tappan Street, Ann Arbor, MI 48109-1234; or (2) finholt@umich.edu.

References

Broadbent, E. A. (1986). Study of humanities faculty library information seeking behavior. Cataloging and Classification Quarterly, 6, 23-37.

Carley, K., & Wendt, K. (1991). Electronic mail and scientific communication: A study of the SOAR extended research group. Knowledge: Creation, Diffusion, Utilization, 12, 406-440.

Finholt, T. A., & Olson, G. M. (1997). From laboratories to collaboratories: A new organizational form for scientific collaboration. Psychological Science, 8, 28-36.

Fox, E. A., Akscyn, R. M., Furuta, R. K., & Leggett, J. J. (Eds.). (1995). Digital libraries [Special issue]. Communications of the ACM, 38(4).

Garvey, W. D. (1979). Communication: The essence of science. Toronto: Pergamon Press.

Garvey, W. D., Lin, N., & Nelson, C. E. (1970). Communication in the physical social sciences. Science, 170, 1166-1173.

Hesse, Bradford W., Sproull, Lee S., & Kiesler, Sara B. (1993). Returns to science: Computer network in oceanography. Communications of the ACM, 26(8) (August), 90-101.

Kling, R., & Covi, L. (1996). Electronic journals and legitimate media. The Information Society, 11, 261-271.

Lougee, W. P., Sandler, M. S., & Parker, L.L. (1990). The Humanistic Scholars Project: A study of attitudes and behavior concerning collection storage and technology. College and Research Libraries, 51, 231-240.

Odlyzko, A. (1995). Tragic loss or good riddance? The impending demise of traditional scholarly journals. International Journal of Human-Computer Studies, 42, 71-122.

Olsen, J. (1994). Electronic journal literature: Implications for scholars. Westport, CT: Mecklermedia.

Sabine, G. A., & Sabine, P. L. (1986). How people use books and journals. Library Quarterly, 56, 399-408.


194

Schuegraf, E. J., & van Bommel, M. F. (1994). An analysis of personal journal subscriptions of university faculty. Part II: Arts and professional programs. Journal of the American Society of Information Science, 45, 477-482.

Simpson, A. (1988). Academic journal usage. British Journal of Academic Librarianship, 3, 25-36.

Stenstrom, P., & McBride, R. B. (1979). Serial use by social science faculty: A survey. College and Research Libraries, 40, 426-431.

Taubes, G. (1993). Publication by electronic mail takes physics by storm. Science, 259, 1246-1248.

Tobin, J. (1958). Estimation of relationship for limited dependent variables. Econometrica, 26, 24-36.

Walsh, J. P., & Bayma, T. (1997). Computer networks and scientific work. In S. B. Kiesler (Ed.), Culture of the Internet. Hillsdale, NJ: Lawrence Erlbaum Associates.


195

Chapter 12—
Patterns of Use for the Bryn Mawr Reviews

Richard Hamilton

Historical Background

Bryn Mawr Classical Review (BMCR), one of the first electronic journals in the humanities, was started in 1990 to provide timely reviews of books in the classics. To lend solidity, a paper version was produced as well, and the two were issued simultaneously until late 1995, when the electronic reviews began to be published individually more or less as they were received and the paper versions issued four times a year. In 1993 a sister journal, Bryn Mawr Medieval Review (BMMR), was created to review books in medieval studies, and the two journals were combined to form the Bryn Mawr Reviews (BMR). After about two years of activity BMMR became dormant, and toward the end of 1996 both location and management were shifted.[1] Since then it has become tremendously active, at one point even surpassing BMCR in its monthly output.[2] Comparisons should be considered with this history in mind. (For more detail, see chapter 24.)

Data

We have two sets of users: subscribers and gopher hitters.[3] For data from the former we have subscription lists, which are constantly updated, and periodic surveys that we have conducted; for the latter we have monthly reports of gopher hits and gopher hitters (but not what the hitters hit). In considering this data our two main questions have been (1) how are we doing? and (2) how can we afford to keep doing it?

Gopher Reports

Our analysis of the monthly gopher reports has concentrated on the hitters rather than the hits. After experimenting rather fruitlessly in 1995 with microanalysis of


196

the data from the Netherlands and Germany hitter by hitter month by month for a year, we decided to collect only the following monthly figures:

• total number of users

• total by address (country, domain, etc.)

• list of top hits (those reviews that received 15+ hits/month and are over a year old[4] )

• list of top hitters (those who use the system 30+/month)

Analysis shows that use has leveled off at a peak of about 3,800 users a month. With a second full year of gopher use to study we can see the seasonal fluctuation more easily. The one area of growth seems to be non-English foreign sites. If we compare the top hitters in the first ten months of 1995 with the comparable period in 1996, we find that the total increased only 5% but the total number of nonEnglish heavy users increased 120% (Table 12.1). Three countries were among the heavy users in both 1995 and 1996 (France, Germany, Netherlands); two appeared only in 1995 (South Africa, Taiwan), and eight only in 1996 (Brazil, Italy, Ireland, Poland, Portugal, Russia, Spain, Venezuela).

In terms of total number of users from 1995 to 1996 there was an overall increase of 10.8%, although the increase among U.S. users was only 9.1%. Conversely, most foreign countries showed a marked increase in total use over the ten months of 1996 versus 1995: Argentina, 16 to 27; Australia, 542 to 684; Brazil, 64 to 165; Denmark, 80 to 102; Spain, 107 to 197; Greece, 41 to 80; Ireland, 50 to 69; Israel, 89 to 108; Italy, 257 to 359; Japan, 167 to 241; Korea, 26 to 40; Netherlands, 273 to 315; Portugal, 16 to 26; Russia, 9 to 27; (former) USSR, 13 to 20; and South Africa, 63 to 88. On the other hand, Iceland went from 22 to 8, Malaysia from 30 to 21, Mexico from 68 to 56, Sweden from 307 to 250, and Taiwan from 24 to 14. Also, among U.S. users there was a large drop in the .edu domain, from 7,073 to 5,962, and a corresponding rise in the .net domain, from 1,570 to 4,118, perhaps because faculty members are now using commercial providers for home access.[5]

In the analysis of top hits (Table 12.2), a curious pattern emerges: BMMR starts out with many more top hits despite offering a much smaller number of reviews (about 15% of BMCR's number), but toward the end of 1995 the pattern shifts. BMMR dominates at the beginning but drops when BMMR becomes inactive.

The shift is easily explained because it occurs about the time BMMR was becoming inactive, but the original high density is still surprising.[6] Also surprising is that medieval books receive noticeably more attention: 32 medieval titles made the top hits list 116 times (avg 3.6), while 81 classical titles made the list only 219 times (avg 2.7), despite including two blockbuster titles, Amy Richlin's Pornography and Representation (10 times) and John Riddle's Contraception and Abortion (14 times).[7] My guess is that medievalists, being more widely dispersed in interests and location, have found the Net more important than have classicists, who are mostly located


197
 

TABLE 12.1. BMCR/BMMR Top Hitters (30+ hits a month)

 

U.S.

English

Non-English

Total

1995

47

8

5

60

1996

42

10

11

63

 

TABLE 12.2. Top Hits (averaging 15+ hits per month for at least one year)

Month

1/95

2/95

3/95

4/95

5/95

6/95

7/95

8/95

9/95

10/95

BMMR

2

15

10

2

5

16

3

12

  41

  46

BMCR

1

11

6

3

5

20

1

14

116

170

Month

1/96

2/96

3/96

4/96

5/96

6/96

7/96

8/96

9/96

10/96

BMMR

38

14

15

19

6

9

7

8

20

14

BMCR

81

69

74

50

25

13

16

19

48

54

in a classics department and whose professional work is more circumscribed (and has a longer history).

Subscriptions

Subscriptions to the e-journals continue to grow at a rate of 5% per quarter, although there is considerable seasonal fluctuation (see Table 12.3). Looking more broadly we see a steady slowdown in growth of all but the joint subscriptions (see Table 12.4).

If we look at the individual locations (Table 12.5), we find again that while the U.S. subscriptions continue to grow, they are becoming steadily fewer of the whole, going from 77% of the total in 1993 to 68% in 1996. English-speaking foreign countries have remained about the same percentage of the whole; it is nonEnglish speaking foreign countries that have shown the greatest increase, going from 4% of the total in 1993 to 13% of the total in 1996.

Subscriber Surveys

As opposed to the gopher statistics, which give breadth but little depth, our surveys offer the opportunity for deeper study of our users but at the expense of breadth. We cannot survey our subscribers too often or they will not respond.[8] A further limitation is that we felt we could not survey those who take both BMCR and BMMR, a significant number, without skewing the results, since many subscribers lean heavily toward one journal or the other and the journals are significantly different in some ways. So far we have conducted five surveys:


198
 

TABLE 12.3. Subscriptions over Two Years

 

3/95

6/95

9/95

3/96

6/96

10/96

3/97

BMCR

1,072

1,067
(-.4%)

1,135
(+6%)

1,253
(+10%)

1,273
(+2%)

1,317
(+3%)

1,420
(+8%)

BMMR

711

755
(+6%)

865
(+13%)

931
(+8%)

964
(+4%)

995
(+3%)

1,091
(+10%)

Joint

568

562
(-1%)

599
(+7%)

672
(+12%)

685
(+2%)

770
(+12%)

844
(+10%)

Total

2,351

2,384
(+1%)

2,599
(+9%)

2,856
(+10%)

2,922
(+2%)

3,082
(+5%)

3,355
(+9%)

 

TABLE 12.4. Subscriptions over Three Years

 

9/93

9/94

9/95

10/96

BMCR

651

882
(+35%)

1,135
(+29%)

1,317
(+16%)

BMMR

257

498
(+94%)

865
(+74%)

995
(+15%)

Joint

261

460
(+76%)

599
(+30%)

770
(+29%)

1. a 20-question survey in November 1995 to BMCR subscribers

2. a 21-question survey in February 1996 to BMMR subscribers

3. a 2-question survey in October 1996 to all subscribers[9]

4. a 15-question survey in January 1997 to all BMCR reviewers whose e-mail addresses we knew

5. a 2-question survey in March 1997 to those who have canceled subscriptions in the past year

Table 12.6 presents the subscriber profile as revealed in the surveys. Many of the differences are easily explained by the checkered history of BMMR or by the differing natures of the two readerships.[10] I doubt many readers will be surprised to learn that medievalists are more often female and less often faculty. The paucity of reader-reviewers of BMMR reflects the paucity of BMMR reviews. To me, the most surprising statistic is the low use of gopher by subscribers to either journal.

The key question, of course, is willingness to pay for subscriptions. With that in mind, we did some correlation studies for the BMCR survey, first seeing what variables correlated with a willingness to pay $5 for a subscription.[11] We found posi-


199
 

TABLE 12.5. BMCR Subscribers

 

1993

1994

1995

1996

Total

730

1,019

1,130

1,349

.edu

529

   701

   703

  779

.com

   22

   44

     72

  103

.gov

    3

    6

      4

      4

.mil

    2

   2

      2

      2

.org

    5

   6

      7

     12

.net

    3

  5

      8

    17

U.S. Total

564(77%)

764(75%)

796(70%)

917(68%)

Foreign Total

154

254

  332

  428

.ca

58

  87

106

114

.uk

31

45

   57

   77

.au

21

33

   38

   43

.nz

4

  6

    7

     6

.za

8

12

  14

   18

.ca/ .uk/ .au/ .nz/ .za

122(17%)

183(18%)

222(20%)

258(19%)

Non-English

32(4%)

71(7%)

110(10%)

170(13%)

.de

5

11

  16

  27

.nl

7

10

  16

  24

.ie

1

  4

   5

   5

.fi

3

  8

   9

  12

.br

0

  2

   2

   2

.fr

1

  4

   7

   9

.es

0

  0

   1

   3

.it

2

  4

   7

  17

.hu

0

  2

   2

   2

.ve

1

  1

   1

   1

.se

3

  4

   6

   7

.gr

0

  1

   3

   8

.il

2

  6

  11

14

.dk

1

  1

    1

   0

.no

3

  4

    4

   4

.kr

0

  0

    1

   1

.be

0

  2

    5

   7

.us

0

  2

    2

   4

.jp

1

  2

    3

   4

.ch

1

  2

    4

12

.pt

0

  0

    1

   1

.at

0

  0

    1

   2

.hk

0

  1

    1

   1

.my

0

  0

    1

   1

.tr

0

  0

    1

   1

.pl

0

  0

    0

   2


200
 

TABLE 12.6. Subscriber Profiles

 

BMCR (%)

BMMR (%)

Male

74.1

52.8

Female

25.9

47.2

High school degree

   .5

2.6

A.B.

5.6

9.8

M.A.

11.1

15.9

ABD

13.0

18.5

Ph.D.

67.7

50.6

M.D., etc.

2.1

2.6

No academic affiliation

4.7

9.7

Faculty

65.5

45.0

Adjunct, research

7.1

6.6

Grad student

15.3

24.1

Undergrad

   .8

2.3

Other

6.6

12.3

Check e-mail daily

90.8

87.4

Read review on-screen

68.2

65.8

Print immediately

6.7

6.1

Read on-screen to decide

25.1

27.3

Never/rarely delete without reading

83.7

86.3

Make printed copy sometimes/often

57.2

52.5

Copy on disk sometimes/often

52.0

51.6

Have used gopher

42.5

16.0

Reviewed for this journal

25.1

9.6

Heard reference to this journal

71.5

31.0

Start no/a few reviews

20.2

20.7

Start many/most reviews

70.5

65.8

Start almost all reviews

9.3

13.5

Finish no/a few reviews

43.1

42.2

Finish many/most reviews

53.8

54.6

Finish almost all reviews

3.1

3.2

Review useful for teaching

56.9

45.5

Review useful for research

88.1

81.2

Willing to pay $5 subscription

69.8

53.8

tive correlation (Pearson product-moment correlation) with the following categories:

• ever found review useful for teaching (r = .19,.00037 likelihood of a chance correlation)

• ever found review useful for research (r = .21, .00005)

• ever hear a reference to BMCR (r = .23, .00001)

• ever written a review for BMCR (r = .17, .00084)


201

Further correlations were found, some not at all surprising:

• start to read many/most reviews//heard a reference to BMCR (r = .20, .00014)

• willing to review//heard a reference to BMCR (r = .22, .00002)

• get paper BMCR//have written review (r = .22, .00002)

• have written review//will write in future (r = .24, .00000)

• will write in future//library gets BMCR (r = .21, .00005)

• Ph.D. //willing to review (r = .24, .00000)

• institutional affiliation//useful for teaching (r = .21, .00009)

• useful for teaching//useful for research (r = .25, .00000)

• heard a reference/ /willing to review (r = .22 , .00002)

A follow-up two-question survey done in October 1996 asked whether subscribers would prefer to pay for e-mail subscription, receive advertisements from publishers, or cancel. Fourteen percent preferred to pay, 82% to receive advertisements, and 4% to cancel.

Our most recent survey, of those who had for one reason or another dropped from the list of subscribers, revealed that almost a third were no longer valid addresses and so were not true cancellations. Of those who responded, almost half (40,44%) of the unsubscriptions were only temporary (Table 12.7). The reason for cancellation was rarely the quality of the review.

Conclusions

If we return to our two questions-progress and cost recovery-we can see that our progress is satisfactory but that cost recovery is still uncertain.

BMCR is growing at the rate of 30% a year.[12] The major American classics organization (The American Philological Association) has about 3,000 members, and on that basis we estimate very roughly that the total world population of classicists is somewhere between 7,000 and 10,000. BMCR, then, presently reaches between 22% and 32% of its total market. Presumably, only half of that market has access to computers, so BMCR's real penetration may be over 50%. If so, at its present rate of growth, BMCR may saturate its market in as few as five years. It is much more difficult to estimate the total world market for BMMR, but it is certainly greater than that for BMCR. With BMMR's present growth rate of perhaps 30%,[13] it will take somewhat longer to reach saturation.

BMCR unrecovered costs are about $4,000 per year for over 700 pages of reviews.[14] About half the cost goes for producing the paper version, and we anticipate costs of between $1,500 and $2,000 per year for preparing reviews for the Web.[15] Uncompensated editorial time averages 34 hours per month. Therefore, total out-of-pocket expenses could be as high as $6,000 if the paper version con-


202
 

TABLE 12.7. BMCR Unsubscriber Survey (January 1996 through February 1997)

317 total: 103 address no longer valid; 91 responses

Identity of Respondents

       

15 unaffiliated with academic institution

46 faculty (4 retired, 9 adjunct or research)

7 librarians

 

8 students (2 undergraduates)

 

7 other

         

Reason Given for Unsubscribing

No. Giving Reason

No. of Those Who Are Faculty

Never subscribed

2

 

   1

Never meant to unsubscribe

2

 

   1

Unsubscribed from old, subscribed to new address

16

 

14

Suspended subscription while away

15

 

9+1

Decided reviews not sufficiently relevant to interests

22

 

6+2

Decided review quality not high enough

2

 

  +1

Too much e-mail

11+3

 

6+3

No longer have time to read reviews

7+1

 

+2

Other (5 shifted to BMR, 1 to BMCR, mistake)

7+1

 

4+1

Question

Unaffiliated

Faculty

Librairan

Student

Other

Not relevant

  8

   6+2

1

2

2

Too much mail

  2

       7

-

2

-

No time

  4

    +2

-

-

1

Total

14

13+4

1

4

3

"+" number signify those who marked more than one category

tinues and if markup continues to be done by hand. A third possible reduction in costs besides elimination of the paper version and automatic markup is a "fasttrack" system whereby the review never leaves the Net: it is e-mailed to the editor, who sends it to a member of the editorial board; when the two have made changes, it is sent back to the reviewer for approval and then published on the Net. The great advantage for the reviewer is that this system cuts publication time by a month; the disadvantage is that the reviewer is asked to do some simple markup on the text before sending it.[16]

Possible revenue sources include advertising, subscriptions, and institutional support. As we have seen, our subscribers much prefer receiving advertising to paying for a subscription, but we have no idea how successful we will be in attracting advertising.[17] Hal Varian has suggested that we try to arrange something with Amazon Books, and we have made a tentative agreement to list their URL on our Web reviews.[18] We will not consider charging for subscriptions until BMCR is


203

on the Web; at that point we could charge for timely delivery of the review, perhaps several months before universal access. We also want to wait for wide acceptance of a simple electronic cash transfer system. Institutional support seems to us the most obvious way to cover costs, since the college gets considerable exposure for what seems to us a small cost.


205

Chapter 13—
The Crosscurrents of Technology Transfer
The Czech and Slovak Library Information Network

Andrew Lass

Introduction

One would have no great difficulty in estimating the demand function, i.e., the relationship between the price and the quantity that can be sold at that price for, say, tomatoes. But one would have considerable problems in making sales predictions at various hypothetical alternative prices for a new product that looks like a blue tomato and tastes like a peach. (Quandt 1996, 20)

This vivid image of an odd-looking vegetable that tastes like a fruit is meant to highlight the difficulty of estimating the demand side in the overall cost picture of producing and distributing new products, such as electronic publications. Compared to the traditional printed material, electronic products are new, from their internal architecture to the mechanisms of production, distribution, and access that stem from it. After all, the world of readers is not a homogeneous social group, a market with a simple set of specific needs. Yet we assume that a segment of this market-the scholarly community-takes easily and more or less quickly to supplementing their long established habits (of associating the printed text with a paper object) with different habits, experienced as equally convenient, of searching for and reading electronic texts. While this observation may be correct, it should be emphasized at this point that precisely in the expression "more or less" is where the opportunity lies-for those of us interested in transitions-to see what is involved in this change of habit and why it is not just a "matter of time." As anyone who has tried to explain the possibilities of electronic text delivery to an educated friend will attest, the idea can be viewed with anxiety and taken to mean the end of the book. The Minister of Culture of the Czech Republic, a wellknown author and dissident, looked at me with surprise as I tried to explain the need for library automation (and therefore for his ministerial support): he held both hands clasped together as if in prayer and then opened them up like a book


206

close to his face. He took a deep breath, exhaled, and explained how much the scent of books meant to him. His jump from on-line cataloging and microfilm preservation to the demise of his personal library was a rather daring leap of the imagination but not an uncommon one, even among those who should know better. It is not just the community of scholars, then, but of politicians and even librarians who must change their attitudes and habits. The problem is further compounded if we consider that in the case of Eastern Europe this new product is being introduced into a setting where the very notion of a market is itself unsettled. The question of demand is quite different in a society that had been dominated by a political economy of command.

In the pages that follow I will give a brief account of an extensive interlibrary automation and networking project that The Andrew W. Mellon Foundation initiated and funded abroad, in the Czech and Slovak republics.[1] While most of the papers in this volume deal with digital libraries, this one points to the complexities that affect the ability of any library to change its ways and expand its mandate to include access to digitized materials. My aim is critical rather than comprehensive. By telling the reader about some of the obstacles that were confronted along the way, I hope to draw attention to the kinds of issues that need to be kept in mind when we think of establishing library consortia-the seemingly natural setting for the new technologies-in other countries.

The Caslin Projects

The Mellon-funded proposal to establish the Czech and Slovak Library Information Network (CASLIN) commenced in January 1993. In its original stage it involved four libraries in what has now become two countries: the National Library of the Czech Republic (in Prague), the Moravian Regional Library (in Brno), the Slovak National Library (in Martin), and the University Library of Bratislava. These four libraries had signed an agreement (a Letter of Intent) that they would cooperate in all matters that pertained to fully automating their technical services and, eventually, in developing and maintaining a single on-line Union Catalogue. They also committed themselves to introducing and upholding formats and rules that would enable a "seamless" integration into the growing international library community. For example, compliance with the UNIMARC format was crucial in choosing the library system vendor (the bid went to ExLibris's ALEPH). Similarly, Anglo-American cataloging rules (AACR2 ) have been introduced, and most recently, there is discussion of adopting the LC subject headings. Needless to say, the implementation was difficult and the fine-tuning of the system is not over yet, though most if not all of the modules are up and running in all four libraries. The first on-line OPAC terminals were made available to readers during 1996. At present, these electronic catalogs reflect only the library's own collection-there are no links to the other libraries, let alone to a CASLIN Union Catalogue-though


207

they do contain a variety of other databases (for example, a periodicals distribution list is available on the National Library OPAC that lists the location of journals and periodicals in different libraries in Prague, including the years and numbers held). A record includes the call number-a point of no small significance-but does not indicate the loan status, nor does the system allow users to "Get" or "Renew" a book.[2] In spite of these shortcomings, the number of users of these terminals has grown sharply, especially among university students, and librarians are looking for ways to finance more (including some graphics terminals with access to the WWW).

In the period between 1994 and 1996, several additional projects (conceived as extensions of the original CASLIN project) were presented to The Mellon Foundation for funding. It was agreed that the new partners would adopt the same cataloging rules as well as any other standards and that they would (eventually) participate in the CASLIN Union Catalogue. Each one of these projects poses a unique opportunity to use information technology as an integrator of disparate and incongruous institutional settings.

The Library Information Network of the Czech Academy of Science (LINCA) was projected as a two-tiered effort that would (1) introduce library automation to the central library of the Czech Academy of Sciences and thereby (2) set the stage for the building of an integrated library-information network that would connect the specialized libraries of all the 6o scientific institutes into a single web with the central library as their hub. At the time of this writing the central library's LAN has been completed and most of the hardware installed, including the highcapacity CD-ROM (UltraNet) server. The ideal of connecting all the institutes will be tested against reality as the modular library system (BIBIS by Square Co., Holland) is introduced together with workstations and/or miniservers in the many locations in and outside the city of Prague.[3]

The Koſsice Library Information Network (KOLIN) is an attempt to draw together three different institutions (two universities and one research library) into a single library consortium. If successful, this consortium in eastern Slovakia would comprise the largest on-line university and research library group in that country. The challenge lies in the fact that the two different types of institutions come under two different government oversight ministries (of education and of culture), which further complicates the already strained budgetary and legislative setup. Furthermore, one of the universities-the University of Pavel Josef Safarik (UPJS)-at that time had two campuses (in two cities 40 km apart) and its libraries dispersed among thirteen locations. UPJS is also the Slovak partner in the SlovakHungarian CD-ROM network (Mellon-funded HUSLONET) that shares in the usage and the costs of purchasing database licenses.[4]

Finally, the last of the CASLIN "add-ons" involves an attempt to bridge incompatibilities between two established library software systems by linking two university and two state scientific libraries in two cities (Brno and Olomouc) into a


208

single regional network, the Moravian Library Information Network (MOLIN). The two universities-Masaryk University in Brno and Palacký University in Olomouc-have already completed their university-wide library network with TinLib (of the United Kingdom) as their system of choice. Since TinLib records do not recognize the MARC structure (the CASLIN standard adopted by the two state scientific libraries), a conversion engine has been developed to guarantee full import and export of bibliographic records. Though it is too soon to know how well the solution will actually work, it is clear already that its usefulness goes beyond MOLIN, because TinLib has been installed in many Czech universities.[5]

Fortunately, storage, document preservation, retrospective conversion, and connectivity have all undergone substantial changes over the past few years. They are worth a brief comment:

1. Up until the end of the Communist era, access to holdings was limited not only by the increasingly ridiculous yet strict rules of censorship but also by the worsening condition of the physical plant and, in the case of special collections, the actual poor condition of the documents. The National Library in Prague was the most striking example of this situation; it was in a state of de facto paralysis. Of its close to 4 million volumes, only a small percentage was accessible. The rest were literally "out of reach" because they were either in milk crates and unshelved or in poorly maintained depositories in different locations around the country.[6] This critical situation turned the corner in January 1996 when the new book depository of the NL was officially opened in the Prague suburb of Hostivar. Designed by the Hillier Group (Princeton, N.J.) and built by a Czech contractor, it is meant to house 4.5 million volumes and contains a rare book preservation department (including chemical labs) and a large microfilm department. Because more than 2 million volumes were cleaned, moved, and reshelved by the end of 1996, it is now possible to receive the books ordered at the main building (a book shuttle guarantees overnight delivery).[7] Other library construction has been under way, or is planned, for other major scientific and university libraries in the Czech Republic.[8] There is no comparable library construction going on in Slovakia.

2. The original CASLIN Mellon project included a small investment in microfilm preservation equipment, including a couple of high-end cameras (GRATEK) with specialized book cradles-one for each of the National Libraries-as well as developers, reader-printers, and densitometers. The idea was to (1) preserve the rare collection of nineteenth- and twentieth-century periodicals (that are turning to dust), (2) significantly decrease the turnaround time that it takes to process a microfilm request (from several weeks to a few days), and (3) make it technically possible to meet the highest international standards in microfilm preservation. The program has since evolved to a full-scale digitalization project (funded by the Ministry of Culture) that includes the collections of other libraries.[9]


209

3. The most technologically ambitious undertaking, and one that also has the most immediate and direct impact on document accessibility, is the project for the retrospective conversion of the general catalog of the National Library in Prague. Known under the acronym RETROCON, it involves a laboratory-like setup of hardware and software (covered by a Mellon Foundation grant) that would-in an assembly-line fashion-convert the card catalog into ALEPH-ready electronic form (UNIMARC). RETROCON is designed around the idea of using a sophisticated OCR in combination with a specially designed software that semiautomatically breaks down the converted ASCII record into logical segments and places them into the appropriate MARC field. This software, developed by a Czech company (COMDAT) in cooperation with the National Library, operates in a Windows environment and allows the librarian to focus on the "editing" of the converted record (using a mouse and keyboard, if necessary) instead of laboriously typing in the whole record. As an added benefit, the complete scanned catalog has now been made available for limited searching (under author and title in a Windows environment), thereby replacing the original card catalog. One of the most interesting aspects of this project has been the outsourcing of the final step in the conversion to other libraries, a sort of division of labor (funded in part by the Ministry of Culture) that increases the pool of available expert catalogers.[10]

4. For the most part, all installations of the LAN have proceeded with minimal problems, and the library automation projects, especially those involving technical services, are finally up and running. Unfortunately, this achievement cannot be said for the statewide infrastructure, especially not the phone system. Up until the end of 1997, the on-line connections between libraries were so poor that it was difficult to imagine, let alone test, what an on-line library network would have to offer. Needless to say, this holdback has had an adverse effect on library management, especially of the CASLIN consortium as a whole.[11]

Crosscurrents

A comparison of the present condition and on-line readiness of research and university libraries in Central Europe with the status quo as it arrived at the doorstep of the post-1989 era leaves no doubt that dramatic improvements have taken place. But even though the once empty (if not broken) glass is now half filled, it also remains half empty. Certainly that is how most of the participants tend to see the situation, perhaps because they are too close to it and because chronic dissatisfaction is a common attitude. Yet the fact remains that throughout the implementation and in all of the projects, obstacles appeared nearly every step of the way. While most of the obstacles were resolved, although not without some cost, all of them can be traced to three basic sources of friction: (1) those best attributed


210

to external constraints-the budgetary, legal, political, and for the most part, bureaucratic ties that directly affect a library's ability to function and implement change; (2) those caused by cultural misunderstandings-the different habits, values, and expectations that inform the activity of localization; and (3) the internal problems of the libraries themselves, no doubt the most important locus of micropolitical frictions and therefore of problems and delays.[12] In what follows, I will focus on the first source of friction (with some attention paid to the second), since my emphasis here is on the changing relations between what are taken to be separate institutional domains (particularly between libraries and other government organizations or the market) as I try to make sense of the persistently problematic relationships between libraries (particularly within the CASLIN group). Obviously, while these analytical distinctions are heuristically valuable, in reality, these sources of friction are intertwined and further complicated by the fact that the two countries are undergoing post-Communist aftershocks and an endless series of corrections. Not only are the libraries being transformed, but so is the world of which they form a part. To make sense of this double transition and to describe the multifaceted process that the library projects have moved through may pose some difficulties. But the task also offers a unique opportunity to observe whether, and if so how, the friction points move over time. What could have been predicted when the initial project commenced-that implementation and system localization would also mean giving in to a variety of constraints-is only beginning to take on the hard contours of reality four years later. In several instances, the results differ from our initial conception, but I don't think it would be fair to assume that the final outcome will be a compromise. Instead, the success of the Mellon library projects in Eastern Europe (of which CASLIN is only one) should be judged by the extent to which they have been accepted and have taken on a life of their own, initially distinguishable but finally inseparable from the library traditions already in place. After all, if the projects were designed to affect a change in the library system-and by "system," we must understand a complex of organizational structures, a real culture, and an actually existing social network-then we must also expect that the library system will respond that way, that is, as a complex sociocultural system. What appeared at first as a series of stages (goals) that were to follow one another in logical progression and in a "reasonable" amount of time may still turn out to have been the right series. It's just that the progression will have followed another (cultural) logic, one in which other players-individuals and the organizational rules that they play by-must have their part. As a result, the time it actually takes to get things done seems "unreasonable," and some things even appear to have failed because they have not taken place as and when expected. What is the meaning of these apparent problems? A seemingly philosophical issue takes on a very real quality as we wonder, for example, about the future of the CASLIN consortium. If establishing a network of library consortia was one of the central aims of the Mellon project, then it is precisely this goal that we have failed to reach, at least now, when it was supposed to be long in place according to our


211

scheme of things. There is no legal body, no formal association of participating libraries in place. This deficiency is particularly important and, needless to say, frustrating for those of us who take for granted the central role that networking and institutional cooperation play in education and scholarly research. But behind this frustration another one hides: it is probably impossible to say whether what is experienced as the status quo, in this case as a failure or shortcoming, is not just another unexpected curve in a process that follows an uncharted trajectory.[13]

As I have noted above, in 1992 a Letter of Intent had been signed by the four founding CASLIN members. It was a principal condition of the project proposal. In January 1996, when this part of the project was-for all intents and purpose- brought to a close, there was still no formally established and registered CASLIN association with a statute, membership rules, and a governing body in place. Although the four libraries had initially worked together to choose the hardware and software, the work groups that had been formed to decide on specific standards (such as cataloging rules, language localization, or the structure of the Union Catalogue record) had trouble cooperating and their members often lacked the authority to represent their institution. Tasks were accomplished more because of the enthusiasm of individuals and the friendly relations that developed among them than because of a planned, concerted effort on the part of the library leadership guided by a shared vision. The initial stages of the implementation process were characterized by an uneven commitment to the shifting of priorities that would be necessary in order to carry the intent through. There was even a sense, in some instances, that the prestige of the project was more important than its execution or, more exactly, that while the funding for library automation was more than welcome, so was the political capital that came with being associated with this U.S.-funded project, even if such an attitude meant using this political capital at a cost to the consortium. As is well documented from many examples of outside assistance in economic development, well-intentioned technology transfer is a prime target for subversion by other, local intentions; it can be transformed with ease into a pawn in another party's game. Potential rivalries and long-standing animosities that existed among some of the libraries, instead of being bridged by the project, seemed to be exacerbated by it. In one instance, for example, affiliation with the Mellon project was used by a library to gain attention of high government officials (such as the cultural minister) responsible for policies affecting their funding and, most important, their mandate. The aim, as it now turns out, was to gain the status of a national library. This library's target, that is, the library that already had this status, was the Slovak National Library, its primary CASLIN partner. While both libraries participated in the CASLIN project's implementation and even cooperated in crucial ways at the technical level (as agreed), their future library cooperation was being undermined by a parallel, semiclandestine, political plot. Needless to say, this situation has left the CASLIN partnership weakened and the managements of both libraries dysfunctional.[14]

As the additional library projects mentioned earlier were funded and the new


212

libraries joined the original CASLIN group, it became clear that the new, larger group existed more in rhetoric than in fact. From the newcomer's point of view there was not much "there" to join. "What is in this for us, and at what cost?" seemed to be the crucial question at the January 1996 meeting at which a written proposal for a CASLIN association was introduced by the National Library in Prague. This meeting was not the first time that an initiative had been presented but failed to take hold. Nor was it the last. The discussion about the proposal resulted in a squabble. An e-mail discussion group was established to continue the discussion but nothing came of it nor of several other attempts.

If the point of a consortium is for libraries to cooperate in order to benefit (individually) from the sharing of resources so as to provide better on-line service, then a situation such as this one must be considered counterproductive. How does one explain the chronic inability of CASLIN to get off the ground as a real existing organization? Where does the sense of apathy, reluctance, or even antagonism come from? Most of the answers (and there are many) lie hidden within the subtleties of society and history. But of these answers, a few stand out clearly: the fact that all the original CASLIN libraries come under the administrative oversight of the Ministry of Culture is one key piece of the puzzle. The dramatic cuts in the ministries' overall budgets are passed down to the beneficiaries who find themselves competing for limited goods. Another answer is in the lingering nature of the relationship: if the difference from the previous setup (under the "planned" socialist economy) lies with the fact that the library has the status of a legal subject that designs and presents its own budget, its relationship to the ministry-very tense and marked by victimization-seems more like the "same old thing." In other words, certain aspects of organizational behavior continue not only by force of habit (a not insignificant factor in itself), but also because these aspects are reinforced by a continuing culture of codependency and increased pressure to compete over a single source of attention. The situation appears as if, from our point of view, the formal command economy has been transformed into a market economy only to the extent that strategic and self-serving positioning is now more obvious and potentially more disruptive. So-called healthy competition (so called by those whose voices dominate in the present government and who believe in the self-regulating spirit of "free market forces") seems to show only its ugly side: we see the Mellon project embraced with eagerness in part because of the way its prestige could be used to gain a competitive advantage over other libraries. In the case of CASLIN partners, we see it take the form of suspicion, envy, and even badmouthing expressed directly to the Mellon grants administrator (myself).[15]

What are the constraints under which a research or national library operates, and in what way is the present situation different from the "socialist" era [1948-1989]? An answer to these questions will give us a better sense of the circumstances under which attempts to bring these institutions up to international standards-and get them to actively cooperate-must unfold.


213

Figures 13.1 and 13.2 illustrate the external ties between a library and other important domains of society that affect its functioning and co-define its purpose before and after 1989 (while keeping in mind that economic, legal, and regulatory conditions have been in something of a flux in the years since 1989 and, therefore, that the rules under which a library operates continue to change).

1. Under "party" rule the library, like all other organizations, came under direct control of its ministry, in this case the Ministry of Culture [MK]. One could even say, by comparison with the present situation, that the library was an extension of the ministry. However, the ministry was itself an extension of the centralized political rule (the Communist party), including the watchful eye of the secret police [STB]. The director was appointed "from above" [PARTY] and the budget arrived from there as well. While requests for funding were entertained, it was hard to tell what would be funded and under what ideological disguise.[16] For the most part the library was funded "just in order to keep it alive," though if the institution ran out of money in any fiscal year, more could be secured to "bail it out" [hence "Soft" Budget]. In addition to bureaucratic constraints (regarding job descriptions and corresponding wage tables, building maintenance and repairs, or the purchase of monographs and periodicals), many of which remain in place, there were political directives regarding employability[17] and, of course, the ever-changing and continuously growing list of prohibited materials to which access was to be denied [Index]. In contrast, the library is now an independent legal body that can more or less decide on its priorities and is free to establish working relationships with other (including foreign) organizations. The decision making, including organizational changes, now resides within the library. While the budget is presented to the ministry and is public knowledge, it is also a "hard" budget that is set at the ministerial level as it matches its cultural policies against those of the Ministry of Finance [MF] (and therefore of the ruling government coalition). After an initial surge in funds (all marked for capital investment only), the annual budgets of the libraries have been cut consistently over the past five years (i.e., they are not even adjusted for inflation but each year are actually lower than the previous one). These cuts have seriously affected the ability of the libraries to carry out their essential functions, let alone purchase documents or be in the position to hire qualified personnel.[18] For this reason, I prefer to speak of a relationship of codependence. The Ministry of Culture still maintains direct control over the library's ability to actualize its "independence"-though it has gradually shifted from an antagonistic attitude to one of genuine concern. The point is that whereas the Ministry of Culture is supposed to oversee the well-being of its institutions, it is, as is usually the case in situations of government supervision, perceived as the powerful enemy.


214

figure

Figure 13.1.
Czech Research Library before 1990: External Ties

2. The publishing world was strictly regulated under the previous regime: all publishing houses were state enterprises (any other attempt at publishing was punishable by law), and all materials had to pass the scrutiny of the state (political) censor. Not everything that was published was necessarily political trash, and editions were limited; the resulting economy of shortage created a high demand for printed material, particularly modern fiction, translations from foreign languages, and the literary weekly [hence "Seller's Market"]. Libraries benefited from this situation. Because all state scientific and research libraries were recipients of the legal deposit, their (domestic) acquisitions were, de facto, guaranteed. At present the number of libraries covered by the deposit requirement has been reduced from some three dozen to half a dozen. This change was meant to ease the burden on publishers and give the libraries a freer hand in building their collection in a "competitive marketplace." But considering the severe cuts in the budget, many of the libraries cannot begin to fulfill even the most Spartan acquisitions policy. For the same reason publishers, of whom there are many and all of whom are private and competing for the readers' attention, do not consider libraries as important parts of their market. Furthermore, many of the small and often short-lived houses do not bother to apply for the ISBN or to send at least one copy (the legal deposit law is impossible to enforce) to the National Library, which, in turn, cannot fulfill its mandate of maintaining the national bibliographic record.


215

figure

Figure 13.2.
Czech Research Library after 1990: External Ties

3. During the Communist era, access to materials was limited for several obvious reasons: political control (books on the Index, limited number of books from Western countries, theft) or deliberate neglect (the progressively deteriorating storage conditions eventually made it impossible to retrieve materials). Over the years there was less and less correspondence between the card catalogs in the circulation room and the actual holdings, and as a result, students and scholars stopped using the National Library in Prague because it was increasingly unlikely that their requests would be filled. This situation was also true for current Czech or Slovak publications because of an incredible backlog in cataloging or because the books remained unshelved. Of course, in such a system there was no place for user feedback. Since then, some notable improvements-many of them due to Mellon and other initiatives-have been made in public services, such as self-service photocopying machines and, to remain with the example of the National Library, quick retrieval of those volumes that have been reshelved in the new depository. Also, readers are now used to searching the electronic OPACs or using the CD-ROM databases in the reference room. On the other hand, the backlog of uncataloged books is said to be worse than before and, with acquisitions cut back and the legal deposit not observed, the reader continues to leave the circulation desk empty-handed. The paradoxical situation is not lost on the reader: if the books are out of print or, as is more often the case these days, their price beyond what readers could


216

afford, going to the library may not be a solution either. So far the basic library philosophy has remained the same as it has throughout its history: although there is concern for the user, libraries are not genuinely "user driven" (only a few university libraries have adopted an open stack policy) and, as far as I can tell, user feedback is not a key source of information actively sought, analyzed, and used in setting priorities.

4. Under the policies of socialist economy, full employment was as much a characteristic of the library as it was of the rest of society. In effect, organizations hoarded labor (as they did everything else) with a sort of just-in-case philosophy in mind, since the point was to fulfill "the plan" at just about any cost and provide full benefits for all with little incentive for career development (other than through political advancement). Goods and services became known for their poor quality; the labor force became known for its extremely low productivity and its lousy work morale. More time seemed to be spent in learning how to trick the system than in working with it, to the point where micro political intrigue-the backbone of the "second" economy- competed very well with the official chain of command. The introduction of a market economy after 1990 did very little to help change this situation in a library, a state organization with no prestige. Simply put, the novelty and promise of the private sector, coupled by its high employment rate and good wages, has literally cut the library out of the competitive market for qualified labor. Between the budget cuts and the wage tables still in place, there is little space left for the new management to negotiate contracts that would attract and keep talented people in the library, certainly not those people with an interest in information technologies and data management.[19]

5. As mentioned above, the first information technologies arrived in the state scientific and national libraries in the late 1980s. Their impact on budgets was minimal (UNESCO's ISIS is freeware) as was their effect on technical services. On the other hand, the introduction of information technologies into these libraries, in particular the CASLIN group, was the single most visible and disruptive change-a sort of wedge that split the library organizations open-that has occurred since 1990 (or, according to some, during the last century). The dust has not yet settled, but in view of our present discussion, one thing is clear already: between the Mellon funds and the initial capital investment that followed, libraries have become a significant market for the local distributors of hardware and for the library software vendors (in contrast to the relationship with publishers). But as we all know, these purchases are not one-time purchases but only the first investments into a new kind of dependency, a new external tie that the library must continue to support and at no small cost. And the cost is not just financial. The ongoing complications with the technology and the chronic delays in systems localization only


217

contribute to the present sluggish state of affairs and thus lend support to the ever cynical factions within the organization that knew "all along" that "the whole automation project was a mistake." Obviously, the inability to attract qualified professionals doesn't help.

What I have painted here is but part of the picture (the other part would be made up of a detailed analysis of the micropolitics that actually go on, both inside the organization and in relation to other organizations, particularly other libraries). But the above discussion should help us see how and why the libraries feel trapped in a vicious circle from which they perceive little or no way out other than continuing to battle for their place in the sun. Of course, their tactics and battle cries only reinforce the relationship of codependency as well as their internal organizational problems. And that is exactly what the public and government officials see: that these institutions need to grow up and learn what real work is before more money is poured down the drain. Needless to say, a sizable portion of the blame must be carried by a government that has made a conscious choice against long-term investment into the educational, scientific, and information sectors.

If the long-standing administrative ties between libraries and the Ministry of Culture inform and override the building of new, potentially powerful ties to other libraries, then the flip side of this codependency, its result, is a lack of experience with building and envisioning the practical outcome of a horizontally integrated (i.e., nonhierarchical) association of independent organizations. The libraries had only limited exposure to automation, and the importance of long-term strategic planning was lost on some of them.[20] At least two other factors further reinforced this situation: the slow progress (the notorious delays mentioned above) in the implementation of the new system, which had involved what seemed like impractical and costly steps (such as working in UNIMARC), and the sluggish Internet connection. These factors suggest that at the present, a traditional understanding of basic library needs (which themselves are overwhelming) tends to take precedent over scenarios that appear much too radical and not grounded in a familiar reality. Since the on-line potential is not fully actualized, its impact is hard to imagine, and so the running of the organization in related areas continues to be predominantly reactive rather than proactive. In other words, in-house needs are not related to network solutions, especially when such solutions appear to be counterintuitive for the established (and more competitive) relationship between the libraries.

Cooperation among the libraries exists at the level of system librarians and other technical experts. Without this cooperation, the system would not have been installed, certainly not as an identical system in all four libraries. In addition (and, I should say, ironically) the CASLIN project has now received enough publicity to make it a household name among librarians. The acronym


218

has a life of its own, and there is a growing interest among other scientific libraries to join this "prestigious" group (that both does and does not exist).[21] But are we waiting for a moment at which the confluence of de facto advances in technical services and a growing interest of other libraries in logistical support (involving technology and technical services) will create a palpable need for a social organization that would exist (1) above and beyond the informal network of cooperation and (2) without association with the name and funds of The Andrew W. Mellon Foundation (its original reason for existence)? I have heard it said that "nothing more is needed," because the fundamentals of CASLIN are now embedded in the library process itself (reference here, I gather, was to cataloging) and in the existing agreements between individual libraries on the importing and exporting of records into and from the CASLIN Union Catalogue that is to be serviced by the two national libraries. In fact, as the most recent meeting (June 1997) of the Union Catalogue group made clear, such processes are indeed where the seed of an association of CASLIN libraries lies. The import and export of records and the beginning of the Union Catalogue database have yet to materialize, but they did bring together these individuals who represented individual libraries. If these people figure out a way to run their own show and stick to it, then there is a fair chance that an organization of CASLIN libraries will take off after all.[22]

Concluding Remarks

The above discussion raises three very important points. The first point regards cultural misunderstanding. The problem with the "misbehaving consortium" may lie to some extent with our (e.g., U.S.) expectations of what cooperation looks like and what basic fundamentals an on-line library consortium must embrace in order to do its job well. In the Czech and Slovak case, not only were the conditions not in place, they were counterindicative. While our naïveté caused no harm (the opposite is the case, I am repeatedly told!), it remains to be seen what the final result will look like. And in the final result resides the really intriguing lesson: maybe it is not so much that we should have or even could have thought differently and therefore ended up doing "this" rather than "that." Perhaps it is in the (information) technology itself-in its very organization-that the source of our (mis)understanding lies. After all, these technologies were developed in one place and not another. Our library automation systems obviously embody a particular understanding of technical and public services and an organization of work that share the same culture as a whole tradition of other technologies that emphasize speed, volume (just think of the history of railroads or the development of the "American system" of manufacturing), and finally, access. Every single paper in this volume exemplifies and assumes this world. In transferring a technology from one place to another, an implied set of attitudes and habits is being marketed as well. The intriguing question is whether the latter emerges logically from the former in the


219

new location. To this possibility, my second point lends some support: technology transfer involves a time lag, the duration of which is impossible to predict and that is accounted for by a complex series of micropolitical adjustments. It is this human factor that transforms the logical progression in the projected implementation process into a much less logical but essentially social act. Thanks to this human factor, the whole effort may fail. Without it, the effort will not exist. Only after certain problems and not others arise will certain solutions and not others seem logical. It is no secret that much social change is technology driven. It is less clear, ethnographically speaking, what exactly this process means, and even less is known about it when technology travels across cultural boundaries. There is much to be gained from looking carefully at the different points in the difficult process of implementing projects such as CASLIN. Apparently the ripple effect reaches far deeper (inside the institutions) and far wider (other libraries, the government, the market, and the users) than anyone would have anticipated. Before it is even delivering fully on its promise, the original Mellon project is demanding changes in library organization and management. Such changes are disruptive, even counterproductive, long before they "settle in." Nevertheless, and this is my third point, internal organizational change involves a gradual but, in consequence, quite radical realignment of ties with the outside, that is, with the Ministry of Culture (which-at least on the Czech side-has taken a keen interest in supporting library automation throughout the country; on the Slovak side, unfortunately, the situation is rather different), with other libraries (there has been a slow but palpable increase in interlibrary cooperation on specific projects that involve the use of information technologies, e.g., retrospective conversion, newspaper preservation, and, I hope, the CASLIN Union Catalogue), and most important, with the public. How far reaching and permanent these shifts are is difficult to say, especially when any accomplishments have been accompanied by a nagging feeling that they were done on a shoestring and against all odds. The persistent inability of the governments to pass legislation and appropriate funding that would support the newly emerging democracies' entrance into the global, information age in a sustainable manner highlights a serious lack of vision as well as of political savvy.E

At the beginning of this paper I argued that in discussing the introduction of new technologies, specifically information technologies, it is important to pay attention to the point of transition, to see all that is involved in this change of habit and why it is not just a "matter of time." The body of this paper, I hope, provided at least a glimpse of some of the friction points involved. For the time being, the last word, like the first, belongs to an economist, in this case to Václav Klaus, the prime minister of the Czech Republic (1993-1997), whose opinions expressed in a recent op-ed piece on "Science and our Economic Future" make him sound like someone who has just bitten into a blue tomato only to find that it tastes like a peach.

Science is not about information, but about knowing, about thinking, about the ability to generalize thoughts, make models of them and then testable hypotheses that


220

are to be tested. Science is not about the Internet and certainly not about its compulsory introduction. (Klaus 1997)


223

Chapter 14—
Consortial Access versus Ownership

Richard W.Meyer

Introduction

This chapter reports on a consortial attempt to overcome the high costs of scholarly journals and to study the roots of the cost problem with the advent of high-speed telecommunication networks throughout the world. The literature on the problem of journal costs includes both proposals for new ways of communicating research results and many studies on pricing.

Prominent members of the library profession have written proposals on how to disengage from print publishers.[1] Others have suggested that electronic publications soon will emerge and bring an end to print-based scholarship.[2] Another scientist proposes that libraries solve the problem by publishing journals themselves.[3] These proposals, however, tend not to accommodate the argument that loosely coupled systems cannot be easily restructured.[4] While access rather than ownership promises cost savings to libraries, the inflation problem requires further analysis of the factors that establish journal prices before it is solved.

Many efforts to explain the problem occupy the literature of the library profession and elsewhere. The most exhaustive effort to explain journal price inflation, published by the Association of Research Libraries for The Andrew W. Mellon Foundation, provides ample data, but no solution.[5] Examples of the problem appear frequently in the Newsletter on Serials Pricing Issues, which was developed expressly to focus discussion of the issue.[6] Searches for answers appear to have started seriously with Hamaker and Astle, who provided an explanation based on currency exchange rates.[7] Other analyses propose means to escape inflation by securing federal subsidies, complaining to publishers, raising photocopying charges, and convincing institutional administrators to increase budgets.[8]

Many analyses attempt to isolate factors that determine prices and the difference in prices between libraries and individuals. Some studies look at the statistical


224

relevance of sundry variables, but especially publisher type.[9] They confirm the belief that certain publishers, notably in Europe, practice price discrimination.[10] They also show that prices are driven by cost of production, which is related to frequency of issue, number of pages, and presence of graphics. Alternative revenue from advertising and exchange rate risk for foreign publishers also affects price.[11] Quality measures on the content, such as number of times a periodical is cited, affects demand that then impacts price. Economies of scale that are available to some journals with large circulation affects price.[12] These articles help explain differentials between individual and library prices.[13] Revenues lost to photocopying also account for some difference.[14] Also, differences in the way electronic journals may be produced compared to print provides a point on which some cost savings could be based.

The costs of production and the speed of communication may be driving forces that determine whether new publications emerge in the electronic domain to replace print. However, in a framework shaped by copyright law, the broader issue of interaction of demand and supply more likely determines the price of any given journal. Periodical prices remain quite low over time when magazine publishers sell advertising as the principal generator of revenue. When for political or similar reasons, publication costs are borne by organizations, usually not scholarly societies, periodical prices tend to be lower. Prices inflate in markets with high demand such as the sciences, where multiple users include practicing physicians, pharmaceutical firms, national laboratories, and so forth.

Unfortunately for libraries, the demand from users for any given journal is usually inelastic. Libraries tend to retain subscriptions regardless of price increases, because the demand originates with nonpaying users. In turn, price increases charged to individual subscribers to scholarly journals drive user demands. Therefore, it might be expected that as publishers offer currently existing print publications in an electronic form, they will retain both their price as well as inelastic demand. Commercial publishers, who are profit maximizers, will seek to retain or improve their profits when expanding into the electronic market. However, there are some properties associated with electronic journals that could relax the inelasticity of prices.

This chapter describes a multidisciplinary study of the impact of electronic publishing on the pricing of scholarly periodicals. A brief overview of the pricing issue comparing print and electronic publishing is followed by a summary of the access approach to cost containment. A preliminary report on an attempt at this technique by a consortium and on an associated econometric study also is included.[15]

Overview of Pricing Relevant to Electronic Journals

The industry of scholarly print publishing falls into the category of monopolistic competition, which is characterized by the presence of many firms with differentiated products and by no barriers to entry of new firms.[16] As a result of product


225

differentiation, scholarly publishers do not encounter elastic aggregate demand typically associated with competitive markets. Rather, each publisher perceives a negatively sloped individual demand curve. Therefore, each supplier has the opportunity to partially control the price of its product, even though barriers to entry of new, competing periodical titles may be quite low. Given this control, publishers have raised their prices to libraries with some loss of sales but with consequent increases in profits that overwhelm those losses. They segment their market between individuals and libraries and charge higher prices to the latter in order to extract consumer surplus.

As publishers lose sales to individuals, scholars increase their dependency on libraries, which then increases interlibrary borrowing to secure the needed articles. Photocopies supplied via library collections constitute revenue lost to publishers, but is recaptured in the price differential. Additional revenue might accrue if publishers could offer their products in electronic databases where they could monitor all duplication. This potential may rest on the ability of publishers to retain control in the electronic domain of the values they have traditionally added to scholarship.

Scholars need two services from scholarly literature: (1) input in the form of documentation of the latest knowledge and/or information on scholarly subjects and (2) outlets for their contributions to this pool of scholarship. Partly in exchange for their trade of copyright, scholars receive value in four areas. First, scholars secure value in communication when every individual's contribution to knowledge is conveyed to others, thus impacting the reputation of each author. Second, although not provided by publishers directly, archiving provides value by preserving historically relevant scholarship and fixing it in time. Third, value accrues from filtering of articles into levels of quality, which improves search costs allocation and establishes or enhances reputation. Fourth, segmenting of scholarship into disciplines reduces input search costs to scholars. The exchange of copyright ownership for value could be affected with the emergence of electronic journals.

Electronic journals emerge as either new titles exclusively in electronic form or existing print titles transformed to electronic counterparts. Some new journals have begun exclusively as electronic publications with mixed success. The directory published by the Association of Research Libraries listed approximately 27 new electronic journals in 1991. By 1995 that figure had risen to over 300, of which some 200 claim to be peer reviewed.[17] Since then hundreds more electronic journals have been added, but the bulk of these additions appear to be electronic counterparts of previously existing print journals.[18] In fact, empirical work indicates that exclusively electronic publications have had little impact on scholarship.[19]

The infrastructure of scholarly print publishing evolved over a long time. In order for a parallel structure to emerge in the electronic domain, publishers have to add as much value to electronic journals as they do print. Value must be added


226

in archiving, filtering, and segmenting in addition to communication. Establishing brand quality requires tremendous energy and commitment. Some electronic titles are sponsored by individuals who are fervent in their efforts to demonstrate that the scholarly community can control the process of communicating scholarship. However, it is unrealistic to expect an instantaneous, successful emergence of a full-blown infrastructure in the electronic domain that overcomes the obstacles to providing the values required by scholars. The advantage of higher communication speed is insufficient to drive the transformation of scholarship; thus traditional publishing retains an edge in the electronic domain.

A transformation is being achieved effectively by duplicating existing print journals in the electronic sphere, where publishers face less imposing investments to provide electronic counterparts to their product lines. For example, the Adonis collection on CD-ROM contains over 600 long-standing journals in medicine, biology, and related areas covering about seven years.[20] Furthermore, EBSCO, University Microfilms (UMI), Information Access Company (IAC), Johns Hopkins University Press, OCLC, and other companies are implementing similar products. OCLC now offers libraries access to the full text of journal collections of more than 24 publishers. Johns Hopkins University Press has made all 46 plus titles that it publishes available on-line through Project MUSE.

During the past 15 years, libraries have experienced a remarkable shift from acquiring secondary sources in print to accessing them through a variety of electronic venues, which suggests that many scholarly periodicals will become available electronically as an automatic response to the economies available there. However, some monopoly power of publishers could be lost if barriers to the entry of new journals are lower in the electronic domain than in the print domain. With full text on-line, libraries may take advantage of the economies of sharing access when a group of libraries contracts for shared access to a core collection. Sharing a given number of access ports allows economies of scale to take effect. Were one access port provided to each member of a consortium of 15 libraries, the vendor would tie up a total of 15 ports, but any given library in the group would have difficulty servicing a user population with one port. Whereas by combining access, 15 libraries together might get by with as few as 10 ports collectively. The statistical likelihood is small that all 10 ports would be needed collectively by the consortium at any given moment. This saves the vendor some computer resources that can then lead to a discount for the consortium that nets out less cost to the libraries.

Numerous models for marketing exist, but publishers can price their products in the electronic domain fundamentally only two ways. Either they will offer their products on subscription to each title or group of titles, or they will price the content on an article-by-article transaction basis. Vendor collections of journals for one flat fee based on the size of the user population represents a variant on the subscription fee approach. Commercial publishers, who are profit maximizers, will choose the method with the higher potential to increase their profit. Transaction-based


227

pricing offers the possibility of capturing revenue lost to interlibrary lending. Also, demand for content could increase because of the ease of access afforded on-line. On the risk side, print subscription losses would occur where the cumulative expenditure for transactions from a given title is less than its subscription price.

Potentially, two mechanisms could flatten demand functions in the electronic domain. First, by making articles available individually to consumers, the separation of items of specific interest to given scholars creates quality competition that increases the elasticity of demand, because quality varies from article to article. Presumably, like individual grocery items, the elasticity of demand for particular articles is more elastic than that of periodical titles. Economists argue that the demand for tortillas is more elastic than for groceries in general because other bakery goods can be substituted, whereas there is no substitute for groceries in general except higher priced restaurant eating. Similarly, when faced with buying individual articles, price increases will dampen demand more quickly than would be the case for a bundle of articles that are of interest to a group of consumers.

Second, by offering articles in an environment where the consuming scholar is required to pay directly (or at least observe the cost to the library), the effect of separation of payer and demander common with library collections resulting in high inelasticity will be diminished. This mechanism will increase elasticity because scholars will no longer be faced with a zero price. Even if the scholar is not required to pay directly for the article, increased awareness of price will have a dampening effect on inelasticity. However, publishers may find it possible to price individual articles so that the sum of individual article fees paid by consumers exceeds the bundled subscription price experienced by libraries formerly forced to purchase a whole title to get articles in print.

For a product like Adonis, which is a sizable collection of periodicals in the narrow area of biomedicine, transaction-based pricing works out in favor of the consumer versus the provider, since there will likely be only a small number of articles of interest to consumers from each periodical title. This result makes purchasing one article at a time more attractive than buying a subscription, because less total expenditure will normally result. In the case of a product composed of a cross section of general purpose periodicals, such as the UMI Periodical Abstracts, the opposite will be true. The probability is higher that a user population at a college may collectively be interested in every single article in general purpose journals. This probability makes subscription-based pricing more favorable for libraries, since the cumulative cost of numerous transactions could easily exceed the subscription price. Publishers will seek to offer journals in accordance with whichever of these two scenarios results in the higher profit. Scientific publishers will tend to bundle their articles together and make products available as subscriptions to either individual journals or groups. Scholarly publishers with titles of general interest will be drawn toward article-by-article marketing.

An Elsevier effort to make 1,100 scientific titles available electronically will be


228

priced on a title-by-title subscription basis and at prices higher than the print version when only the electronic version is purchased.[21] On the other hand, the general purpose titles included in UMI's Periodical Abstracts full text (or in the similar products of EBSCO and IAC), as an alternative interface to their periodicals, are available on a transaction basis by article. These two approaches seek to maximize profit in accordance with the nature of the products.

Currently, UMI, EBSCO, and IAC, which function as the aggregators, have negotiated arrangements that allow site licenses for unlimited purchasing. These companies are operating as vendors who make collections of general purpose titles available under arrangements that pay the publishers royalties for each copy of their articles printed by library users. UMI, IAC, and EBSCO have established license arrangements with libraries for unlimited printing with license fees based on expected printing activity, thus offering some libraries a solution to the fundamental pricing problem created by the monopoly power of publishers.

New research could test whether publishers are able to retain monopoly power with electronic counterparts to their journals. Theory predicts that in a competitive market, even when it is characterized as monopolistic competition, the price offered to individuals will tend to remain elastic. Faced with a change in price of the subscriptions purchased from their own pockets, scholars will act discriminately. Raise the price to individuals and some will cancel their subscriptions in favor of access to a library. In other words, the price of periodicals to individuals is a determinant of demand for library access. By exercising a measure of monopoly power in place of price, publishers have some ability to influence their earnings through price discrimination.[22]

In contrast, publishers can set prices to libraries higher than the price to individuals as a means to extract consumer surplus. The difference in prices provides a reasonable measure of the extent of monopoly power, assuming that the individual subscription price is an acceptable proxy for the marginal cost of production.[23] Even if not perfect, the difference in prices represents some measure of monopoly power. Extending this line of research may show that monopoly power is affected by the medium.

In monopolistic competition, anything that differentiates a product may increase monopoly power. Historically, tremendous amounts of advertising money are expended to create the impression that one product is qualitatively distinguishable from others. It may be that electronic availability of specific titles will create an impression of superior quality that could lead to higher prices. However, the prices of journals across disciplines also may be driven by different factors. In general, prices are higher in the sciences and technical areas and lower in the humanities. This price differential is understandable considering that there is essentially no market for scholarly publications in the humanities outside of academe, whereas scientific publications are used heavily in corporate research. As a result, monopoly power will likely be stronger in the sciences than in other areas. This


229

power would reflect additional price discrimination in the electronic environment by publishers who are able to capture revenue lost to photocopying.

Access Versus Ownership Strategy

Clearly, if commercial publishers continue to retain or enhance their monopoly power with electronic counterparts of their journals, the academic marketplace must adjust or react more effectively than it has in the past. The reaction of universities could lead to erosion of previous success achieved with price discrimination if an appropriate strategy is followed. Instead of owning the periodicals needed by their patrons, some libraries have experimented with replacing subscriptions with document delivery services. Louisiana State University reports canceling a major portion of their print journals.[24] They replaced these cancellations by offering faculty and students unlimited subsidized use of a document delivery service. The first-year cost for all the articles delivered through this service was much less than the total cost to the library for the former subscriptions. Major savings for the library budget via this approach would appeal to library directors and university administrators as a fruitful solution. However, it may turn out to be a short-term solution at best.

Carried to its logical conclusion, this approach produces a world in which each journal is reduced to one subscription shared by all libraries. This situation is equivalent to every existing journal having migrated to single copies in on-line files accessible to all interested libraries. Some libraries will pay a license fee in advance to allow users unlimited printing access to the on-line title, while others will require users to pay for each article individually. Individual article payment requires the entire fixed-cost-plus-profit components of a publisher's revenue to be distributed over article prints only, whereas with print publications, the purchase of subscriptions of physical artifacts that included many articles not needed immediately brought with it a bonus. The library acquired and retained many articles with future potential use. Transaction-based purchasing sacrifices this bonus and increases the marginal cost of articles in the long run. In sum, the marginal cost of a journal article in the print domain was suppressed by the spread of expenditure over many items never read. In the electronic domain under transaction-based pricing, users face a higher, more direct price and therefore are more likely to forego access. While the marginal benefit to the user may be equivalent, the higher marginal cost makes it less likely that users will ask for any given article. The result may show up in diminished scholarly output or notably higher prices per article.

In the long term, should a majority of libraries take this approach, it carries a benefit for publishers. There has been no means available in the past for publishers to count the actual number of photocopies made in libraries and thus to set their price accordingly. The electronic domain could make all those hidden transactions readily apparent. As a result, publishers could effectively maintain their


230

corporate control of prices and do so with more accurate information with which to calculate license fees. Given this attempted solution, publishers would be able to regain and strengthen their monopoly position.

A more promising approach lies in consortial projects such as that conducted by the Associated Colleges of the South (ACS).[25] Accompanying the Periodical Abstracts and ABI/Inform indexes of UMI that are made available on-line from the vendor or through OCLC are collections in full text of over 1,000 existing journals with backfiles. The ACS contracted an annual license for these two products (ABI/Inform and Periodical Abstracts ) for the 13 schools represented. Trinity University pays $11,000 per year for the electronic periodicals in the UMI databases, a cost that is similar to that paid by each ACS library. Coincidentally, Trinity subscribes to the print version of about 375 titles covered by these products. Trinity could cancel its subscriptions to the print counterparts of the journals provided and save $25,000. Even considering that Trinity's library will subsidize user printing for paper, toner, and so forth at an expected cost of several thousand dollars per year to service its 230 faculty and 2,500 students, it appears likely that favorable economies accrue from switching to these electronic products. Of course, these savings will be accompanied by a significant decrease in nondollar user cost to patrons, so unmet demand will emerge to offset some of the savings. Moreover, there is a substantial bonus for Trinity users inherent in this arrangement.

There are a number of titles made available in the UMI product for which subscriptions would be desirable at Trinity but have not been purchased in the past because of budget limitations. From some of these titles, users would have acquired articles through the normal channels of interlibrary loan. However, the interlibrary loan process imposes costs in the form of staff time and user labor and is sufficiently cumbersome, causing many users to avoid it for marginally relevant articles. However, if marginal articles could be easily viewed on screen as a result of electronic access, users would consider the labor cost of acquiring them to have been sufficiently reduced to encourage printing the articles from the system. Therefore, the net number of article copies delivered to users will be significantly increased simultaneous with a substantial net decrease in the cost of subscriptions delivered to libraries.

Included in this equation are savings that accrue to the consortial libraries by sharing access to electronic subscriptions. Shared access will result in a specific number of print cancellations, which will decrease publisher profit from subscriptions. Publishers offering their journals in the electronic domain will be confronted by a change in the economic infrastructure that will flatten the scholar's demand functions for their titles while simultaneously increasing the availability of articles to the direct consumers. By lowering the user's nondollar cost of accessing individual articles, demand will increase for those items. Scholars, therefore, will be more likely to print an article from an electronic library than they would be to request it through interlibrary loan. However, depending on library policy, those scholars may be confronted with a pay-per-print fee, which will affect their de-


231

mand function. If publishers raise the price to scholars for an article, they are more likely to lose a sale. Users will be more cautious with their own money than with a library's. That is, in the electronic domain, where scholars may be paying directly for their consumption, demand functions will be more elastic. This elasticity will occur to some extent even when users do not pay for articles but merely note the article price paid by their subsidizing library. Therefore, price discrimination may be more difficult to apply and monopoly power will be temporarily lost.

The loss might be temporary because this strategy is functionally the same as merging several libraries into one large library and providing transaction-based access versus ownership. This super library could ultimately face price discrimination similar to that currently existing in the print domain. This discrimination will lead, in turn, to the same kind of inflation that has been suffered for many years.

Preliminary Analysis of Financial Impact

This paper reports on the early stages of a three-year study funded by The Andrew W. Mellon Foundation. The study includes analysis directed at testing the viability of consortial access versus ownership for cost savings as well as the potential long-term solution that would derive from emergence of a new core of electronic titles. A complete financial analysis of the impact of consortial, electronic access to a core collection of general purpose periodicals and an econometric analysis of over 2,000 titles on the impact of electronic availability on pricing policy will issue from the study conducted under this grant. Some interesting issues have emerged with preliminary results of the study.

Financial Analysis

The Palladian Alliance is a project of the Associated Colleges of the South funded by The Andrew W. Mellon Foundation. This consortium of 13 liberal arts colleges-not just libraries-has a full-time staff and organizational structure. The Palladian Alliance came about as result of discussions among the library directors who were concerned about the problem described in this paper. As the project emerged, it combined the goals of several entities, which are shown in Table 14.1 along with the specific objectives of the project.

The Andrew W. Mellon Foundation awarded a grant of $1.2 million in December 1995 to the ACS. During the first half of 1996, the librarians upgraded hardware, selected a vendor to provide a core collection of electronic full-text titles, and conducted appropriate training sessions. Public and Ariel workstations were installed in libraries by July 1996 and necessary improvements were made to the campus networks to provide access for using World Wide Web technology. Training workshops were developed under contract with Amigos and SOLINET on technical aspects and were conducted in May 1996. During that same time, an analysis was conducted to isolate an appropriate full-text vendor.


232
 

TABLE 14.1. Goals and Objectives of the ACS Consortial Access Project

Goals of the ACS libraries:

   • Improve the quality of access to current information

   • Make the most efficient use of resources

Goals of the ACS deans:

   • Cost containment

Goals of The Andrew W. Mellon Foundation:

   • Relieve the economic pressure from periodical price inflation

   • Evaluate the impact of electronic access on publisher pricing practices

Objectives of the project:

   • Improve the hardware available within the libraries for electronic access

   • Provide on-line access to important undergraduate periodical indexes

   • Provide on-line access to core undergraduate periodicals in full text

   • Provide campuswide access through Internet browsers

   • Determine the financial impact on the ACS libraries

   • Test the pricing practices of publishers and their monopoly power

After comparison of the merged print subscription list of all institutions with three products-IAC's InfoTrac, EBSCO's EBSCOHOST, and UMI's Periodical Abstracts and ABI/Inform-the project team selected UMI with access through OCLC. A contract with OCLC was signed in June for July 1, 1996, start-up of FirstSearch for the nine core databases: WorldCat, FastDoc, ERIC, Medline, GPO Catalog, ArticleFirst, PapersFirst, ContentsFirst, and ProceedingsFirst; and for UMI's two core indexes, Periodical Abstracts and ABI/Inform, along with their associated full-text databases. This arrangement for the UMI products provides a general core collection with indexing for 2,600 titles, of which approximately 1,000 are full-text titles.

The UMI via OCLC FirstSearch subscription was chosen because it offered several advantages including the potential for a reliable, proprietary backup to the Internet, additional valuable databases at little cost, and easy means to add other databases. The UMI databases offered the best combination of cost and match with existing holdings. UMI also offered the future potential of full-image as well as ASCII text. After the first academic year, the project switched to access via ProQuest Direct in order to provide full image when available.

Students have had access to the core electronic titles since the fall semester in 1996. As experience builds, it is apparent that the libraries do have some opportunity to cancel print subscriptions with financial advantages. The potential costs, savings, and added value are revealed in Tables 14.2 through 14.4. Specific financial impact on the institutions during the first year is shown in Tables 14.5 and 14.6. It should be noted that the financial impact is based on preliminary data that has been extremely difficult to gather. Publisher and vendor invoices vary considerably


233

between schools on both descriptive information and prices. Therefore, these results will be updated continually throughout the project.

At the outset, the project benefits the libraries in a significant way because of the power of consortial purchasing. Only a few of the larger libraries might be able to afford to purchase access to both full-text databases were they constrained to individual purchases. Added together, individual subscriptions to ABI/Inform and Periodical Abstracts accompanied with the full text would collectively cost the 13 libraries $413,590 for 1997/98. By arranging consortial purchase, the total cost to the ACS is $129,645 for this second year. Because the libraries can then afford their share of the collective purchase, the vendor benefits from added sales otherwise not available and the libraries add numerous articles to the resources provided their students. More detailed accounting of the benefits are determined in the accompanying tables.

These tables are based on actual financial information for the consortium. Table 14.2 summarizes the project costs. These calculations will be corrected to reflect revised enrollment figures immediately prior to renewal for the third year. The project was designed to use grant funds exclusively the first year, then gradually shift to full support on the library accounts by the fourth year.

As the project started, the ACS libraries collectively subscribed through EBSCO, FAXON, and Readmore to approximately 14,600 subscriptions as shown in Table 14.3. Of these subscriptions, 6,117 are unique titles; the rest are duplicates of these unique titles. Were the ACS libraries collectively merged into one collection, it would therefore be possible to cancel more than 8,000 duplications and save over $1,000,000. Since this merger was not possible, the libraries contracted for electronic access to nearly 1,000 full-text titles from UMI. Over 600 of these UMI titles match the print subscriptions collectively held by the libraries. As Table 14.3 indicates, canceling all but one subscription to the print counterparts of the UMI titles could save the libraries about $137,000 for calendar year 1996. Canceling all the print counterparts to the electronic versions would save nearly $185,000, which is about equal to the licensing costs for the first year per Table 14.2.

For calendar year 1996, the libraries canceled very few titles. In part, this came about because of reluctance to depend upon an untested product. There was no existing evidence that UMI (or any other aggregator) could maintain a consistent list of offerings. To date, cancellations for 1997 have also been fewer than expected at the outset. Furthermore, the project has begun to show that products such as ProQuest Direct come closer to offering a large pool of journal articles than they do to offering full electronic counterparts to print subscriptions. However, these products do provide significant benefits to the libraries.

The project adds considerable value to the institutional resources. The schools had not previously subscribed to many of the titles available through UMI. As an


234
 

TABLE 14.2. Cost Sharing between the Grant and the Institutions for License to UMI Products Plus OCLC Base Package

Institution

First Year Enrollment

% of Total Enrollment

First Year

Second Year

Third Year

Mellon Grant

   

$184,295

$120,705

$45,000

Atlanta

13,174

38.70%

 

$19,423

$52,231

Birmingham

1,406

4.13%

 

$2,073

$5,574

Centenary

821

2.41%

 

$1,210

$3,255

Centre

968

2.84%

 

$1,427

$3,838

Furman

2,673

7.85%

 

$3,941

$10,598

Hendrix

978

2.87%

 

$1,442

$3,877

Millsaps

1,278

3.75%

 

$1,884

$5,067

Rhodes

1,407

4.13%

 

$2,074

$5,578

Richmond

3,820

11.22%

 

$5,632

$15,145

Rollins

2,632

7.73%

 

$3,880

$10,435

Southwestern

1,199

3.52%

 

$1,768

$4,754

Trinity

2,430

7.14%

 

$3,583

$9,634

Sewanee

1,257

3.69%

 

$1,853

$4,984

Totals

34,043

 

$184,295

$170,895

$179,970

 

TABLE 14.3. 1996 Potential Savings from Substitution of UMI Full-Text for Print Subscriptions

 

No. Titles

Costs/Savings

Cost total for all ACS print subscriptions

14,613

$2,716,480

Number of unique titles

6,117

$1,466,862

Number of duplicate titles

8,496

$1,249,618

Canceling of all overlapping duplicates

2,680

$184,862

Number of unique titles overlapping UMI

606

$47,579

Canceling of all but one overlapping duplicates

2,074

$137,283

illustration, Table 14.4 lists the number of print subscriptions carried by each institution and indicates how many of those are available in the UMI databases electronically. The fourth column reveals the potential savings available to each school were the print counterparts of all these electronic journals to be canceled. The column labeled Added E-Titles shows the number of new journals made available to each institution through the grant. The final column indicates the total titles now available at each institution as a result of the consortial arrangement. Comparison of the final column with the first reveals that the electronic project nearly doubles the journal resources available to students.

Table 14.5 details the preliminary financial impact on the ACS institutions for the first and second calendar year of the project. While the opening premise of


235
 

TABLE 14.4. 1996 Savings Potential for Each Institution and Value Added by Electronic Subscriptions

Institution

No. Print Subscriptions in 1996

Overlap with UMI

Potential Cancellation Savings

Added E-Titles

Total Subscriptions

Atlanta

1,085

112

$8,689

1,004

2,089

Birmingham

659

181

$18,141

935

1,594

Centenary

558

163

$12,974

953

1,511

Centre

701

152

$4,913

964

1,665

Furman

1,685

229

$13,856

887

2,572

Hendrix

599

145

$7,976

971

1,570

Millsaps

686

167

$10,485

949

1,635

Rhodes

964

187

$9,617

929

1,893

Richmond

1,827

358

$28,640

758

2,585

Rollins

1,017

210

$14,932

906

1,923

Southwestern

1,309

272

$16,648

844

2,153

Trinity

2,739

384

$27,373

732

3,471

Sewanee

784

120

$10,618

996

1,780

Total

14,613

2,680

$184,862

11,828

26,441

the project suggests that canceling print subscriptions would pay for consortial access to UMI's aggregated collections, actual practice shows otherwise. The data is still being collected in the form of invoices, but preliminary summaries of cancellations show meager savings. Total savings across the 13 campuses is little more than $50,000 per year. This savings is not enough to pay for the first two years of the project, which is over $350,000. However, the added value to the combined collections exceeds $2,000,000 per year as measured by the cost of print counter-parts to the UMI titles. Furthermore, additional action by some institutions, as shown in Table 14.6, reveals better outcomes.

Comparing the savings shown in Table 14.6 with the subsidized cost reveals that in the cases of Trinity and Millsaps, even without Mellon support, the consortial provision of the OCLC/UMI databases could be paid for by canceling indexes along with a few print subscriptions. In Trinity's case, two indexes previously purchased as CD-ROMs or direct links to another on-line source were canceled for savings of over $5,000 in the first year. Trinity canceled a CD-ROM subscription to a site license of ABI/Inform, which saved expenditures totaling over $6,000 per year, and an on-line general purpose index that previously cost over $12,000. The Trinity share to the Palladian Alliance project would have been just over $13,000 per year for the first three years. Similarly, Millsaps canceled one index and 74 periodical titles that overlapped the UMI content for net first-year savings of nearly $9,000. On this basis, the project more than pays for itself.

Added interesting outcomes of the project at this point include a couple of new


236
 

TABLE 14.5. Preliminary Financial Impact on Each Institution by the Palladian Project

1996 Saving and Added Value

Atlanta

Birmingham

Centenary

Centre

Furman

Hendrix

Number of print subscriptions

1,085

659

558

701

1,685

599

Number of sbscriptions canceled

0

1

0

0

5

0

Canceled title overlapping Palladian

0

0

0

0

0

0

Total invoices for periodical titles

$297,717

$94,416

$72,567

$117,947

$327,910

$82,112

Total Palladian expenditure

$0

$0

$0

$0

$0

$0

Total expenditures

$297,717

$94,416

$72,567

$117,947

$327,910

$82,112

Total invoice for canceled periodical titles

$0

$0

$0

$0

$11,022

$0

Total invoice for canceled Palladian overlap

$0

$0

$0

$0

$0

$0

Net cost

$297,717

$94,416

$72,567

$117,947

$316,888

$82,112

Total expenditures

$297,717

$94,416

$72,567

$117,947

$327,910

$82,112

Net cost

$297,717

$94,416

$72,567

$117,947

$316,888

$82,112

Savings

$0

$0

$0

$0

$11,022

$0

Total value of Palladian titles

$160,796

$160,796

$160,796

$160,796

$160,796

$160,796

Overlap value of Palladian titles

$8,689

$18,141

$12,974

$4,913

$13,856

$7,976

Value of added net benefit from Palladian

$152,107

$142,655

$147,822

$155,883

$146,940

$152,820

Savings

$0

$0

$0

$0

$11,022

$0

Value of added net benefit from Palladian

$152,107

$142,655

$147,822

$155,883

$146,940

$152,820

Total savings and added value

$152,107

$142,655

$147,822

$155,883

$157,962

$152,820

1997 Savings and Added Value

Atlanta

Birmingham

Centenary

Centre

Furman

Hendrix

Number of print subscriptions

1,086

655

566

706

1,685

608

Number of subscriptions canceled

0

1

0

1

1

1

Canceled title overlapping Palladian

0

1

0

1

1

0

Number of subscription overlapping UMI

107

135

119

120

174

115

Total invoices for periodical titles

$328,049

$74,170

n/a

$140,851

$350,580

$89,613

Total Palladian expenditure

$19,423

$2,073

$1,210

$1,427

$3,941

$1,442

Total expenditures

$347,472

$76,243

n/a

$142,278

$354,521

$91,055

(table continued on next page)


237

(table continued from previous page)

 

Millsaps

Rhodes

Richmond

Rollins

Southwestern

Trinity

Sewanee

TOTAL

686

964

1,827

1,017

1,309

2,739

784

14,613

3

1

7

0

0

79

0

96

0

1

3

0

0

11

0

15

$87,358

$140,431

$327,270

$134,659

$191,291

$659,080

$183,722

$2,716,480

$0

$0

$0

$0

$0

$0

$0

$0

$87,358

$140,431

$327,270

$134,656

$191,291

$659,080

$183,722

$2,716,480

$350

$0

$744

$0

$0

$41,640

$0

$53,756

$0

$60

$138

$0

$0

$30

$0

$228

$87,308

$140,371

$326,387

$134,659

$191,291

$617,410

$183,722

$2,662,795

$87,358

$140,431

$327,270

$134,659

$191,291

$659,080

$183,722

$2,716,480

$87,308

$140,431

$327,270

$134,659

$191,291

$617,410

$183,722

$2,662,795

$50

$60

$883

$0

$0

$41,670

$0

$53,685

$160,796

$160,796

$160,796

$160,796

$160,796

$160,796

$160,796

$2,090,348

$10,485

$9,617

$28,640

$14,932

$16,648

$27,373

$10,618

$184,862

$150,311

$151,179

$132,156

$145,864

$144,148

$133,423

$150,178

$1,905,486

$50

$60

$882

$0

$0

$41,670

$0

$53,984

$150,311

$151,239

$133,039

$145,864

$144,148

$133,423

$150,178

$1,906,429

$150,361

$151,299

$133,921

$145,864

$144,148

$175,093

$150,178

$1,960,113

Millsaps

Rhodes

Richmond

Rollins

Southwestern

Trinity

U of South

TOTAL

645

959

1,186

1,017

1,349

2,849

819

14,760

14

0

1

0

1

42

0

62

73

0

1

0

1

0

0

78

215

151

285

161

237

327

105

2,251

$74,095

$154,258

$358,417

$148,769

$195,075

$670,749

$202,079

$2,786,705

$1,884

$2,074

$5,632

$3,880

$1,768

$3,583

$1,853

$50,190

$75,979

$156,332

$364,049

$152,649

$196,843

$674,332

$203,932

$2,836,895

continued next page


238
 

TABLE 14.5. (Continued )

1997 Savings and Added Value

Atlanta

Birmingham

Centenary

Centre

Furman

Hendrix

Total invoice for canceled periodical titles

$0

$0

$0

$0

$270

$0

Total invoice for canceled Palladian overlap

$0

$24

$0

$120

$0

$0

Net cost

$347,472

$76,219

n/a

$142,158

$354,251

$91,055

Total expenditures

$347,472

$76,243

n/a

$142,278

$354,521

$91,055

Net cost

$347,472

$76,219

n/a

$142,158

$354,251

$91,055

Savings

$0

$24

$0

$120

$270

$0

Total value of Palladian titles

$183,772

$183,772

$183,772

$183,772

$183,772

$183,772

Overlap value of Palladian titles

$9,890

$9,373

$4,396

$7,927

$12,787

$7,505

Value of added net benefit from Palladian

$173,882

$174,399

$179,376

$175,845

$170,985

$176,267

Savings

$0

$24

$0

$120

$270

$0

Value of added net benefit from Palladian

$173,882

$174,399

$179,376

$175,845

$170,985

$176,267

Total savings and added value

$173,882

$174,423

$179,376

$175,965

$171,255

$176,267

pieces of important information. First, canceling individual subscriptions to indexes provides a viable means for consortial pricing to relieve campus budgets, at least in the short run. Were it necessary for Trinity to pay its full share of the cost, canceling indexes alone provided sufficient savings to pay for the project. Just considering trade-offs with indexes alone, Trinity's net savings over the project life span total nearly $18,000.

Second, on the down side, canceling journals and replacing them with an aggregator's collection of electronic subscriptions may not be very reliable. It is apparent that the aggregators suffer from the vagaries of publishers. In just the first few months of the project, UMI dropped and added a number of titles in both full-text databases. This readjustment means that instead of full runs of each title, the databases often contain only partial runs. Furthermore, in some cases the publisher provides only significant articles, not the full journal. Therefore, the substitution of UMI provides the libraries with essentially a collection of articles, not a collection of electronic subscription substitutes. This result diminishes reliability and discourages libraries from being able to secure really significant cost savings.

It should be noted however, that several of the libraries independently subscribed to the electronic access to Johns Hopkins Project MUSE. In contrast to an aggregated collection, this project provides full-image access to every page of the print counterparts and guarantees access indefinitely to any year of the subscription once it's paid for. This guarantee means that reliability of the product is substantially improved, and it provides reasonable incentives to the libraries to substitute access for


239
 

Millsaps

Rhodes

Richmond

Rollins

Southwestern

Trinity

U of South

TOTAL

$4,216

$0

$0

$0

$0

$46,345

$0

$50,831

$4,591

$0

$0

$0

$0

$0

$0

$4,735

$67,171

$156,332

$364,049

$152,649

$196,843

$627,987

$203,932

$2,780,118

$75,979

$156,332

$364,049

$152,649

$196,843

$674,332

$203,932

$2,835,686

$67,171

$156,332

$364,049

$152,649

$196,843

$627,987

$203,932

$2,780,119

$8,808

$0

$0

$0

$0

$46,345

$0

$55,567

$183,772

$183,772

$183,772

$183,772

$183,772

$183,772

$183,772

$2,389,039

$11,907

$9,741

$25,570

$14,303

$17,456

$28,145

$8,846

$167,846

$171,865

$174,031

$158,202

$169,469

$166,316

$155,627

$174,926

$2,221,193

$8,808

$0

$0

$0

$0

$46,345

$0

$55,567

$171,865

$174,031

$158,202

$169,469

$166,316

$155,627

$174,926

$2,221,193

$180,673

$174,031

$158,202

$169,469

$166,316

$201,972

$174,926

$2,276,760

 

TABLE 14.6. Total First Year Financial Impact on Selected Institutions

 

Birmingham

Centre

Hendrix

Millsaps

Trinity

Periodical subscriptions

Total 1996

659

701

599

686

2739

Total 1997

655

706

608

645

2849

Cancellations

Total 1997

2

2

1

87

42

Overlap of UMI

1

1

0

73

0

Indexes

1

0

1

1

9

Savings

Other periodicals

$24

$120

$0

$256

$20,049

Overlap of UMI

$0

$0

$0

$4,591

$0

Print indexes

$4,650

$0

$604

$3,960

$7,806

Electronic indexes

$4,650

$0

$0

$0

$18,491

Total savings

$4,674

$120

$604

$8,807

$46,346

Subsidized cost of project

$7,612

$5,240

$5,294

$6,919

$13,155

NET SAVINGS

($2,938)

($5,120)

($4,690)

$1,888

$33,191


240

collecting. While it may be acceptable to substitute access to a large file of general purpose articles for undergraduate students, Project MUSE promises better results than the initial project for scholarly journal collections. The final report of this project will include information on the impact of the Project MUSE approach as well as on the original concept.

Third, the impact of on-line full-text content may or may not have an impact on interlibrary loan activity. Table 14.7 summarizes the searching and article delivery statistics for the first six months of the project compared to the total interlibrary borrowing as well as nonreturn photocopies ordered through the campus interlibrary loan offices. The change in interlibrary loan statistics for the first six months of the project compared to the previous year show that in some cases interlibrary borrowing increased and in other cases it decreased. Several variables besides the availability of full-text seem to affect use of interlibrary loan services. For instance, some of the institutions had full-text databases available before the project started. Some made more successful efforts to promote the project services than others. It seems likely that improved access to citations from on-line indexes made users more aware of items that could be borrowed. That effect probably offset an expected decrease in interlibrary loans that the availability of full text makes predictable. Regardless, statistics on this issue yield inconclusive results early in the project.

Fourth, it is curious that secondary journals in many fields are published by commercial firms rather than by professional organizations and that their publications are sold at higher prices. Libraries typically pay more for Haworth publications than they do for ALA publications. Haworth sells largely to libraries responding not to demand for content but for publication outlets. Libraries are willing to pay for the Haworth publications. This fact helps explain why secondary titles cost more than primary ones. Demand may be more for exposure of the contributor than it is for reading of content by subscribers. The econometric analysis included in the project may confirm this unintended hypothesis.

Econometric Analysis

At this point, a meaningful econometric analysis is many months away. A model based on Lerner's definition of monopoly power will be used to examine pricing as journals shift into the electronic sphere. The model calls for regressing the price of individual titles on a variety of independent variables such as number of pages, advertising content, circulation, and publisher type, and for including a dummy variable for whether a journal is available electronically. Data is being collected on over 2,000 of the subscriptions held by Trinity for the calendar years 1995 through 1997. Difficulties with financial data coupled with the time-consuming nature of data gathering have delayed progress on the econometric analysis.

It would be desirable to conduct an analysis on time series data to observe the


241
 

TABLE 14.7. UMI Articles Delivered to Users Compared to Change in Interlibrary Loans from 1995 to 1996

Institution

Enrollment

Search of Base Files

Total Searches

Searches per Student

ABI Documents Delivered

PA Documents Delivered

Total Documents Delivered

Birmingham

1,406

8,734

11,869

8.44

646

842

1,488

Centenary

821

948

1,819

2.22

-

8

8

Centre

968

6,713

15,003

15.50

309

4,134

4,44

Furman

2,673

8,666

18,068

6.76

998

863

1,861

Hendrix

978

1,481

5,117

5.23

301

1,600

1,901

Millsaps

1,278

5,994

22,455

17.57

4,583

4,424

9,007

Morehouse

13,174

3,642

12,305

0.93

     

Rhodes

1,407

2,244

3,691

2.62

252

418

670

Richmond

3,820

33,490

83,477

21.85

6,104

11,900

18,004

Rollins

2,632

5,464

16,471

6.26

1,198

3,298

4,496

Southwestern

1,119

14,763

36,018

32.19

1,752

8,153

9,905

Sewanee

1,257

12,140

40,317

32.07

403

1,534

1,937

Trinity

2,430

30,601

134,693

55.43

16,317

37,838

54,155

Insitution

Total Documents per Student

Nonreturns 1995

Nonreturns 1996

Change in Nonreturns

Total Borrows 1995

Total Borrows 1996

Change in Total Borrowing

Birmingham

1.06

662

668

0.91%

928

380

-59.05%

Centenary

0.01

583

441

-24.36%

911

1,137

24.81%

Centre

4.59

409

351

-14.18%

872

758

-13.07%

Furman

0.70

246

246

0.00%

833

923

10.80%

Hendrix

1.94

146

192

31.51%

251

353

40.64%

Millsaps

7.05

568

352

-38.03%

710

887

24.93%

Morehouse

*

*

*

*

*

*

 

Rhodes

0.48

255

198

-22.35%

601

471

-21.63%

Richmond

4.71

1,034

1,044

0.97%

1,892

1,831

-3.22%

Rollins

1.71

394

365

-7.36%

656

652

-0.61%

Southwestern

8.85

412

308

-25.24%

695

571

-17.84%

Sewanee

1.54

626

434

-30.67%

1,083

1,038

-4.16%

Trinity

22.29

706

711

0.71%

1,172

1,257

7.25%

*Data not available.

consequences in journal price changes as a shift is made to electronic products. This analysis would provide a forecast of how publishers react. Lacking the opportunity at the outset to examine prices over time, a straightforward model applying ordinary least squares (OLS) regression on cross section data, similar to the analyses reported by others, will form the basis of the analysis. Earlier models have


242

typically regressed price on a number of variables to distinguish the statistical relevance of publisher type in determining price. By modifying the earlier models, this analysis seeks to determine whether monopoly power may be eroded in the electronic market. The methodology applied uses two specifications for an ordinary least squares regression model. The first regresses price on the characteristics of a set of journal titles held by the ACS libraries. This data set is considerably larger than those utilized in previous studies. Therefore, we propose to confirm the earlier works that concentrate on economic journals across a larger set of disciplines. This specification includes the variables established earlier: frequency of publication, circulation, pages per year, and several dummy variables to control for whether the journals contain advertising and to control for country of publication. Four dummy variables are included for type of publisher with the residual being commercial. A second specification regressing the difference in price for libraries compared to individuals will be regressed on the same set of variables with an additional dummy added to show whether given journals are available electronically.[26]

The ACS libraries collectively subscribe to approximately 14,000 title. Where they duplicate, an electronic set has been substituted for shared access. We anticipate that at the margin, the impact on publishers would be minimal if ACS canceled subscriptions to the print counterparts of this set. However, the national availability of the electronic versions will precipitate cancellations among many institutions in favor of electronic access. Prices will be adjusted accordingly. Since most publishers will offer some products in print only and others within the described electronic set, we expect the prices of the electronic version will reflect an erosion of monopoly power. Thus the cross section data will capture the effect of electronic availability on monopoly power.

Since the data set is comprised of several thousand periodical titles representing general and more popular items, several concerns experienced by other investigators will be mitigated. The only study found in the literature so far that examines publishers from the standpoint of the exercise of monopoly power investigated price discrimination.[27] This project intends to extend that analysis in two ways. First, we will use a much broader database, since most of the previous work was completed on limited data sets of less than 100 titles narrowly focused in a single academic discipline. Second, we will extend the analysis by assuming the existence of price discrimination given the difference in price to individuals versus libraries for most scholarly journals. With controls in the model for previous discoveries regarding price discrimination, we will attempt to test the null hypothesis that monopoly power will not decrease in the electronic domain.

In the data set available, we were unable to distinguish the specific price of each journal for the electronic replacement, because UMI priced the entire set for a flat fee. This pricing scheme may reflect an attempt by publishers to capture revenue lost to interlibrary lending. Alternatively, it may reflect publisher expectations that


243

article demand will increase when user nondollar costs decrease. Thus, monopoly power will be reflected back on to the subscription price of print versions. As a result we will use the price of print copies as a proxy for the specific electronic price of each title.

An alternative result could emerge. In monopolistic competition, anything that differentiates a product may increase its monopoly power. For example, firms selling products expend tremendous amounts of money on advertising to create the impression that their product is qualitatively distinguishable from others. Analogous electronic availability of specific titles may create an impression of superior quality.

The general model of the first specification is written:

figure

where y equals the library price (LPRICE) for journal j = 1, 2, 3, ... n. The definitions of independent variables appear in Table 14.8 along with the expected signs on and calculations of the parameters b1 through b17 to be estimated by traditional single regression techniques.

The general model of the second specification is written:

figure

where y equals two different forms of monopoly power (MPOWER1; MPOWER2) defined as measure i = 1 and 2 for journal j = 1, 2, 3, ... n. Again, the definitions of independent variables appear in Table 14.8 along with the expected signs on and calculations of the parameters b1 through b17 to be estimated by traditional single regression techniques.

The variables listed in Table 14.8 are suggested at this point based on previous studies that have demonstrated that they are appropriate. Testing with the regression model is required in order to determine those variables ultimately useful to this study. Additional variables will be introduced should experiments suggest them. A very brief rationale for the expected sign and the importance of the variables is in order. If the difference in price between what publishers charge libraries versus individuals represents price discrimination, then a variable for the individual price (IPRICE) will be a significant predictor of price to institutions (LPRICE). Higher individual prices will shift users toward the library, thus raising demand for library subscriptions, which will pull institutional prices higher. The sign on this variable is expected to be positive.

One group of variables deals with the issue of price discrimination based on


244
 

TABLE 14.8 List of Variables

Dependent variable

LIPRICE

The price for library subscriptions

MPOWER1

Monopoly power as represented by LPRIC - IPRICE

MPOWER2

Monopoly power as represented by the index: (LPRICE - IPRICE)/LPRICE

Independent variables

IPRICE

Price for individuals (+, number)

GBRITAIN

1 if the journal is published in Great Britain, 0 otherwise (-, dummy variable)

EUROPE

1 if the journal is published in Europe, 0 otherwise (-, dummy variable)

OTHER

1 if the journal is published outside United States, Canada, Europe or Great Britain, 0 otherwise (-, dummy variable)

RISK

Standard deviation of the annual free market exchange rate between the currency of the home country of a foreign publisher to the U.S. dollar

ASSOC

1 if the journal is published by an association, 0 otherwise (-, dummy variable)

GOVERN

1 if the journal is published by a government agency, 0 otherwise (-, dummy variable)

FOUNDTN

1 if the journal is published by a foundation, 0 otherwise (-, dummy variable)

UNIVPR

1 if the journal is published by a university press, 0 otherwise (-, dummy variable)

FREQ

The number of issues per year (+, number)

PAGES

Number of pages printed per year (+, number)

PEERREV

1 if the article submissions are peer reviewed, 0 otherwise (+, dummy variable)

SUBMISSFEE

1 if the contributor is required to pay a fee with submission (-, dummy variable)

CCCREG

1 if the journal is registered with the CCC, 0 otherwise (+, dummy variable)

ILLUS

1 if the journal contains graphics or illustrations, 0 otherwise (+, dummy variable)

CIRC

The reported number of subscriptions to the journal (-, number)

ADV

1 if there is commercial advertising in the journal, 0 otherwise (-, dummy variable)

AGE

Current year minus the date the journal first published (-, number)

QUALITY

Sum of the Institute for Scientific Information citation measures (+, number)

HUMAN

1 if the journal is in the humanties, 0 otherwise (-, dummy variable)

SOCSCI

1 if the journal is in the social sciences, 0 otherwise (- dummy variable)

ELECTRONIC

1 if the journal is available in electronic form, 0 otherwise (+, dummy variable)


245

the monopoly power that can be exercised by foreign publishers. Publishers in Great Britain (GBRITAIN), western Europe (EUROPE), and other countries outside the United States (OTHER) may have enough market power to influence price. Therefore these variables will carry a positive sign if a sizable market influence is exerted. Some of these publishers will also be concerned with currency exchange risks (RISK), which they will adjust for in prices. However, since they offer discounts through vendors for libraries who prepay subscriptions, this variable will carry a negative sign if the price to individuals captures most of the financial burden of risk adjustment.

It is expected that commercial publishers discriminate by price more than their nonprofit counterparts do. Therefore, in comparison to the commercial residual, associations (ASSOC), government agencies (GOVERN), university presses (UNIVPR) and foundations (FOUNDTN) will capture generally lower prices of these nonprofit publishers. Negative signs are expected on these.

All the publishers will experience production costs, which can be exposed through variables that control for frequency (FREQ), total pages printed per year (PAGES), peer review (PEERREV), submission fees (SUBMISSFEE), processing/communication expenses and copyright clearance registration expenses (CCCREG), and the presence of graphics, maps, and illustrations (ILLUS), all of which will positively affect price to the extent they are passed along through price discrimination. Circulation (CIRO) will capture the effects of economies of scale, which those publications that are distributed in larger quantities will experience. Thus this variable is expected to be negative. Similarly, the inclusion of advertising (ADV) will provide additional revenue to that of sales, so this variable is expected to be negative since journals that include ads will have less incentive to extract revenue through sales. New entries into the publishing arena are expected to experience costs for advertising to increase awareness of their products, which will be partially passed on to consumers. Therefore, age (AGE), which is the difference between the current date and the date the journal started, will be a negative predictor of price and monopoly power.

Previous studies have developed measures of quality based on rankings of publications compared to each other within a given discipline. Most of these comparisons work from information available from the Institute for Scientific Information. Data acquired from this source that shows the impact factor, immediacy index, half-life, total cites, and cites per year will be summarized in one variable to capture quality (QUALITY) of journals, which is expected to be positive with regard to both price and monopoly power.

The prices of journals across disciplines may be driven by different factors. In general, prices are higher in the sciences and technical areas and lower in the humanities. This discrepancy is understandable when we consider the market for science versus humanities. As stated earlier, there is essentially no market for scholarly publications in the humanities outside of academe, whereas scientific


246

publications are used heavily in corporate research by pharmaceutical firms and other industries highly dependent on research. As a result, two additional dummies are included in the model to segment the specification along discipline lines. HUMAN and SOCSCI will control for differences in price among the humanities and social sciences as compared to the residual category of science. These variables are expected to be negative and strong predictors of price.

Finally, a dummy variable is included to determine whether availability of each journal electronically (ELECTRONIC) has a positive impact on ability to discriminate by price. Since we have predicted that monopoly power will erode in the electronic arena, ELECTRONIC should be statistically significant and a negative predictor of monopoly power. However, to the extent that availability of a journal electronically distinguishes it from print counterparts, there is some expectation that this variable could be positive. This would show additional price discrimination by publishers who are able to capture lost revenue in the electronic environment.

The data set will be assembled by enhancing the data on subscriptions gathered during the planning project. Most of the additional data set elements including prices will come from examination of the journals and invoices received by the libraries. Impact and related factors will be acquired from the Institute for Scientific Information. The number of subscriptions supplied in print by two major journal vendors, FAXON and EBSCO, will be used as a proxy for circulation. An alternative measure of circulation will be compiled from a serials bibliography. The rest of the variables were obtained by examination of the print subscriptions retained by the libraries or from a serials bibliography.

Conclusion

There may be other ways to attack the problem of price inflation of scholarly periodicals. Some hope arises from the production cost differences between print and electronic periodicals. The marginal cost of each added print copy diminishes steadily from the second to the n th copy, whereas for electronic publications, the marginal cost of the second and subsequent copies is approximately zero. Although distribution is not quite zero for each additional copy, since computer resources can be strained by volume of access, the marginal cost is so close to zero that technical solutions to the problem of unauthorized redistribution for free of pirated copies might provide an incentive for publishers in the electronic domain to distribute equitably the cost of the first copy across all consumers. If the total cost of production of the electronic publications is lower than it would be for printed publication, some publishers may share the savings with consumers. However, there is no certainty that they will, because profit maximizers will continue to be profit maximizers. Therefore, it is appropriate to look for a decoupled solution lying in the hands of consumers.

In the meantime, the outcomes of this research project will include a test of the


247

benefits of consortial access versus ownership. In addition, earlier work on price discrimination will be extended with this cross-discipline study to determine whether electronic telecommunications offers hope of relief from monopoly power of publishers.

The author wishes to acknowledge with thanks the financial support of The Andrew W. Mellon Foundation and the participation of several colleagues from libraries of the Associated Colleges of the South. Thanks also to my associate Tanya Pinedo for data gathering and analysis. All errors remain the responsibility of the author.


250

Chapter 15—
The Use of Electronic Scholarly Journals
Models of Analysis and Data Drawn from the Project MUSE Experience at Johns Hopkins University

James G. Neal

Project MUSE is a collaborative initiative between the Johns Hopkins University Press and the Milton S. Eisenhower Library at Johns Hopkins University to provide network-based access to scholarly journals including titles in the humanities, social sciences, and mathematics. Launched with electronic versions of 40 titles still published in print, Project MUSE coverage has now been expanded to include electronic-only publications. Funded initially by grants from The Mellon Foundation and the National Endowment for the Humanities, Project MUSE seeks to create a successful model for electronic scholarly publishing characterized by affordability and wide availability. It has been designed to take advantage of new technical capabilities in the creation and storage of electronic documents. It has been developed to provide a range of subscription options for individual libraries and consortia. It is based on a very liberal use and reuse approach that encourages any noncommercial activity within the bounds of the subscribing organization.

Project MUSE has been produced from the outset for usability, with a focus on user-centered features. This focus has evolved as a participative and interactive process, soliciting input and feedback from users and integrating user guidance components into the system. An on-line survey is available to all users, and libraries are providing information about the local implementation and the results of campus and community focus group discussions on Project MUSE. As the number of subscribing libraries expands and the activity grows, a valuable database of user experiences, attitudes, and behaviors will accumulate. A new feature will be the ability to track and analyze individual search sessions and to observe closely user activities. This feature will monitor the impact of new capabilities and the efficiency of searching practices.

Six models of use analysis are discussed in this paper that cover both the macro, or library-level, activity and the micro, or individual user-level, activity:


251

1. subscribing organizations-which libraries are subscribing to Project MUSE and how do they compare with the base of print journal customers?

2. subscriber behaviors-how do libraries respond as access to electronic journals is introduced and expanded, and in particular, how are acquisitions like Project MUSE accommodated in service and collection development programs and budgets?

3. user demography-what are the characteristics of the individual user population in such areas as status, background/experience, motivation, attitudes, and expectations?

4. user behaviors-how do individuals respond to the availability of scholarly materials in electronic format as they explore the capabilities of the system and execute requests for information?

5. user satisfaction-what objectives do users bring to network-based access to scholarly information, and how do users evaluate system design and performance and the quality of search results?

6. user impact-how are user research and information-seeking activities being shaped by access to full-text journal databases like Project MUSE?

One of the objectives of Project MUSE is to achieve full cost recovery status by the completion of the grant funding period in 1998. Therefore, it is important to monitor the growth in the base of subscribing libraries and to evaluate the impact on the print journal business of the Hopkins Press. An analysis of those libraries subscribing to the full Project MUSE database as of June 1997 (approximately 400 libraries) demonstrates a very significant expansion in the college, community college, and now public library settings with very low or no history of subscriptions to the print journals (see Table 15.1). The result is a noteworthy expansion in access to Hopkins Press titles, with 70% of the subscribing libraries currently purchasing less than 50% of the titles in print and over one-fourth acquiring no print journals from the Hopkins Press.

One explanation for these patterns of subscription activity is the purchase arrangement for Project MUSE. Over 90% of the libraries are subscribing to the full Project MUSE database of 43 titles. And due to very favorable group purchase rates, nearly 80% of Project MUSE subscribers are part of consortial contracts. The cooperative approach to providing access to electronic databases by libraries in a state or region is widely documented, and the Project MUSE experience further evidences this phenomenon.

Another objective of Project MUSE is to enable libraries to understand the use of collections and thus to make informed acquisitions and retention decisions. The impact on collection development behaviors will be critical, as libraries do indicate intentions to cancel print duplicates of MUSE titles and to monitor carefully the information provided on individual electronic title and article activity. Use information is beginning to flow to subscribing libraries, but there is no evidence yet of journal cancellations for Hopkins Press titles.


252
 

TABLE 15.1. Project MUSE Subscribing Libraries and Customer Print Subscriptions as of June 1997

Subscribing Libraries

ARL universities

  65 libraries

Other universities

128 libraries

Liberal arts colleges

101 libraries

Community colleges

  53 libraries

Public libraries

    3 library systems

Customer Print Subscriptions

No. of Print Subscriptions

Percentage of Libraries

0

27.8

1-4

7.6

5-9

6.8

10-14

15.8

15-19

13.0

20-24

11.8

25-29

9.0

30-34

6.2

35-40

2.0

An important area of analysis is user demography, that is, the characteristics of the individuals searching the Project MUSE database. An on-line user survey and focus group discussions are beginning to provide some insights:

• The status of the user, that is, undergraduate student, graduate student, faculty, staff, community member, or library employee. As Project MUSE is introduced, library staff are typically the heaviest users, followed by a growth in student use as campus awareness and understanding expands.

• Type of institution, that is, research university, comprehensive university, liberal arts college, community college, or public library setting. As Project MUSE subscriptions have increased and access has extended into new campus settings, heavier use has initially been in the research universities and liberal arts colleges where there is either traditional awareness of Project MUSE titles or organized and successful programs to promote availability.

• The computer experience of users, that is, familiarity with searching fulltext electronic databases through a Web interface. Project MUSE users tend to be knowledgeable Internet searchers who have significant comfort with Web browsers, graphical presentations of information, and construction of searches in textual files.


253

• The location of use, that is, in-library, on-campus in faculty office and student residence hall, or off-campus. Preliminary data indicates that the searching of Project MUSE is taking place predominantly on library-based equipment. This finding can be explained by the inadequate network infrastructure that persists at many campuses or by the general lack of awareness of Project MUSE until a user is informed by library staff about its availability during a reference exchange.

• The browsers used to search the Project MUSE database. An analysis of searches over an 18-month period confirms that Netscape browsers are used now in over 98% of the database activity, with a declining percentage of Lynx and other nongraphical options.

Project MUSE enables searching by author, title, or keyword, in the table of contents or the full text of the journals, and across all the journals or just selected titles. All articles are indexed with Library of Congress subject headings. Hypertext links in tables of contents, articles, citations, endnotes, author bibliographies, and illustrations allow efficient navigation of the database. User searching behavior is an important area for investigation, and some preliminary trends can be identified:

• The predominant search strategy is by keyword, with author and title inquiries occurring much less frequently. This strategy can be partially explained by the heavy undergraduate student use of the database and the rich results enabled by keyword strategies.

• Use of the database is equally distributed across the primary content elements: tables of contents, article abstracts, images linked to text, and the articles. An issue for future analysis is the movement of users among these files.

• Given the substantial investment in the creation of LC subject headings and the maintenance of a structured thesaurus to enhance access to articles, their value to search results and user success is being monitored carefully.

• With the expansion of both internal and external hypertext links, the power of the Web searching environment is being observed, the user productivity gains are being monitored, and the willingness to navigate in an electronic journal database is being tested.

• Users are directed to the Project MUSE database through several channels. Libraries are providing links from the bibliographic record for titles in the on-line catalog. Library Web sites highlight Project MUSE or collections of electronic journals. Subject pages list the Project MUSE titles that cluster in a particular discipline.

• Users are made aware of Project MUSE through a variety of promotional and educational strategies. Brochures and point-of-use information are


254

being prepared. In some cases, campus media have included descriptive articles. Library instructional efforts have focused on Project MUSE and its structure and searching capabilities.

• Printing and downloading to disk are important services linked to the effective use of Project MUSE, given the general unwillingness of users to read articles on-line. Libraries have an interest in maximizing turnover on limited computer equipment and are focused on implementing cost-recovery printing programs.

• Project MUSE is increasingly enabling users to communicate with publishers, journal editors, and the authors of articles through e-mail links embedded in the database. Correspondence has been at a very low level but is projected to expand as graduate student and faculty use increases and as familiarity and comfort with this feature grows.

With over 400 subscribing libraries and over three million potential users of Project MUSE in the communities served, it is possible to document global use trends and the changing intensity of searching activity (see Table 15.2). The progression of use over time as a library introduces access to Project MUSE is being monitored. Early analysis suggests that the first two quarters of availability produce low levels of use, while third quarter use expands significantly.

Data is also being collected on the number of requests for individual journal titles. During the 12-month period ending August 1, 1997, the total number of requests to the MUSE database was just over nine million, for an average of just under 25,000 hits per day. For data on the average number of requests per month for individual journal titles, see Table 15.3.

In addition, data is now being collected on the number of requests for individual journal articles. During the 12-month period ending August 1, 1997, 100 articles represented 16.5% of the total articles requested. The article receiving the largest number of requests was hit 3,944 times. Two journals, Postmodern Culture (33 articles) and Configurations (22 articles), included 55% of the most frequently requested articles.

User satisfaction with the quality and effectiveness of Project MUSE will be the central factor in its long-term success. Interactions with users seek to understand expectations, response to system design and performance, and satisfaction with results. The degree to which individuals and libraries are taking advantage of expansive fair use capabilities should also be gauged.

Project MUSE has focused on various technical considerations to maximize the dependability and efficiency of user searching. Detailed information on platforms and browsers is collected, for example, and access denials and other server responses that might indicate errors are automatically logged and routed for staff investigation.

Expectations for technology are generally consistent: more content, expanded access, greater convenience, new capabilities, cost reduction, and enhanced pro-


255
 

TABLE 15.2. Project MUSE Global Use Trends

 

4th Quarter 1996

1 st Quarter 1997

Total requests

1,833,692

2,618,069 (+42.8%)

Requests per day

19,922

29,090 (+46.0%)

Total subscribers

199

322 (+61.8%)

Requests per subscriber

9,214

8,131 (-12.0%)

ductivity. It will be important to monitor the impact of Project MUSE in the subscribing communities and to assess whether it is delivering a positive and effective experience for users.

It is also important to maximize the core advantages of using information in digital formats:

• accessibility, that is, delivery to locations wherever users can obtain network connections

• searchability, that is, the range of strategies that can be used to draw relevant information out of the database

• currency, that is, the ability to make publications available much earlier than is possible for print versions

• researchability, that is, the posing of questions in the digital environment that could not even be conceived with print materials

• interdisciplinarity, that is, the ability to conduct inquiries across publications in a range of diverse disciplines and discover new-but-related information

• multimedia, that is, access to text, sound, images, video in an integrated presentation

• linkability, that is, the hypertext connections that can be established among diverse and remote information sources

• interactivity, that is, the enhancement of user control and influence over the flow of information and the communication that can be integrated into the searching activity

Project MUSE will be evaluated against these quantitative and qualitative models. Its success will ultimately be determined by its support for the electronic scholarly publishing objectives outlined in the work of the American Association of Universities and the Association of Research Libraries:

• foster a competitive market for scholarly publishing by providing realistic alternatives to prevailing commercial publishing options

• develop policies for intellectual property management emphasizing broad and easy distribution and reuse of material


256
 

TABLE 15.3. Average Requests for Individual Journal Titles per Month as of August 1997

Journal Title

No. of Issues On-line

Average No. of Requests per Month

American Imago

10

12,328

American Jewish History

5

3,718

American Journal of Mathematics

6

4,596

American Journal of Philology

6

4,126

American Quarterly

6

8,162

Arethusa

5

6,195

Bulletin of the History of Medicine

6

10,392

Callaloo

8

23,649

Configurations

13

36,917

Diacritics

5

7,083

ELH

15

25,796

Eighteenth-Century Life

4

2,113

Eighteenth-Century Studies

8

7,605

Human Rights Quarterly

11

10,718

Imagine

2

1,656

Journal of Democracy

7

22,032

Journal of Early Christian Studies

6

6,454

Journal of Modern Greek Studies

3

5,316

Journal of the History of Ideas

8

7,550

Kennedy Institute of Ethics Journal

6

3,803

Late Imperial China

2

1,742

Literature and Medicine

5

6,667

MLN

14

13,139

Milton Quarterly

1

253

Modern Fiction Studies

11

20,381

Modern Fudaism

5

3,155

Modernism / Modernity

8

14,488

New Literary History

7

7,253

Performing Arts Journal

5

3,363

Philosophy and Literature

3

12,037

Philosophy, Psychiatry, and Psychology

6

6,601

Postmodern Culture

21

74,564

Review of Higher Education

4

5,621

Reviews of American History

10

18,509

SAIS Review

5

6,390

The Henry James Review

8

10,610

The Lion and the Unicorn

6

10,764

The Yale Journal of Criticism

3

6,108

Theatre Journal

6

9,322

Theatre Topics

3

3,199

Theory and Events

3

39

Wide Angle

6

4,324

World Politics

6

6,454


257

• encourage innovative applications of information technology to enrich and expand the means for distributing research and scholarship

• ensure that new channels of scholarly communication sustain quality requirements and contribute to promotion and tenure processes

• enable the permanent archiving of research publications and scholarly communication in digital formats


258

Chapter 16—
A New Consortial Model for Building Digital Libraries

Raymond K. Neff

Libraries in America's research universities are being systematically depopulated of current subscriptions to scholarly journals. Annual increases in subscription costs are consistently outpacing the growth in library budgets. This problem has become chronic for academic libraries that collect in the fields of science, engineering, and medicine, and by now the problem is well recognized (Cummings et al. 1992). At Case Western Reserve University, we have built a novel digital library distribution system and focused on our collections in the chemical sciences to investigate a new approach to solving a significant portion of this problem. By collaborating with another research library that has a strong chemical sciences collection, we have developed a methodology to control costs of scholarly journals and have planted the seeds of a new consortial model for building digital libraries. This paper summaries our progress to date and indicates areas in which we are continuing our research and development.

For research libraries in academia, providing sufficient scholarly information resources in the chemical sciences represents a large budgetary item. For our purposes, the task of providing high-quality library services to scholars in the chemical sciences is similar to providing library services in other sciences, engineering, and medicine; if we solve the problem in the limited domain of the chemical sciences, we can reasonably extrapolate our results to these other fields. Thus, research libraries whose mission it is to provide a high level of coverage for scholarly publications in the chemical sciences are the focus of this project, although we believe that the principles and practices employed in this project are extensible to the serial collections of other disciplines.

A consortium depends on having its members operating with common missions, visions, strategies, and implementations. We adopted the tactics of developing a consortial model by having two neighboring libraries collaborate in the initial project. The University of Akron (UA) and Case Western Reserve University


259

(CWRU) both have academic programs in the chemical sciences that are nationally ranked, and the two universities are fewer than 30 miles apart. It was no surprise to find that both universities have library collections in the chemical sciences that are of high quality and nearly exhaustive in their coverage of scholarly journals. To quantify the correlation between these two collections, we counted the number of journals that both collected and found the common set to be 76% in number and 92% in cost. The implications of the overlap in collecting patterns is plain; if both libraries collected only one copy of each journal, with the exception of the most used journals, approximately half of the cost of these subscriptions could be saved. For these two libraries, the cost savings is potentially $400,000 per year. This cost savings seemed like a goal worth pursuing, but to do so would require building a new type of information distribution system.

The reason scholarly libraries collect duplicative journals is that students and faculty want to be able to use these materials by going to the library and looking up a particular volume or by browsing the current issues of journals in their field. Eliminating a complete set of the journals at all but one of our consortial libraries would deprive local users of this walk-up-and-read service. We asked ourselves if it would be possible to construct a virtual version of the paper-based journal collection that would be simultaneously present at each consortium member institution, allowing any scholar to consult the collection at will even though only one copy of the paper journal was on the shelf. The approach we adopted was to build a digital delivery system that would provide to a scholar on the campus of a consortial member institution, on a demand basis, either a soft or hard copy of any article for which a subscription to the journal was held by a consortial member library. Thus, according to this vision, the use of information technology would make it possible to collect one set of journals among the consortium members and to have them simultaneously available at all institutions. Although the cost of building the new digital distribution system is substantial, it was considered as an experiment worth undertaking. The generous support of The Andrew W. Mellon Foundation is being used to cover approximately one-half of the costs for the construction and operation of the digital distribution system, with Case Western Reserve University covering the remainder. The University of Akron Library has contributed its expertise and use of its chemical sciences collections to the project.

It also seemed necessary to us to want to invite the cooperation of journal publishers in a project of this kind. To make a digital delivery system practical would require having the rights to store the intellectual property in a computer system, and when we started this project, no consortium member had such rights. Further, we needed both the ongoing publications and the backfiles so that complete runs of each serial could be constructed in digital form. The publishers could work out agreements with the consortium to provide their scholarly publications for inclusion in a digital storage system that would be connected to our network-based transmission system, and thus, their cooperation would become essential. The chemical sciences are disciplines in which previous work with electronic libraries


260

had been started. The TULIP Project of Elsevier Science (Borghuis et al. 1996) and the CORE Project undertaken by a consortium of Cornell University, the American Chemical Society, Bellcore, Chemical Abstracts, and OCLC were known to us, and we certainly wanted to benefit from their experiences. Publications of the American Chemical Society, Elsevier Science, Springer-Verlag, Academic Press, John Wiley & Sons, and many other publishers were central to our proposed project because of the importance of their journal titles to the chemical sciences disciplines.

We understood from the beginning of this effort that we would want to monitor the performance of the digital delivery system under realistic usage scenarios. The implementation of our delivery system has built into it extensive data collection facilities for monitoring what users actually do. The system is also sensitive to concerns of privacy in that it collects no items of performance information that may be used to identify unambiguously any particular user.

Given the existence of extensive campus networks at both CWRU and UA and substantial internetworking among the academic institutions in northeastern Ohio, there was sufficient infrastructure already in place to allow the construction and operation of an intra- and intercampus digital delivery system. Such a digital delivery system has now been built and made operational. The essential aspects of the digital delivery system will now be described.

A Digital Delivery System

The roots of the electronic library are found in landmark papers by Bush (1945) and Kemeny (1962). Most interestingly, Kemeny foreshadowed what prospective scholarly users of our digital library told us was their essential requirement, which was that they be able to see each page of a scholarly article preserved in its graphical integrity. That is, the electronic image of each page layout needed to look like it did when originally published on paper. The system we have developed uses the ACROBATR page description language to accomplish this objective.

Because finding aids and indices for specialized publications are too limiting, users also have the requirement that the article's text be searchable with limited or unlimited discipline-specific thesauri. Our system complements the page images with an optical character recognition (OCR) scanning of the complete text of each article. In this way, the user may enter words and phrases the presence of which in an article constitutes a "hit" for the scholar.

One of the most critical design goals for our project was the development of a scanning subsystem that would be easily reproducible and cost efficient to set up and operate in each consortium member library. Not only did the equipment need to be readily available, but it had to be adaptable to a variety of work flow and staff work patterns in many different libraries. Our initial design has been successfully tailored to the needs of both the CWRU libraries and the Library at the Univer-


261

sity of Akron. Our approach to the sharing of paper-based collections is to use a scanning device to copy the pages of the original into a digital image format that may be readily transmitted across our existing telecommunications infrastructure. In addition, the digital version of the paper is stored for subsequent retrieval. Thus, repeated viewing of the same work would necessitate only a one-time transformation of format. This procedure is an advantage in achieving faster response times for scholars, and it promotes the development and use of quality control methods. The scanning equipment we have used in this project and its operation are described in Appendix E. The principal advantage of this scanner is that bound serials may be scanned without damaging the volume and without compromising the resulting page images; in fact, the original journal collection remains intact and accessible to scholars throughout the project. This device is also sufficiently fast that a trained operator, including students, may scan over 800 pages per average workday. For a student worker making $7.00 per hour, the personcost of scanning is under $0.07 per page; the cost of conversion to searchable text adds $0.01 per page. Appendix E also gives more details regarding the scanning processes and work flow. Appendix F gives a technical justification for a digitization standard for the consortium. Thus, each consortium member is expected to make a reasonable investment in equipment, training, and personnel.

The target equipment for viewing an electronic journal was taken to be a common PC-compatible computer workstation, hereafter referred to as a client. This client is also the user platform for the on-line library catalog systems found on our campuses as well as for the growing collections of CD-ROM-based information products. Appendix B gives the specification of the workstation standards for the project. The implications for use of readily available equipment is that the client platform for our project would also work outside of the library-in fact, wherever a user wanted to work. Therefore, by selecting the platform we did, we extended the project to encompass a full campuswide delivery system. Because our consortium involves multiple campuses (two at the outset), the delivery system is general purpose in its availability as an access facility.

Just as we needed a place to store paper-based journals within the classical research library, we needed to specify a place to store the digital copies. In technical parlance, this storage facility is called a server. Appendixes B and C give some details regarding the server hardware and software configurations used in this project.

Appendix C also gives some information regarding the campuswide networks on both our campuses and the statewide network that connects them. It is important to note that any connected client workstation that follows our minimum standards will be able to use the digital delivery system being constructed.

Because the key to minimizing the operating costs within a consortium is interoperability and standardization, we have adopted a series of data and equipment standards for this project; they are given in Appendixes A and B.


262

Rights Management System

One of the most significant problems in placing intellectual property in a networked environment is that with a few clicks of a mouse thousands of copies of the original work can be distributed at virtually zero marginal cost, and as a result, the owner may be deprived of expected royalty revenue. Since we recognized this problem some years ago and realized that solutions outside of the network itself were unlikely to be either permanent or satisfactory to all parties (e.g., author, owner, publisher, distributor, user), we embarked on the creation of a software subsystem now known as Rights Manager(tm). With our Rights Manager system, we can now control the distribution of digitally formatted intellectual property in a networked environment subject to each stakeholder receiving proper due.

In this project, we use the Rights Manager system with our client server-based content delivery system to manage and control intellectual property distribution for digitally formatted content (e.g., text, images, audio, video, and animations).

Rights Manager is a working system that encodes license agreement information for intellectual property at a server and distributes the intellectual property to authorized users over the Internet or a campuswide intranet along with a Rights Manager-compliant browser. The Rights Manager handles a variety of license agreement types, including public domain, site licensing, controlled simultaneous accesses, and pay-per-use. Rights Manager also manages the functionality available to a client according to the terms of the license agreement; this is accomplished by use of a special browser that enforces the license's terms and that permits or denies client actions such as save, print, display, copy, excerpt, and so on. Access to a particular item of intellectual property, with or without additional functionality, may be made available to the client at no charge, with an overhead charge, or at a royalty plus overhead charge. Rights Manager has been designed to accommodate sufficient flexibility in capturing wide degrees of arbitrariness in charging rules and policies.

The Rights Manager is intended for use by individuals and organizations who function as purveyors of information (publishers, on-line service providers, campus libraries, etc.). The system is capable of managing a wide variety of agreements from an unlimited number of content providers. Rights Manager also permits customization of licensing terms so that individual users or user classes may be defined and given unique access privileges to restricted sets of materials. A relatively common example of this customization for CWRU would be an agreement to provide (1) view-only capabilities to an electronic journal accessed by an anonymous user located in the library, (2) display/print/copy access to all on-campus students enrolled in a course for which a digital textbook has been adopted, and (3) full access to faculty for both student and instructor versions of digital editions of supplementary textbook materials.

Fundamental to the implementation of Rights Manager are the creation and maintenance of distribution rights, permissions, and license agreement databases.


263

These databases express the terms and conditions under which the content purveyor distributes materials to its end users. Relevant features of Rights Manager include:

• a high degree of granularity, which may be below the level of a paragraph, for publisher-defined content

• central or distributed management of rights, permissions, and licensing databases

• multiple agreement types (e.g., site licensing, limited site licensing, and payper-use)

• content packaging where rights and permission data are combined with digital format content elements for managed presentation by Web browser plug-in modules or helper applications

Rights Manager maintains a comprehensive set of distribution rights, permissions, and charging information. The premise of Rights Manager is that each publication may be viewed as a compound document. A publication under this definition consists of one or more content elements and media types; each element may be individually managed, as may be required, for instance, in an anthology.

Individual content elements may be defined as broadly or narrowly as required (i.e., the granularity of the elements is defined by the publisher and may go below the level of a paragraph of content for text); however, for overall efficiency, each content element should represent a significant and measurable unit of material. Figures, tables, illustrations, and text sections may reasonably be defined as individual content elements and be treated uniquely according to each license agreement.

To manage the distribution of complete publications or individual content elements, two additional licensing metaphors are implemented. The first of these, a Collection Agreement, is used to specify an agreement between a purveyor and its supplier (e.g., a primary or secondary publisher); this agreement takes the form of a list of publications distributed by the purveyor and the terms and conditions under which these publications may be issued to end users (one or more Collection Agreements may be defined and simultaneously managed between the purveyor and a customer).

The second abstraction, a Master Agreement, is used to broadly define the rules and conditions that apply to all Collection Agreements between the purveyor and its content supplier. Only one Master Agreement may be defined between the supplier and the institutional customer. In practice, Rights Manager assumes that the purveyor will enter into licensing agreements with its suppliers for the delivery of digitally formatted content. At the time the first license agreement is executed between a supplier and a purveyor, one or more entries are made into the purveyor's Rights Manager databases to define the Master and Collection Agreements. Optionally, Publication and/or Content-Element usage rules may also be


264

defined. Licensed materials may be distributed from the purveyor's site (or perhaps by an authorized service provider); both the content and associated licensing rules are transferred by the supplier to the purveyor for distributed license and content management.

Depending on the selected delivery option, individual end users (e.g., faculty members, students, or library patrons) may access either a remote server or a local institutional repository to search and request delivery of licensed publications. Depending on the agreement(s) between the owner and the purveyor, individual users are assigned access rights and permissions that may be based on individual user IDs, network addresses, or both.

Network or Internet Protocol addresses are used to limit distribution by physical location (e.g., to users accessing the materials from a library, a computer lab, or a local workstation). User identification may be exploited to create limited sitelicensing models or individual user agreements (e.g., distributing publications only to students enrolled in Chemistry 432 or, perhaps, to a specific faculty member).

At each of the four levels (Master Agreement, Collection Agreement, Publication, and Content-Element), access rules and usage privileges may be defined. In general, the access and usage permissions rules are broadly defined at the Master and Collection Agreement level and are refined or restricted at the Publication and Content-Element levels. For example, a Master or Collection Agreement rule could be defined to specify that by default all licensed text elements may be printed at a some fixed cost, say 10¢ per page; however, high value or core text sections may be individually identified using Publication or Content-Element override rules and assessed higher charges, say 20¢ per page.

When a request for delivery of materials is received, the content rules are evaluated in a bottom-up manner (e.g., content element rules are evaluated before publication rules, which are, in turn, evaluated before license agreement rules, etc.). Access and usage privileges are resolved when the system first recognizes a match between the requester's user ID (or user category) and/or the network address and the permission rules governing the content. Access to the content is only granted when an applicable set of rules specifically granting access permission to the end user is found; in the case where two or more rules permit access, the rules most favorable to the end user are selected. Under this approach, site licenses, limited site licenses, individual licensing, and pay-per-use may be simultaneously specified and managed.

The following use of the Rights Manager rules databases is recommended as an initial guideline for Rights Manager implementation:

1. Use Master Agreement rules to define the publishing holding company or imprint, the agreement's term (beginning and ending dates), and the general "fair use" guidelines negotiated between a supplier and the purveyor. Because of the current controversy over the definition of "fair use," Rights


265

Manager does not rely on preprogrammed definitions; rather, the supplier and purveyor may negotiate this definition and create rules as needed. This approach permits fair use definitions to be redefined in response to new standards or regulatory definitions without requiring modifications to Rights Manager itself.

2. Use Collection Agreement rules to define the term (beginning and ending dates) for specific licensing agreements between the supplier and the purveyor. General access and permission rules by user ID, user category, network address, and media type would be assigned at this level.

3. Use Publication rules to impose any user ID or user category-specific rules (e.g., permissions for students enrolled in a course for which this publication has been selected as the adopted textbook) or to impose exceptions based on the publication's value.

4. Use Content-Element rules to grant specific end users or user categories access to materials (e.g., define content elements that are supplementary teaching aids for the instructor) or to impose exceptions based on media type or the value of content elements.

The Rights Manager system does not mandate that licensing agreements exploit user IDs; however, maximum content protection and flexibility in license agreement specification is achieved when this feature is used. Given that many institutions or consortium members may not have implemented a robust user authentication system, alternative approaches to uniquely identifying individual users must be considered. While there are a variety of ways to address this issue, it is suggested that personal identification numbers (PINs), assigned by the supplier and distributed by trusted institutional agents at the purveyor's site (e.g., instructors, librarians, bookstore employees, or departmental assistants) or embedded within the content, be used as the basis for establishing user IDs and passwords. Using this approach, valid users may enter into registration dialogues to automatically assign user IDs and passwords in response to a valid PIN "challenge."

While Rights Manager is designed to address all types of multimedia rights, permissions, and licensing issues, the current implementation has focused on distribution of traditional print publication media (text and images). Extensions to Rights Manager to address the distribution of full multimedia, including streaming audio and video, are being developed at CWRU.

The key to understanding our approach to intellectual property management is that we expect that each scholarly work will be disseminated according to a comprehensive contractual agreement. Publishers may use master agreements to cover a set of titles. Further, we do not expect that there will be only one interpretation of concepts such as fair use, and our Rights Manager system makes provision for arbitrarily different operational definitions of fair use, so that specific contractual agreements can be "enforced" within the delivery system.


266

A New Consortial Model

The library world has productively used various consortial models for over 30 years, but until now, there has not been a successful model for building a digital library. One of the missing pieces in the consortial jigsaw puzzle has been a technical model that is both comprehensive and reproducible in a variety of library contexts. To begin our approach to a new consortial model, we developed a complete technical system for building and operating a digital library. Building such a system is no small achievement. Similar efforts have been undertaken with the Elsevier Science TULIP Project and the JSTOR project.

The primary desiderata for a new consortial model are as follows:

• Any research library can participate using agreed upon and accepted standards.

• Many research libraries each contribute relatively small amounts of labor by scanning a small, controlled number of journal issues. Scanning is both systematic and based on a request for an individual article.

• Readily available off-the-shelf equipment is used.

• Intellectual property is made available through licensing and controlled by the Rights Manager software system.

• Publishers grant rights to libraries to scan and store intellectual property retrospectively (i.e., already purchased materials) in exchange for the right to license use of the digital formats to other users. Libraries provide publishers with digital copies of scholarly journals for their own use, thus enabling publishers to enrich their own electronic libraries.

A Payments System For The Consortium

It is unrealistic to assume that all use of a future digital library will be without any charging mechanisms even though the research library of today charges for little except for photocopying and user fines. This is not to assume that the library user is charged for each use, although that would be possible. A more likely scenario would be that the library pay on behalf of the members of the scholarly community (i.e., student, professor, researcher) that it supports. According to our proposed consortial model, libraries would be charged for use of the digital library according to the total pages "read" in any given user session. It could be easily worked out that users who consult the digital library on the premises of the campus library would not be charged themselves, but if they used the digital library from another campus location or from off-campus through a network, that they would pay a per-page charge analogous to the cost of photocopying. A system of charging could include categorization by type of user, and the Rights Manager system provides for a wide variety of charging models, including the making of distinctions of usage in soft-copy format, hard-copy format, and downloading of


267

a work in whole or in part. Protecting the rights of the owner is an especially interesting problem when the entire work is downloaded in a digital format. Both visible and invisible watermarking are techniques with which we have experience for protecting rights in the case of downloading an entire work.

We also have in mind that libraries that provide input via scanning to the decentralized, digital library would receive a credit for each page scanned. It is clear that the value of the digital library to the end user will increase as higher degrees of completeness in digitized holdings is achieved. Therefore, the credit system to originating libraries should recognize this value and reward these libraries according to a formula that charges and credits with a relative credit-to-charging ratio of perhaps ten to one; that is, an originating library might receive a credit for scanning one page equal to a charge for reading ten soft-copy pages.

The charge-and-credit system for our new consortial model is analogous to that used for the highly successful Online Computer Library Center's cataloging system. Member libraries within OCLC contribute original cataloging entries in the form of MARC records for the OCLC database as well as draw down a copy of a holding's data to fill in entries for their own catalog systems. The system of charging for "downloads" and crediting for "uploads" is repeated in our consortial model for retrospective scanning and processing of full-text journal articles. Just as original cataloging is at the heart of OCLC, original scanning is at the heart of our new consortial model for building the library of the future.

Data Collection

One of the most important aspects of this project is that the underlying software system has been instrumented with many data collection points. In this way we can find out through actual usage by faculty, students, and research staff what aspects of the system are good and which need more work and thought. Over the past decade many people have speculated about how the digital library might be made to work for the betterment of scholarly communications. Our system as described in this paper is one of the most comprehensive attempts yet to build up a base of usage experience data.

To appreciate the detailed data being collected by the project, we will describe the various types of data that the Rights Manager system captures. Many types of transactions occur between the Rights Manager client and the Rights Manager server software throughout a user session. The server software records these transactions, which will permit detailed analysis of usage patterns. Appendix D gives some details regarding the data collected during a user session.

Publishers And Digital Libraries

The effects of the new consortial model for building digital libraries are not confined to the domain of technology. During the period when the new digital distribution system was being constructed, Ohio LINK, an agency of the Ohio


268

Board of Regents, commenced an overlapping relationship with Academic Press to offer its collection of approximately 175 electronic journals, many of which were in our chemical sciences collections. Significantly, the Ohio LINK contract with Academic Press facilitated the development of our digital library because it included a provision covering the scanning and storage of retrospective collections (i.e., backfiles) of their journals that we had originally acquired by subscription. In 1997, Ohio LINK extended the model of the Academic Press contract to an offering from Elsevier Science. According to this later agreement, subscriptions to current volumes of Elsevier Science's 1,153 electronic journals would be available for access and use on all of the 57 campuses of Ohio LINK member institutions, including CWRU and the University of Akron. The cost of the entire collection of electronic journals for each university for 1998 was set by the Ohio LINK-Elsevier contract to be approximately 5.5% greater than the institution's Elsevier Science expenditure level for 1997 subscriptions regardless of the particular subset these subscriptions represented; there is a further 5.5% price increase set to take effect in 1999. Further, the agreement between Ohio LINK and Elsevier constrains the member institutions to pay for this comprehensive access even if they cancel a journal subscription. Notably, there is an optional payment discount of 10% when an existing journal subscription (in a paper format) is limited to electronic delivery only (eliminating the delivery of a paper version). Thus, electronic versions of the Elsevier journals that are part of our chemical sciences digital library will be available at both institutions regardless of the existence of our consortium; pooling collections according to our consortial model would be a useless exercise from a financial point of view.

Other publishers are also working with our consortium of institutions to offer digital products. During spring 1997, CWRU and the University of Akron entered into an agreement with Springer-Verlag to evaluate their offering of 50 or so electronic journals, some of which overlapped with our chemical sciences collection. A similar agreement covering backfiles of Elsevier journals was considered and rejected for budgetary reasons. During the development of this project, we had numerous contacts with the American Chemical Society with the objective of including their publications in our digital library. Indeed, the outline of an agreement with them was discussed. As the time came to render the agreement in writing, they withdrew and later disavowed any interest in a contract with the consortium. At the present time, discussions are being held with other significant chemical science publishers about being included in our consortial library. This is clearly a dynamic period in journal publishing, and each of the societal and commercial publishers sees much at stake. While we in universities try to make sense of both technology and information service to our scholarly communities, the publishers are each trying to chart their own course both competitively and strategically while improvements in information technology continually raise the ante for continuing to stay in the game.

The underlying goal of this project has been to see if information technology


269

could control the costs of chemical sciences serial publications. In the most extreme case, it could lower costs by half in our two libraries and even more if departmental copies were eliminated. As an aside, we estimated that departmentally paid chemical sciences journal subscriptions represented an institutional expenditure of about 40% of the libraries' own costs, so each institution paid in total 1.4 times each library's costs. For both institutions, the total was about 2.8 times the cost of one copy of each holding. Thus, if duplication were eliminated completely, the resulting expenditures for the consortium for subscriptions alone would be reduced by almost two-thirds from that which we have been spending. Clearly, the journal publishers understood the implications of our project. But the implications of the status quo were also clear: libraries and individuals were cutting subscriptions each year because budgets could not keep up with price increases. We believed that to let nature take its course was irresponsible when a well-designed experiment using state-of-the-art information technology could show a way to make progress. Thus, the spirit of our initial conversations with chemical sciences publishers was oriented to a positive scenario: libraries and the scholars they represented would be able to maintain or gain access to the full range of chemical sciences literature, and journals would be distributed in digital formats. We made a crucial assumption that information technology would permit the publishers to lower their internal production costs. This assumption is not unreasonable in that information technology has accomplished cost reductions in many other businesses.

In our preliminary discussions with the publishers, we expressed the long-term objective that we were seeking-controlling and even lowering our costs through the elimination of duplication as our approach to solving the "cancellation conundrum"-as well as our short-term objective-to receive the rights to scan, store, and display electronic versions of both current and back files of their publications, which we would create from materials we had already paid for (several times over, in fact). Current and future subscriptions would be purchased in only one copy, however, to create the desired financial savings. In exchange, we offered the publishers a complete copy of our PDF-formatted current issue and backfiles for their use, from which they could derive new revenue through licensing to others. Since these once-sold backfiles were being considered on the publishers' corporate balance sheets as a depleted resource, we thought that the prospect of deriving additional revenue from selling them again as a digital resource would seem to be an attractive idea. In the end, however, not one publisher was willing to take us up on this exchange. To them, the backfiles that we would create were not worth what we were asking. One chemical sciences journal publisher was willing to grant the rights to backfiles for additional revenue from our consortium. But this offer made no sense unless the exchange could be justified on the basis of savings in costs of library storage space and the additional convenience of electronic access (the digital library is never closed, network access from remote locations would likely increase marginal usage, etc.). When we saw the proposed charge, we


270

rejected this offer as being too expensive. Another publisher did grant us the rights we sought as part of the overall statewide Ohio LINK electronic and print subscription contract, but this arrangement locked in the current costs (and annual increments) for several years, so the libraries could not benefit directly in terms of cost savings. With that particular limited agreement, however, there still is the definite possibility for savings on departmentally paid, personal subscriptions.

When we began to plan this project, it was not obvious what stance the publishing community would take to it. Our contacts in some of the leading publishing houses and in the Association of American Publishers (AAP) led us to believe that we were on the right track. Clearly, our goal was to reduce our libraries' costs, and that goal meant that publishers would receive less revenue. However, we also believed that the publishers would value receipt of the scanned backfiles that we would accumulate. Thus, the question was whether the backfiles have significant economic value. Clearly, libraries paid for the original publications in paper formats and have been extremely reluctant to pay a second time for the convenience of having access to digital versions of the backfiles. In our discussions, the publishers and AAP also seemed interested in doing experiments in learning whether a screen-based digital format could be made useful to our chemical sciences scholars. Thus, there was a variety of positive incentives favoring experimentation, and a benign attitude toward the project was evinced by these initial contacts with publishers. Their substantial interest in the CWRU Rights Management system seemed genuine and sincere, and their willingness to help us with an experiment of this type was repeatedly averred. After many months of discussion with one publisher, it became clear that they were unwilling to participate at all. In the end, they revealed that they were developing their own commercial digital journal service and that they did not want to have another venue that might compete with this. A second publisher expressed repeated interest in the project and, in the end, proposed that our consortium purchase a license to use the backfiles at a cost of 15% more than the price of the paper-based subscription; this meant that we would have to pay more for the rights to store backfiles of these journals in our system. A third publisher provided the rights to scan, store, display, and use the backfiles as part of the overall statewide Ohio LINK contract; thus this publisher provided all the rights we needed without extra cost to the consortium. We are continuing to have discussions with other chemical sciences journal publishers regarding our consortium and Ohio LINK, and these conversations are not uncomplicated by the overlap in our dual memberships.

It is interesting to see that the idea that digital distribution could control publisher costs is being challenged with statements such as "the costs of preparing journals for World Wide Web access through the Internet are substantially greater than the costs of distributing print." Questions regarding such statements abound:For example, are the one-time developmental costs front-loaded in these calculations, or are they amortized over the product's life cycle? If these claims are true,


271

then they reflect on the way chemical sciences publishers are using information technology, because other societies and several commercial publishers are able to reflect cost savings in changing from print to nonprint distribution. Although we do not have the detailed data at this time (this study is presently under way in our libraries), we expect to show that there are significant cost savings in terms of library staff productivity improvement when we distribute journals in nonprint versions instead of print.

As a result of these experiences, some of these publishers are giving us the impression that their narrowly circumscribed economic interests are dominating the evolution of digital libraries, that they are not fundamentally interested in controlling their internal costs through digital distribution, and that they are still pursuing tactical advantages over our libraries at the expense of a different set of strategic relationships with our scholarly communities. As is true about many generalizations, these are not universally held within the publishing community, but the overwhelming message seems clear nonetheless.

Conclusions

A digital distribution system for storing and accessing scholarly communications has been constructed and installed on the campuses of Case Western Reserve University and the University of Akron. This low-cost system can be extended to other institutions with similar requirements because the system components, together with the way they have been integrated, were chosen to facilitate the diffusion of these technologies. This distribution system successfully separates ownership of library materials from access to them.

The most interesting aspect of the new digital distribution system is that libraries can form consortia that can share specialized materials rather than duplicate them in parallel, redundant collections. When a consortium can share a single subscription to a highly specialized journal, then we have the basis for controlling and possibly reducing the total cost of library materials, because we can eliminate duplicative subscriptions. We believe that the future of academic libraries points to the maintenance of a basic core collection, the selective acquisition of specialty materials, and the sharing across telecommunications networks of standard scholarly works. The consortial model that we have built and tested is one way to accomplish this goal. Our approach is contrasted with the common behavior of building up ever larger collections of standard works so that over time, academic libraries begin to look alike in their collecting habits, offer almost duplicative services, and require larger budgets. This project is attempting to find another path.

Over the past decade, several interesting experiments have been conducted to test different ideas for developing digital libraries, and more are under way. With many differing ideas and visions, an empirical approach is a sound way to make


272

progress from this point forward. Our consortium model with its many explicit standards and integrated technologies seems to us to be an experiment worth continuing. During the next few years it will surely develop a base of performance data that should provide insights for the future. In this way, experience will benefit visioning.

References

Borghuis, M., Brinckman, H., Fischer, A., Hunter, K., van der Loo, E., Mors, R., Mostert, P., and Zilstra, J.: TULIP Final Report: T he U niversity LI censing P rogram. New York: Elsevier Science, 1996.

Bush, V.: "As We May Think," The Atlantic Monthly, 176, 101-108, 1945.

Cummings, A. M., Witte, M. L., Bowen, W. G., Lazarus, L. O., Ekman, R. H.: University Libraries and Scholarly Communication: A Study Prepared for The Andrew W. Mellon Foundation. The Association of Research Libraries, 1992.

Fleischhauer, C., and Erway, R. L.: Reproduction-Quality Issues in a Digital-Library System: Observations on the Reproduction of Various Library and Archival Material Formats for Access and Preservation. An American Memory White Paper, Washington, D.C.: Library of Congress, 1992.

Kemeny, J. G.: "A Library for 2000 A.D." in Greenberger, M. (Ed.), Computers and the World of the Future. Cambridge, Mass.: The M.I.T. Press, 1962.

Appendix A Consortial Standards

Marc

• Enumeration and chronology standards from the serials holding standards of the 853 and 863 fields of MARC

- Specifies up to 6 levels of enumeration and 4 levels of chronology, for example

853¦aVolume¦bIssue¦i(year)¦j(month)

853¦aVolume¦bIssue¦cPart¦i(year)¦j(month)

• Linking from bibliographic records in library catalog via an 856 field

-URL information appears in subfield u, anchor text appears in subfield z, for example

856 7¦uhttp://beavis.cwru.edu/chemvl¦zRetrieve articles from the Chemical Sciences Digital Library

Would appear as

Retrieve articles from the Chemical Sciences Digital Library


273

TIFF

• The most widely used multipage graphic format

• Support for tagged information ("Copyright", etc.)

• Format is extensible by creating new tags (such as RM rule information, authentication hints, encryption parameters)

• Standard supports multiple kinds of compression

Adobe PDF

• Container for article images

• Page description language (PDF)

• PDF files are searchable by the Adobe Acrobat browser

• Encryption and security are defined in the standard

SICI (Serial Item and Contribution Identifier)

• SICI definition (standards progress, overview, etc.)

• Originally a key part of the indexing structure

• All the components of the SICI code are stored, so it could be used as a linking mechanism between an article database and the Chemical Sciences Digital Library

• Ohio LINK is also very interested in this standard and is urging database creators and search engine providers to add SICI number retrieval to citation database and journal article repository systems

• Future retrieval interfaces into the database: SICI number search form, SICI number search API, for example

0022-2364(199607)121:1<83:TROTCI>2.0.TX;2-I

Appendix B Equipment Standards for End Users

Minimum Equipment Required

Hardware: An IBM PC or compatible computer with the following components:

• 80386 processor

• 16 MB RAM

• 20 MB free disk space

• A video card and monitor with a resolution of 640 ³ 480 and the capability of displaying 16 colors or shades of gray


274

Software:

• Windows 3.1

• Win32s 1.25

• TCP/IP software suite including a version of Winsock

• Netscape Navigator 2.02

• Adobe Acrobat Exchange 2.1

Win32s is a software package for Windows 3.1 that is distributed without charge and is available from Microsoft.

The requirement for Adobe Acrobat Exchange, a commercial product that is not distributed without charge, is expected to be relaxed in favor of a requirement for Adobe Acrobat Reader, a commercial product that is distributed without charge.

The software will also run on newer versions of compatible hardware and/or software.

Recommended Configuration of Equipment

This configuration is recommended for users who will be using the system extensively. Hardware: A computer with the following components:

• Intel Pentium processor

• 32 MB RAM

• 50 MB free disk space

• A video card and monitor with a resolution of 1280 ³ 1024 and the capability of displaying 256 colors or shades of gray

Software:

• Windows NT 4.0 Workstation

• TCP/IP suite that has been configured for a network connection (included in Windows NT)

• Netscape Navigator 2.02

• Adobe Acrobat Exchange 2.1

The requirement for Adobe Acrobat Exchange, a commercial product that is not distributed without charge, is expected to be relaxed in favor of a requirement for Adobe Acrobat Reader, a commercial product that is distributed without charge.

Other software options that the system has been tested on include:

• IBM OS/2 3.0 Warp Connect with Win-OS/2

• IBM TCP/IP for Windows 3.1, version 2.1.1

• Windows NT 3.51


275

Appendix C
Additional Hardware Specifications

Storage for Digital Copies

To give us the greatest possible flexibility in developing the project, we decided to form the server out of two interlinked computer systems, a standard IBM System 390 with the OS/390 Open Edition version as the operating system and a standard IBM RS/6000 System with the AIX version of the UNIX operating system. Both these components may be incrementally grown as the project's server requirements increase. Both systems are relatively commonplace at academic sites. Although only one system pair is needed in this project, it is likely that eventually two pairs of systems would be needed for an effort on the national scale. Such redundancy is useful for providing both reliability and load leveling.

Campuswide Networks

Both campus's networks and the statewide network that connects them uses the standards-based TCP/IP protocols. Any networked client workstation that follows our minimum standards will be able to use the digital delivery system being constructed. The minimum transmission speed on the CWRU campus is ten million bits per second (M bps) to each client workstation and a minimum of 155 M bps on each backbone link. The principal document repository is on the IBM System 390, which uses a 155 M bps ATM (asynchronous transfer mode) connection to the campus backbone. The linkage to the University of Akron is by way of the statewide network in which the principal backbone connection from CWRU is also operating at 155 M bps, and the linkage from the UA to the statewide network is at 3 M bps. The on-campus linkage for UA is also a minimum of 10 M bps to each client workstation within the chemical sciences scholarly community and to client workstations in the UA university library.

Appendix D System Transactions as Initiated by an End User

A typical user session generates the following transactions between client and server.


276

1. User requests an article (usually from a Web browser). If the user is starting a new session, the RM system downloads and launches the appropriate viewer, which will process only encrypted transactions. In the case of Adobe Acrobat, the system downloads a plug-in. The following transactions take place with the server:

a. Authenticate the viewer (i.e., ensure we are using a secure viewer).

b. Get permissions (i.e., obtain a set of user permissions, if any. If it is a new session, the user is set by default to be the general purpose category of PUBLIC).

c. Get Article (download the requested article. If step b returns no permissions, this transaction does not occur. The user must sign on and request the article again).

2. User signs on. If the general user has no permissions, he or she must log on. Following a successful logon, transactions 1b and 1c must be repeated. Transactions during sign-on include:

a. Sign On

3. Article is displayed on-screen. Before an article is displayed on the screen, the viewer enters the RM protocol, a step-by-step process wherein a single Report command is sent to the server several times with different state flags and use types. RM events are processed similarly for all supported functions, including display, print, excerpt, and download. The transactions include:

a. Report Use BEGIN (just before the article is displayed).

b. Report Use ABORT (sent in the event that a technical problem, such as "out of memory," prevents display of the article).

c. Report Use DECLINE (sent if the user declines display of the article after seeing the cost).

d. Report Use COMMIT (just after the article is displayed).

e. Report Use END (sent when the user dismisses the article from the screen by closing the article window).

4. User closes viewer. When a user closes a viewer, an end-of-session process occurs, which sends transaction 3e for all open articles. Also sent is a close viewer transaction, which immediately expires the viewer so it may not be used again.

a. Close Viewer

The basic data being collected for every command (with the exception of 1a) and being sent to the server for later analysis includes the following:

• Date/time

• Viewer ID

• User ID (even if it is PUBLIC)

• IP address of request

These primary data may be used to derive additional data: Transaction 1b is effectively used to log unsuccessful access attempts, including failure reasons. The time


277

interval between transactions 3a and 3e is used to measure the duration that an article is on the screen. The basic data collection module in the RM system is quite general and may be used to collect other information and derive other measures of system usage.

Appendix E
Scanning and Work Flow

Article Scanning, PDF Conversion, and Image Quality Control

The goal of the scan-and-store portion of the project is to develop a complete and tested system of hardware, software, and procedures that can be adopted by other members of the consortium with a reasonable investment in equipment, training, and personnel. If a system is beyond a consortium member's financial means, it will not be adopted. If a system cannot perform as required, it is a waste of resources.

Our original proposal stressed that all existing scholarly resources, particularly research tools, would remain available to scholars throughout this project. To that end, the scan-and-store process is designed to leave the consortium's existing journal collection intact and accessible.

Scan-and-Store Process Resources

• Scanning workstation, including a computer with sufficient processing and storage capacity, a scanner, and a network connection. Optionally, a second workstation can be used by the scanning supervisor to process the scanned images. The workstation used in this phase of the project includes:

-Minolta PS-3000 Digital Planetary Scanner

-Two computers with Pentium 200 MHz CPU, 64 MB RAM, 4 GB HD, 21" monitor

-Windows 3.11 OS (required by other software)

-Minolta Epic 3000 scanner software

-Adobe Acrobat Capture, Exchange, and Distiller software

-Image Alchemy software

-Network interface cards and TCP/IP software for campus network access

• Scanner operator(s), typically student assistants, with training roughly equivalent to that required for interlibrary loan photocopying. Approximately 8 hours of operator labor will be required to process the average 800 pages per day capacity of a single scanning workstation.


278

• Scanning supervisor, typically a librarian or full-time staff, with training in image quality control, indexing, and cataloging, and in operation of image processing software. Approximately 3 hours of supervisor labor will be required to process 800 scanned pages per day.

Scan-and-Store Process: Scanner Operator

• Retrieve scan request from system

• Retrieve materials from shelves (enough for two hours of scanning)

• Scan materials and enter basic data into system

-Evaluate size of pages

-Evaluate grayscale/black and white scan mode

-Align material

-Test scan and adjust settings and alignment as necessary

-Scan article

-Log changes and additions to author, title, journal, issue, and item data on request form

-Repeat for remaining requested articles

• Transfer scanned image files to Acrobat conversion workstation

• Retrieve next batch of scan requests from system

• Reshelve scanned materials and retrieve next batch of materials

Scan-and-Store Process: Acrobat Conversion Workstation

• Run Adobe Acrobat Capture to automatically convert sequential scanned image files from single-page TIFF to multi-page Acrobat PDF documents, as they are received from scanner operator

• Retain original TIFF files

Scan-and-Store Process: Scanning Supervisor

• Retrieve request forms for scanned materials

• Open converted PDF files

• Evaluate image quality of converted PDF files

-Scanned article matches request form citation

-Completeness, no clipped margins

-Legibility, especially footnotes and references

-Minimal skewing

-Clarity of grayscale or halftone images

-Appropriate margins, no excessive white space

• Crop fingertips, margin lines, and so on, missed by Epic 3000 scanner software

-Retrieve TIFF image file

-Mask unwanted areas

-Resave TIFF image file


279

-Repeat PDF conversion

-Evaluate image quality of revised PDF file

• Return unacceptable scans to scanner operator for rescan or correction

• Evaluate, correct, and expand entries in request forms

• Forward corrected PDF files to the database

• Delete TIFF image files from conversion workstation

Notification to and Viewing by User of Availability of Scanned Article

Insertion of the article into the database

• The scanning technician types the scan request number into a Web form.

• The system returns a Web form with most of the fields filled in. The technician has an opportunity to correct information from the paging slip before inserting the article into the database.

• The Web form contains a "file upload" button that when selected allows the technician to browse the local hard drive for the article PDF file. This file is automatically uploaded to the server when the form is submitted.

• The system inserts the table of contents information into the database and the PDF file to the Rights Manager system.

Notification/delivery of article to requester

• E-mail to requester with URL of requested article (in first release)

• No notification (in first release)

• Fax to requester an announcement page with the article URL (proposed future enhancement)

• Fax to requester a copy of the article (proposed future enhancement)

Appendix F
Technical Justification for a Digitization Standard for the Consortium

A major premise in the technical underpinnings of the new consortial model is that a relatively inexpensive scanner can be located in the academic libraries of consortium members. After evaluating virtually every scanning device on the market, including some in laboratories under development, we concluded that the 400 dot-per-inch (dpi) scanner from Minolta was fully adequate for the purpose of scanning all the hundreds of chemical sciences journals in which we were interested. Thus, for our consortium, the Minolta 400 dpi scanner was taken to be the


280

digitization standard. The standard that was adopted preserves 100% of the informational content required by our end users.

More formally, the standard for digitization in the consortium is defined as follows:

The scanner captures 256 levels of gray in a single pass with a density of 400 dots per inch and converts the grayscale image to black and white using threshold and edge-detection algorithms.

We arrived at this standard by considering our fundamental requirements:

• Handle the smallest significant information presented in the source documents of the chemical sciences literature, which is the lowercase e in superscript or subscript as occurs in footnotes

• Satisfy both legibility and fidelity to the source document

• Minimize scanning artifacts or "noise" from background

• Operate in the range of preservation scanning

• Be affordable by academic and research libraries

The scanning standard adopted by this project was subjected to tests of footnoted information, and 100% of the occurrences of these characters were captured in both image and character modes and recognized for displaying and searching.

At 400 dpi, the Minolta scanner works in the range of preservation quality scanning as defined by researchers at the Library of Congress (Fleischhauer and Erway 1992).

We were also cautioned about the problems unique to very high resolution scanning in which the scanner produces artifacts or "noise" from imperfections in the paper used. We happily note that we did not encounter this problem in this project because the paper used by publishers of chemical sciences journals is coated.

When more is less: images scanned at 600 dpi require larger file sizes than those scanned at 400 dpi. Thus, 600 dpi is less efficient than 400 dpi. Further, in a series of tests that we conducted, a 600 dpi scanner actually produced an image of effectively lower resolution than 400 dpi. It appears that this loss of information occurs when the scanned image is viewed on a computer screen where there is relatively heavy use of anti-aliasing in the display. When viewed with software that permitted zooming in for looking at details of the scanned image (which is supported by both PDF and TIFF viewers), the 600 dpi anti-aliased image actually had lower resolution than did an image produced from the same source document by the 400 dpi Minolta scanner according to our consortium's digitization standard. With the 600 dpi scanner, the only way for the end user to see the full resolution was to download the image and then print it out. When a side-by-side comparison was made of the soft-copy displayed images, the presentation image quality of 600 dpi was deemed unacceptable by our end users; the 400 dpi image was just right. Thus, our delivery approach is more useful to the scholar who needs to examine


281

fine details on-screen. We conducted some tests on reconstructing the journal page from the scanned image by printing it out on a Xerox DocuTech 6135 (600 dpi). We found that the smallest fonts and fine details of the articles were uniformly excellent. Interestingly, in many of the tests we performed, our faculty colleagues judged the end result by their own "acid test": how the scanned image, when printed out, compared with the image produced by a photocopier. For the consortium standard, they were satisfied with the result and pleased with the improvement in quality that the 400 dpi scanner provided in comparison with conventional photocopying of the journal page.


282

Chapter 17—
On-line Books at Columbia
Early Findings on Use, Satisfaction, and Effect

Mary Summerfield and Carol A. Mandel
with Paul Kantor, Consultant

Introduction

The Online Books Evaluation Project at Columbia University explores the potential for on-line books to become significant resources in academic libraries by analyzing (1) the Columbia community's adoption of and reaction to various on-line books and delivery system features provided by the libraries over the period of the project; (2) the relative life-cycle costs of producing, owning, and using on-line books and their print counterparts; and (3) the implications of intellectual property regulations and traditions of scholarly communications and publishing for the on-line format.

On-line books might enhance the scholarly processes of research, dissemination of findings, teaching, and learning. Alternatively, or in addition, they might enable publishers, libraries, and scholars to reduce the costs of disseminating and using scholarship. For example:

• If the scholarly community were prepared to use some or all categories of books for some or all purposes in an on-line format instead of a print format, publishers, libraries, and bookstores might be able to trim costs as well as enhance access to these books.[1]

• If on-line books made scholars more efficient or effective in their work of research, teaching, and learning so as to enhance revenues or reduce operating costs for their institutions, on-line books might be worth adopting even if they were no less costly than print books.

• If an on-line format became standard, publishers could offer low-cost on-line access to institutions that would not normally have purchased print copies, thus expanding both convenient access to scholarship to faculty and students at those institutions and publishers' revenues from these books.[2]


283

This paper focuses on user response to on-line books and reports on:[3]

1. the conceptual framework for the project

2. background information on the status of the collection and other relevant project elements, particularly design considerations

3. the methodology for measuring adoption of on-line books by the Columbia community

4. early findings on use of on-line books and other on-line resources

5. early findings on attitudes toward on-line books

Conceptual Framework

The variables representing usage of a system of scholarly communication and research are both effects and causes. Since scholars, the users of the system, are highly intelligent and adaptive, the effect of the system will influence their behavior, establishing a kind of feedback loop. As the diagram in Figure 17.1 shows, there are two key loops. The upper one, shown by the dark arrows, reflects an idealized picture of university administration. In this picture, the features of any system are adjusted so that, when used by faculty and students, they improve institutional effectiveness. This adjustment occurs in the context of continual adaptation on the part of the users of the system, as shown by the lighter colored arrows in the lower feedback loop.

These feedback loops are constrained by the continual change of the environment, which affects the expectations and activities of the users, affects the kind of features that can be built into the system, and affects the very management that is bringing the system into existence. The dotted arrows show this interaction.

Our primary research goal, in relation to users, uses, and impacts, is to understand these relationships, using data gathered by library circulation systems, Internet servers, and surveys and interviews of users themselves.

The On-line Books Collection

The project began formal activity in January 1995. However, discussions with publishers began in 1993, if not earlier. As noted in the project's Analytical Principles and Design document, "The Online Books Evaluation Project is a component of the developing digital library at Columbia University. As part of its digital library effort, the Columbia University Libraries is acquiring a variety of reference and monographic books in electronic format to be included on the campus network; in most cases, those books will be available only to members of the Columbia community. Some of the books are being purchased; others are being provided on a pilot project basis by publishers who are seeking to understand how the academic community will use online books if they become more widely available in the future."


284

figure

Figure 17.1.
Interrelation of Factors Involved in the Use and Impact of On-line Books

Design of the On-line Books Collection

When this project was proposed, the World Wide Web was just emerging, and we expected to develop custom SGML browsers, just as other on-line projects were doing at the time. However, by the time the project was ready to mount books on-line,[4] the Web seemed the best delivery system for maximizing availability of the books to scholars.

Many other on-line projects are providing users with materials in PDF, scanned, or bitmapped format. These formats are effective for journal articles, which are finely indexed through existing sources and which are short and easily printed. However, the greatest potential for added value from on-line books comes with truly digital books. Only this on-line format allows the development of interactive books that take advantage of the current and anticipated capabilities of Web technology, such as the inclusion of sound and video, data files and software for manipulating data, and links to other on-line resources. Perhaps only such enhanced on-line books will offer sufficient advantages over traditional print format that scholars will be willing to substitute them for the print format for some or all of their modes of use and for some or all classes of books.

As of June 1997, the project included 96 on-line texts. The libraries have each book in print form-circulating from the regular collection or from reserves or noncirculating in reference-as well as in one or more on-line formats. Appendix A summarizes the print access modes for all the modern books in the collection.

Methodology for Studying Use of and Reactions to Various Book Formats

The project's Analytical Principles and Design document lays out the evaluation methodology.[5] Formulated in the first year of the project, this methodology re-


285

mains the working plan. Here are some of the key measures for documenting use of the on-line books:

• The records of the Columbia computing system provide, for the most part, the use data for the on-line books. For books accessed via the World Wide Web, information on date, time, and duration of session involving an on-line book, user's cohort, location of computer, number of requests, amount of the book requested, and means of accessing the book will be available. These data became available in summer 1997 with the full implementation of the authentication system and related databases.

• Circulation data for each print book in the regular collection provides information on number of times a book circulates, circulation by cohort, duration of circulation, number of holds, and recalls. For most libraries, the data available for reserve books is the same as that for books in the regular collection as the CLIO circulation system is used for both.

• The records of the Columbia computing system provide, for the most part, the use data for the books accessed via CNet, Columbia's original, gopherbased Campus Wide Information System, including the number of sessions and their date and time. These records do not include the duration of the session, the activity during the session, e.g., printing or saving, or anything about the user. Thus, all we can analyze are the patterns of use by time of day, day of week, and over time.

• Until March 15, 1997, for books accessed via CWeb, we knew the use immediately preceding the hit on the book and the day and time of the hit. For data collected through that point, our analysis is constrained to patterns of use by time of day, day of the week, and over time. By manual examination of server data, we counted how many hits a user made on our collection during one session and the nature of those hits.

• Since March 15, 1997, we are able to link user and usage information and conduct a series of analyses involving titles used, number of hits, number of books used, and so on by individual and to group those individuals by department, position, and age. These data do not yet include sessions of use, just the magnitude of overall use during the period. Session-specific data are available starting in fall 1997.

We are using a wide range of tools in trying to understand the factors that influence use of on-line books. Table 17.1 summarizes our complex array of surveys and interviews.

Use of Books in On-line Collection

At this point we will report on (1) trends in use of the on-line books; (2) user location and cohort; and (3) use of the on-line books by individuals. Summarized


286
 

TABLE 17.1. Types of Surveys

Population

Method

Contact

Response Rate

Remarks

Users of on-line books

On-line instrument

Passive

Low

 

Users of on-line books

On-line post-use survey

Passive

Very low

 

Users of paper alternatives

Response slips in books

Passive

Unknown

Levels of use not known

Users of course materials in either form

Interviews distributed in class

Active

High

 

Users and nonusers

Library & campuswide surveys

Active

Moderate

No full active survey of the campus has been done

Discipline-specific potential users

Surveys & interviews

Active

High

Thus far only conducted before books were on-line

NOTE : Passive instruments are ones which the user must elect to encounter. Active instruments are distributed in some way, to the attention of the user. High response rates are in the range of 80-90% completion, with better than 60% usable.

below separately are findings for reference works and nonreference books, e.g., monographs and collections.

Reference Books

Three reference works have been available on-line long enough to have generated substantial usage data.[6] These are The Concise Columbia Electronic Encyclopedia, Columbia Granger's World of Poetry, and The Oxford English Dictionary. Three other titles (Chaucer Name Dictionary, African American Women, Native American Women ) have been on-line only since early 1997, so usage data are very short-term for these titles. All three are accessible both through CNet and CWeb.

Most available reference books are used more heavily on-line than in print.

Of the six reference works in the collection, only The Oxford English Dictionary receives sizable use in its print form in the Columbia libraries. At most a handful of scholars use the library copies of the others each month. As the accompanying tables and figure show, each of these books receives much more use on-line. On-line availability seems to increase awareness of these resources as well as make access more convenient.


287

Early on-line reference books have experienced falling usage over time, substitution of use of a new delivery system for an old one, or a smaller rate of growth of use than might be expected given the explosion in access to and use of on-line resources in general.

In the early to mid-1990s, novelty may have brought curious scholars to the on-line format somewhat without concern for design, the utility of the delivery system, or the qualities of the books. With enhancement in delivery systems and expansion in the number of on-line books, being on-line is no longer a guarantee that a book will attract users. As access to the Web spreads, new graphical Web delivery systems are offering superior performance that is increasingly likely to draw scholars away from these early, text-based systems. In addition, as more competing resources come on-line and provide information that serves the immediate needs of a user better or offer a more attractive, user-friendly format, scholars are less likely to find or to choose to use any single resource.

The Oxford English Dictionary is the most heavily used reference work in the collection. Its CNet format offers good analytic functionality but it is difficult to use. The CWeb format is attractive and easy to use, but its functionality is limited to looking up a definition or browsing the contents.[7]

OED CNet usage dropped 59% from fourth quarter 1994 (2,856 sessions) to first quarter 1997 (1,167 sessions). OED CWeb use increased by 27% from fall semester 1996 (1,825 hits) to spring semester 1997 (2,326 hits). The OED had 173 unique users in the period from March 15 to May 31, 1997, with an average of 2.8 hits per user.

The Concise Columbia Electronic Encyclopedia remains on the text-based platform CNet. As Figure 17.2 shows, usage declined 84% over the past three years, from 1,551 sessions in April 1994 to 250 sessions in April 1997. Usage declined most in the 1996-97 academic year; 7,861 sessions were registered from September 1995 to May 1996 and 2,941 sessions (63% fewer) from September 1996 to May 1997.

Columbia now provides CWeb access to the Encyclopedia Britannica (directly from the publisher's server); many scholars may be using this resource instead of the Concise Encyclopedia. Recently the Columbia community has registered about 5,000 textual hits a month on the Encyclopedia Britannica.

Columbia Granger's World of Poetry is available on both CNet and CWeb. The CNet version is a Lynx, nongraphical Web formulation of the CWeb version. This resource, which became available to the community in on-line form in October 1994, locates a poem in an anthology by author, subject, title, first line, or keywords in its title or first line. In addition, it provides easy access to the 10,000 most often anthologized poems. In first quarter 1995, CNet sessions totaled 718; in first quarter 1997, they totaled 90 (or about one a day). CWeb hits totaled about 700 in the first quarter of 1997. Thus, even though it has declined, total usage of Granger's is still considerable.

Garland's Chaucer Name Dictionary was added to the CWeb collection at the end of 1996. Native American Women was added in January 1997, and African American Women went on-line in February 1997. Their early usage on CWeb is shown in Table 17.2.


288

figure

Figure 17.2.
Concise Columbia Electronic Encyclopedia Sessions, 1994-1997: CNet

Nonreference Books

The Online Books Evaluation Project includes two types of nonreference books: Past Masters, 54 classical texts in social thought; and modern monographs and collections from Columbia University Press, Oxford University Press, and Simon and Schuster Higher Education. Most of these books came on-line during the 1996-97 academic year.


289
 

TABLE 17.2. Use of Garland Reference Books on CWeb, January-May 1997

 

Jan.-May 1997

3/15-5/31/1997

Title

CWeb Hits

Unique Users

Mean Hits/ Unique User

Chanucer Name Dictionary

269

9

3.8

Native American Women

124

9

4.3

African American Women

230

6

5.5

On-line scholarly monographs are available to and used by more people than their print counterparts in the library collection.

Once a print book is in circulation, it can be unavailable to other scholars for hours (the reserve collection) or weeks or months (the regular collection). An online book is always available to any authorized user who has access to a computer with an Internet connection and a graphical Web browser.

Table 17.3 tracks usage of the contemporary nonreference books in the on-line collection for the last part of the spring semester 1997 and in the print circulation of these titles for the first six months of 1997. Fourteen of these books had no online use during this 2.5-month measurement period; 12 had no print circulations during their 6-month measurement period. In total, the on-line versions had 122 users while the print versions had 75 users. Looking at only the on-line books that circulated in print form, we find 122 on-line users and 45 print circulations, or that there were nearly three times as many on-line users as circulations. These data suggest that, compared for an equal period, these books will have many more users in on-line form than in print form.[8]

An on-line book may attract scholars who would not have seen it otherwise.

Once a group within the community becomes aware of the on-line books, they are likely to review books in the collection that seem related to their interests-at least while the collection is small. For example, half of the use of Autonomous Agents: From Self Control to Autonomy was from social work host computers. This title might seem related to social work issues even though it is not a social work book or part of the collection of the Social Work Library.

The fifth and sixth most used books-Self Expressions: Mind, Morals, and the Meaning of Life and Bangs, Crunches, Whimpers, and Shrieks -are both philosophy titles.

Self Expressions is listed in the Current Social Science Web page along with the social work titles. Five of its seven users were from the School of Social Work, one from the Center for Neurobiology and Behavior, and one from Electrical Engineering.


290
 

TABLE 17.3. On-line and Print Circulation for Contemporary On-line Nonreference Books

 

Online Activity: 3/15-5/31/97

   

Title

Users

Hits

Mean Hits/User

Print Circulation Jan.-June/97

On-line Users/Circulation

Task Strategies: An Empirical Approach ...

30

288

9.6

4

7.5

Mutual Aid Groups, Vulnerable Populations, & the Life Cycle

18

138

7.7

12

1.5

Supervision in Social Work

8

33

4.1

12

1.5

Philosophical Foundations of Social Work

8

21

2.6

0

NC

Self Expressions: Mind, Morals ...

7

21

3.0

3

2.3

Bangs, Crunches, Whimpers, & Shrieks

7

21

3.0

0

NC

Handbook of Gerontological Services

6

31

5.2

1

6.0

Turning Promises into Performance

6

31

5.2

1

6.0

Qualitative Research in Social Work

6

10

1.7

1

6.0

Gender in International Relations

4

6

1.5

3

1.3

Other Minds

4

34

8.5

2

2.0

Seismosaurus

4

8

2.0

0

NC

Nietzsche's System

3

6

2.0

5

0.6

The Logic of Reliable Inquiry

3

7

2.3

0

NC

Sedimentographica

2

6

3.0

1

2.0

Free Public Reason

2

6

3.0

0

NC

Philosophy of Mathematics & Mathematical Practice in the 17th Century

2

3

1.5

0

NC

Real Rights

2

3

1.5

0

NC

Hemmed In

0

0

    0

13

0

(table continued on next page)


291

(table continued from previous page)

 
 

Online Activity: 3/15-5/31/97

   

Title

Users

Hits

Mean Hits/User

Print Circulation Jan.-June/97

On-line Users/Circulation

Jordan's Inter-Arab Relations

0

0

0

8

0

Ozone Discourses

0

0

0

3

0

Children's Literature & Critical Theory

0

0

0

1

 

Freedom and Moral Sentiment

0

0

0

1

0

Autonomous Agents

0

0

0

1

0

Novel & Globalization of Culture

0

0

0

1

0

Mortality, Normativity, & Society

0

0

0

1

0

Managing Indonesia

0

0

0

1

0

Littery Man

0

0

0

0

NC

Law & Truth

0

0

0

0

NC

Poetics of Fascism

0

0

0

0

NC

Majestic Indolence

0

0

0

0

NC

International Politics

0

0

0

0

NC

Total

122

673

5.5

75

1.6

NOTE: Titles in bold were on reserve for one or more courses in spring 1997. One book has been omitted from the list as it only went on-line in April 1997. NC: total or change is not calculable.


292

Bangs, Crunches, Whimpers, and Shrieks is listed under Physics in the Current Science Web page. Two of its seven users were from the Physics department, another two from unidentified departments, and one each from Electrical Engineering, Engineering, and General Studies.

It is not clear whether scholars' productivity or work quality will be enhanced by such serendipity. The important concept of collation is transformed, in the networked environment, to a diversity of finding and navigational systems. As the online collection expands, browsing will require the focused use of on-line search tools rather than use of project-oriented Web pages. However, the Web's search systems may uncover a book's relevance to a scholar's work when the library catalog or browsing the library or bookstore shelves would not have done so. This new ability to identify relevant books should improve scholars' research and teaching.

Scholars use some on-line books relatively little.

As Table 17.3 shows, use of on-line books is not evenly balanced among titles. Instead it is driven by course demands or other interest in a book. The initial use of the 54 on-line Past Masters classic texts in social thought confirms this finding. In academic year 1996-97, these texts registered a total of about 2,460 hits from the Columbia scholarly community. However, 1,692 (69%) of these hits were on only eight (15%) of the titles, for an average total of 212 hits each, or 24 hits each per month. The other 46 texts averaged about 17 hits each over this period, or about two hits each per month.[9]

Patterns of usage may be expected to change over time as various texts are used in courses or by researchers and as the Columbia community becomes more aware of the on-line collections. In general, it is likely that nonreference books that are being used in courses, but which the students need not own, will be in greater demand on-line than will books that students must own or books that are of interest to only a small set of scholars.

The data to date suggest that to the extent that there are meaningful costs to creating on-line books and to maintaining them as part of a collection, publishing and library planners must select items for the on-line collection carefully. The decision rules will vary depending on what type of organization is taking on the risks of providing the access to the on-line books.

Some scholars, especially students with a reading assignment that is in the on-line collection, are looking at on-line books in some depth, suggesting that they find value in this means of access.

As Table 17.3 shows, the on-line books averaged 5.5 hits per unique user, suggesting that some users are looking at several elements of the book or at some elements repeatedly.[10] In fall 1996, three social work books were most intensively used because they were assigned reading for courses. We analyzed the server statistics through the end of 1996 for these books in an effort to learn how deeply the books


293

were used-to what extent use sessions included book chapters, the search engine, the pagination feature, and so on.

Table 17.4 shows that relatively few sessions (7%-24%) involved someone going only to the Title Page/Table of Contents file for a book. Many sessions (28%-59%) involved use of more than one chapter of the book; sessions averaged 1.4 to 3.5 hits on chapters, depending on the book used. Some users would seem to be repeat users who had set up a bookmark for a chapter in the book or had made a note of the URL because some sessions (9%-17%) did not include a hit on the Table of Contents/Title Page.

Table 17.5, illustrating the distribution of hits on the on-line books collection per unique user over the last part of the spring 1997 semester, indicates that while many users are making quite cursory use of the on-line books, more are looking at multiple files (e.g., reference entry, chapter) in the collection. As Table 17.6 shows, the distribution of unique titles viewed by these users over this period indicates that most users come to the collection to look at a single book. The greatest number of books used by a single person was seven (by two persons).

Not surprisingly, there is a certain correlation between number of hits and number of titles used. Those users with only one hit could only have looked at one title (42% of those using one book). The range of hits among those who used only one book is wide-20 (9%) had more than 10 hits. Six users had more than 25 hits; two of them looked at only one book, one each at two and three books, and two at seven books. These statistics indicate some significant use of the collection as measured by average number of hits per title used.

However, hits on several titles need not indicate heavy use of the on-line books collection. The individual who looked at five books had a total of only six to ten hits, as did four of the seven people who looked at four books (one to two hits each). The person who looked at six books had 11 to 15 hits in total (an average of about two hits per book).

Table 17.7 shows that graduate students tended to have more hits, and undergraduates and faculty, fewer hits.

The preceding discussion highlights the current data on usage by individuals. Using newer data on sessions, we will be able to derive valuable information on user behavior-not only number of books used and hits on those books but parts of the book used and repeat usership. We will begin to see revealed preference in user behavior and will be less reliant on responses to questionnaires.[11]

Data for the last half of the spring 1997 semester suggest that when a social work book available in both print and on-line formats was used in a course, the share of students using the on-line version was at most one-quarter.

Table 17.3 shows that the four most used nonreference books were all in the field of social work. Almost 91% of the users of these books were from the School of Social Work; they accounted for 98% of the hits on those books. The vast majority of these users (56 of 64) were graduate students. With the exception of the


294
 

TABLE 17.4. Session Analysis for Social Work Books, Fall 1996

 

Handbook of Gerontological Services

Supervision in Social Work

Task Strategies

No. chapters in book

10

  10

  11

Sessions

  41

  46

  58

Hits

128

128

284

Mean hits/session

      3.1

    2.8

    4.9

Sessions w/TOC hit only

           4 (10%)

  11 (24%)

    4 (7%)

Sessions w/no TOC hits

           7 (17%)

    7 (15%)

    5 (9%)

Sessions w/ > 1 chapter hits

          15 (37%)

  13 (28%)

  34 (59%)

Total Hits On

Number

Average/ Session

Number

Average/ Session

Number

Average/ Session

Table of contents

38

  .9

47

1.0

60

1.0

Chapters

74

1.8

63

1.4

202

3.5

Page locator

  5

  .1

  3

  .1

   7

  .1

Search

  5

  .1

10

  .2

  4

  .1

Bibliographic page

  5

  .1

  4

  .1

  9

  .2

Author biography

  1

  *

  1

  *

  2

*

*Less than .05.

 

TABLE 17.5. Distribution of Hits per Unique User, March 15-May 31, 1997

Total Number of Hits

% of Total Users

1

34%

2

16%

3

8%

4

8%

5

4%

6-10

16%

11-15

5%

16-20

5%

21-25

2%

>25

2%

NOTE : Detail may not sum to 100% due to rounding.


295
 

TABLE 17.6. Distribution of Unique Titles Viewed per User, March 15-May 31, 1997

Number of Titles Viewed

Number of Users

% of Total Users

1

225

80%

2

32

11%

3

11

   4%

4

8

   3%

5

1

*

6

1

*

7

2

   1%

Total

280

100%

NOTE : Detail may not sum to total due to rounding.

*Less than 0.5%.

 

TABLE 17.7. Hits per Unique User by Academic Cohort, March 15-May 31, 1997

Academic Cohort

N=

1 Hit

2-3 Hits

4-5 Hits

6-10 Hits

11-20 Hits

>20 Hits

Undergraduate

114

40%

28%

13%

14%

4%

1%

Graduate student

66

18%

14%

  9%

20%

27%

12%

Professional student

9

33%

22%

22%

11%

  0%

11%

Faculty

12

42%

25%

17%

  8%

  8%

  0%

most used book, Task Strategies, these texts were on reserve for social work courses during the spring 1997 semester.

• Three sections, with a total of about 70 students, used Supervision in Social Work as a key text. Thus, potentially, if all seven graduate students who used this book were participants in these courses, about 10% of the most likely student user group actually used this book on-line during this period.[12]

• Three other course sections, again with about 70 students in total, used Mutual Aid Groups. This book was a major reading; in fact, one of its authors taught two of the sections in which it was used. Sixteen graduate students used this title for a potential penetration of about 23%.

Philosophical Foundations of Social Work (as well as Qualitative Research in Social Work ) was on reserve for a doctoral seminar that had an enrollment of 11 students. The instructor reported that this book was a major text in the course that students would have bought traditionally. She did not know how many of her students used the on-line version. If all eight users-seven graduate


296

students and the one professional student-were class members, that suggests a substantial penetration for that small class. However, it is likely that some of these users were not enrolled in that course.

Location of Use of On-line Books

Scholars are not using on-line books from off-campus locations to the extent expected.

One of the key potential advantages to on-line books is their instant availability to scholars at any location at which they have access to a computer with a modem and a graphical Web browser. This benefit might well lead to substantial use of the on-line books from locations other than the Columbia campus. So far we are seeing only modest use of the books from off-campus.

From May 1996 to March 1997, 11% of the hits on the Columbia University Press nonreference books were dial-up connections from off-campus. Looking at the use of the social work titles, we find that computers in the School of Social Work were responsible for the following shares of hits on the social work titles:

 

Handbook of Gerontological Services

53%

Mutual Aid Groups, Vulnerable Populations

76%

Philosophical Foundations of Social Work

39%

Qualitative Research in Social Work

69%

Supervision in Social Work

48%

Task Strategies: An Empirical Approach

68%

Closer analysis of the usage data finds substantial use from the computer lab in the School of Social Work as well as from faculty computers. This finding suggests that many of the graduate students, most of whom do not live on or near campus, may not have Web access in their homes and, hence, are not equipped at this point in time to take full advantage of the on-line books.[13] Students who use the on-line books at the School of Social Work, however, avoid walking the several blocks to the social work library, worrying about the library's hours, or encountering nonavailability of the book in its print form. In our interviews, scholars report that key constraining factors to using the on-line books and other Web resources from home are the expense of dialing in to campus or maintaining an Internet account, the lack of sufficiently powerful home computers and Web software, the frequency of busy signals on the dial-up lines, and the slowness of standard modems.

Students residing on campus may have Ethernet connections to the campus network-providing both speedy and virtually free access to the on-line collection.

At the end of the 1996-97 academic year, approximately 2,300 students were registered for residence hall network connections.[14] With the exception of the three Garland reference books, a very small share of reference collection use occurs on


297

computers in the libraries; the Columbia community is taking advantage of the out-of-library access to these resources. For example, 42% of the hits on The Oxford English Dictionary in the ten months following May 1996 were from residence hall network connections.

However, these undergraduates have shown little interest in the nonreference books on-line. Residence hall connections accounted for only 1% of the use of the Columbia University Press titles in social work, earth and environmental science, and international relations and 3% of the use of the Oxford University Press titles in literary criticism and philosophy from May 1996 to May 1997. These small shares are not surprising given that few of these books are aimed at the undergraduate audience. The undergraduates' use of the Past Masters classical texts in social thought from Ethernet connections in their rooms is somewhat higher-654 hits, or almost 13% of the total use of those texts from May 1996 to March 1997.

Scholars' Access to On-line Resources

We theorize that scholars with greater perceived access to networked computers and with greater familiarity with on-line resources are more likely first to sample on-line books and later to adopt them for regular use (assuming that books of interest are available on-line). All project questionnaires ask about both these factors. The access question is, "Is there a computer (in the library or elsewhere) attached to the campus network (directly or by modem) that you can use whenever you want?" The question about use of on-line resources asks, "On average this semester, how many hours per week do you spend in on-line activities (Email, Listservs & Newsgroups, CLIO Plus, Text, Image or Numeric Data Sources, Other WWWeb Uses)?" In some cases, the question asks for a single value; in others, it has five spaces in which respondents are asked to enter their hours for each of these activities.

Over 80% of Columbia library users report adequate access to a networked computer.

In the Columbia Libraries annual survey of on-site users in March 1997, 2,367 individuals responded to this question on access to networked computers. Almost 81% answered "Yes." Masters students were least likely to respond positively (67%) while the other scholarly cohorts-faculty, doctoral students, and undergraduate students-ranged from 85% to 87%. Users of science libraries were generally more likely to respond affirmatively.

Columbia library users report an average of about six hours a week in on-line activities with no significant difference across scholarly cohorts.

Even many of the survey respondents who did not claim easy access to a networked computer reported spending considerable time in on-line activities-22% spent four to six hours a week and 23% spent more than six hours a week.


298

Scholars' Choice among Book Formats

Scholars' patterns of using books in their various formats and their reactions to on-line books are being tracked through a variety of surveys, individual interviews, and focus groups (see Table 17.1).

One survey involves visiting a class session for which an assigned reading was in an on-line book. A question asks which format(s) of the book the student used for this assignment. Responses were distributed as shown in Table 17.8.

In 70% of the responses for fall 1996, as seen in Table 17.8, the student had used his or her own copy of the text. The next most common method was to use a friend's copy (14%). The shares for those two modes are insignificantly different in spring 1997. We are obtaining course syllabi from instructors so that, in the future, we can analyze these responses based on what portion of the book is being used in a course and whether students are expected to purchase their own copies.

Preferences for Studying Class Reading

We obtained far fewer responses (119 in fall 1996 and 88 in spring 1997) as to the preferred mode of studying. Table 17.9 shows that in both semesters, about twothirds of respondents reported a preference for reading their own copy.

Scholars' Reactions to Book Formats and Characteristics

Scholars reporting easy access to a networked computer spend more time on-line and are more likely to prefer to use one of the forms of the on-line book.

In our in-class surveys in spring 1997, students claiming easy access to a networked computer (74% of the 209 respondents) were greater users of on-line resources overall. Only 27% of students claiming easy access reported as few as one to two hours on-line a week, while 53% of those lacking easy access had this low level of on-line activity. About 31% of the former group spent six or more hours a week on-line while 18% of the latter group did.

About 26% of the easy access group gave some form of on-line book (reading directly on-line, printout of text, or download of text and reading away from the Web) as their preferred method of reading an assignment for which an on-line version was available, while only 13% of the students lacking easy access did so.

This combination of responses suggests that, over time as members of the scholarly community obtain greater access to computers linked to the Web, on-line books will achieve greater acceptance.

Students report that they particularly value easy access to the texts that are assigned for class and an ability to underline and annotate those texts.

Students seek the ability to print out all or parts of the on-line texts that they use for their courses, again indicating their desire to have the paper copy to use in their


299
 

TABLE 17.8. Methods of Reading This Assignment

 

Fall 1996

Spring 1997

 

Count

%

Count

%

Used own copy

269

70%

141

73%

Used friend's copy

54

14%

20

10%

Used library copy

33

8%

17

9%

Used photocopy

11

3%

17

9%

Read it directly from CWeb

0

0%

0

0%

Obtained JAKE printout of texta

10

3%

16

8%

Obtained printout using non-JAKE printer

4

1%

4

2%

Downloaded on-line text to disk & read away from CWeb

5

1%

1

       *

Total

386

100%

216

111%

NOTE : % is share of responses, not cases, for individual methods of reading this assignment.

*Less than 0.5%

a JAKE is the networked laser printer system maintained by AcIS. Undergraduates and social work students can print 100 pages a week at no charge.

 

TABLE 17.9. Preferred Method of Reading This Assignment

 

Fall 1996

Spring 1997

 

Count

%of Cases

Count

%of Cases

Own copy

83

67%

56

64%

Friend's copy

9

8%

6

7%

Library copy

10

8%

6

7%

Photocopy

7

6%

8

9%

Directly from CWeb

2

2%

7

8%

JAKE printout of text

7

6%

6

7%

Printout using non-JAKE printer

3

2%

5

6%

Download of on-line text to disk to be read away from CWeb

3

2%

1

1%

Total responses

124

101%

95

109%

Total cases responding

119

 

88

 

300

studying. Computer access to a needed text is not equivalent to having a paper copy (whole book or assigned portion) in one's backpack, available at any time and at any place (see Table 17.10).

The cross-tabulation of preferred method of use and reasons for that preference produces logically consistent results. For example, all the respondents who gave "Printout using non-JAKE printer" or "Download of on-line text to disk to be read away from CWeb" as their preferred method gave "Less costly" as one of their reasons, while few of those students who preferred their own copy gave that reason.

If the effective choice for completing a required reading is between borrowing a book from the library, probably on a very short-term basis from reserves, and accessing the book on-line, the student is facing a parallel situation of needing to photocopy or print out the reading to obtain portable, annotative media.[15] However, the on-line book's advantages are that it will never be checked out when the student wants to use it and that it will be accessible from a computer anywhere in the world at any time (as long as that computer has an Internet connection and a graphical Web browser).

In surveys and interviews, scholars indicate that they value the ability to do searches, to browse, and to quickly look up information in an on-line book.

They also like the ability to clip bits of the text and put them in an electronic research notes file. Willingness to browse and to read on-line for extended periods varies from person to person, but it does not seem to be widespread at this time.

Some scholars perceive gains in the productivity and quality of their work in using on-line books, particularly reference books.

Two key questions asked on all our questionnaires, other than those distributed in class, seek to determine the effect of on-line books on scholarly work:

1. In doing the type of work for which you used this book, do paper books or on-line books help you be more productive?

2. Do you find that you are able to do work of higher quality when you use paper books or on-line books?

The questionnaire offers a range of seven responses from "Much greater productivity (quality) with paper" through "No difference" to "Much greater productivity (quality) with on-line" plus "Cannot say."

As Table 17.11 shows, 52% of OED users felt that they were as productive or more productive using the on-line OED, while 39% of the users of the other on-line books felt that they are as productive or more productive using the on-line format. These responses are somewhat puzzling because the reference book most used on-line is The OED, suggesting that scholars do value it, and the CWeb version of the on-line OED provides as much if not more utility than does the print version (with the exception of being able to view neighboring entries at a


301
 

TABLE 17.10. Reasons for Preferred Method

 

Fall 1996

Spring 1997

Reasons for Preference

Count

%

Count

%

Always available

199

72%

108

74%

Easy to annotate

135

49%

57

39%

Easy to read

104

38%

   70

48%

Less costly

60

22%

33

23%

Easy to search for words

30

11%

15

10%

Other reasons

25

9%

16

11%

Easy to copy

21

8%

20

14%

Easy to get to

0

0%

0

0%

Total responses

574

209%

319

219%

Total cases responding

276

 

146

 

SOURCE : In-Class Survey.

NOTE : Respondents could give more than one reason for their preference.

 

TABLE 17.11. CWeb On-line Survey: Productivity & Work Quality by Book & Format, September 1996-June 1997

Response

OED (N = 64)

All Other Books (N = 21)

Productivity by Book Type

   

Cannot say

12%

10%

Paper much greater

16%

24%

Paper greater

  8%

14%

Paper somewhat greater

12%

14%

No difference

2%

19%

On-line somewhat greater

17%

5%

On-line greater

17%

5%

On-line much greater

16%

10%

Work Quality by Book Type

   

Cannot say

16%

5%

Paper much greater

16%

24%

Paper greater

6%

14%

Paper somewhat greater

16%

14%

No difference

31%

29%

On-line somewhat greater

2%

0%

On-line greater

8%

0%

On-line much greater

6%

14%


302

glance). Thus, one might expect the productivity rating for the on-line ODE to be higher.

The distribution of responses to the quality of work question supports the print format in general, although 47% of ODE users and 43% of the users of all the other books felt that quality was as good or better with on-line books.

Table 17.12 shows considerable correlation in the responses to these two questions-those who supported the paper version for productivity tended to support it for quality as well.

In the last part of the spring 1997 semester, 52% of the on-line book users who went to the on-line survey responded to it, but only 15% of users chose to click on the button taking them to the survey.

Designing an on-line survey that is available to the reader without overt action might enhance the response rate significantly. We are working on doing that using HTML frames on the on-line books. We are also experimenting with other methods of reaching the users of the on-line books, e.g., registration of users that will bring e-mail messages about new books in their field while also enabling us to query them about their reactions to on-line books.

Conclusions

These preliminary results of the Online Book Evaluation Project suggest that, at this early point in its development, the on-line format is finding a place in the work patterns of scholars who have had an opportunity to try it.

Interviews and focus groups substantiate the findings from the server data and surveys. Together they suggest the following about scholars' reactions to the online format:

• It is a convenient way to access information in reference books and potentially to do textual analyses in individual books or whole databases like the OED.

• Using a search function, one can quickly determine if a book or set of books addresses a topic of interest and warrants further investigation.

• It is an easy way to browse through a book to determine whether it is worth deeper exploration or whether only a small section is pertinent to one's work. If the latter is the case, it is as easy to print out that small section of the on-line book as it is to take the typical next step of photocopying that section of the paper book.

• A scholar who wants to read and annotate only a modest section of a book, say a chapter or an essay for a course assignment, will find that accessing and printing out the section from the on-line book can be quicker than doing the equivalent with a library copy of the paper book.


303
 

TABLE 17.12. CWeb On-line Survey: Quality & Productivity, September 1996-June 1997

Productivity

Quality of Work

 

Cannot say

Better paper

No difference

Better on-line

Cannot say

8

  0

2

0

Better paper

3

27

5

1

No difference

0

1

4

0

Better on-line

0

9

15

12

• Ready access from any location at any hour and not worrying about whether the book sought is on the library shelf are valued features of the on-line format.

On the other hand, if scholars want to read much or all of a book, they are likely to prefer the traditional format. If the book is core to their research or to a course, scholars are likely to prefer to own a copy. If they cannot afford such a copy, if the book is of more passing interest, or if they cannot obtain a print copy, scholars would typically prefer to retain a library copy for the duration of their interest in the book. If they cannot do so, say because the book is on reserve, scholars must decide among their options, e.g., buying their own copy or using an on-line copy, and decide which option is next preferred.

Over the duration of this project, we will continue to add books to the on-line collection and to pursue our explorations of scholars' reactions to this format. We will look for trends in the perceived accessibility of on-line books and in the desirability of this format for various uses. We will seek to measure the frequency with which scholars read such substantial portions of books borrowed from libraries that they will continue to seek library access to paper copies. In a related effort, we will assess the extent to which libraries now satisfy scholars' desires for access to such copies. If a library did not have a book in its collection in print format but did offer on-line access, a scholar would face a different trade-off between the two formats.

At the same time we will pursue our analyses of the cost and intellectual property issues involved in scholarly communication in an effort to determine whether the on-line book format can contribute to the perpetuation of research and learning and to the dissemination and preservation of knowledge.


305
 

APPENDIX A . Contemporary Books in the On-line Collection

Publisher/Title

Author

Subject

Print Status

Month Public

Columbia University Press

Great Paleozoic Crisis

Erwin

Earth Science

Circulating

6/97

Seismosaurus: The Earth Shaker

Gillette

Earth Science

Circulating

10/96

Invasions of the Land

Gordon

Earth Science

Circulating

6/97

Folding of Viscous Layersa

Johnson

Earth Science

Circulating

 

Dinosaur Tracks & Other Fossil Footprintsa

Lockley

Earth Science

Circulating

 

Sedimentographica: Photographic Atlas

Ricci-Lucchi

Earth Science

Circulating

1/97

Development of Biological Systematicsa

Stevens

Earth Science

Circulating

 

Consuming Subjectsa

Kowaleski

Economic History

Circulating

 

Jordan's Inter-Arab Relations

Brand

Internat'l Relations

Reserves

3/97

Managing Indonesia

Bresnan

Internat'l Relations

Reserves

3/97

Logic of Anarchy a

Buzan

Internat'l Relations

Reserves

 

Hemmed In: Responses to Africa's ...

Callaghy

Internat'l Relations

Reserves

3/97

China's Road to the Korean Wara

Chen

Internat'l Relations

Circulating

 

Culture of National Securitya

Katzenstein

Internat'l Relations

Reserves

 

International Relations Theory & the End of the Cold Wara

Lebow

Internat'l Relations

Circulating

 

The Cold War on the Peripherya

McMahon

Internat'l Relations

Circulating

 

Losing Control: Sovereignty ...a

Sassen

Internat'l Relations

Circulating

 

Gender In International Relations

Tickner

Internat'l Relations

Reserves Circulating

11/96

The Inhuman Race a

Cassuto

Literary Criticism

Circulating

 

Rethinking Class: Literary Studies ...a

Dimock

Literary Criticism

Circulating

 

The Blue-Eyed Tarokajaa

Keene

Literary Criticism

Circulating

 

Ecological Literary Criticisma

Kroeber

Literary Criticism

Circulating

 

Parables of Possibilitya

Martin

Literary Criticism

Circulating

 

The Text and the Voicea

Portelli

Literary Criticism

Circulating

 

At Emerson's Tomba

Rowe

Literary Criticism

Circulating

 

Extraordinary Bodies: Figuring Physical ...a

Thomson

Literary Criticism

Circulating

 

What Else But Love? The Ordeal of Race ...a

Weinstein

Literary Criticism

Circulating

 

Columbia Granger's Index to Poetry

Granger

Poetry

Ref.Desk

10/94


306
 

APPENDIX A . (continued)

Publisher/Title

Author

Subject

Print Status

Month Public

Ozone Discourses

Liftin

Political Science

Reserves

1/97

Concise Columbia Electronic Encyclopedia

 

Reference

Ref.Desk

3/91

Hierarchy Theory a

Ahl

Science

Circulating