Scientific Journals of the Future
by Steven M. Bachrach, Trinity University
San Antonio, TX
The 20th century has witnessed a great many technological advances, but perhaps none have had as great an impact upon everyday lifestyles as changes in our means of communication. We have seen the development of true mass communication. Telephones are ubiquitous, even in airplanes. Cellular phones, beepers, and messaging services make us accessible anywhere on the globe. Offset printing and high-speed copiers allow for the manufacture of books only days after the events written about have occurred. Film has evolved from the silents to the talkies to color to Technicolor to video, and virtual reality entertainment centers are just around the corner. Television is perhaps the most dominant communications medium and it, too, has progressed from the vacuum tube-filled console with the round black-and-white screen to the portable color set with video cassette unit, and flat-panel, high definition models are in the offing. Delivery of television signals via an analog signal sent through the air is being supplanted by cable systems and by digital satellite systems, each offering over 100 channels. Since the beginning of the '90s, the Internet has fostered a new realm of communications, the globally connected computer network where millions chat in specially created virtual environments and send email to children at school overseas.
Yet, while all of these changes have radically altered our society, the primary means by which scientists communicate with each other has remained frozen in time, unchanged in well over 100 years. We scientists still produce the written article, published in the specialized scientific journal, that appears as ink applied to paper.
Our libraries are jammed from floor to ceiling with printed journal after printed journal. The trend for growth in scientific publishing is truly staggering. Using the field of chemistry as an example of what is typical in all scientific disciplines, Figure 1 shows the number of abstracts indexed by Chemical Abstracts for the past 90 years. Similarly, the number of scientific journals has grown, with every new subdiscipline launching its own journal or, more typically, multiple journals.
This is not to say that scientists are doing science in the same fashion as their predecessors at the turn of this century. Technology has completely changed how scientists collect data, but what is pertinent here is that we process the data and information using state-of-the-art technology. For example, visualization of data has been revolutionized by the appearance of relatively low-cost workstations. Large data sets now can be represented using high-resolution color images and animations. Climatologists simulate weather over succeeding weeks and watch the storms progress across the face of the simulated earth. Seismologists visualize the innards of the earth, searching for undiscovered pockets of oil. Physicians make use of MRI (magnetic resonance imaging) to visualize internal organs in a non-invasive manner. Physicists simulate the capture of the moon by the earth, and chemists watch the progression, in molecular form, from small carbon clusters to bucky balls to the formation of soot. Molecular biologists use virtual reality to manipulate enzymes and DNA strands. These animations are necessary pieces of the working scientists' arsenal, and it is an extremely rare desk in a laboratory that does not have a PC (personal computer) or workstation glowing with some fancy color graphic.
And yet, even though the use of color graphics, animations, sound, and extremely large data sets are now routine and essential components of the scientific method and process, all of these are omitted when the time comes to distribute the knowledge among our colleagues. Since we write journal articles destined to appear in print on paper, we can't include a movie or sound. Color images are still (for most journals) too expensive to print. Large data sets consume a significant number of the limited pages available in journals and are at best relegated to supplements, if not left out entirely.
A New Paradigm
As already mentioned, the Internet has become a major avenue for mass communications, spurred by a number of developments. First, the Internet has penetrated most countries around the world. It is now available to many at home, to most academic centers, and to many businesses around the globe. The cost of using the Internet is quite low. Last, the power of desktop PCs has become so formidable that multimedia now can be handled reasonably in many environments, from the home, to office, to university, to government research institutions.
At the same time, continued printing of scientific articles on paper is becoming less effective, and not only because of the limitations of print in a multi-media world. Libraries are hamstrung by ever tightening budgets, so the myriad journals can no longer be acquired. Space is limited for continued storage of printed materials. The result is that published papers will reach fewer readers, since access to the printed journals will become more difficult.
The Internet offers solutions to these problems. With this new medium comes new opportunities. The time is ripe for a dramatic, profound shift in how scientists should (and will) communicate in the near future.
Instead of publishing articles on paper, publication will occur electronically, with an article appearing as a collection of files on the Internet. With the global reach of the Internet, access to information stored on any computer is potentially available on any desktop. There is no need to walk to the library or to order a photocopy via interlibrary loan.
As discussed in more detail elsewhere in this book, the economics of electronic publication may prove to be significantly less costly than print publication. This can be especially true if authors take a larger role in the publication process. Lower costs can lead to greater access of materials to all scientists.
The growth rate of storage capacity of computers is even faster than the growth rate of publication (Figure 1). This provides hope that our increasing production of articles can be stored and managed by using the distributed storage capabilities of the Internet. In fact, with storage becoming so cheap and available, page limitations may become a restriction of the past. There will be no pressure to remove data, spectra, etc. simply to meet a length limitation established by a publisher.
Regardless of the economics of electronic publishing, there is no doubt that e-publishing will allow fully enabled scientific discourse. There are no added costs to producing a color image rather than a black-and-white one. Videos can be created and saved in a digital file that can then be viewed on any computer that has the proper software. The same is true for sound.
Perhaps more important and revolutionary is that e-publishing brings data into the hands of the readers in a way that facilitates interaction and discovery. The best way to describe this is by providing a concrete example drawn from chemistry. Chemists are very interested in the structure of a molecule, and one way of obtaining this is via x-ray crystallography. Once a crystallographer has determined the structure, she must select a single view of this molecule for presentation in the printed article. This is a static ("dead") image. A reader of this work may be interested in a view of the back side of the molecule. Unfortunately, the reader is forced into some very time-consuming manipulations to obtain the necessary view, either by entering the structure by hand into her own molecular viewing program or by obtaining the structure from a database and feeding it into the viewer. With electronic publishing, what is published is not the "dead" image but the coordinates of the molecule itself. The reader uses a browser that will automatically direct the coordinates into the appropriate viewer, and the reader is free to manipulate the structure to her heart's content. This concept of interactive manipulation of data is possible in all disciplines, be it a matrix of wind speeds in a hurricane, the scattering tracks made by particles in a collider, the three-dimensional image of a brain tumor obtained in an MRI (magnetic resonance image), or the DNA sequences of a dinosaur. This interactivity goes far beyond what is capable in traditional print and is perhaps the most compelling reason for the shift to electronic publication of scientific articles.
How an article is published on the Internet
Publishing an article on the Internet can be very simple. An author writes the manuscript (creates a file) and places it on a computer that is accessible by others. The question then becomes, what format should be used for storing the article?
As of early 1998, there are two readily available and simple formats for saving documents. They differ quite dramatically in their intent. Portable Data Format (PDF) attempts to maintain the look and feel of the printed document. It provides a means for storing a document in a digital form that will appear identical on any screen and when printed from any output device, such as a laser printer or offset press. PDF provides for the complete typeset and layout control that publishers have been using for decades, meaning that text and graphics can be incorporated on a single page, font and point size are dictated and preserved, etc. Publishers can deliver a PDF file with complete assurance of how the document will appear. A PDF file is created from a wide variety of original sources, such as a text or graphics program, and then viewed using (as of this writing) Adobe Acrobat, a freely distributed program. PDF can incorporate hyperlinks to other documents or other sections of a single document. However, the user is not really dealing with text, since PDF files cannot be directly edited; one cannot cut text out of the PDF file and paste it elsewhere — one is really dealing with an image.
The alternative format is hypertext markup language (HTML), which is a subset of the very powerful page layout language SGML (standard generalized markup language). Text is marked up to identify structure within the document. For example, tags are inserted around text to indicate that this is a header, or a footnote, or an author's name, or in boldface, etc. A program then interprets these tags and formats the text appropriately. Currently, HTML has a limited set of layout features, but as HTML continues to evolve, authors will be able to produce ever more specialized layouts. Files written in HTML can be viewed quite easily using a browser such as Netscape Navigator or Microsoft's Internet Explorer.
The markup approach has a number of distinct advantages over the PDF approach. One is that the reader can dictate some of the presentation features, such as the font and point size, and the size of the window. But of greater significance to scientists is that markup languages are inherently extensible. Currently, HTML deals primarily with how text should be displayed. For a scientist, some portions of the text have deeper meaning than the words themselves. The name of a compound is more than just a string of text; it contains chemical information. The name "benzene" implies a molecular formula of C6H6, a molecular weight of 78.11, and a structure shown below. A chemical markup language would allow for this type of information to be contained within the document itself. A chemically-intelligent browser could then render a 2-D or 3-D image of the molecule instead of just the text. Another example is units. While most scientists use the SI (International System of Units), it is not universal nor is it often convenient. A bond energy might be represented as 100 kcal/mol within the text. Chemical markup would allow this number to be properly identified as an energy unit, and a browser could provide instant conversion to an alternative unit, such as ergs. The beauty here is that data can be represented not only as text but also as manipulable information.
Benzene
Initiatives supported by both Netscape and Microsoft are creating extensions to HTML, under the name XML. The first example of XML actually was developed in the field of chemistry — CML, chemical markup language. A proof-of-concept of CML already exists, with a definition of the tags and a working browser.
Once the article is written and saved in an appropriate format, the author simply mounts the files (either HTML or PDF), and any associated graphics or interactive media, on a web server, which makes the document available to the public.
The web therefore can be viewed in a sense as everyone's personal vanity press. For most scientists, this perspective is anathema. Scientists demand quality control and peer review. If the web ends up as a realm where everyone places articles without peer review, there is fear that too much garbage will be available and that it will be difficult, if impossible, to locate the "good" science, and that a giant step backwards will have been taken. One of the strong arguments in favor of the journal system is that it provides a framework for peer review, thereby assuring readers that the contents have had at least some scrutiny before publication.
There is no reason to believe that the peer review system must perish in an electronic environment. Furthermore, even with the ability to easily and readily "self-publish" via the web, it is not a given that the journals will cease, especially if there is a demand for the value-added services a journal can provide. The key then is to recognize what the functions of a journal are and how these may be provided in an electronic medium.
An Aside — The Present State of Electronic Journals in Chemistry
Many electronic chemistry journals are available, but the vast majority of them take little advantage of the opportunities provided. Most current electronic journals as of early 1998 are simply electronic delivery of their print counterpart. In fact, virtually all are offered in both print and electronic form. Since the publishers use the print version as the authoritative source, the electronic version is simply a duplicate of the print. Articles are usually made available as PDF files, though some offer HTML versions as well.
The key point here is that there is almost no enhancement of the publication to take advantage of the electronic nature of publication. The American Chemical Society journals do include hyperlinks to other articles published by the ACS, so one can quickly obtain an article referenced in the original article (assuming of course that the reader subscribes to both journals). Electronic publication of the articles is generally prior to the print versions, so access to the information is improved. But in general, interactive tools, 3-D structures, full interactive spectra, color graphics, and animations are not included in any of these electronic versions. For most journals, there is no procedure for authors to include electronic enhancements. Some electronic journals, which are analogues of their print cousins, do allow for deposition of these electronic enhancements within the supplementary materials, which are made available through the web. However, they are still not directly incorporated within the articles.
Only a handful of electronic chemistry journals take advantage of the enhanced presentation aspects of electronic publication. The earliest was the Journal of Molecular Modeling, which does include color graphics and manipulative structures. The Chemical Educator has had articles that include interactive java applets to demonstrate new educational techniques. The recently launched Internet Journal of Chemistry offers authors extensive opportunities to publish enhanced articles and begins to incorporate some aspects of chemical markup within the articles.
A Model for Electronic Journals
We must recognize that in an electronic publishing regime, it is not necessary for the journal to be the distributor of the information. In the print world, this is essential; the journal collects the articles, prints them on paper, binds them, and ships them to subscribers.
On the Internet, it is irrelevant where a document resides and how it gets delivered to a reader. As long as the network is active and the server is functional, documents will be delivered. A journal publisher may find it useful to provide a central repository of documents and to function as a long-term archive, an issue addressed in other chapters of this book. But physical transfer of files from an author to a central computing facility is not necessary.
In fact, a central depository may be inefficient. If the central system goes down, the entire journal is gone. Now, mirror sites can be arranged to provide redundancy, while a distributed collection of documents can take advantage of local resources, e.g., large storage capacity or specialized computing needs for certain applications embedded within an article.
So what benefits should a journal provide in electronic media? First, peer review and quality control remain the central functions. Journals will provide the "stamp of approval" indicating that the article has been judged by experts. Second, the journal provides a collection of these accepted articles that together provide a continuously evolving picture of the state of a discipline. A single, isolated scientific article has little context and little value. A collection of articles, like this book for example, fleshes out a topic and provides a broader context for evaluating the impact and importance of the work. Thus, even though the documents may be geographically distributed over many sites, the journal offers a collective index that maps onto the Internet. In other words, journals become overlays to the collective content of the Internet, a road map of approved content.
Thus the journal can provide a table of contents, author and subject indices, classification schemes, hyperlinks through the citations, and search engines across the contents. Searches could be conducted within an article, across a range of articles, or across the entire journal. Search tools can be quite complex and specific to each discipline. In a sense, the journal becomes a database, with articles as its content. It is the organizational structure that the journal provides that is the key here. If the Internet becomes a complete world of "self-publication" with no journals, then navigation and location of relevant materials will become horrifically time- and resource-consuming. But the journal home page can be the stepping-off point towards locating information.
Third, the journal can provide a mechanism for subscribers to pose questions and comments on the article to the author and to the other readers. These questions and answers can be attached to the article, enabling readers to move quickly from the text to comments and back. In a sense, a discussion group or mini-conference can be created for each article, providing a "living document" environment for every work.
The distributed medium also suggests some novel methods for peer review. Articles can be deposited in a preprint archive and made available to the community. This archive can collect comments and reviews from the community at large instead of the one or two referees typically employed in the print review process. Allowing community review may provide a broader stamp of approval and increase access to less traditional work that may have a difficult time in the typical review process. Perhaps, once an article generates a critical mass of favorable comments from qualified reviewers or subscribers, it is accepted by a journal.
It is clear that electronic publication can serve the scientific community in new ways. The key to publication is information distribution. Electronic publication facilitates this in many ways, some quite novel. Electronic distribution is likely to be less expensive that print. Access by scientists around the world is likely to be much greater and easier. Documents can be made available in less time. Information content is boosted via the electronic medium, allowing for publication of audio, video, large data sets, and interactive tools. The tyranny of page limits becomes obsolete, and while conciseness will remain next to godliness, the advantage of allowing all quality work to be accepted surely outweighs the disadvantage of some verbose contributions. Peer review can be made more inclusive and can empower the community as a whole.
The future of electronic publication holds out hope for a true information revolution.