Event Calendar
View upcoming events at Boston College
Full story:
Video
- A Paradise Lost reading, in a Boston College Minute
- Inside the BC Studio with the poet Brendan Galvin '60
- "From Denial to Acceptance: Holy See–Israel Relations," a talk by Mordechay Lewy, Israel's ambassador to the Vatican
Reconnect 2009
Reader's List
Books by alumni, faculty, and staff
Headliners
Alumni in the news
BC Bookstore Connection
Order books noted in Boston College Magazine
Class Notes
Join the online community of alumni
Books, camera, action
Toward a universal electronic library

Scanning a 1910 collection of Jesuit documents at the Boston Public Library. Photograph: Gary Wayne Gilbert
On November 14, 2007, a 160-year-old volume, The History of the Jesuits, by Andrew Steinmetz, left Boston College’s O’Neill Library in an interlibrary loan truck headed for the Boston Public Library. It arrived in Copley Square later that day and was transferred, along with books from other area libraries, into a freight elevator and carried up to Room 212, a stifling hot warren full of the sounds of fans whirring, pages turning, and electronic equipment clicking and shifting.
Room 212 is the home of the Northeast digital scanning center run by the nonprofit Internet Archive. Steinmetz’s History was placed on a glass-covered, V-shaped scanning bed, and a young Bostonian spent more than an hour flipping through all 512 of its pages as two high-end digital cameras captured them from above. A technician in San Francisco next formatted the raw images, and a book that 72 hours earlier could not be checked out from some of the country’s largest libraries suddenly became available for viewing and downloading by anyone with an Internet connection.
Steinmetz’s History (the author, who lived in England, also wrote a novel, The Jesuit in the Family) was among the first batch of books from Boston College to sit for the archive’s cameras. Since that day, the University’s library system has sent 186 items from its collections to Room 212. It has done so as a member of the Open Content Alliance (OCA), a consortium of some 100 academic and cultural institutions, whose modest aim is to create, in digital form, a universal public library—multilingual and multimedia—that is nonprofit and open to all.
Ever since words were put to papyrus, humans have dreamt of, and tried to build, a universal library. The library at Alexan-dria is reputed to have held between 30 and 70 percent of the world’s books in the first few centuries a.d. In the 16th century, the Swiss physician Conrad Gesner—whose catalogue of animals earned him the honorific “father of zoology”—compiled the Bibliotheca universalis, which he claimed listed every work written since antiquity in Hebrew, Greek, or Latin (he included some 1,800 authors).
Fast forward to the digital age, when, in 1971, Project Gutenberg was launched by a student at the University of Illinois. Manned by volunteers, it marked the first attempt at a large-scale digital library, scanning books and documents in the public domain (beginning with the Declaration of Independence). Project Gutenberg has made available more than 25,000 books, from Manual of Surgery (1915) to Aesop’s Fables, but funding is limited and its volunteers produce only about 340 books a month, working in their spare time. Perhaps the largest nonprofit digitization effort to date has been the Universal Library, established by Carnegie Mellon University. Partnering with institutions overseas, notably in China, India, and Egypt, the Universal Library has digitized more than 1.5 million volumes since 2002, in more than 20 languages (“a little more than 1 percent of all of the world’s books,” according to its website). Many of its volumes, however, are protected by copyright, which means that less than 10 percent of their content can be tapped for free.
On the dot-com side, Amazon began digitizing certain books in its catalogue for potential buyers to browse in snippets in 2003, and more than 170,000 books are available in full through the company electronically, at a price. But it is Google, in partnership with 20 of the world’s largest libraries—from the New York Public Library to the collections at Oxford University and the University of Michigan—that has undertaken the most ambitious assault on the daunting task of creating a universal digital library. Since 2004, Google has been scanning books at no charge to their institutional owners, and the company believes it can digitize the world’s 32 million books (Google’s estimate, derived from the online database WorldCat) in just 10 years. The Google Books Library Project, as it is called, has drawn criticism from librarians, however, on several grounds—first, and probably foremost, for its maverick approach to copyright law. Google scans some copyrighted books that are out of print and makes them freely available for searching—and for fragmentary reading—until an author requests that the scan be removed. Second, the company requires that all scanned materials be indexed exclusively on the Google search engine. A search for The Great Gatsby on Yahoo, Ask.com, or any other search engine won’t turn up Google’s digitized editions. On October 28, a settlement was tentatively reached in lawsuits against Google brought by a group of U.S. publishers and the Authors Guild. It provides for compensation to authors and publishers when their books are viewed online, and it now awaits court approval.
“Google was a sexy idea,” says Paul Nguyen, who is director of the Northeast scanning center. But “what’s the best way to make the product robust? Make it free and open,” he says.
The OCA, to which Boston College belongs, is the creation of Brewster Kahle (pronounced “kale”), a dot-com entrepreneur who sold his text-searching system to AOL in 1995 and his web traffic–analysis tool to Amazon in 1999. Kahle founded the Internet Archive in 1996 with the public-spirited goal of providing “Universal Access to All Knowledge” (the phrase is the title of a 2004 speech he gave at the Library of Congress, an institution with which the Internet Archive has ongoing projects). The archive now operates 14 scanning centers spread across the United States and four more abroad, in Canada, England, Scotland, and Guatemala. It also maintains a “digital time capsule” made up of screen captures of websites (for instance, the New York Times on September 11, 2001) and one of the Internet’s largest free collections of recorded live music (5,873 Grateful Dead listings alone).
Initially supported in 2005 by several technology giants, including Yahoo and Microsoft (Microsoft left the project in May 2008), and by the Alfred P. Sloan Foundation, the OCA represents an attempt to provide a nonprofit, open-access alternative to Google that avoids issues of commercialization and corporate control. In addition to major university libraries (Johns Hopkins, Columbia, Chicago, Texas, North Carolina, to name a few) and special collections (the Bancroft Library, the Smithsonian Institution) in the United States, the OCA counts among its international partners the British Library and the National Library of Australia. Regional members affiliated through the Boston Library Consortium include Boston University, Brown, Brandeis, MIT, Tufts, UMass, Wellesley, and Williams. Though the OCA charges participating libraries a 10-cent per page scanning fee—still far cheaper than Boston College could do the work on its own—librarians involved in the alliance say it has thus far trumped Google’s effort in several respects: It scans only out-of-copyright books (mainly, books published 85 years before the current year). And the OCA allows any search engine—including Google—to link to its digital volumes.
In Room 212, which sits just off a hallway that connects the Boston Public Library’s 19th-century McKim Building and its 1970s expansion, the scanning facility with its 10 digitizing stations has been in operation since August 2007, 16 hours a day, five days a week. Turnover among employees is high, as the production rate—500 pages per hour, 4,000 per day, 50,000 per month—leaves no time for pleasure reading.
Collectively, the membership of the OCA has contributed more than 400,000 items to the Internet Archive, ranging from the personal library of John Adams (courtesy of the Boston Public Library) to contemporary accounts of the Gold Rush provided by the University of California at Berkeley. “This project is truly revolutionary,” says Boston College librarian Christine Conroy. “This is Reading 2.0.”
At its campus in Chestnut Hill, in a back room on the second floor of O’Neill Library, Boston College has been running a small in-house digitization effort for eight years (its first undertaking was the Ratio Studiorum, the founding document of Jesuit education). Manpower and equipment costs have limited the University’s digitization to student theses, dissertations, and a handful of books on Boston College history. Through the OCA, however, the University expects to digitize 1,000 volumes a year—with the ultimate goal of having its entire 2.4 million-volume collection accounted for.
The goal is more readily attainable than one might expect. Because nearly every library has multiple copies of, say, Moby Dick, Boston College does not need to scan its own 27 volumes: The OCA has three copies online (Google has close to 5,000). As for copyrighted materials, BC’s librarians expect a solution will emerge—perhaps ushered in by Google’s recent tentative legal settlement—that will result in even new books being made available at little or no cost. Boston College is acting preemptively on one front: It now acquires many freshly published books in digital form.
The University continues to focus on digitizing materials in its holdings that are rare, or at least scarce, starting with its collection of more than 2,500 volumes of Jesuitana (the most downloaded Boston College book on the OCA website recently was a Jesuit document from 1925 whose Latin title translates as “Letters sent to Rome four times a year from all the places—except India and Brazil—where any men of the Society of Jesus are engaged”). Though some of these materials are available in other libraries, many, including original letters from St. Francis Xavier, have previously been inaccessible to all but visitors to BC’s special collections.
The University librarians estimate that Boston College’s total collection could be online in 10 to 15 years, preserved from time’s ravages, cut loose from the disintegration of paper and the fraying of pages. “This is a way of buying the future of our own collection,” said Jerome Yavarkovsky, former head of the Boston College Library system, in an interview last summer. Yavarkovsky served the University from 1995 to 2008, and the digitization process began on his watch. “What we are doing,” he said then, “is not just preserving, but ensuring the material is freely and openly available to anyone.”
Reeves Wiedeman is a reporting intern at the Chronicle of Higher Education.
Read more by Reeves Wiedeman

