Digitizing the Leffingwell Scrapbooks:
A Study in Complex and Fragile Materials

Where do you start when you want to digitize a collection of complex and fragile materials?

The Lawrence and Lee Theatre Research Institute at The Ohio State University Libraries holds, in its Leffingwell Collection, scrapbooks that document American theatre performances from the late 19th and early 20th centuries. These scrapbooks include loose items, folded sections, and acidic paper mounted on base pages of nearly transparent tissue.

The project of digitizing these scrapbooks presented numerous challenges, from choosing which volumes should be converted to deciding how to handle long newspaper clippings, loose bindings, and detached covers, as well as brittle, creased, and dog-eared tissue pages.

In this webinar, recorded April 26, 2017, Nena Couch and Emily Shaw, of The Ohio State University Libraries, and Courtney LoPresti, of Backstage Library Works, describe the process of selecting and preparing these materials for digitization and discuss decisions that were made before and during the digital imaging process.



How did you deal with multiple layers of information on a single scrapbook page?

OSU: You have to make a decision early regarding the level of detail that you want to capture. For the Leffingwell collection, we captured everything, but the project was more expensive as a result. Getting 100% capture is what led the project to exceed the initial page estimate. For future projects, we might choose to just have a straight 2-up capture. That would cost less, but clippings and inserts that stick out past the edge of the page might be cut off. If you do want to capture everything—and also want to have accurate pricing—make sure to turn every page when counting and work closely with your vendor.


Are the scrapbooks presented as single volume (e.g. PDF with several pages)? Or is each page a single image file?

OSU: In the Leffingwell collection, the images are displayed as PDFs. Some volumes are large enough that they've been divided into sections, with each section represented by a PDF file.

Backstage: PDF is one way to virtually bind multiple pages together as a single volume. The Flash-based PDF FlipBook that one attendee mentioned in the webinar chat is another way. Your digital asset management system may have other ways to keep a collection of images together in a set.

As Backstage organizes the scans, we group the master TIFF image files from a single volume together in a folder and name the files with a sequential counter. That facilitates sorting and conversion into PDF volumes or whatever other grouping tool you decide to use, now or in the future.

OSU: We typically prefer platforms that are more likely to allow our IT team to support them into the future. Flipbooks have robust access but may not work properly as systems are upgraded and change. That kind of access also requires very robust and detailed metadata, which we do not currently have for the Leffingwell collection.

Do you save the master image file?

OSU: At Ohio State, we preserve all of the master TIFF files and all of the PDFs. We may not preserve the PDFs forever, depending on available file types and OCR options in the future.

Backstage: Our cameras output the original images as TIFF files. We consider these TIFF images the master files and recommend preserving the TIFFs, regardless of what derivative formats you decide to produce for access and display.

For preservation purposes, the Federal Agencies Digital Guidelines Initiative (FADGI), recommends the TIFF format for still images from all types of source materials. Under FADGI, JPEG 2000 and PDF/A are also acceptable master file formats for some material types.


Some of our scrapbooks have newspaper clippings glued in overlapping rows. To even look at the clippings risks breaking the fragile newsprint. How would you recommend dealing with those?

Backstage: Providing access to special collections can sometimes put the originals at risk. As you note, that's true whether you're trying to digitize something or simply read it. You have to weigh the value of making that content available against the value of preserving the original. Those aren't easy decisions.

You might decide to dismount the glued-in pieces. How you handle that depends on the condition of all the paper types involved and what adhesives were used. At that point, you're really looking at conservation questions, and someone would have to examine the materials to make a recommendation.

You could decide to digitize the pages without folding or removing any of the overlapped clippings. That's a simpler route, but the trade-off is that the overlapped content will remain hidden. On the other hand, the originals are still intact, if you find there's a reason to take a closer look in the future.


What is the make and manufacturer of the scanner?

Is the capture device proprietary or what make/model was used?

Backstage: We don't prescribe a one-size-fits-all method for our clients, so the equipment used could change, depending on the materials being scanned. We'd be happy to discuss equipment options in detail for a specific project.

Generally speaking, our studio uses planetary-mounted (overhead) medium-format digital cameras in the 60 to 100 megapixel range. We use media cradles that are custom-built to our specifications. For the Leffingwell project, based on the condition the material was in and the fact that the books were bound and needed to remain bound, we used a cradle that only opens the materials up to 120 degrees.

We do not use flatbed scanners. We find that we get better results and faster workflows from a camera's shutter capture.

We wouldn't recommend a flatbed for this type of materials, in any case. Gravity would be your enemy in repeatedly turning a scrapbook over. Loose elements could fall out. Managing fold-outs would be problematic. Dimensional objects such as pins and flowers would be difficult to capture, given a scanner's limited depth-of-field. And the book's spine would almost certainly be damaged by the repeated flattening motion required to get a good scan.


What's the maximum size of the item for this process? We have a few scrapbooks that are the size of coffee tables.

Backstage: Accommodating larger objects raises two issues: physical, structural support, and digital resolution.

First, you need to support the materials, physically. We use an oversized cradle to hold very large objects for digital imaging. We can make further adjustments with additional foam and boards to properly support whatever size is needed.

Second, you need to capture the entire image at the desired resolution. Looking at the size of the capture area, we can determine whether to use a camera with a larger sensor array. In bound materials, we can capture one page at a time, instead of taking in a two-page spread.

Would you break a folded document at the creases in order to be able to capture the entire thing?

Backstage: With maps and similar, very large documents, we sometimes capture an image in panels, then digitally stitch the images together in post-processing. We don't necessarily break the images at the creases for that stitching process. You need to have some overlap between one image and the next to optimize the alignment.

You revisited the specifications mid-project and opted to change the resolution. Does that mean that half the output comprises very large files and the other half are lower-res, or smaller size?

Backstage: The original TIFF images are all at full resolution. The decision to reduce resolution was for the PDF derivative files. This makes those files smaller and faster for a researcher to load for viewing.


What about shipping/transfer of items to Backstage? How was that done, and what was the timeframe of that process?

OSU: This was mostly handled by Emily's predecessor. We have a conservator in house who does all of the packing to carefully transport the material.

When loaning materials to another institution, we'll use a fine art shipping company. When shipping to vendors, we often use regular shipping companies and opt for next-day service. Overnight shipping offers tracking and higher-value insurance options.

Backstage: We ask our clients to fill out a shipping manifest, for which we provide a template, and we recommend an inventory of all materials. It's easier to keep track if everyone knows what's supposed to be in the package.

In packing fragile collections, pick a good, sturdy box, and include enough packing material to hold the items securely in place. If you can hear or feel things jostling in the box, then it's too loose.

Backstage can recommend a shipping vendor to fit your collection's needs. We also offer our own van service for material pickup and return. And we can set up equipment at your location to scan your materials on-site, if the collection simply can't travel.


What are your plans for transcribing the handwritten text?

OSU: We have no plans for transcribing this set of scrapbooks as they little to no handwritting. This collection is mostly made up of clippings of printed text, so OCR is sufficent. They would look at transcription for more manuscript style materials. We might crowdsource the work in that case.

Backstage: When you have content that can't be OCR'd, transcription is one option. We've also done projects where we used TEI metadata to flag key information like names and dates in collections with handwritten materials.


Is the metadata per page?

Backstage: Metadata can range from item-level records, to page-level descriptions, to describing individual clippings and photographs.

In this case, the digitization equipment records the technical metadata for each image. Our file naming and directory arrangement provide the page sequencing and volume identification of structural metadata.

The existing catalog records offer description at the volume level. And the OCR text then provides searchable metadata for the content within each volume. That's not the same thing as creating detailed descriptive metadata for each page or each clipping, but it serves a similar function for discovery at a much lower cost.


I'm confused about how the estimate came in so low.

Backstage: In the case of the Leffingwell scrapbooks, it was Ohio State's decision to capture every fold-out possible—a change from the initial bid specifications—that pushed the project so far over the planned image count. The work was priced at a per-image rate, so with their decision to extend the funding, it was easy to add the extra fold-out page views to the project.

Because your institution is in posession of your materials when estimates are being made, we rely on you for the page count. If an accurate count is crucial to your budget plan, take time to verify the number of pages and to consider how many pages will be captured more than once to display additional layers, like the folded clippings.

Tough question, but if BSLW went 5,000 images over estimate, then, what happens for a smaller institution when they can't *find* the money like OSU to complete it?

Backstage: When we see that a project is in danger of going over the estimated page count, we immediately let you know. We never force a client's hand on overages.

Would Backstage put the project on hold and be willing to resume later?

Backstage: Yes, we can break a project into phases, if needed, to accommodate the availability of funding.


