School of Information - The University of TexasSkip to content
Home Academic Programs Admissions Courses People & Groups Research & Publications Computing Resources Career Services
INF 392K Problems in the Permanent Retention of Electronic Records - Schedule, Spring 2010
Open Book
  INF 392K Home

Site Map
Contact Info
UT Home

NOTE: This syllabus is preliminary until the first class meets and may change slightly through the semester if new issues come up.

January 20: Course overview: overall discussion of course, assignments, student skillsets, and some major preservation issues

Discuss student backgrounds and skills; at rollcall have students answer on skillsets. Provide resources for students who need to bring skills up to speed.

Outline the history of the iSchool repository, together with a list of the possible projects for this semester. Discuss overall schedule of work to accomplish semester projects. Student teams will be assigned to projects next week and will meet to come up with their own plan for the project and the distribution of tasks.

Lecture Topic: Overview of preservation issues in general: needs, practices, and politics.
Students will log into the iSchool DSpace repository and become e-people (if not already).

January 27: Overview of the digital preservation problem and field. Basis: Cornell tutorial

There will be an in-class quiz on the Cornell tutorial below. It will be graded.

Topic: Basic digital preservation management, as addressed in the Cornell tutorial, will be discussed and questions answered. Several issues will be discussed critically, where the tutorial does not represent the full range of information.


To prepare for class today you should "take" the Cornell tutorial on digital records preservation; this means spend at least three hours going through it, taking notes for discussion in class and a quiz, and making sure you pay attention to the sidebar issues and follow the major links :
"Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems." Available at:

For some slightly corny fun, have a look at the three Team Digital cartoon videos on the Digital Preservation Europe YouTube Channel (click "see all" for the other two) here:

February 3: Archival institutional repositories and the OAIS model: distinctive characteristics of the digital archive

Students will be assigned to project teams and an outline protocol for project work will be discussed, including the steps that will be undertaken through the project and how they will coincide with class lectures.

Topic: Fortunately for the digital archiving community, there is now a widely-accepted model for the functions that a digital archives should provide: the Open Archives Information System (OAIS) model. We will look at the original OAIS specifications in some detail, and then examine MIT/HP's (now open-source) DSpace as an implementation of that model, and discuss the term "institutional repository" and what it means. Additional repository implementations will also be discussed. We will discuss a general protocol for capture and preprocessing of archival materials and identify a range of tools available for use. Ground rules for group work will be discussed.


Richard Jones, Theo Andrew, and John MacColl, The Institutional Repository (Oxford: Chandos, 2006), Chapter 3, Technologies and Technicalities. Available on e-reserves.

OAIS model: --this is the most recent 2003 version of the OAIS specification (read sections 1-3; for concrete examples to make it less abstract, look at "Annex A: Examples of Existing Archives")

DSpace system documentation:; read especially the section "Functional Overview." It is especially important that you become familiar with the DSpace documentation so that we can discuss how DSpace instantiates the OAIS model (or doesn't). You can download a nice printable Word version of this file here.

DSpace roadmap document, including emergent relationship with Fedora.

February 10: Reliability, authenticity, custodianship

Students will report on the progress of their projects, including the first meeting with collection creators (or custodians). We will discuss the inventory instrument(s) you will be using, the basic SIP agreement included as Appendix A to the Proposal for the Establishment of a DSpace Digital Repository (on e-reserves), and the methods you will use to review your digital materials safely so as to preserve authenticity.

Topic: Discussion of major issues related to the nature of digital objects and the nature of archives. What are we trying to preserve? What does "preservation" mean for digital objects? By this time you should be thinking of some specific problems of this kind raised by the materials you are dealing with. We will discuss the kinds of replications that are parts of the digital capture and preservation task: disk images, forensic copies, non-forensic copies, use copies, etc. etc. Emphasis is on bitstream preservation and contextualization/documentation of the capture process.


Luciana Duranti, "Reliability and Authenticity: The Concepts and their Implications," Archivaria 39:1-10. This is the canonical definition of the two concepts as used by archivists of the diplomatic persuasion (and us), and you need to be clear on the two concepts and the difference between them as terms of art. Available on e-reserves.

Findings on the Preservation of Authentic Electronic Records. This is the final report on "authenticity track" of the first part of the InterPARES project. ** Prepare a precis of the first 48 pages of this report.** Available at

The State of Digital Preservation: An International Perspective, Washington, D.C.: CLIR. the most up-to-date (spring 2002) and generally useful discussion of what is likely to become the direction of U.S. (and maybe other people's) policy for the near term. You should read at least to page 53. Available at:

Extra issues to discuss/investigate:

LOCKSS, CLOCKSS, and peer-to-peer error-checking technology: (here's the website: )


February 17: Metadata and access: resource discovery and preservation

Topic: Without descriptive metadata digital objects are literally lost, and without preservation metadata they might as well be. There has been an enormous amount of attention devoted to the metadata requirements for archival digital objects: what metadata are needed, when they are generated, how they are generated. Metadata is the crucial "wrapper" that facilitates all digital archival activities and is crucial to the structure of DSpace. We will discuss the DSpace Dublin Core registry and the addition of METS in DSpace 1.2 as well as the emergent PREMIS standard for preservation metadata. Finally, we will look at available metadata harvesting tools and discuss a handout on metadata standards for the course, including biog/hist, scope/content, controlled vocabularies, and special format subsets.

Metadata readings if you need background:

Introduction to Metadata: Pathways to Digital Information, including Anne Gilliland-Swetland, “Setting the Stage,” Tony Gill, “Metadata and the World Wide Web,” and Mary Woodley, “Crosswalks: the Path to Universal Access?”:

Qualified Dublin Core is what DSpace supports out of the box. There is now a repository of all the papers from Dublin Core international conferences, 2001-2009: . 2009 includes a metadata framework for manga and 2008 includes an article on collection-item relationships.


OAIS model, sections 4-6 and annexes

OCLC/RLG Working Group on Preservation Metadata, “Preservation Metadata and the OAIS Information Model” (2002) available at:

PREMIS preservation metadata documents:
Understanding PREMIS (2009):
PREMIS data dictionary 2.0 (2008; use for reference):

METS Primer and Reference Manual:

Metadata harvesters (TBA)


February 24 : Digital archaeology and preprocessing steps; levels of service

Topic: What are the details of preprocessing steps beginning with capture and ending with ingest? What does "digital archaeology" mean and what are the techniques used for identifying and recovering digital objects that can no longer be accessed using current technology? Finally, can/should we distinguish degrees of care/effort that we expend with reference to digital records? What is the relation between cost-benefit and levels of service?


Digital forensics and bitstream copies (see slideshow "What to do with the bits?" under Handouts on the Resources page).

Resource on digital forensics and archives (TBA)

William LeFurgy, "Levels of Service for Digital Repositories," D-LIb Magazine (May 2002) **this is a central concept that needs to be addressed in order to define what preservation steps will be taken; write your precis of this article**:


March 3: Format: conversion, migration, emulation, reauthentication

Topic: What form should a digital object take for preservation purposes or during the process of delivery to a user?. We'll discuss a range of options for overcoming hardware/software obsolescence and when each is appropriate. We'll also look at the emergence of file format registries and how to use them and discuss the DSpace file format registry and how DSpace detects file types. Drawing on our earlier discussion of significant properties and levels of service, we will discuss any specific needs for creation of use copies of materials in your collections before ingest begins.


Dutch Digital Preservation Testbed White Paper, "Emulation: Context and Current Status," available at:

Dutch Digital Preservation Testbed White Paper, "Migration: Context and Current Status." **write a precis of this article** Available at:

Phil Mellor, Paul Wheatley, and Derek Sergeant, "Migration on Request, a Practical Technique for Preservation," CaMiLEON report, available at:

Investigate the GDFR ( and PRONOM ( file format registries and find out how they differ; also investigate the JHOVE ( and DROID ( file format validation tools

March 10: Digital genres and their significant properties

Topic: Different "genres" of electronic records (email, webpages, databases, etc.) represent different bundles of affordances, necessitating different strategies for preservation and different "significant properties" to be considered in devising those strategies. Should all properties be preserved? Should only "significant properties" be provided for access? Discuss strategies: bitstreams as authenticity guarantors and starting place for serious study; use copies as digital library fodder; making readers and other tools available.


Gregory Lawrence, William Kehoe, Oya Rieger, William Walters, and Anne Kenney, Risk Management of Digital Information: A File Format Investigation, CLIR Report # 93, 2000. **precis this reading; pay attention to just what "risk management" means in this context** Available at:

Margaret Hedstrom, Christopher Lee, "Significant properties of digital objects: definitions, applications, implications," (in Proceedings of 2002 DLM-Forum):

Review the eleven presentations given at the Digital Preservation Coalition's workshop, "What to Preserve: Significant Properties of Digital Objects," in April 2008, and be prepared to discuss what significant properties you think you will be concerned with in your projects and how you might approach them:

SPRING BREAK March 15-20

March 24: Logical models: how to structure digital collections ("arrangement")

Topic: Discuss the OAIS and other logical models for the sake of features that they might add to OAIS/DSpace. Discuss the structure of collections in DSpace and how the DSpace object model can be used to advantage in creating virtual collections. Discuss student progress with research on areas of expertise. Discuss order as received (creator order) vs virtual orderings (interpreted order[s]).


CEDARS/OAIS model (two documents):

Kelly Russell, "Digital Preservation and the Cedars Project Experience," available at:

CEDARS final report: **write a precis of sections 3-5 of the full report**--go to this page:
click on "Homepage archived 11 Jan 2005," which will show you the archived CEDARS site; click on Publications and Conferences, then scroll down to "The Cedars Project Report, April 1998 to March 2001", click on it, and download the full pdf. You will note that this is a digital archiving website archive archived for the Brits by the Internet Archive--and most of the site works.

DSpace as real and virtual model

Review the DSpace data model in the DSpace documentation

Patricia Galloway, "Representing Archival Descriptive Metadata in a DSpace Environment," linked here; also "Order as Received: Constructing an Initial Virtual Order for Digital Objects," linked here

March 31: Producer-Archive interface: set up DSpace communities and collections

Topic: We will set up and discuss appropriate collection structures in DSpace for your materials and review the details of "ingest," both as envisioned in OAIS and as implemented in DSpace as a manual process. Deconstructing DSpace and using it to instantiate/represent archival materials.


ERPANET, ERPA Guidance: Ingest Strategies (ERPANET, September 2004). Available at This document will provide guidance to each team in deciding on the overall strategies/templates for their collection creation. **precis this whole document, including the appendix**

Producer-Archive Interface Methodology Abstract Standard, CCSDS 651.0-R-1 (this is the OAIS Ingest document from the CCSDS, in the final "Blue Book" format from May 2004). Available at:
You need to read this document carefully for a broader picture of the process than is represented in the ERPANET document, to be sure you can contextualize the DSpace version of the process adequately.

DSpace 1.2 system documentation: Ingest Process and Workflow (under Functional Overview)

April 7: Management/authentication structure: communities, groups, e-people, collections (lab)

Each team will describe briefly and present schematically its management policy for its designated community. Teams will have consulted the proposed policy document on pacer ( on overall policy for the iSchool repository as a context for their policy development.

Topic: Presentation of DSpace management interface and elements of the DSpace authentication system. Setting up groups and levels of access. Discussion of issues of closed vs open collections and how to assure the desired outcome.


See DSpace system documentation ( under "Functional Overview" if you have not already done so (and even if you have).

Also review Chapter 4 in The Institutional Repository.

Finally, there is a (slightly old) policy document from MIT that is worth review:

April 14: Ingest test set of documents (lab)

Students will have a test set of files and appropriate metadata ready to ingest into DSpace and will describe to the class how they chose the files and what specific problems the files raise.

Topic: Step through the manual ingest process.


DSpace system documentation: Ingest workflow (this is part of the document mentioned for last week).

April 21: Ingest remaining collection, test (lab)

Topic: Additional manual ingest and/or preparation of batch ingest directory trees.

April 28: Complete any remaining tasks

Topic: Whatever comes up

April 30: Society of Southwest Archivists session on digital archiving, Santa Fe

May 5: Summative discussion

Class evaluation to be done after this class (online).

Project team formal presentations, 10:00-12:00 (i.e., 15 minutes allotted to each project; all team members should participate in some way).

Note: We will invite all the collection custodians to attend your presentations; treat them as you would a capstone presentation.

Informal discussion of lessons learned, what else/more/different should be done, etc

Final project report due (ingested to the pacer repository).