Preservation Standards and Digital Policy Enforcement

Consulting on TDR | trusted digital repositories

Preservation Standards, and Audit and Certification Mechanisms Question

What types of policies would you expect to be enforced on a digital repository, based on the emerging Trustworthiness assessment criteria? What types of additional policies would you expect to find related to administrative or management functions?

Citation

Ward, J.H. (2012). Doctoral Comprehensive Exam No.4, Managing Data: Preservation Standards and Audit and Certification Mechanisms (e.g., “policies”). Unpublished, University of North Carolina at Chapel Hill. (pdf)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Note: All errors are mine. I have posted the question and the result “as-is”. The comprehensive exams are held as follows. You have five closed book examinations five days in a row, one exam is given each day. You are mailed a question at a set time. Four hours later, your return your answer. If you pass, you pass. If not…well, then it depends. The student will need to have a very long talk with his or her advisor. I passed all of mine. — Jewel H. Ward, 24 December 2015

Preservation Standards, and Audit and Certification Mechanisms Response

The CCSDS’ “Audit and Certification of a Trusted Digital Repository” (2011) describes policies in terms of the (1) technical framework, (2) the organizational framework, and, (3) the digital object itself. These policies may be applied and enforced manually (by humans) or at the machine level (by computers using computer code). Some of the policies required for a repository to be considered a Trusted Digital Repository (TDR) are also required for day-to-day management of the repository generally. Other types of policies are completely outside of the requirements for a TDR, yet they are important for the day-to-day management of it. This essay will address both types of policies.

Some examples of the types of technical policies this author would expect to be enforced on a digital repository in practice based on the TDR assessment criteria are as follows. In some instances in the examples below, the policy of the repository administrators may be, for example, to save the original file format/SIP…or not save it. The enforced policy will depend on the mission of the repository and the implicit and explicit policies that are developed and applied by the human managers of the repository.

  1. The hardware, software, and file formats must/must not be migrated.
  2. A copy of the original file format and the original software version to render the original version must be/must not be retained for provenance purposes.
  3. At least two off-site backups must be implemented, and the back ups must be tested periodically to ensure they are actually backing up the data as required and expected.
  4. The contents of the repository must be catalogued; i.e., the administrators of the repository have logged what objects are in the repository.
  5. The administrator of the repository must be able to audit all actions performed on an object, including what, by whom, and when.
  6. Upon ingest, the digital object is scanned for viruses and a checksum is performed.
  7. The administrator must be able to access, retrieve, and render all digital objects in the repository, either for his or her own erudition, or, if appropriate for users.
  8. Any software required to render the digital object will be maintained and migrated (if possible; some software may not have newer versions).
  9. If a digital object is to be deleted on X date, then it must be deleted, and a follow up audit run to ensure the object was actually deleted.
  10. If the content rendered via a digital object requires any clean up, then the clean up of the data/content will be documented. The original (un-cleaned up file) must be saved for provenance purposes. Some organizations may make the decision not to save the original (un-cleaned) digital object.
  11. The administrator of the repository must enforce appropriate restrictions to the data. For example, some digital objects may be only available to users via a certain IP (Internet Protocol) range.

Some examples of the types of organizational policies this author would expect to be enforced on a digital repository in practice based on the TDR assessment criteria are as follows.

  1. The organization maintaining the digital repository commits to employing an appropriate level of staff with an appropriate level of training in order to maintain the archive based on Information and Library Science (ILS) best practices and standards.
  2. The organization maintaining the digital repository commits to providing an appropriate level of funding for the (preservation) maintenance of the repository and its content.
  3. The organization commits to finding an appropriate organization to take over the repository in the event the original managing organization can no longer do so.
  4. The staff of the organization commit to documenting the policies, procedures, workflows, and system design of the preservation repository.
  5. The management and staff maintaining the repository agree to periodically audit the policies and procedures of the repository in order to ensure that they are doing what they say they are doing. This may be a self-assessment using a standard self-audit such as DRAMBORA, or via an outside auditor who will certify that the repository meets Trusted Digital Repository (TDR) criteria.
  6. Barring any extenuating circumstances, the organization commits to honoring all contracts signed and agreed to at the time the content was acquired or created in-house. This includes the spirit and intent of the agreement, especially if the originating party no longer exists (either a person or an institution).
  7. The management and organization maintaining the repository agree to honor and enforce all copyright, intellectual property rights, and other legal obligations related to the digital object and repository. These agreements may be separate from any agreements entered into in order to acquire or create the content.

Some examples of the types of digital object management policies this author would expect to be enforced on a digital repository in practice based on the TDR assessment criteria are as follows. These example policies are related to ingest. The files are in a staging area (SIPs), awaiting upload into the preservation repository as AIPs. These policies are in addition to or supplement the policy examples provided above.

  1. If the digital object does not have a unique ID, or the current unique ID will not be used, then a new unique identifier will be assigned. A record of the changed ID or new ID assignment will be logged.
  2. A virus scan and a checksum will be run and the fact that these actions were taken on the digital object will be logged. In the event of a virus, the object will be quarantined until the virus is eliminated.
  3. Any metadata associated with the digital object will be checked for quality and appropriateness. If necessary, the metadata may be supplemented by additional information. If there is no associated metadata, then some metadata will be created.
  4. Storage and presentation methods will be applied, if appropriate. For example, if the policy is to store the original .tiff file but create .jpeg files for Web rendering and storage via a database, then the .jpeg files may be created in the staging area and stored. Another possible policy may be to create .jpeg files on the fly from the .tiff as needed, once the collection is live and online. This type of policy would save on storage space.
  5. If the SIP, AIP, and DIP are different, then the final version of the file must be created prior to upload into the repository from the staging area. The original SIP may be stored or deleted, per the policy of the repository. This is not recommended for files that have been cleaned up, as the original “dirty” file may need to be viewed later for provenance and data accuracy purposes.
  6. Set access privileges, both for internal staff accessing the digital object, and for any external users, assuming the content of the repository is publicly accessible.
  7. Upload the digital object to the repository, log that the object has been uploaded, and test that the files are retrievable and “renderable”.

In terms of what types of additional policies this author would expect to find related to administrative or management functions that are not part of the TDR assessment criteria, the following types of policies might be applied to a preservation repository. These are not preservation policies per se, but they may (or may not) affect the policies enforced for preservation.

  1. Collection policies. For example, what types of collections are included or not included in the archive? Images? Documents? Data sets? Only peer-reviewed articles related to Physics? Only Social Science data sets?
  2. File format policies. Are there any limitations on the type of file formats the repository will or will not store and make available to users? For example, the policy may be to store a .tiff file but only make .jpegs available to users.
  3. Type of archive policies. Is the repository a dark archive only? A public archive? An archive with limited public access?
  4. “This is not a preservation repository” policy. The policy may be not to plan to preserve any of the material in the repository, because that is neither the mission nor the concern of the repository managers or the reason for the existence of the repository itself.
  5. WYSIWYG content and metadata policies. The policy of the repository may be not to invest in quality control on the content or metadata. Therefore, there is no clean up of the digital object or any vetting of the metadata. If and when a user accesses the material, it is What-You-See-Is-What-You-Get (WYSIWYG). This is sometimes related to the limitations of personnel time and funding. For example, in the early 2000s the developers of the National Science Digital Library had to accept what the content owners and creators could provide regarding metadata quality, which was “non-existent” or “terrible”, and rarely “good” or “excellent” (Hillmann & Dushay, 2003).
  6. Legal, financial, ethical, and collection policies. What types of material will the repository accept and acquire, even when the material falls within the collection policy purview? For example at the University of Southern California, the focus of the digital archive was “Southern California”, and L.A., specifically. The archive primarily consisted of images. In the mid-2000s, the staff discussed acquiring photographic images related to L.A. gangs with the idea of building a gang archive, but the legal issues were deemed to be extremely challenging by all involved. The only way to acquire the material and work around the legal issues would be to require that no access to the photos be allowed until 100 years had passed. The staff could not justify the costs of acquiring the collection for the purposes of embargoing it for that long of a period; this includes the costs associated with maintaining the collection as a dark archive. All digital archive staff agreed, however, that such a collection would be very valuable to historians.
     
    More recently, an archive in the Northeastern United States had recently faced legal action by the British government over oral histories of living former IRA members. The historian who recorded the oral histories had promised the former IRA members that the recordings would be private and not subject them to legal action. The courts are saying otherwise. Thus, a repository manager may have to take into account multiple types of policies with regards to content.
  7. Software, hardware, and repository design policies. Will the repository use off-the-shelf or one-off/home-grown software? What hardware will the repository run on? Whether home-grown or off-the-shelf, will the software comply with preservation repository recommendations, per the OAIS Reference Model (CCSDS, 2002)? Is compliance with the OAIS Reference Model part of the policies guiding the repository design?
  8. Policies regarding conflicts between international standards, domain standards, and local rules and regulations. Which policies, standards, rules, and/or regulations will take priority over others? For example, if your national standard (Beedham, et al., 2004 (?)) requires providing access to handicapped citizens, but fulfilling this requirements means that the repository is not compliant with international standards or the standards of the domain represented by the archive and, therefore, will not be considered a TDR, whose rules do you follow? (In this case, Beedham, et al., (2004?) followed their national laws, but criticized the authors of the OAIS Reference Model for not taking into account local laws.)
  9. Federation policy. Will the repository federate with other repositories? This excludes reciprocal back-up agreements. The federation may include providing metadata for metadata harvesting, or the sharing of the content and metadata itself. For example, the Odum Data Archive provides metadata via an OAI-PMH Data Provider, and also provides users of their data archive with access to ICPSR metadata. A user may or may not be able to access the actual non-Odum Institute ICPSR data sets, however. Therefore, the policy applied by the managers of the Odum Institute data archive is to provide access to the metadata of non-Odum Institute data sets, but not to the data sets themselves.

In conclusion, the CCSDS’ recommendation, “The Audit and Certification of a Trusted Digital Repository” (2011) divides policies into three main types: Technical, Organizational, and Digital Object Management. The policies required to be a Trustworthy Digital Repository encompass many of the policies required to manage a digital archive generally. This means, if the policy of a repository administrator is not to preserve the content, then many of the policies required for a Trusted Digital Repository will still be implemented, as many of those are required for general repository management, anyway.

Repository managers and administrators must also implement managerial and administrative policies that are not part of preserving the content, but yet reflect important decisions that must be made with regards to the repository and the content it contains. This essay has outlined a sample of policy types related both to a Trusted Digital Repository, and to a non-Trusted Digital Repository.

Enforce Digital Preservation Standards and Policy Enforcement

If you would like to work with us on a digital preservation and curation or data governance project, please review our services page.

Preservation Standards and Digital Policy Enforcement

Repository Design: Understand the Value of the OAIS’ Preservation

Consulting on the OAI Reference Model | research data

The OAIS Reference Model Repository Design Question

In your literature review #3 you state that ”the conclusion from a variety of experienced repository managers is that the authors of the OAIS Reference Model created flexible concepts and common terminology that any repository administrator or manager may use and apply, regardless of content, size, or domain.”

  1. Does this one-size-fits-all model really work for repositories large and small? Please discuss.
  2. You also note that Higgins and Boyle (2008) in their critic of OAIS for the DCC talk about the need for an OAIS lite. Please discuss what that might look like, who would be its primary audience, and how useful it could be.
  3. Finally, how can repositories such as the US National Archives work with the concept of designated community as their mission is to serve all citizens. Is the notion of designated audience generally useful? Why or why not and under which conditions is it most valuable?

Citation

Ward, J.H. (2012). Doctoral Comprehensive Exam No.3, Managing Data:
Preservation Repository Design (the OAIS Reference Model)
. Unpublished, University of North Carolina at Chapel Hill. (pdf)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Note: All errors are mine. I have posted the question and the result “as-is”. The comprehensive exams are held as follows. You have five closed book examinations five days in a row, one exam is given each day. You are mailed a question at a set time. Four hours later, your return your answer. If you pass, you pass. If not…well, then it depends. The student will need to have a very long talk with his or her advisor. I passed all of mine. — Jewel H. Ward, 24 December 2015

The OAIS Reference Model Repository Design Response

Based on the feedback this author has received from participants attending the DigCCurr Professional Institute in May 2011, no, the one-size-fits all OAIS Reference Model recommendation does not work for repositories both large and small. The repository administrators in question were discussing digital curation concepts in general, but this may also be applied to the OAIS Reference Model, as this is one part of digital curation. The repository administrators wanted to know what part of what they had learned at the Institute they should apply, and which parts they could safely leave out. The attendees thought the information presented to them was useful, but that it would be “overkill” for their particular repositories.

Beedham, et al. (2004?) noted that the design of the OAIS Reference Model is for that of a repository within a large bureaucracy. The authors wrote that the Reference Model is not designed for a small archive collection with a limited audience with limited funding and personnel to build, maintain, and preserve the collections in the repository. It is designed for an institution with a team of personnel working on the repository, not one or two people responsible for all aspects of creating, maintaining, and preserving it. This author would add that the OAIS Reference Model is designed with an archive whose collections consist of tens of thousands to n number of objects. It is not designed for an archive of a few hundred or a few thousand objects with one person to administer it, who may or may not be trained in digital library/digital archive Information and Library Science (ILS) Best Practices.

The Reference Model has been designed such that it may federate with other OAIS archives, presumably to create access to one Really Large Preservation Repository. It has also been designed so that the object may have three different “versions”: the Submission Information Package (SIP); the Archival Information Package (AIP); and the Dissemination Information Package (DIP). As a concept, these are three different things, but in practice, a SIP may equal an AIP, which may equal a DIP. For a large repository with different audiences, the DIP may need to be different from the AIP. For a small archive with a homogeneous audience, the AIP and DIP may be exactly the same.

Therefore, with regards to my statement, “…any repository administrator or manager may use and apply, regardless of content, size, or domain”, the key is in the use of the word, “may”. They may use it. It is not that they must use it, or they will use it, it is simply that the repository administrator may use it. A repository administrator must take into account the rules and regulations that apply to their repository when applying the OAIS Reference Model. These rules and regulations may be domain Best Practices that differ from ILS practices, or federal, state, institutional or other local policies that differ from what the OAIS recommends. The OAIS Reference Model is a recommendation, not a requirement or a law. As Thibodeau wrote (2010?), any evaluation of a repository must be taken on a case-by-case basis. In other words, one size does not fit all.

The primary responsibility of a repository manager is to ensure the near-term availability of the objects in the repository, and the long-term as well, if that is part of the mission of the digital archive. This author has two views of what an “OAIS Lite” might look like. The first is to determine what is actually required to preserve content for the long-term, regardless of the model used. The second is how the documentation of the recommendation could be adapted to create an “OAIS Lite”. The primary audience for an OAIS-lite would be the managers of small- to medium repositories who do not operate within large bureaucracies, and, perhaps, have some kind of computer science knowledge, but who will generally not have an ILS background.

Jon Crabtree of the Odum Institute at the University of North Carolina at Chapel Hill supports the use of standards, but he has noted on several occasions that the Odum Institute “preserved” their digital data for decades without explicit preservation standards or policies. They did this because they hired competent people who did their job, and because it was understood that the data itself must be migrated, and the software and hardware must be migrated, replaced, upgraded, etc. This author’s own work experience seconds Crabtree’s comments.

At the time of this writing, the following must occur in order for data to be preserved without following any particular recommendation for preservation. Although this section is designed to be illustrative of “bare bones” preservation requirements, the “[]” designates the OAIS Reference Model section in which this would fit; i.e., either the “Information Model” or the “Functional Model”.

  1. [Functional Model] Document the holdings of the archive and its system design. Update the documentation if and when there are any changes to numbers 2-4 below.
  2. [Information Model] Ensure the appropriate metadata for the digital objects.
  3. [Functional Model] Migrate & refresh the hardware and software periodically, as well as any software required to render the objects in the repository (for example, CAD files). Upon ingest run integrity checks and virus scans. Periodically run these scans on the data. Set up at least 2 off-site back ups, and check that the back ups are actually backing up the data. Ensure all of the objects in the repository may actually be found and accessed, assuming access is permitted and desired.
  4. [Functional Model] Find someone to take the data if the organization in charge of the data goes out of existence. Keep (1) above updated in order to facilitate a takeover of the archive’s contents.
  5. [Functional Model] Hire competent people who ensure that numbers 1-4 above occur.

Additional steps a repository administrator may take are to take the documentation from (1) above, map it to the OAIS Reference model and identify gaps. Then, as time and resources permit, address any existing gaps within the current system design and content versus the OAIS Reference Model. At the least, identify that the gaps exist and document this in (1) above.

This author’s vision of an “OAIS Lite”, therefore, would be very general guidelines for the type of administration and management required to maintain a digital repository over time. This may not be what Higgens & Boyle (2006) had in mind.

However, if this author were to create an “OAIS Lite” based purely on the OAIS Reference Model recommendation itself, then it would be the current recommendation, but with each subsection designated as:

  1. “Must have”/required.
  2. “Nice to have”/recommended.
  3. “Optional”.

The assumption is that if some part of the recommendation is not necessary, then it won’t be in the OAIS Reference Model recommendation at all. Thus, “not needed” is not provided as an option. This also assumes the same audience as outlined above for the “bare bones” preservation guidelines. This would have the advantage of breaking down the Reference Model into manageable chunks. A repository manager of any size could begin by implementing the “must haves”; as time permits, add in the “nice to haves”; and, again, as time permits, add in any “optional” sections.

Another possibility is to divide the recommendations in the Reference Model by repository size, and then break those down by “required”, “recommended”, and “optional”. A committee of experienced repository administrators working with small repository owners could set up the Reference Model in this way. Either of these formats would be a useful version of the recommendation.

Thus, an “OAIS Lite” could consist of two types of recommendations. The first is a description of the bare bones functions required to maintain a repository and its contents over the long-term, mapped to the general OAIS models. The second version would be to take the recommendation itself, and break it down into required, recommended, and optional sections. Breaking down the recommendations would be useful to the managers of both large-and small repositories. The challenge would be get a committee of repository experts to agree on what constitutes “required”, “recommended”, and “optional” within the OAIS Reference Model.

The concept of a Designated Community is useful within the OAIS Reference Model, as it reminds repository managers that the goal of the repository is to serve a set of users. The goal is not necessarily to serve the needs of the repository managers! The concept is most useful when the users of a repository are homogeneous, and it is least useful when the users are heterogeneous. This is because the more heterogeneous the population using a repository, the less “one size fits all” fits all users. It is easier to serve a specific set of users (“scholars”) than all users (“hobbyists” and “scholars”).

Having said that, an organization like the National Archives may work around this limitation by aiming collections at specific users, once a baseline standard has been met. So, for example, the Southern Historical Collection at UNC was initially put online for scholars and, to some extent, “to serve the people of North Carolina (NC)” (as that is also the stated mission of the University of North Carolina at Chapel Hill), but the administrators of the collection soon realized that K-12 educators were using the resource. Thus, the administrators of the digital library still serve their “generic” audience (“the people of NC”) and scholars of Southern history, but they have developed K-12 educational materials for teachers to use as part of the state curriculum.

This author believes it is possible for the National Archives to serve “the people of the United States” by breaking down the digital collections by themes, collections, etc., and determine who uses what collections, and how. They can thus better serve specific audiences, and tailor the site as needed. The administrators of an archive still must determine who their “general” Designated Community is, and set standards for that community, but can, as needed, serve targeted communities.

In conclusion, the “one size fits all” model of the OAIS Reference Model does not fit all. It is important to have standards for preservation repository design, but when the preservation repository design is more suited to a large bureaucratic institution than a small repository with fewer resources, then not all of those standards may be useful. If not all of the standards are applicable ore seem like “overkill”, then the repository manager will need to decide which of the standards to use, and how. One way to ease this “cherry picking” of preservation repository standards is to determine the processes required to ensure preservation, regardless of repository design. A second way is for ILS and Computer Science experts to break down the OAIS Reference Model recommendations into “required”, “recommended”, and “optional”, also possibly based on a repository’s size. This would be useful to managers of repositories of all sizes, as it would help the manager figure out what they have right so far, or where they need to start, and allow him or her to figure out what gaps remain.

A downside to this idea is that if a repository only implements the “required” recommendations of the OAIS Reference Model, then they may be only partially OAIS-compliant, and it might encourage laziness among repository administrators.

Regardless, “content is king”, so the important issue is that the content and its metadata, along with any required software to run it, are preserved. The model used to preserve it is secondary. Finally, while the concept of a Designated Community is important, it is a more valuable term when the users of an archive are more homogeneous, and less useful when the user base is heterogeneous. Large archives at the national level may work around this limitation by setting a baseline standard of quality for all users, and then targeting the archive’s collections to particular audiences who use those collections.

Repository Design: Understand the Value of the OAIS' Preservation

If you would like to work with us on a data governance or digital preservation project, please review our services page.

Repository Design: Understand the Value of the OAIS' Preservation

Learn the Priorities of Digital Preservation Community

Community Digital Preservation Standards | Trusted Digital Repositories

Digital Preservation Question

Since 1996, the digital preservation community has been emerging as evidenced by the increasing number of formalized standards, conferences, publishing options, and discussion venues.

What are the most significant community developments and why? What gaps remain in terms of community standards and practice? What roles should/could academic programs, professional associations, curatorial organizations, and individual researchers and practitioners play in those developments? What priorities and desired outcomes should there be for building the community’s literature?

Citation

Ward, J.H. (2012). Doctoral Comprehensive Exam No.2, Managing Data: the Emergence & Development of Digital Curation & Digital Preservation Standards. Unpublished, University of North Carolina at Chapel Hill. (pdf)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Note: All errors are mine. I have posted the question and the result “as-is”. The comprehensive exams are held as follows. You have five closed book examinations five days in a row, one exam is given each day. You are mailed a question at a set time. Four hours later, your return your answer. If you pass, you pass. If not…well, then it depends. The student will need to have a very long talk with his or her advisor. I passed all of mine. — Jewel H. Ward, 24 December 2015

Digital Preservation Response

Ascertaining the most significant community developments is a task likely to cause a few religious wars amongst long-time digital preservationists. However, the following are the most significant events in this author’s humble opinion.

  1. The realization by the overall Computer Science (CS) and Information & Library Science (ILS), among other domains and industries, that there is a digital preservation problem in the first place. This realization took place among individuals and organizations over the course of several decades, from the 1960s to the early 1990s. This is important because you cannot fix a problem if you don’t know you have one.
  2. The 1996 Waters (et al.) report on the digital preservation problem. This report was a significant event because it outlined the problem(s) and what steps needed to be taken to ameliorate it.
  3. The development of the OAIS Reference Model (RM) by the Consultative Committee on Space Data Systems (CCSDS) and the standardization of the model in 2002. The development of the OAIS RM is important because it the committee that created it consisted of, and was informed by, practitioners and users of data beyond the Space Data Community. It defined common preservation terms that would mean the same to all who used them (or, “should”). And, finally, the OAIS RM defined a common preservation repository standard against which digital repository managers could compare their own systems to determine, at least subjectively, their own preservation-worthiness.
  4. The creation of the Digital Curation Centre (DCC) in the UK in the 2000s. Although the DCC is designed to serve UK Higher Education Institutions (HEIs), it has provided a central location for digital preservation practitioners to go to for information related to digital preservation. The centre has also provided a platform from which further research and standardization in digital curation and preservation may continue.
  5. The creation of the National Digital Infrastructure Preservation Program (NDIPP) in the United States in the 2000s. Much like the DCC in the UK, NDIPP had provided a central location in the USA from which digital preservation research and development, and the application of it, is promoted. As well, the program has provided an avenue through which private industry, government, research, and academia may come together to address the common problem of digital preservation. The Science and Technology committee of the Academy of Motion Picture Arts & Sciences (AMPAS) mentions NDIPP in their report, “The Digital Dilemma”, (2007) as being an important program with which private industry should be involved in order to coordinate resources to solve a problem (digital preservation) that all industries are facing.
  6. The publication of the AMPAS report (2007) on the digital preservation problem within the movie industry. The movie industry’s products and libraries represent a large source of profit for the industry, as well as the cultural heritage of the respective countries that produce movies. The AMPAS report about the digital preservation problem is important because:
    1. it meant that a major, high dollar industry was also seeking solutions to the digital preservation problem. This made it “not just a library issue”, and provided additional clout (financial and political) to the task of finding solutions.
    2. The authors of the AMPAS report clearly stated that the digital preservation problem was not just a movie industry problem, it was everyone’s problem who used digital data, thus the solutions must be found by working together across private industry, government, academia, and other research institutions.
    3. Reflecting the work done in ILS, the industry stated that digital preservation costs were far higher (1100% more) than non-digital preservation. This is the only non-research, non-academic report this author has read that shows the costs as determined by private industry. The authors of the report stated that standards will reduce costs, and the movie industry should resist implementing one-off solutions. This promotes the use of standards as an integral part of the digital preservation problem, even within a high-profit commercial industry.
  7. The development of the concept of a “Trusted Digital Repository”, as well as the mechanisms to audit and certify that a repository is actually “trustworthy”. This includes the development of TRAC (“Trusted Repository Audit and Certification”), DRAMBORA (quantitative self-assessment of a repositories trustworthiness), other assessment criteria developed in Europe, and the development of TRAC into an ISO standard via the CCSDS called, “the Audit and Certification of Trusted Digital Repositories”. This development also includes the development of standards with which to certify the certifiers. The significance of the development of an ISO standard for a “trusted digital repository” is that it gives practitioners and other repository managers a base set of policies from which they can build or assess their repository’s ability to survive over the indefinite long-term, especially when use in conjunction with the OAIS RM.
  8. The development of outlets for publication, forums for discussion, web sites, with information on preservation, etc., has given practitioners and researchers avenues for their work that can be used for their own professional advancement. Providing incentives for researchers and practitioners to do preservation work is one way to ensure the necessary preservation is done. It also provides an iterative feedback loop, such that researchers and practitioners can adapt policies and standards as new information and research become available.

Some gaps do remain in terms of community standards and practice. The gaps are in some instances managerial, others are more technical.

  1. The standards for preservation policies and repository design, such as “the Audit and Certification of a Trusted Digital Repository” and the OAIS RM are designed with large organizations in mind. What if you aren’t NASA or the Library of Congress? For example, what if you are the lone digital archivist for the Harley Davidson archive? Or a digital library on quilts? The standards outlined for preservation policies and repositories are stacked in favor of large organizations with large bureaucracies. The ILS community ought to develop a “lite” version aimed at small “mom and pop” repositories whose administrators curate important material but don’t need all of the overhead presented in the ISO standards for trusted digital repositories and the OAIS RM.
  2. The same idea applies for data management training for researchers and other administrators of data archives in non-ILS domains. These researchers don’t want to spend their time on the full curation of the data, but neither are many of them likely to turn the data sets over to libraries and archives for stewardship in the near term. (The long-term is another issue.) Yet, in order to support science and the requirements of funding agencies such as the NSF and the NIH, the data must be preserved and shareable. The development of a “lite” curriculum for data management, aimed at non-ILS data managers, would be useful for non-ILS data managers, and would strengthen librarian and archivists’ roles as information managers by providing a consulting and outreach function to scientists and researchers.
  3. The designation of certification as “trustworthy” does not seem to take into account “local” rules and regulations that a repository may have to take into account when designing the preservation system and the policies that must be applied to it. Some repositories may have to forgo international preservation standards in lieu of following national, state, county, or other regulations. Does that mean the repository is not “trustworthy”? Will the repository now be considered “2nd class” because it isn’t certified as “trustworthy”? Thibodeau has discussed looking at each repository on a case-by-base basis.
  4. Preservation policy and repository standards have been designed from the top-down. Granted, the people designing the standards have worked with repositories themselves (usually) and so based the standards on their own experiences with the running of a repository. However, it is one thing to define standards, it is another to ensure their implementation. For example, this author’s masters paper work (2002) involved studying 100 Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) data providers, to determine which Dublin Core elements were used or not used. At the time, practitioners were arguing for and against qualifying DC to be more detailed, per other metadata standards (Lagoze, c. 2001). Practitioners have certain standards for metadata quality. No one had actually examined how people were using DC. This author found that out of 15 DC elements, only 3 (title, author, & date) were being used the majority of the time. A separate but related study by Dushay & Hillmann around 2003 with the National Science Digital Library found the quality of the metadata content was abysmal. Follow up studies by Shreeves, et al. in the mid-2000s examined metadata quality in the OAI-PMH and also found it to be abysmal. These studies combined made the religious war over qualifying or not qualifying DC moot. Why qualify DC if only 3 elements out of 15 are being used the majority of the time, and the quality of the metadata content in those elements is abysmal? Perhaps the quality of the metadata content should be improved, then more elements should be used, and then practitioners can worry about qualifying DC? The same discrepancy may not exist for preservation policies, but one might think it would be interesting to find out what people are actually doing, as opposed to what they say or think they are doing with regards to compliance with standards. Then again, that is the purpose of auditing and certifying trusted digital repositories, so one may consider this argument circular!
  5. Further examination as to what digital preservation is going to cost. If material must be curated from its birth, then it is also be true that decisions will have to be made early on as to whether or not material should be preserved at all. Even AMPAS noted that the movie industry must change its mindset from “save everything”, which worked fine with film, and must now curate the digital move data.
  6. A large gap in digital preservation is the transfer of data from one system to another, whether external or internal. The development of standard ingest tools would help reduce the costs of preservation. The Producer-Archive Interface Method Standard (PAIMAS) by the CCSDS (c. 2003-2004) has been one step in this direction, but it only outlines a method. A technically simple way to transfer data between repositories is a solution in need of an answer.
  7. Metadata is one large gap still in need of a solution. The problem relates to both metadata quality (mentioned earlier) and tools with which to create metadata. Scientists and researchers who work with data have repeatedly stated, both in readings by others and in this author’s work on the DataNet project, that the “killer app” is a tool which helps them to appropriately and simply annotate their data. Like ingest, metadata is a challenging problem in search of an answer.

The roles that academic programs, professional associations, curatorial organizations, and individual researchers and practitioners play in these developments as a function are that individuals must identify the gaps. As a community, individuals must agree on those gaps, and then apply to their organizations for the time to work on the problems, or else, obtain grans to that they may work on the solutions. Eventually, the solutions may be taught as part of the ILS and CS curriculums.

In terms of the gaps identified above, researchers and practitioners should continue to work on the metadata and ingest problems, realizing that these are two huge gaps to preserving materials that cross all domains.

Some other areas organizations and individuals may play involve deciding what are the penalties if a repository is not an “OAIS RM” “TDR”? What are the rewards for being “trustworthy”? Should there be rewards or penalties? If so, what? For example, there is a “charity navigator” that provides certain criteria against which a possible donator may determine whether or not they wish to give money. It does prevent people from giving money to organizations who, say, waste a lot of money on administration. But it is also true that some smaller organizations may not have the money to re-fit themselves to meet Charity Navigator’s criteria. Does this mean that they are no less worthy of donations, or that the money will do to waste? Not necessarily. It may mean that a charity may receives less money than they would otherwise, because Charity Navigator gives them a lower rating than an organization with more funding. This in turn gives more funding to the charity that already receives more funding, and less funding to a charity that already has less. If an organization does not have the time and resources to self-assess or receive certification as “trustworthy”, will they encounter any penalties, whether implicit or explicit?

This implies that standards should be a guide, and viewed as one part of a whole package.

One role individuals and organizations in the preservation field might play is that of consultant to small organizations that manage data and to individual researchers. One output of this could be an OAIS RM “lite” and an “audit and certification for trusted digital repositories” “lite”, aimed at repository managers who do not work for large bureaucratic organizations. This could be a document or standard or online training that a practitioner could do on his or her own time. Currently, even the DRAMBORA self-assessment requires a large time commitment from at least one, if not more, repository administrators. Part of this consultant role would involve education graduate students and researchers on data management. One output of this could be an online certification program that scientists and researchers could do on their own time, as they have time, on how to manage their data. This is slightly different from, but related to, personal information management. The data in this sense would be data gathered in the course of one’s work, not personal data, such as a digital photo album. This would include learning how to tag metadata, and thus begin to fix this problem where it starts – with the data creator.

Some possible outputs for the above problems are librarians and archivists continuing to provide consulting and outreach to scientists and researchers regarding preservation standards. The creation of “lite” versions of preservation policy standards and repository designs would be helpful for small repository administrators and those whose local standards might supersede international preservation standards. Technology is not an ILS strength, information management is ILS’s strength. ILS practitioners and researchers must continue to work with CS and other technical folks on developing ingest and metadata tools, especially with preservation in mind. These tools may also need to be designed with individual researchers and small repository administrators in mind. ILS practitioners and researchers must build upon their strength in information management, and not cede ground to CS, if they wish to remain relevant.

One outcome of a consulting role within other domains regarding preservation, and providing tools that aid in preservation, is to raise standards for data preservation within other communities. This will make the long-term preservation of that data and information easier for those who must eventually manage it. And, thus, make it more likely that the data will be preserved at all. This will also strengthen ILS as a field.

Learn the Priorities of Digital Preservation Standards & Practices

If you would like to work with us on a data governance or digital preservation and curation project, please see our consulting services.

Learn the Priorities of Digital Preservation Community