Preservation Standards, and Audit and Certification Mechanisms Question
What types of policies would you expect to be enforced on a digital repository, based on the emerging Trustworthiness assessment criteria? What types of additional policies would you expect to find related to administrative or management functions?
Ward, J.H. (2012). Doctoral Comprehensive Exam No.4, Managing Data: Preservation Standards and Audit and Certification Mechanisms (e.g., “policies”). Unpublished, University of North Carolina at Chapel Hill. (pdf)
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
Note: All errors are mine. I have posted the question and the result “as-is”. The comprehensive exams are held as follows. You have five closed book examinations five days in a row, one exam is given each day. You are mailed a question at a set time. Four hours later, your return your answer. If you pass, you pass. If not…well, then it depends. The student will need to have a very long talk with his or her advisor. I passed all of mine. — Jewel H. Ward, 24 December 2015
Preservation Standards, and Audit and Certification Mechanisms Response
The CCSDS’ “Audit and Certification of a Trusted Digital Repository” (2011) describes policies in terms of the (1) technical framework, (2) the organizational framework, and, (3) the digital object itself. These policies may be applied and enforced manually (by humans) or at the machine level (by computers using computer code). Some of the policies required for a repository to be considered a Trusted Digital Repository (TDR) are also required for day-to-day management of the repository generally. Other types of policies are completely outside of the requirements for a TDR, yet they are important for the day-to-day management of it. This essay will address both types of policies.
Some examples of the types of technical policies this author would expect to be enforced on a digital repository in practice based on the TDR assessment criteria are as follows. In some instances in the examples below, the policy of the repository administrators may be, for example, to save the original file format/SIP…or not save it. The enforced policy will depend on the mission of the repository and the implicit and explicit policies that are developed and applied by the human managers of the repository.
- The hardware, software, and file formats must/must not be migrated.
- A copy of the original file format and the original software version to render the original version must be/must not be retained for provenance purposes.
- At least two off-site backups must be implemented, and the back ups must be tested periodically to ensure they are actually backing up the data as required and expected.
- The contents of the repository must be catalogued; i.e., the administrators of the repository have logged what objects are in the repository.
- The administrator of the repository must be able to audit all actions performed on an object, including what, by whom, and when.
- Upon ingest, the digital object is scanned for viruses and a checksum is performed.
- The administrator must be able to access, retrieve, and render all digital objects in the repository, either for his or her own erudition, or, if appropriate for users.
- Any software required to render the digital object will be maintained and migrated (if possible; some software may not have newer versions).
- If a digital object is to be deleted on X date, then it must be deleted, and a follow up audit run to ensure the object was actually deleted.
- If the content rendered via a digital object requires any clean up, then the clean up of the data/content will be documented. The original (un-cleaned up file) must be saved for provenance purposes. Some organizations may make the decision not to save the original (un-cleaned) digital object.
- The administrator of the repository must enforce appropriate restrictions to the data. For example, some digital objects may be only available to users via a certain IP (Internet Protocol) range.
Some examples of the types of organizational policies this author would expect to be enforced on a digital repository in practice based on the TDR assessment criteria are as follows.
- The organization maintaining the digital repository commits to employing an appropriate level of staff with an appropriate level of training in order to maintain the archive based on Information and Library Science (ILS) best practices and standards.
- The organization maintaining the digital repository commits to providing an appropriate level of funding for the (preservation) maintenance of the repository and its content.
- The organization commits to finding an appropriate organization to take over the repository in the event the original managing organization can no longer do so.
- The staff of the organization commit to documenting the policies, procedures, workflows, and system design of the preservation repository.
- The management and staff maintaining the repository agree to periodically audit the policies and procedures of the repository in order to ensure that they are doing what they say they are doing. This may be a self-assessment using a standard self-audit such as DRAMBORA, or via an outside auditor who will certify that the repository meets Trusted Digital Repository (TDR) criteria.
- Barring any extenuating circumstances, the organization commits to honoring all contracts signed and agreed to at the time the content was acquired or created in-house. This includes the spirit and intent of the agreement, especially if the originating party no longer exists (either a person or an institution).
- The management and organization maintaining the repository agree to honor and enforce all copyright, intellectual property rights, and other legal obligations related to the digital object and repository. These agreements may be separate from any agreements entered into in order to acquire or create the content.
Some examples of the types of digital object management policies this author would expect to be enforced on a digital repository in practice based on the TDR assessment criteria are as follows. These example policies are related to ingest. The files are in a staging area (SIPs), awaiting upload into the preservation repository as AIPs. These policies are in addition to or supplement the policy examples provided above.
- If the digital object does not have a unique ID, or the current unique ID will not be used, then a new unique identifier will be assigned. A record of the changed ID or new ID assignment will be logged.
- A virus scan and a checksum will be run and the fact that these actions were taken on the digital object will be logged. In the event of a virus, the object will be quarantined until the virus is eliminated.
- Any metadata associated with the digital object will be checked for quality and appropriateness. If necessary, the metadata may be supplemented by additional information. If there is no associated metadata, then some metadata will be created.
- Storage and presentation methods will be applied, if appropriate. For example, if the policy is to store the original .tiff file but create .jpeg files for Web rendering and storage via a database, then the .jpeg files may be created in the staging area and stored. Another possible policy may be to create .jpeg files on the fly from the .tiff as needed, once the collection is live and online. This type of policy would save on storage space.
- If the SIP, AIP, and DIP are different, then the final version of the file must be created prior to upload into the repository from the staging area. The original SIP may be stored or deleted, per the policy of the repository. This is not recommended for files that have been cleaned up, as the original “dirty” file may need to be viewed later for provenance and data accuracy purposes.
- Set access privileges, both for internal staff accessing the digital object, and for any external users, assuming the content of the repository is publicly accessible.
- Upload the digital object to the repository, log that the object has been uploaded, and test that the files are retrievable and “renderable”.
In terms of what types of additional policies this author would expect to find related to administrative or management functions that are not part of the TDR assessment criteria, the following types of policies might be applied to a preservation repository. These are not preservation policies per se, but they may (or may not) affect the policies enforced for preservation.
- Collection policies. For example, what types of collections are included or not included in the archive? Images? Documents? Data sets? Only peer-reviewed articles related to Physics? Only Social Science data sets?
- File format policies. Are there any limitations on the type of file formats the repository will or will not store and make available to users? For example, the policy may be to store a .tiff file but only make .jpegs available to users.
- Type of archive policies. Is the repository a dark archive only? A public archive? An archive with limited public access?
- “This is not a preservation repository” policy. The policy may be not to plan to preserve any of the material in the repository, because that is neither the mission nor the concern of the repository managers or the reason for the existence of the repository itself.
- WYSIWYG content and metadata policies. The policy of the repository may be not to invest in quality control on the content or metadata. Therefore, there is no clean up of the digital object or any vetting of the metadata. If and when a user accesses the material, it is What-You-See-Is-What-You-Get (WYSIWYG). This is sometimes related to the limitations of personnel time and funding. For example, in the early 2000s the developers of the National Science Digital Library had to accept what the content owners and creators could provide regarding metadata quality, which was “non-existent” or “terrible”, and rarely “good” or “excellent” (Hillmann & Dushay, 2003).
- Legal, financial, ethical, and collection policies. What types of material will the repository accept and acquire, even when the material falls within the collection policy purview? For example at the University of Southern California, the focus of the digital archive was “Southern California”, and L.A., specifically. The archive primarily consisted of images. In the mid-2000s, the staff discussed acquiring photographic images related to L.A. gangs with the idea of building a gang archive, but the legal issues were deemed to be extremely challenging by all involved. The only way to acquire the material and work around the legal issues would be to require that no access to the photos be allowed until 100 years had passed. The staff could not justify the costs of acquiring the collection for the purposes of embargoing it for that long of a period; this includes the costs associated with maintaining the collection as a dark archive. All digital archive staff agreed, however, that such a collection would be very valuable to historians.
More recently, an archive in the Northeastern United States had recently faced legal action by the British government over oral histories of living former IRA members. The historian who recorded the oral histories had promised the former IRA members that the recordings would be private and not subject them to legal action. The courts are saying otherwise. Thus, a repository manager may have to take into account multiple types of policies with regards to content.
- Software, hardware, and repository design policies. Will the repository use off-the-shelf or one-off/home-grown software? What hardware will the repository run on? Whether home-grown or off-the-shelf, will the software comply with preservation repository recommendations, per the OAIS Reference Model (CCSDS, 2002)? Is compliance with the OAIS Reference Model part of the policies guiding the repository design?
- Policies regarding conflicts between international standards, domain standards, and local rules and regulations. Which policies, standards, rules, and/or regulations will take priority over others? For example, if your national standard (Beedham, et al., 2004 (?)) requires providing access to handicapped citizens, but fulfilling this requirements means that the repository is not compliant with international standards or the standards of the domain represented by the archive and, therefore, will not be considered a TDR, whose rules do you follow? (In this case, Beedham, et al., (2004?) followed their national laws, but criticized the authors of the OAIS Reference Model for not taking into account local laws.)
- Federation policy. Will the repository federate with other repositories? This excludes reciprocal back-up agreements. The federation may include providing metadata for metadata harvesting, or the sharing of the content and metadata itself. For example, the Odum Data Archive provides metadata via an OAI-PMH Data Provider, and also provides users of their data archive with access to ICPSR metadata. A user may or may not be able to access the actual non-Odum Institute ICPSR data sets, however. Therefore, the policy applied by the managers of the Odum Institute data archive is to provide access to the metadata of non-Odum Institute data sets, but not to the data sets themselves.
In conclusion, the CCSDS’ recommendation, “The Audit and Certification of a Trusted Digital Repository” (2011) divides policies into three main types: Technical, Organizational, and Digital Object Management. The policies required to be a Trustworthy Digital Repository encompass many of the policies required to manage a digital archive generally. This means, if the policy of a repository administrator is not to preserve the content, then many of the policies required for a Trusted Digital Repository will still be implemented, as many of those are required for general repository management, anyway.
Repository managers and administrators must also implement managerial and administrative policies that are not part of preserving the content, but yet reflect important decisions that must be made with regards to the repository and the content it contains. This essay has outlined a sample of policy types related both to a Trusted Digital Repository, and to a non-Trusted Digital Repository.
If you would like to work with us on a digital preservation and curation project, please review our informatics consulting page.