SEO and digital stewardship for creative industry websites
Category: Digital Stewardship
This is a collection of published and unpublished articles, papers, presentations, etc. from Jewel H. Ward’s academic career. The posts cover the following areas of her career.
The Open Archives Protocol for Metadata Harvesting (OAI-PMH)
ISO 16363 the Audit and Certification of Trusted Digital Repositories (TDR)
ISO 14721 the Open Archival Information System (OAIS) Reference Model
Data governance provides you with a method for the proper care and feeding of data and information. When applied correctly, it provides the appropriate checks and balances. Data governance ensures those who need access to the data can do so.
Therefore, the policies deny connection to the data by those people, internal or external, friend or foe, who do not need to know anything related to the data.
What Kinds of Data Governance Policies Do We Mean?
Any kind of policy, actually. The policies might prevent access to research data by staff and outside persons who do not need to see it. In addition, it means unauthorized people cannot alter the data.
When we protect research data and store it properly, it allows future scientists to study past research. They can note changes in the environment, climate, or simply in people’s attitudes and actions over time.
In addition, when we preserve data according to standard policies, it provides a window into the past. More importantly, it means that research is repeatable. This is one of the basic criteria for any applied research method.
The drawbacks to data governance are that when we apply it improperly, it becomes a bottleneck that impedes work or research. People then work around it. The policies become paper rules and regulations, that are not enforced within a culture or within machine code.
When people circumvent data governance policies, this may aid in pushing day-to-day work forward. However, over the long-term, it leaves companies and organizations open to litigation and/or expensive fines. With regards to cultural heritage or research data, once the data is lost, damaged, or altered in some way, most cannot be recovered.
What is the Best Method for the Care of Data and Information?
The best course of action with any data governance policy is to balance the needs of employees and researchers to get their work done, against rules and regulations.
Large fines for GDPR violations could wipe out any gains by employees “cheating” to bypass data restrictions. The loss of research and cultural heritage data has no price.
When applying data governance to your enterprise, research project, or organization, the best course of action is to be brutally pragmatic.
Figure out which polices you must enforce. Next, determine which ones you should enforce. Last, list the data governance policies that would be nice to enforce. Then proceed to set up your systems and processes accordingly.
If you would like to work with us on a digital stewardship project, please see our consulting services page.
I thought I would help others by posting the two examples that helped me create my mixed-methods content analysis proposal. Included in this post is my mixed-methods content analysis dissertation literature review.
Preparing to write a literature review is challenging, and I found myself frustrated at the scarcity of articles. I hope these two citations and studies help you.
The first citation is to a dissertation proposal. The second example is to the follow up dissertation. Both examples are actual papers submitted to faculty. N.V. Ivankova used them to fulfill the requirements for graduation.
Example Studies Using Mixed-Methods Content Analysis
Ivankova, N.V. (2002.) Students’ Persistence in the University of Nebraska – Lincoln Distributed Doctoral Program in Educational Administration: a Mixed Methods Study (Doctoral Proposal). Sage Publications:New York. Retrieved from http://studysites.sagepub.com/creswellstudy/Sample%20Student%20Proposals/Proposal-MM-Ivankova.pdf. (pdf)
Ivankova, N.V. (2004). Students’ persistence in the University of Nebraska -Lincoln Distributed Doctoral Program in Educational Leadership in Higher Education: A mixed methods study (doctoral dissertation). Retrieved from the ETD collection for University of Nebraska – Lincoln. Paper AAI3131545. http://digitalcommons.unl.edu/dissertations/AAI3131545. (pdf)
Ward, J.H. (2012). Managing Data: Content Analysis Methodology. Unpublished manuscript, University of North Carolina at Chapel Hill. (pdf)
Please leave have any mixed-methods content analysis literature advice you would like to contribute in the comments section.
If you would like to work with Impact Zone on a content analysis or data analysis and analytics project, please see our services page.
Jewel Ward. A Quantitative Analysis of Dublin Core Metadata Element Set (DCMES) Usage in Data Providers Registered with the Open Archives Initiative (OAI). A Master’s paper for the M.S. in I.S. degree. November, 2002. 68 pages. Advisor: Gregory B. Newby
This research describes an empirical study of how the Dublin Core Metadata Element Set (DCMES) is used by 100 Data Providers (DPs) registered with the Open Archives Initiative (OAI). The research was conducted to determine whether or not the DCMES is used to its full capabilities.
Eighty-two of 100 DPs have metadata records available for analysis. DCMES usage varies by type of DP. The average number of Dublin Core elements per record is eight, with an average of 91,785 Dublin Core elements used per DP. Five of the 15 elements of the DCMES are used 71% of the time. The results show the DCMES is not used to its fullest extent within DPs registered with OAI.
Electronic data archives – Standards.
Science and Technology – Databases.
Dublin Core and OAI-PMH Research Questions
As a result of reading about this debate, I decided to analyze DCMES usage by registered OAI-PMH-compliant DPs. My hypothesis is that DCMES is not used to as full an extent as possible by the DPs. My research aims to answer the following questions.
Which individual elements of the DCMES are used or not used?
Which individual elements of the DCMES are used the most? Which are used the least?
Are there different “types” of DPs? If so, does usage of individual elements of the DCMES vary by type?
The answers to these questions are applicable to the debate over whether or not the unqualified DCMES is an appropriate metadata schema for the OAI-PMH.
Preservation Standards, and Audit and Certification Mechanisms Question
What types of policies would you expect to be enforced on a digital repository, based on the emerging Trustworthiness assessment criteria? What types of additional policies would you expect to find related to administrative or management functions?
Ward, J.H. (2012). Doctoral Comprehensive Exam No.4, Managing Data: Preservation Standards and Audit and Certification Mechanisms (e.g., “policies”). Unpublished, University of North Carolina at Chapel Hill. (pdf)
Note: All errors are mine. I have posted the question and the result “as-is”. The comprehensive exams are held as follows. You have five closed book examinations five days in a row, one exam is given each day. You are mailed a question at a set time. Four hours later, your return your answer. If you pass, you pass. If not…well, then it depends. The student will need to have a very long talk with his or her advisor. I passed all of mine. — Jewel H. Ward, 24 December 2015
Preservation Standards, and Audit and Certification Mechanisms Response
The CCSDS’ “Audit and Certification of a Trusted Digital Repository” (2011) describes policies in terms of the (1) technical framework, (2) the organizational framework, and, (3) the digital object itself. These policies may be applied and enforced manually (by humans) or at the machine level (by computers using computer code). Some of the policies required for a repository to be considered a Trusted Digital Repository (TDR) are also required for day-to-day management of the repository generally. Other types of policies are completely outside of the requirements for a TDR, yet they are important for the day-to-day management of it. This essay will address both types of policies.
Some examples of the types of technical policies this author would expect to be enforced on a digital repository in practice based on the TDR assessment criteria are as follows. In some instances in the examples below, the policy of the repository administrators may be, for example, to save the original file format/SIP…or not save it. The enforced policy will depend on the mission of the repository and the implicit and explicit policies that are developed and applied by the human managers of the repository.
The hardware, software, and file formats must/must not be migrated.
A copy of the original file format and the original software version to render the original version must be/must not be retained for provenance purposes.
At least two off-site backups must be implemented, and the back ups must be tested periodically to ensure they are actually backing up the data as required and expected.
The contents of the repository must be catalogued; i.e., the administrators of the repository have logged what objects are in the repository.
The administrator of the repository must be able to audit all actions performed on an object, including what, by whom, and when.
Upon ingest, the digital object is scanned for viruses and a checksum is performed.
The administrator must be able to access, retrieve, and render all digital objects in the repository, either for his or her own erudition, or, if appropriate for users.
Any software required to render the digital object will be maintained and migrated (if possible; some software may not have newer versions).
If a digital object is to be deleted on X date, then it must be deleted, and a follow up audit run to ensure the object was actually deleted.
If the content rendered via a digital object requires any clean up, then the clean up of the data/content will be documented. The original (un-cleaned up file) must be saved for provenance purposes. Some organizations may make the decision not to save the original (un-cleaned) digital object.
The administrator of the repository must enforce appropriate restrictions to the data. For example, some digital objects may be only available to users via a certain IP (Internet Protocol) range.
Some examples of the types of organizational policies this author would expect to be enforced on a digital repository in practice based on the TDR assessment criteria are as follows.
The organization maintaining the digital repository commits to employing an appropriate level of staff with an appropriate level of training in order to maintain the archive based on Information and Library Science (ILS) best practices and standards.
The organization maintaining the digital repository commits to providing an appropriate level of funding for the (preservation) maintenance of the repository and its content.
The organization commits to finding an appropriate organization to take over the repository in the event the original managing organization can no longer do so.
The staff of the organization commit to documenting the policies, procedures, workflows, and system design of the preservation repository.
The management and staff maintaining the repository agree to periodically audit the policies and procedures of the repository in order to ensure that they are doing what they say they are doing. This may be a self-assessment using a standard self-audit such as DRAMBORA, or via an outside auditor who will certify that the repository meets Trusted Digital Repository (TDR) criteria.
Barring any extenuating circumstances, the organization commits to honoring all contracts signed and agreed to at the time the content was acquired or created in-house. This includes the spirit and intent of the agreement, especially if the originating party no longer exists (either a person or an institution).
The management and organization maintaining the repository agree to honor and enforce all copyright, intellectual property rights, and other legal obligations related to the digital object and repository. These agreements may be separate from any agreements entered into in order to acquire or create the content.
Some examples of the types of digital object management policies this author would expect to be enforced on a digital repository in practice based on the TDR assessment criteria are as follows. These example policies are related to ingest. The files are in a staging area (SIPs), awaiting upload into the preservation repository as AIPs. These policies are in addition to or supplement the policy examples provided above.
If the digital object does not have a unique ID, or the current unique ID will not be used, then a new unique identifier will be assigned. A record of the changed ID or new ID assignment will be logged.
A virus scan and a checksum will be run and the fact that these actions were taken on the digital object will be logged. In the event of a virus, the object will be quarantined until the virus is eliminated.
Any metadata associated with the digital object will be checked for quality and appropriateness. If necessary, the metadata may be supplemented by additional information. If there is no associated metadata, then some metadata will be created.
Storage and presentation methods will be applied, if appropriate. For example, if the policy is to store the original .tiff file but create .jpeg files for Web rendering and storage via a database, then the .jpeg files may be created in the staging area and stored. Another possible policy may be to create .jpeg files on the fly from the .tiff as needed, once the collection is live and online. This type of policy would save on storage space.
If the SIP, AIP, and DIP are different, then the final version of the file must be created prior to upload into the repository from the staging area. The original SIP may be stored or deleted, per the policy of the repository. This is not recommended for files that have been cleaned up, as the original “dirty” file may need to be viewed later for provenance and data accuracy purposes.
Set access privileges, both for internal staff accessing the digital object, and for any external users, assuming the content of the repository is publicly accessible.
Upload the digital object to the repository, log that the object has been uploaded, and test that the files are retrievable and “renderable”.
In terms of what types of additional policies this author would expect to find related to administrative or management functions that are not part of the TDR assessment criteria, the following types of policies might be applied to a preservation repository. These are not preservation policies per se, but they may (or may not) affect the policies enforced for preservation.
Collection policies. For example, what types of collections are included or not included in the archive? Images? Documents? Data sets? Only peer-reviewed articles related to Physics? Only Social Science data sets?
File format policies. Are there any limitations on the type of file formats the repository will or will not store and make available to users? For example, the policy may be to store a .tiff file but only make .jpegs available to users.
Type of archive policies. Is the repository a dark archive only? A public archive? An archive with limited public access?
“This is not a preservation repository” policy. The policy may be not to plan to preserve any of the material in the repository, because that is neither the mission nor the concern of the repository managers or the reason for the existence of the repository itself.
WYSIWYG content and metadata policies. The policy of the repository may be not to invest in quality control on the content or metadata. Therefore, there is no clean up of the digital object or any vetting of the metadata. If and when a user accesses the material, it is What-You-See-Is-What-You-Get (WYSIWYG). This is sometimes related to the limitations of personnel time and funding. For example, in the early 2000s the developers of the National Science Digital Library had to accept what the content owners and creators could provide regarding metadata quality, which was “non-existent” or “terrible”, and rarely “good” or “excellent” (Hillmann & Dushay, 2003).
Legal, financial, ethical, and collection policies. What types of material will the repository accept and acquire, even when the material falls within the collection policy purview? For example at the University of Southern California, the focus of the digital archive was “Southern California”, and L.A., specifically. The archive primarily consisted of images. In the mid-2000s, the staff discussed acquiring photographic images related to L.A. gangs with the idea of building a gang archive, but the legal issues were deemed to be extremely challenging by all involved. The only way to acquire the material and work around the legal issues would be to require that no access to the photos be allowed until 100 years had passed. The staff could not justify the costs of acquiring the collection for the purposes of embargoing it for that long of a period; this includes the costs associated with maintaining the collection as a dark archive. All digital archive staff agreed, however, that such a collection would be very valuable to historians.
More recently, an archive in the Northeastern United States had recently faced legal action by the British government over oral histories of living former IRA members. The historian who recorded the oral histories had promised the former IRA members that the recordings would be private and not subject them to legal action. The courts are saying otherwise. Thus, a repository manager may have to take into account multiple types of policies with regards to content.
Software, hardware, and repository design policies. Will the repository use off-the-shelf or one-off/home-grown software? What hardware will the repository run on? Whether home-grown or off-the-shelf, will the software comply with preservation repository recommendations, per the OAIS Reference Model (CCSDS, 2002)? Is compliance with the OAIS Reference Model part of the policies guiding the repository design?
Policies regarding conflicts between international standards, domain standards, and local rules and regulations. Which policies, standards, rules, and/or regulations will take priority over others? For example, if your national standard (Beedham, et al., 2004 (?)) requires providing access to handicapped citizens, but fulfilling this requirements means that the repository is not compliant with international standards or the standards of the domain represented by the archive and, therefore, will not be considered a TDR, whose rules do you follow? (In this case, Beedham, et al., (2004?) followed their national laws, but criticized the authors of the OAIS Reference Model for not taking into account local laws.)
Federation policy. Will the repository federate with other repositories? This excludes reciprocal back-up agreements. The federation may include providing metadata for metadata harvesting, or the sharing of the content and metadata itself. For example, the Odum Data Archive provides metadata via an OAI-PMH Data Provider, and also provides users of their data archive with access to ICPSR metadata. A user may or may not be able to access the actual non-Odum Institute ICPSR data sets, however. Therefore, the policy applied by the managers of the Odum Institute data archive is to provide access to the metadata of non-Odum Institute data sets, but not to the data sets themselves.
In conclusion, the CCSDS’ recommendation, “The Audit and Certification of a Trusted Digital Repository” (2011) divides policies into three main types: Technical, Organizational, and Digital Object Management. The policies required to be a Trustworthy Digital Repository encompass many of the policies required to manage a digital archive generally. This means, if the policy of a repository administrator is not to preserve the content, then many of the policies required for a Trusted Digital Repository will still be implemented, as many of those are required for general repository management, anyway.
Repository managers and administrators must also implement managerial and administrative policies that are not part of preserving the content, but yet reflect important decisions that must be made with regards to the repository and the content it contains. This essay has outlined a sample of policy types related both to a Trusted Digital Repository, and to a non-Trusted Digital Repository.
If you would like to work with us on a digital preservation and curation or data governance project, please review our services page.
Computer scientists who work with digital data that has long-term preservation value, archivists and librarians whose responsibilities include preserving digital materials, and other stakeholders in digital preservation have long called for the development and adoption of open standards in support of long-term digital preservation. Over the past fifteen years, preservation experts have defined “trust” and a “trustworthy” digital repository; defined the attributes and responsibilities of a trustworthy digital repository; defined the criteria and created a checklist for the audit and certification of a trustworthy digital repository; evolved this criteria into a standard; and defined a standard for bodies who wish to provide audit and certification to candidate trustworthy digital repositories. This literature review discusses the development of standards for the audit and certification of a trustworthy digital repository.
Ward, J.H. (2012). Managing Data: Preservation Standards & Audit & Certification Mechanisms (i.e., “policies”). Unpublished Manuscript, University of North Carolina at Chapel Hill. (pdf)
Computer scientists who work with digital data that has long-term preservation value, archivists and librarians whose responsibilities include preserving digital materials, and other stakeholders in digital preservation have long called for the development and adoption of open standards in support of long-term digital preservation (Lee, 2010; Science and Technology Council, 2007; Waters & Garrett, 1996). However, Hedstrom (1995) cautions that only “if” standards provide the conditions for the archive to conform to standard archival practices, software and hardware designers comply with the standards, and producers and users select and use the standards, will they then provide a high-level solution to some of the obstacles that may prevent the preservation of digital materials. The development of standards for the audit and certification of digital repositories as “trustworthy” is a major development towards ensuring that digital data will be curated and preserved for the indefinite long-term, as they provide the conditions so that all three of Hedstrom’s criteria may be met.
In 1996, the Commission on Preservation and Access and the Research Libraries Group released the now-seminal report, “Preserving Digital Information” (Waters & Garrett, 1996). The Research Libraries Group (RLG) (2002) noted three key points that lead to the interest in developing standards for the “attributes and responsibilities” of a “trusted digital repository”: the requirement for ‘a deep infrastructure capable of supporting a distributed system of digital archives’; ‘the existence of a sufficient number of trusted organizations capable of storing, migrating, and providing access to digital collections’; and, ‘a process of certification is needed to create an overall climate of trust about the prospects of preserving digital information’. A few years later, the Consultative Committee on Space Data Systems (CCSDS) released the “Reference Model for an Open Archival Information System (OAIS)” (CCSDS, 2002). This document defined a set of common terms, components, and concepts for a digital archive. It provided not just a technical reference, but outlined the organization of people and systems required to preserve information for the indefinite long-term and make it accessible (RLG, 2002).
However, experts and other stakeholders with an interest in preserving information for the long-term recognized that as part of defining an archival system, they also needed to form a consensus on the responsibilities and characteristics of a sustainable digital repository. In other words, they needed a method to “prove” (i.e., “trust”) that an organization’s systems were, in-fact, OAIS-compliant. First, they would have to define the attributes and responsibilities of a “trusted” digital repository. Next, they would have to develop a method to audit and certify that a repository may be “trusted”. And, finally, they would have to create an infrastructure to certify and train the auditors.
This essay on “preservation standards and audit and certification mechanisms” is an overview of “trust”; the types of audit and certification available generally; the development of standards for the audit and certification of a repository as “trustworthy”; a brief overview of the standards themselves; and, a very brief overview of the requirements for the certification of bodies that certify the auditors of said trusted digital repositories. Thus, the scope of this particular literature review is deliberately narrow to avoid the duplication of previously discussed topics.
Jøsang and Knapskog (1998) discussed “trust” as a “subjective belief” when they described a metric for a “trusted system”, while Lynch (2000) described “trust” as an elusive and subjective probability. Both the former and the latter wrote that a user trusts the evaluation of the certifier, not the actual system component. Jøsang and Knapskog drew attention to that fact that an evaluator only certifies that a system has been checked against a particular set of criteria; whether or not a user should or will trust that criteria is another matter. The two researchers pointed out that most end users of a certified system do not have the necessary expertise to evaluate the appropriateness and quality of the criteria used to audit the system. They must trust that the people who established the criteria chose relevant components, and that the evaluator had the skill and knowledge to assess the system.
This is similar to Lynch (2001), who wrote that users tend to assume digital system designers and content creators have users’ best interests at heart, which is not always the case; yet the idea of creating a formal system of trust “is complex and alien to most people”. Ross & McHugh (2006) posit that “trust” may be established with the various stakeholders affiliated with a repository by providing quantifiable “evidence” such as annual financial reports, business plans, policy documents, procedure manuals, mission statements, etc., so that a system’s “trustworthiness” is believable. Jøsang & Knapskog (1998) and Ross & McHugh’s (2006) research goal was to provide a methodical evaluation of system components to define “trust” in a system that in and of itself was trustworthy (RLG, 2002).
Finally, Merriam-Webster (Trust, 2011) defines “trust” as “one in which confidence is placed”; “a charge or duty imposed in faith or confidence or as a condition of some relationship”; and, “something committed or entrusted to one to be used or cared for in the interest of another”.
The Types of Audit and Certification
Jøsang and Knapskog (1998) described four types of roles generally assigned to “government driven evaluation schemes”: accreditor, certifier, evaluator, and, sponsor. They defined the accreditor as the body that accredits the evaluator, the certifier, and, sometimes, evaluates the system itself. They noted that the certifier is accredited based on “documented competence level, skill, and resources”. They stipulated that the certifier might also be a “government body issuing…certificates based on the evaluation reports from the evaluators”. They defined the evaluator as “yet another government agency” that is “accredited by the accreditor”, and “the quality of the evaluator’s work will be supervised by the certifier”. They described the sponsor as the party interested in having their system evaluated (Jøsang & Knapskog, 1998). In other words, the authors wrote that someone who would like their system audited and certified by a particular evaluation criteria (“the sponsor”) hires an auditor (“the evaluator”) who has been certified (“the certifier”) by an accredited agency (“the accreditor”).
RLG (2002) defined four approaches to certification: individual, program, process, and data. They described “individual” as personnel certification. This is also called professional certification or accreditation, and it is often given to an individual when they meet some combination of work experience, education, and professional competencies. RLG noted that at the time of writing, there were no professional certifications for digital repository management or electronic archiving. They cited “program” as a type of certification for an institution or a program achieved through a combination of site visits and “self-evaluation using standardized checklists and criteria”.
RLG explained that the assessment areas included access, outreach, collection preservation and development, staff, facilities, governing and legal authority, and financial resources. They provided examples of this type of certification that included museums, schools and programs within a university, etc. They defined “process” as “quantitative or qualitative guidelines…to internal and external requirements” that use various methods and procedures, such as the ISO 9000 family of standards (RLG, 2002).
Finally, the authors designated the “data” approach to certification as addressing “the persistence or reliability of data over time and data security”. They wrote that this certification requires adherence to procedures manuals and international standards, such as ISO, that ensure both external and internal quality control. They note that certification will require the managers of a repository to document migration processes, to maintain and create metadata, authenticate new copies, as well as update the data or files (RLG, 2002).
Trusted Digital Repositories: Attributes and Responsibilities
RLG (2002) defined a “trusted digital repository” as “one whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now and in the future”. They described the “critical component” as “the ability to prove reliability and trustworthiness over time”. The authors’ stated goal for the report was to create a framework for large and small institutions that could cover different responsibilities, architectures, materials, and situations yet still provide a foundation with which to build a sustainable “trusted repository” (RLG, 2002).
Trusted Digital Repositories
The authors of the RLG document noted that repositories may be contracted to a third party or locally designed and maintained, regardless, the expectations for trust require that a digital repository must:
Accept responsibility for the long-term maintenance of digital resources on behalf of its depositors and for the benefit of current and future users;
Have an organizational system that supports not only long-term viability of the repository, but also the digital information for which it has responsibility;
Demonstrate fiscal responsibility and sustainability;
Design its system(s) in accordance with commonly accepted conventions and standards to ensure the ongoing management, access, and security of materials deposited within it;
Establish methodologies for system evaluation that meet community expectations of trustworthiness;
Be depended upon to carry out its long-term responsibilities to depositors and users openly and explicitly;
Have policies, practices, and performance that can be audited and measured; and
Meet the responsibilities detailed in Section 3 [sic] of this paper” (RLG, 2002).
Per the OAIS Reference Model (CCSDS, 2002), they noted that the repository’s “designated community” will be the primary determining factor in how the content is accessed and disseminated; managed and preserved; and what, including content and format, is deposited. The authors of the report discussed and defined “trust”, noting, “most cultural institutions are already trusted”. Regardless, they outlined three levels of trust that administrators of a repository must consider in order to be a “trusted repository”: the trust a cultural institution must earn from their designated community; the trust cultural institutions must have in third-party providers; and the trust users of the repository must have in the digital objects provided to them by the repository owner via the repository software.
The report authors wrote that archives, libraries, and museums must simply keep doing what they have been doing for centuries in order to maintain the trust of their user community; they do not need to develop that trust, as institutions, they have already earned it. RLG (2002) explained that while librarians, archivists, etc., are loath to use third-party providers who have not proven their reliability, the establishment of a certification program with periodic re-audits may overcome their reluctance. Finally, the authors stated that users must be able to trust that the digital items they receive from a repository are both authentic and reliable. In other words, the objects the users access must be unaltered and they must be what they purport to be (Bearman & Trant, 1998).
They established that this can be accomplished by the use of checksums and other forms of validation that are common in the Computer Science and digital security communities, although security does not equal integrity (Lynch, 1994). Waters & Garrett (1996) put forth that the “central goal” of an archival repository must be “to preserve information integrity”; this includes content, fixity, reference, provenance, and context.
RLG (2002) identified seven primary attributes of a trusted digital repository. They were and are: compliance with the OAIS Reference Model; administrative responsibility; organizational viability; financial sustainability; technological and procedural suitability; system security; and procedural accountability.
The authors defined “compliance with the OAIS” as the repository owners/administrators ensuring that the “overall repository system conforms” to the OAIS Reference Model. They described “administrative responsibility” as the repository administrators adhering to “community-agreed” best practices and standards, particularly with regards to sustainability and long-term viability. RLG (2002) explained “organizational viability” as creating and maintaining an organization and structure that is capable of curating the objects in the repository and providing access to them for the indefinite long-term. They included as part of this maintaining trained staff, legal status, transparent business practices, succession plans, and maintaining relevant policies and procedures.
RLG (2002) designated “financial sustainability” as maintaining financial fitness, engaging in financial planning, etc., with an ongoing commitment to remain financially viable over the long-term. The authors outlined “technological and procedural suitability” as the repository owners/administrators keeping the archives software and hardware up to date, as well as complying with applicable best practices and standards for technical digital preservation. They traced an outline for “system security” by describing the minimal requirements a repository must follow regarding best practices for risk management, including written policies and procedures for disaster preparedness, redundancy, firewalls, back up, authentication, data loss and corruption, etc.
Finally, RLG (2002) defined “procedural accountability” as the repository owners/administrators being accountable for all of the above. That is, the authors wrote that maintaining a trusted digital repository is a complex set of “interrelated tasks and functions”; the maintainer of the repository is responsible for ensuring that all required functions, tasks, and components are carried out (RLG, 2002).
Responsibilities of a Trusted Digital Repository
RLG (2002) described two primary responsibilities for the owners and administrators of a trusted digital repository: high-level organizational and curatorial responsibilities, and, operational responsibilities. They subdivided organizational and curatorial responsibilities into three levels. The authors noted that organizations must understand their local requirements, which other organizations may have similar requirements, and, how these responsibilities may be shared.
The authors of the report summarized five primary areas in support of those three levels: the scope of the collections, preservation and lifecycle management, the wide range of stakeholders, the ownership of material and other legal issues, and, cost implications (RLG, 2002).
The scope of the collections: the repository owners and administrators must know exactly what they have in their digital collection, and how to adequately preserve the integrity and authenticity of the properties and characteristics of the individual items.
Preservation and lifecycle management: the repository owners and administrators must commit to proactive planning with regards to preserving and curating the items in the repository.
The wide range of stakeholders: the repository owners and administrators must take into account the interests of all stakeholders when planning for long-term access to the materials. In some instances, they will have to act in spite of their stakeholder’s wishes, as some stakeholders tend to have short-term views, and they will not care about the long-term preservation of, and access to, the materials. Other stakeholders will have a differing point of view, and they will want the material preserved in the long-term. The repository owners and administrators will have to balance these competing interests.
The ownership of material and other legal issues: digital librarians and archivists will have to take a proactive role with content producers. They must seek to preserve materials by curating the data early in the life cycle of it, while being cognizant of the copyright and intellectual property concerns of the content producers and owners.
Cost implications: repository owners and administrators must commit financial resources to maintaining the content over the indefinite long-term, while bearing in mind that the true costs of doing so are variable.
In sum, RLG (2002) recommended incorporating preservation planning into the everyday management of the preservation repository.
Next, the authors of this RLG report defined operational responsibilities in more detail than the organizational and curatorial responsibilities, above. They wrote the operational responsibilities based on the OAIS Reference Model, and added to that the “critical role” of a repository in the “promotion of standards” (RLG, 2002). They defined these areas as:
Negotiates for and accepts appropriate information from information producers and rights holders: this responsibility covers the submission agreement between a content Producer and the OAIS Archive. These responsibilities include preservation metadata, record keeping, authenticity checks, and legal issues. As part of fulfilling this role, a repository will have policies and procedures in place to cover collection development, copyright and intellectual property rights concerns, metadata standards, provenance and authenticity, appropriate archival assessment, and, records of all transactions with the Producer.
Obtains sufficient control of the information provided to support long-term preservation: this responsibility refers to the “staging” process, where ingested content is stored after submission from a Producer and before the material is ingested into the archive. The responsibilities of a repository administrator at this point encompass best practices for the ingest of materials, which includes an analysis of the digital content itself, including its “significant properties”; what requirements must be fulfilled to provide access to the material continuously; a metadata check against the repository’s standards (including adding metadata to bring the current metadata up to par); the assignment of a persistent and unique identifier; integrity/fixity/authentication checks; the creation of an OAIS Archival Storage Package (AIP); and, storage into the OAIS Archive.
Determines, either by itself of [sic] with others, the users that make up its designated community, which should be able to understand the information provided: the repository administrators and owners must determine who their user base is so that they may understand how best to serve their Designated Community.
Ensures that the information to be preserved is “independently understandable” to the designated community; that is, the community can understand the information without needing the assistance of experts: the repository owner and administrator must make the information available using generic tools that are available to the Designated Community. For example, documents might be made available via .pdf or .rtf because the software to render these documents is available for free to most users. A repository owner and/or administrator may not wish to preserve documents in the .pages file format, as this Apple file format is not commonly used and the software to render it is not free beyond a limited day trial period.
Follows documented policies and procedures that ensure the information is preserved against all reasonable contingencies and enables the information to be disseminated as authenticated copies of the original or as traceable to the original: the repository owners and administrators will document any unwritten policies and procedures, and follow best practice recommendations and standards where possible. These policies must include policies to define the Designated Community and its knowledge base; policies for material storage, including service-level agreements; policies for authentication and access control; a collection development policy, including preservation planning; a policy to keep policies updated with current recommendations, standards, and best practices; and, finally, links between procedures and policies, to ensure compliance across all collections in the repository.
Makes the preserved information available to the designated community: the repository owners and administrators must comply with legal responsibilities such as licensing, copyright, and intellectual property regarding access to the content in the repository. Within that framework, however, they should plan to provide user support, record keeping, pricing (where applicable), authentication, and, most importantly, a method for resource discovery.
Works closely with the repository’s designated community to advocate the use of good and (where possible) standard practice in the creation of digital resources; this may include an outreach program for potential depositors: the repository owners and administrators should work with all stakeholders to advocate the use of standards and recommended best practices (RLG, 2002). As the Science and Technology Council (2007) noted, using standards will reduce costs for all parties involved and better ensure the longevity of the material.
In conclusion, the OAIS Reference Model has provided a useful framework “for identifying the responsibilities of a trusted digital repository” (RLG, 2002).
Certification of a Trusted Digital Repository
As part of the certification framework, the authors of the RLG report intended to support Waters & Garrett’s (1996) assertion that archival repositories “must be able to prove that they are who they say they are by meeting or exceeding the standards and criteria of an independently-administered program for archival certification”.
RLG (2002) described two types of certification then in use within the libraries and archives community: the standards model and the audit model. The “standards” model is an informal process. They stated that standards are created when best practices and guidelines are established by the consensus of the expert community and then “certified” by other practitioners’ acceptance and/or use of the “standard”. In other words, librarians, archivists, and computer scientists who work with libraries decide what constitutes a “standard”; only rarely does a standard become formalized via ISO or another international organization. The authors described the audit model as an output of legislation or policies and procedures established by national agencies, such as the U.S. Department of Defense. That is, a governing body passes laws or policies, and the information repository’s policies must conform to the governing body’s requirements (RLG, 2002).
For a discussion of other approaches to certification, please see an earlier section, “Types of Audit and Certifications”.
RLG (2002) described a framework for a trusted digital repository’s responsibilities and attributes. They noted that these apply to repositories both large and small that hold a wide variety of content. The authors summarized their work above with several recommendations.
Recommendation 1: Develop a framework and process to support the certification of digital repositories.
Recommendation 2: Research and create tools to identify the attributes of digital materials that must be preserved.
Recommendation 3: Research and develop models for cooperative repository networks and services.
Recommendation 4: Design and develop systems for the unique, persistent identification of digital objects that expressly support long-term preservation.
Recommendation 5: Investigate and disseminate information about the complex relationship between digital preservation and intellectual property rights.
Recommendation 6: Investigate and determine which technical strategies best provide for continuing access to digital resources.
Recommendation 7: Investigate and define the minimal-level metadata required to manage digital information for the long term. Develop tools to automatically generate and/or extract as much of the required metadata as possible (RLG, 2002).
The remainder of this essay focuses on the results of Recommendation 1, above, regarding the development of certification standards for digital repositories.
Trusted Digital Repositories: Audit and Certification
Several researchers have addressed the problem of audit and certification. For example, Ross & McHugh (2006) created the Digital Repository Audit Method Based On Risk Assessment (DRAMBORA) to provide a self-audit method for repository administrators that provided quantifiable results (Digital Curation Centre, 2011). Dobratz, Schoger, and Strathmann (2006) created nestor, the Network of Expertise in Long-Term Storage of Digital Resources. Other lesser-known researchers such as Becker, et al. (2009) described a decision-making procedure for preservation planning that provides a means for repository administrators to consider various alternatives.
This section will examine the audit and certification method known as the “Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist” and its follow up document, the “Audit and Certification of Trustworthy Repositories Recommended Practice”. Researchers and practitioners across the globe — including Ross, McHugh, Dobratz, et al. – combined their efforts and contributed their expertise into developing TRAC from a draft into a final version (Research Libraries Group, 2005; Dale, 2007). Their efforts have led to the development and refinement of TRAC into a CCSDS “Recommended Practice”; this may eventually become an ISO standard.
Trustworthy Repositories Audit & Certification: Criteria and Checklist
The authors of TRAC created it as part of a larger international effort to define an audit and certification process to ensure the longevity of digital objects. They defined a checklist that any repository manager could use to assess the trustworthiness of the repository. The checklist provided examples of the required evidence, but the list is considered “prescriptive”; the authors did not try to list every possible type of example. It contained three sections: “organizational infrastructure”, “digital object management”, and, “technologies, technical infrastructure, and security”.
The authors provided a spreadsheet-style “audit checklist” called “Criteria for Measuring Trustworthiness of Digital Repositories and Archives”. They note that the criteria measured is applicable to any kind of repository, using documentation (evidence), transparency (both internal and external), adequacy (individual context), and, measurability (i.e., objective controls). The authors stated that a full certification process must include not just an external audit, but tools to allow for self-examination and planning prior to an audit (OCLC & CRL, 2007). The terminology in the audit checklist conformed to the OAIS Reference Model.
A typical policy in TRAC followed the model of statement, explanation, and evidence (see Figure 1, below).
I. Organizational Infrastructure
The authors of TRAC considered the organizational infrastructure to be as critical a component as the technical infrastructure (OCLC & CRL, 2007). This reflected the view of the authors of the OAIS Reference Model, who consider an OAIS to be “an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community” (CCSDS, 2002). OCLC & CRL (2007) considered “organizational attributes” to be a characteristic of a trusted digital repository, and these characteristics are reflected RLG’s (2002) grouping of financial sustainability, organizational viability, procedural accountability, and administrative responsibility as four of the seven attributes of a trusted digital repository.
The authors of TRAC considered the following ten elements to be part of organizational infrastructure, but they did not limit it to only these elements.
Mandate or purpose
Roles and responsibilities
Financial issues, including assets
Contracts, licenses, and liabilities
Transparency (OCLC & CRL, 2007).
In addition, they grouped the above elements into five areas:
Governance and organizational viability: the owners and managers of a repository must commit to established best practices and standards for the long term. This includes mission statements, and succession/contingency plans.
Organizational structure and staffing: the repository owners and managers must commit to hiring an appropriate number of qualified staff that receives regular ongoing professional development.
Procedural accountability and policy framework: the repository owners and managers must provide transparency with regards to documentation related the long-term preservation and access of the archival data. This requirement provides evidence to stakeholders of the repository’s trustworthiness. This documentation may define the Designated Community, what policies and procedures are in place, legal requirements and obligations, reviews, feedback, self-assessment, provenance and integrity, and operations and management.
Financial sustainability: the repository owners and administrators must follow solid business practices that provide for the long-term sustainability of the organization and the digital archive. This includes business plans, annual reviews, financial audits, risk management, and possible funding gaps.
Contracts, licenses, and liabilities: the repository owners and administrators must make contracts and licenses “available for audits so that liabilities and risks may be evaluated”. This requirement includes deposit agreements, licenses, preservation rights, collection maintenance agreements, intellectual property and copyright, and, ingest (OCLC & CRL, 2007).
II. Digital Object Management
The authors described this section as a combination of technical and organizational aspects. They organized the requirements for this section to align with six of the seven OAIS Functional Entities: Ingest, Archival Storage, Preservation Planning, Data Management, Administration, and Access (OCLC & CRL, 2007; CCSDS, 2002). The authors of the TRAC audit & checklist defined these six sections as follows.
The initial phase of ingest that addresses acquisition of digital content.
The final phase of ingest that places the acquired digital content into the forms, often referred to as Archival Information Packages (AIPs), used by the repository for long-term preservation.
Current, sound, and documented preservation strategies along with mechanisms to keep them up to date in the face of changing technical environments.
Minimal conditions for performing long-term preservation of AIPs.
Minimal-level metadata to allow digital objects to be located and managed within the system.
The repository’s ability to produce and disseminate accurate, authentic versions of the digital objects (OCLC & CRL, 2007).
The authors further elucidated the above areas as follows.
Ingest: acquisition of content
This section covered the process required to acquire content; this generally falls under the realm of a Submission Agreement between the Producer and the repository. The Producer may be external or internal to the repository’s governing organization. The authors recommended considering the object’s properties, any information that needs to be associated with the submitted object (s), mechanisms to authenticate the materials, verify each ingested object for integrity, maintaining control of the bits so that none may be altered at any time, regular contact with the Producer as appropriate, a formal acceptance process with the Producer for all content, and, an audit trail of the Ingest process.
Ingest: creation of the archival package
The actions in this section covered the creation of an AIP. These actions involved documentation: of each AIP preserved by the repository; that each AIP created is actually adequate for preservation purposes; of the process of constructing an AIP from a SIP; of the actions performed on each SIP (deletion or creation as an AIP); of the use of persistent and unique naming schemas/identifiers, else, of the preservation of the existing unique naming schema; of the context for each AIP; of an audit trail of the metadata records ingested; of associated preservation metadata; of testing the ability of current tools to render the information content; of the verification of completeness of each AIP; of an integrity audit mechanism for the content; and, of any actions and process related to AIP creation.
The authors’ recommended four simple actions a repository administrator may take regarding keeping the archive current. The administrator must document their current preservation strategies; monitor format, etc., obsolescence; adjust the preservation plan if or when conditions change; and, provide evidence that the preservation plan used is actually effective.
Archival storage & preservation/maintenance of AIPs
The actions in this section covered what is required to ensure that an AIP is actually being preserved. This involved examining multiple aspects of object maintenance, including, but not limited to, storage, tracking, checksums, migration, transformations, and copies/replicas. The repository administrator must be able to demonstrate the use of standard preservation strategies; that the repository actually implements these strategies; that the Content Information is preserved; that the integrity of the AIP is audited; and that there is an audit trail of any actions performed on an AIP.
This section addressed the requirements related to descriptive metadata. The repository owner must identify the minimal metadata required for retrieval by the Designated Community; create a minimal amount of descriptive metadata and attach it to the described object; and, prove there is referential integrity between each AIP and its associated metadata (both creation and maintenance of).
The authors designed this section to address methods for providing access to the content (i.e., DIPs) in the repository to the Designated Community; they wrote that the degree of sophistication of this would vary based on the context of the repository itself and the requirements of the Designated Community. They further subdivided this section into four areas: access conditions and actions, access security, access functionality, and, provenance. In order to fulfill the requirements presented in this section, a repository owner must: provide information to the Designated Community as to what access and delivery options are actually available; require an audit of all access actions; only provide access to particular Designated Community members as agreed to with the Producer; ensure access policies are documented and comply with deposit agreements; fully implement the stated access policy; log all access failures; demonstrate the DIP generated is what the user requested; prove that access success or failure is made known to the user within a reasonable length of time; and, all DIPs generated may be traced to an authentic original and themselves authentic (OCLC & CRL, 2007).
In summary, OCLC & CRL (2007) designed this section to make it mandatory for a trustworthy digital repository to be able to produce a DIP, “however primitive”.
III. Technologies, Technical Infrastructure, and Security
The authors of TRAC did not want to make specific software and hardware requirements, as many of these would fall under standard computer science best practices and they are covered by other standards. Therefore, they addressed general information technology areas as related to digital preservation. These areas fall under one of three categories: system infrastructure, appropriate technologies, and security (OCLC & CRL, 2007).
This section addressed the basic infrastructure required to ensure the trustworthiness of any actions performed on an AIP. This meant that the repository administrator must be able to demonstrate that the operating systems and other core software are maintained and updated; the software and hardware are adequate to provide back ups; the number and location of all digital objects, including duplicates, are managed; all known copies are synched; audit mechanisms are in place to discover bit-level changes; any such bit-level changes are reported to management, including the steps taken to prevent further loss and replace/repair the current corruption and loss; processes are in place for hardware and software changes (e.g., migration); a change management process is in place to mitigate changes to critical processes; there is process for testing the effect of critical changes prior to an actual implementation; and, software security updates are implemented with an awareness of the risks versus benefits of doing so.
The authors recommended that a repository administrator should look to the Designated Community for relevant standards and strategies. They proposed that the hardware and software technologies in place are appropriate for the Designated Community, and that appropriate monitoring is in place to update hardware and software as appropriate.
This section addressed non-IT security, as well as IT security. The authors recommended that a repository administrator conducts a regular risk assessment of internal and external threats; ensures controls are in place to address any assessed threats; decides which staff members are authorized to do what and when; and, has an appropriate disaster preparedness plan in place, including off-site recovery plan copies (OCLC & CRL, 2007).
In conclusion, the archivists, librarians, computer scientists, and other experts who contributed to the development of TRAC created a document that encompassed the minimum requirements for an OAIS Archive to be considered “trustworthy”.
Audit and Certification of Trustworthy Digital Repositories Recommended Practice
The CCSDS released the “Audit and Certification of Trustworthy Digital Repositories Recommended Practice” (v. CCSDS 652.0-M-1, the “Magenta Book”) in September 2011 (CCSDS, 2011). This section will discuss the Recommended Practice only with regards to major differences with TRAC (OCLC & CRL, 2007), above. This is because the two documents are similar enough that to repeat a description of each of the sections would be gratuitous.
The CCSDS described the purpose of the Recommended Practice as that of providing the documentation “on which to base an audit and certification process for assessing the trustworthiness of digital repositories” (CCSDS, 2011). The essay “Managing Data: the Emergence & Development of Digital Curation & Digital Preservation Standards” contains an overview of this Recommended Practice. This section will cover areas not covered by the overview in that essay or earlier in this document.
The three major sections of the Recommended Practice are the same as for TRAC, except that the last section has been re-named. Therefore, instead of “organizational infrastructure”, “digital object management”, and, “technologies, technical infrastructure, & security”, the authors of the Recommended Practice renamed the last section, “infrastructure and security risk management”. Within that technology section, the sections were reduced from three to two. Therefore, instead of, “system infrastructure”, “appropriate technologies”, and “security”, the Recommended Practice contains sub-sections on “technical infrastructure risk management” and “security risk management”. The subsections for “organizational infrastructure” and “digital object management” remained the same. The CCSDS re-worded, re-organized, and expanded the content of the sub-sections, but the general ideas behind each section stayed in place. So for example, Figure 2, below, is the Recommended Practice version of the same content in the same section in TRAC from Figure 1, above.
In short, the members of the CCSDS evolved and expanded the original TRAC checklist to create the Recommended Practice, but overall, the ideas in the original version have held up well during the four-year transition to a Recommended Standard.
Trusted Digital Repositories: Requirements for Certifiers
Both Waters & Garrett (1996) and RLG (2002) recommended the creation of a certification program for trusted digital repositories. As a result, librarians, archivists, computer scientists and other experts and stakeholders in digital preservation created the “Trustworthy repositories audit & certification: criteria and checklist” in order to create a common set of standards and terminology by which a repository may be certified. These experts and others then took TRAC, via the CCSDS, and created the “Audit and Certification of Trustworthy Digital Repositories (CCSDS 652.0-M-1) Recommended Practice”. As part of the process of creating this Recommended Practice, these experts also determined the requirements for bodies that will provide the audit and certification of “candidate” trustworthy digital repositories.
They created a second Recommended Practice, “Requirements for bodies providing audit and certification of candidate trustworthy digital repositories CCSDS 652.1-M-1”. This Recommended Practice for bodies providing audit and certification is a supplement to an existing ISO Standard that outlines the requirements for a body performing audit and certification, “Conformity assessment — Requirements for bodies providing audit and certification of management systems” (ISO/IEC 17021, 2011).
ISO/IEC 17021 Conformity Assessment
The authors of this standard covered seven primary areas: principles, general requirements, structural requirements, resource requirements, information requirements, process requirements, and, management of system requirements for certification bodies. They defined “principles” as covering impartiality, competence, responsibility, openness, confidentiality, and responsiveness to complaints. They described “general requirements” as covering legal and contractual matters, management of impartiality, and liability and financing. They kept “structural requirements” simple — this is about the organizational structure and top management, and a committee for safeguarding impartiality.
The authors detailed “resource requirements” as covering the competence of management and personnel, the personnel involved in the certification activities, the use of individual auditors and external technical experts, personnel records, and outsourcing. They outlined “information requirements” as publicly accessible information, certification documents, directory of certified clients, reference to certification and use of marks, confidentiality, and the information exchange between a certification body and its clients. The authors delineated “process requirements” as covering general requirements, audit and certification, surveillance activities, recertification, special audits, suspending, withdrawing or reducing the scope of certification, appeals, complaints, and, the records of applicants and clients.
Finally, the authors provided three options for “management systems requirements for certification bodies” that includes general management requirements and management system requirements that are in accordance with ISO 9001. In document appendices, the authors discussed the required knowledge and skills to be an auditor, the possible types of evaluation methods, provided an example of a process flow for determining and maintaining competence, desired personal behaviors, the requirements for a third-party audit and certification process, and, considerations for the audit programme, scope or plan (ISO/IEC 17021, 2011).
Requirements for Bodies Providing Audit and Certification of Candidate Trustworthy Digital Repositories Recommended Practice
This section of this essay will address the areas in which the Recommended Practice for bodies providing audit and certification differs from “ISO/IEC 17021 Conformity Assessment”.
The CCSDS created the Recommended Practice, “Requirements for bodies providing audit and certification of candidate trustworthy digital repositories” as a supplement to “Conformity assessment — Requirements for bodies providing audit and certification of management systems” (ISO/IEC 17021, 2011). They created the document to provide additional information on which an organization that is assessing a digital repository for certification as trustworthy may base their operations for issuance of such certification (CCSDS, 2011). In other words, the CCSDS (2011) created the document to support the accreditation of bodies providing certification. They created the document with a secondary purpose of providing repository owners with documentation by which they may understand the processes involved in achieving certification. They wrote the document using terminology from the OAIS Reference Model.
The authors defined a “Primary Trustworthy Digital Repository Authorisation Body” (PTAB) as an organization that accredits training courses for auditors, accredits other certification bodies, and that provides audit and certification of candidate trustworthy digital repositories. The membership consists of “internationally recognized experts in digital preservation” (CCSDS, 2011). They defined the primary tasks of the organization as: accrediting other trustworthy digital repository certification bodies; certifying auditors; making certification decisions; accrediting auditor qualifications; undertaking audits; and, last, having a mechanism to add new experts to PTAB as needed. They noted that PTAB will also be accredited by ISO and will become a member of the International Accreditation Forum (IAF). In the event of any possible conflicts of interest, the authors designated two areas that are not considered conflicts by those members who are certifiers: lecturing, including in training courses, and identifying areas of improvement during the course of an audit (CCSDS, 2011).
The CCSDS outlined the criteria for the training of audit team members. This training must include: understanding digital preservation, including the technical aspects related to the audited activity; understanding of knowledge management systems; a general knowledge of the regulatory requirements related to trustworthy digital repositories; an understanding of the basic principles related to auditing, per ISO standards; an understanding of risk management and risk assessment with regards to digitally encoded information; and, finally, an understanding of the Recommended Practice, “Audit and Certification of Trustworthy Digital Repositories (CCSDS 652.0-M-1)”.
Furthermore, the authors specified that the audit team should have or find members with appropriate technical knowledge for the scope of the digital repository certification, the necessary comprehension of any applicable regulatory requirements for that repository, and knowledge of the repository owner’s organization, such that an appropriate audit may be conducted. The CCSDS wrote that the audit team might be supplemented with the necessary technical expertise, as needed. As well, the authors charged PTAB with assessing the conduct of auditors and experts and monitoring their performance, as well as selecting these experts and auditors based on appropriate experience, competence, training, and qualifications (CCSDS, 2011).
The CCSDS outlined the required levels of work experience for a trusted digital repository auditor. They required these auditors to have completed five days of training via PTAB or an accredited agency; some prior experience assessing trustworthiness, including participating in two audit certifications for a total of 20 days; four years of workplace experience focusing on digital preservation; remained current with regards to digital preservation best practices and standards; current experience; and, received certification from PTAB. The authors stipulated three additional requirements for audit team leaders. They must be able to effectively communicate in writing and orally; have been an auditor previously for two completed trustworthy digital repository audits; and, have the capability and knowledge of managing an audit certification process (CCSDS, 2011).
The authors outlined additional recommendations, including a requirement that the auditor must have access to the client organization’s records. If these records may not be accessed, then it is possible the audit may not be performed. The CCSDS defined the criteria against which an audit is performed as those defined in the Recommended Practice, “Audit and Certification of Trustworthy Digital Repositories (CCSDS 652.0-M-1)”. They require two auditors to be present on site; other auditors may work remotely. The authors’ note in an appendix on security that all auditors maintain confidentiality with respect to an organization’s systems, content, structure, data, etc., as required (CCSDS, 2011).
In conclusion, the CCSDS has created a method for a larger umbrella organization — PTAB — to certify the certifiers of a trusted digital repository by creating a “Recommended Practice for bodies providing audit and certification” as a supplement to the existing ISO/IEC standard for “Conformity assessment — Requirements for bodies providing audit and certification of management systems”. By creating both a certification program and the criteria for certification of trustworthiness, these experts believe they have ensured the availability of digital information over the indefinite long-term.
Trusted Digital Repositories: Criticisms
Gladney (2005; 2004) has been a vocal critic of the repository-centric approach to digital preservation, which he considers “unworkable”. He has proposed, instead, the creation of durable digital objects that encode all required preservation information within the digital object itself. R. Moore has reservations about the “top-down” approach, in which standards are handed-down from a body of experts to be used by practitioners. He would like to know what policies preservation data grid administrators are actually implementing at the machine-level (Ward, 2011).
Similar to R. Moore’s concerns, Thibodeau (2007) supports the development of standards for digital preservation, but he believes these standards should be supplemented by empirical data regarding the purpose of each repository. For example, practitioners should not assess a repository based solely on whether or not the repository is OAIS-compliant. He writes that practitioners should consider the purpose of the repository, its mission, and its user base, and whether or not the repository owner’s are fulfilling those requirements. Thibodeau (2007) defined a five-point framework for repository evaluation that considers service, collaboration, “state”, orientation, and coverage. He believes that this broader context, along with the OAIS Reference Model and the Recommended Practice for the Audit and Certification of Trustworthy Repositories, provide a more realistic determiner of a repository’s “success” or “failure”.
Archivists, librarians, computer scientists and other stakeholders and experts in digital preservation wanted to create certification standards for trustworthy digital repositories, and they voiced this desire in a 1996 report, “Preserving Digital Information” (Waters & Garrett, 1996). As one part of this enthusiasm for standards, the CCSDS released the OAIS Reference Model (CCSDS, 2002). Experts recognized that a technical framework was only part of a preservation repository, and so they worked to define the attributes and responsibilities of a trusted digital repository (RLG, 2002). They created an audit and certification checklist based on these attributes and responsibilities, called TRAC (OCLC & CRL, 2007). After receiving feedback from the preservation community, the CCSDS evolved TRAC into the Recommended Practice for the Audit and Certification of Trustworthy Digital Repositories (2011), and released the Recommended Practice for Requirements for Bodies Providing Audit and Certification of Candidate Trustworthy Digital Repositories (2011).
Thus, after many years of work, stakeholders with an interest in the preservation of digital material now have criteria against which to judge whether or not a repository and its contents are likely to last for the indefinite long-term, as well as an umbrella organization that will provide certified and trained auditors. To reiterate these accomplishments, over the past fifteen years, preservation experts have defined “trust” and a “trustworthy” digital repository; defined the attributes and responsibilities of a trustworthy digital repository; defined the criteria and created a checklist for the audit and certification of a trustworthy digital repository; evolved this criteria into a standard; and defined a standard for bodies who wish to provide audit and certification to candidate trustworthy digital repositories.
The significance of these accomplishments cannot be overstated — at stake in the concerns over the preservation of digital objects and information are the cultural and scientific heritage, and personal information, of humanity.
Bearman, D. & Trant, J. (1998). Authenticity of digital resources: towards a statement of requirements in the research process. D-Lib Magazine. Retrieved April 14, 2009, from http://www.dlib.org/dlib/june98/06bearman.html
Becker, C., Kulovits, H., Guttenbrunner, M., Strodl, S., Rauber, A., & Hofman, H. (2009). Systematic planning for digital preservation: evaluating potential strategies and building preservation plans. International Journal of Digital Libraries, 10(4), 133-157.
CCSDS. (2011). Requirements for bodies providing audit and certification of candidate trustworthy digital repositories recommended practice (CCSDS 652.1-M-1). Magenta Book, November 2011. Washington, DC: National Aeronautics and Space Administration (NASA).
CCSDS. (2011). Audit and certification of trustworthy digital repositories recommended practice (CCSDS 652.0-M-1). Magenta Book, September 2011. Washington, DC: National Aeronautics and Space Administration (NASA).
CCSDS. (2002). Reference model for an Open Archival Information System (OAIS) (CCSDS 650.0-B-1). Washington, DC: National Aeronautics and Space Administration (NASA). Retrieved April 3, 2007, from http://nost.gsfc.nasa.gov/isoas/
Dale, R. (2007). Mapping of audit & certification criteria for CRL meeting (15-16 January 2007). Retrieved September 11, 2007, from http://wiki.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/TRAC-Nestor-DCC-criteria_mapping.doc
Digital Curation Centre. (2011). DRAMBORA. Retrieved December 9, 2011, from http://www.dcc.ac.uk/resources/tools-and-applications/drambora
Dobratz, S., Schoger, A., & Strathmann, S. (2006). The nestor Catalogue of Criteria for Trusted Digital Repository Evaluation and Certification. Paper presented at the workshop on “digital curation & trusted repositories: seeking success”, held in conjunction with the ACM/IEEE Joint Conference on Digital Libraries, June 11-15, 2006, Chapel Hill, NC, USA. Retrieved December 1, 2011, from http://www.ils.unc.edu/tibbo/JCDL2006/Dobratz-JCDLWorkshop2006.pdf
Gladney, H.M. & Lorie, R.A. (2005). Trustworthy 100-Year digital objects: durable encoding for when it is too late to ask. ACM Transactions on Information Systems, 23(3), 229-324. Retrieved December 29, 2011, from http://eprints.erpanet.org/7/
Gladney, H.M. (2004). Trustworthy 100-Year digital objects: evidence after every witness is dead. ACM Transactions on Information Systems, 22(3), 406-436. Retrieved July 12, 2008, from http://doi.acm.org/10.1145/1010614.1010617
Hedstrom, M. (1995). Electronic archives: integrity and access in the network environment. American Archivist, 58(3), 312-324.
ISO/IEC 17021. (2011.) Conformity assessment — Requirements for bodies providing audit and certification of management systems. Retrieved December 30, 2011, from http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=56676
Jøsang, A. & Knapskog, S.J. (1998). A metric for trusted systems. In Proceedings of the 21st National Information Systems Security Conference (NISSC), October 6-9, 1998, Crystal City, Virginia. Retrieved December 27, 2011, from http://csrc.nist.gov/nissc/1998/proceedings/paperA2.pdf
Lee, C. (2010). Open archival information system (OAIS) reference model. In Encyclopedia of Library and Information Sciences, Third Edition. London: Taylor & Francis.
Lynch, C. (2001). When documents deceive: trust and provenance as new factors for information retrieval in a tangled web. Journal of the American Society for Information Science and Technology, 52(1), 12-17.
Lynch, C. (2000). Authenticity and integrity in the digital environment: an exploratory analysis of the central role of trust. Authenticity in a digital environment. Washington, DC: Council in Library and Information Resources. Retrieved April 14, 2009, from http://www.clir.org/pubs/reports/pub92/pub92.pdf
Lynch, C. A. (1994). The integrity of digital information: mechanics and definitional issues. Journal of the American Society for Information Science, 45(10), 737-744.
OCLC & CRL. (2007). Trustworthy repositories audit & certification: criteria and checklist version 1.0. Dublin, OH & Chicago, IL: OCLC & CRL. Retrieved September 11, 2007, from http://www.crl.edu/sites/default/files/attachments/pages/trac_0.pdf
Research Libraries Group. (2005). An audit checklist for the certification of trusted digital repositories, draft for public comment. Mountain View, CA: Research Libraries Group. Retrieved April 14, 2009, from http://worldcat.org/arcviewer/1/OCC/2007/08/08/0000070511/viewer/file2416.pdf
Research Libraries Group. (2002). Trusted digital repositories: attributes and responsibilities an RLG-OCLC report. Mountain View, CA: Research Libraries Group. Retrieved September 11, 2007, from http://www.oclc.org/programs/ourwork/past/trustedrep/repositories.pdf
Ross, S. & McHugh, A. (2006). The role of evidence in establishing trust in repositories. D-Lib Magazine 12(7/8). Retrieved May 6, 2007, from http://www.dlib.org/dlib/july06/ross/07ross.html
Science and Technology Council. (2007). The digital dilemma strategic issues in archiving and accessing digital motion picture materials. The Science and Technology Council of the Academy of Motion Picture Arts and Sciences. Hollywood, CA: Academy of Motion Picture Arts and Sciences.
Thibodeau, K. (2007). If you build it, will it fly? Criteria for success in a digital repository. Journal of Digital Information, 8(2). Retrieved December 27, 2011, from http://journals.tdl.org/jodi/article/view/197/174
Trust. (2011). Merriam-Webster.com. Encyclopaedia Britannica Company. Retrieved December 30, 2011, from http://www.merriam-webster.com/dictionary/trust
Ward, J.H. (2011). Classifying Implemented Policies and Identifying Factors in Machine-Level Policy Sharing within the integrated Rule-Oriented Data System (iRODS). In Proceedings of the iRODS User Group Meeting 2011, February 17-18, 2011, Chapel Hill, NC.
Waters, D. and Garrett, J. (1996). Preserving Digital Information. Report of the Task Force on Archiving of Digital Information. Washington, DC: CLIR, May 1996.
If you would like to work with us on a digital curation and preservation or data governance project, please review our services page.