Case Study

GenAI supports archivists to preserve document accessibility over time

Eng supported the Archival Hub of the Emilia-Romagna Region in evolving the process for assessing the interoperability of digital file formats, ensuring that preserved documents remain consistently readable over time.


Format interoperability refers to the ability of different systems and applications to exchange, read, and correctly interpret data, regardless of the format in which it was created. This concept is essential to ensure compatibility between different software solutions and to facilitate the long-term preservation and management of digital documents.


Where: Italy

 

The project successfully leveraged AI and Machine Learning to automate the evaluation of file formats. We developed a prototype format catalog and a workflow for calculating the interoperability index, both of which were presented at various conferences and generated significant interest due to the methodology and the future potential for creating more efficient, reproducible, and scalable digital preservation solutions. From understanding the complexities of file formats and existing registries to developing and refining the AI-based classification model, the project highlighted the importance of a multidisciplinary approach in tackling the innovative challenges of digital archiving.

Marianna Tascone ParER, Polo archivistico dell'Emilia-Romagna
Challenge
The Archival Hub of the Emilia-Romagna Region identifies file formats at the time of upload to the archive and periodically reassesses their interoperability to promptly address those at risk of obsolescence. The process, carried out by domain experts in accordance with current AgID regulations, is conducted through the Format Registry and requires a significant investment of time and resources.
Approach
To effectively support archivists, the implemented solution must handle complex legal-technical language and incorporate domain-specific knowledge. EngGPT, our private Generative AI LLM, classifies textual evaluations based on AgID criteria and generates labels that combine evaluation data with classification parameters. The analyzed texts are assigned individual scores, which contribute to the overall interoperability score.
Digital Ecosystem
Solution
Digital Ecosystem
We developed a quantitative evaluation process and applied it to the entire file format database of one of the most authoritative international sources in the field, the Library of Congress of the United States. The result is an extended Format Catalog, enriched with assessments of interoperability levels, based on an objective and reproducible evaluation process compliant with national AgID regulations. The solution supports archivists by providing concise natural language justifications for each evaluation. The reliability and quality of the results are ensured by enriching the model with contextual information. Alongside the Catalog, a consultation application was released in a second project phase to support clients in navigating the evaluated Catalog.
Results

 

 

 

 

Quantitative evaluation with scores based on a reference value scale

 

 

Evaluations and explanations available in two languages (Italian and English)

 

 

Reliable, highquality results

 

 

Full transparency of the models decision-making process

Technologies

To know more
Case Study

Labour Information System of Emilia-Romagna (SILER)

Labour Information System to centralize data and service management, ensuring interoperability between the regional system, local authorities, other regions, Ministry of Labour through application cooperation services.

Case Study

Reengineering of the Health Record of the Regione Molise

The evolution will facilitate information sharing among healthcare facilities and ensure citizens access to an integrated and coordinated service.

Case Study

The digital transformation of the Local Police of Turin

Thanks to Municipia's digital ecosystem, the Command has improved process management, streamlined tasks related to administrative penalties, benefiting both the entity and users.