Reimagining Harvard Collections: AI-enhanced Discovery
In the spring of 2024, the Harvard Library launched the Reimagining Discovery project, an ambitious initiative aimed at uncovering new ways to surface and enable discovery of Harvard’s vast and distinctive special collections. The first phase focuses on developing a platform called Collections Explorer, which leverages semantic search and generative AI technologies to allow Harvard community and researchers worldwide to explore Harvard’s collections through natural language queries. Key technical highlights of Collections Explorer are retrieval-augmented text generation using Amazon Bedrock and implementation of semantic search using open source components. The presentation will consist of an overview of the project goals and objectives as well as a technical deep-dive into the architecture design with diagrams and demonstrations. The technical components will be showcased, including the index builder for generating semantic embeddings, the semantic search API, the LLM API, and the frontend user interface. The presentation will also describe the data pipelines designed to collect data from Harvard’s extensive collections, which are spread across multiple data sources and have a wide variety of formats and varying levels of consistency. The presentation will describe the evaluation process for technical design decisions, and will highlight the most significant challenges encountered and solutions implemented.