Text Embeddings for Entity Resolution and Name Disambiguation in the Library Catalog
Library catalogs have long grappled with the challenges of entity resolution, hindered by legacy string-based workflows for authority control. Standards that were suitable for printed card catalogs have become a significant source of technical debt in the digital era. For instance, current systems for automated authority control fail to disambiguate common names that represent multiple individual identities. Traditional full-text search platforms may partially mask this problem for users, but it becomes increasingly pronounced in linked data discovery environments, where accurate entity resolution is crucial. To address these persistent challenges, a novel approach that utilizes text embeddings, vector databases, and machine learning is proposed. This method clusters identities based on weighted semantic similarity rather than simple string matching. It offers a scalable solution that can reduce technical debt and promote effective identity management as libraries shift toward more dynamic, interconnected data ecosystems.