Text Embeddings for Entity Resolution and Name Disambiguation in the Library Catalog

Library catalogs have long grappled with the challenges of entity resolution, hindered by legacy string-based workflows for authority control. Standards that were suitable for printed card catalogs have become a significant source of technical debt in the digital era. For instance, current systems for automated authority control fail to disambiguate common names that represent multiple individual identities. Traditional full-text search platforms may partially mask this problem for users, but it becomes increasingly pronounced in linked data discovery environments, where accurate entity resolution is crucial. To address these persistent challenges, a novel approach that utilizes text embeddings, vector databases, and machine learning is proposed. This method clusters identities based on weighted semantic similarity rather than simple string matching. It offers a scalable solution that can reduce technical debt and promote effective identity management as libraries shift toward more dynamic, interconnected data ecosystems.

Speaker(s)

Tim, a middle-aged white male with short, slightly graying hair and beard, flashes a friendly smile in black and white. Pictured from the shoulders up, he wears a spotted tie. Behind him is a blurred metal bookcase full of books.

Tim Thompson

Gavin Mendel-Gleason

March 10^th

11:35 AM

15 minutes