KGTK: Knowledge Graph ToolKit
Abstract
KGTK is a comprehensive framework for the creation and exploitation of large knowledge graphs, designed for simplicity, scalability, and interoperability. Its key quality attributes are: 1) native support for reading and writing knowledge graphs in many formats; 2) extensibility, by seamless import and export to popular data science tools like Pandas and ElasticSearch; 3) modularity, i.e., pipeline-friendly design to create workflows with multiple components. 4) speed, i.e., to be comparably fast to SQL databases; and 5) scale to billions of statements, e.g., handle all Wikidata on a laptop. KGTK represents graphs in tables and leverages popular libraries developed for data science applications, enabling a wide audience of developers to easily construct knowledge graph pipelines for their applications. KGTK has dozens of commands, covering a wide range of imports and exports to popular formats, highly scalable querying and storage functionality, a rich and diverse suite of transformation functions,and modern analytics powered by machine learning and graph algorithms. KGTK provides several services, namely a text search interface, a user-friendly browser, a similarity interface, and a SPARQL endpoint. All services of KGTK can be readily customized to arbitrary graphs, thus closing the loop and enabling users to use KGTK seamlessly with their own data in a variety of formats.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 26, 2023
- Accession Number
- AD1204361
Entities
People
- Filip Ilievski
Organizations
- University of Southern California