Protecting Databases from Malicious Discovery through Automated Similarity Queries
Abstract
Companies, hospitals, and research laboratories in certain domains have developed extensive databases, such as clinical databases, as part of their research or daily activities. The entities that have developed these databases may wish to lease or allow use parts of the database by external users. Due to the significant time and monetary investment in the development of the databases, and the proprietary or the private nature of the data itself, they may not want to sell or allow access to the entire database. However we show that such databases are vulnerable to reverse engineering using popularly employed similarity-based queries. We identify some important security issues related to k-NN search and investigate their vulnerabilities against users who try to copy the database by sending automated queries. We analyze two models for similarity search, namely reply model and score model. Reply model responds with the k tuples that most closely match the query according to some metric, and score model responds with only the score of similarity search which provides more power in preserving the privacy. For these models we analyze possible attack methodologies and develop strategies that can be used to detect the potential attacks. We state the limits of protection provided by each query response model, and also provide techniques to guard the database against malicious discovery.
Document Details
- Document Type
- Technical Report
- Publication Date
- Feb 01, 2006
- Accession Number
- AD1001208
Entities
People
- Ali S. Tosun
- Fatih Altiparmak
- Hakan Ferhatosmanoglu
Organizations
- Ohio State University