Statistical Method and Theory for Privacy and Fairness in Trustworthy Artificial Intelligence
Abstract
Project SummaryApproved for Public ReleaseResearch problem and objective Trustworthy AI problems nowadays arise from a wide range of, industries (mission critical or not) such as Financial and Healthcare industries (medical institutions want to collaborate without,concerns on data privacy), AI hiring (female job applicants are unfairly treated in video interviews), and Auto-driving (insufficien,t data from corner cases in training environments). Therefore, it is not surprising that governments have announced stricter and str,icter regulations on AI such as the famous ``General Data Protection Regulation in EU. The next generation of artificial intellige,nce should be driven by trustworthiness, beyond performance. This will lead to paradigm shift in methodological and theoretical stud,ies of AI. Technical approachesThe theme of this proposal is to build trustworthy AI systems using either data-centric or algorithm-,centric solutions. Our proposal consists of three projects exploring privacy and fairness aspects of trustworthy AI. The first two p,rojects deal with differential privacy achieved by either traditional algorithmic approaches or modern data-centric approaches, whil,e the last project develops a theoretical benchmark for fair classification, together with user friendly algorithms. In this proposa,l, several statistical models are considered including linear regression, nonparametric regression/classification and deep neural ne,tworks. With a rapid development of machine learning, plentiful information can be predicted from massive data. Meanwhile, data priv,acy has drawn ,ramework of DP, we focus on an important but much less studied scenario that datasets need to be partially privatized. In this scena,rio, the conventional privacy-preserving approaches, such as noise injection and shuffling, will no longer work. To this end, we pro,pose a series of algorithmic solutions in Project 1.To complement algorithmic solutions, we consider protecting privacy using a data,-centric approach, i.e., synthetic data generation, in Project 2. We?ll produce artificially created data sets that remove individua,l information but still retain similar statistical information as the raw data sets. Despite numerous synthesis algorithms, we still, lack a theoretical understanding of how the generation of synthetic data affects the utility of downstream machine learning tasks.,This motivates us to develop statistical learning framework for the analysis of synthetic data. Machine learning algorithms are wide,ly integrated into high-stakes decision making processes, such as in job application and criminal prediction. However, empirical stu,dies have shown that most existing algorithms focus on performance, retaining or even amplifying implicit unfairness in historical d,ata. There are growing ethical concerns on the machine learning algorithms, and official institutions and organizations advocate con,sidering fairness in AI practice. The last project is devoted to establish a theoretical benchmark for fair classification algorithm,s, based on which user friendly and large scale algorithms are developed with guaranteed statistical optimality.Anticipated outcomes,We will develop user-friendly publicly available software as Tensorflow library or PyTorch. Their performance will be thoroughly eva,luated using different real-world data sets. We will summarize our findings in publications.Impact on DoD capacitiesThe U.S. Dept. o, privacy and civil liberties. The proposed research into the theoretical foundation of privacy and fairness in AI ensures a trusted,AI ecosystem in ONR.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Sep 08, 2022
- Source ID
- N000142212680
Entities
People
- Guang Cheng
Organizations
- Office of Naval Research
- United States Navy
- University of California, Los Angeles