"Big Data Privacy for Machine Learning Just Got 100 Times Cheaper"
Computer scientists at Rice University have discovered a method that cuts the cost for companies to implement a rigorous form of data privacy called differential privacy when using or sharing large databases for Machine Learning (ML). ML could benefit society in many ways if data privacy is ensured. If ML systems are trained to search for patterns in large databases containing medical or financial records, there is significant potential for improving medical care or identifying patterns of discrimination. However, that is currently impossible as data privacy methods do not scale. Therefore, the Rice University researchers proposed the use of a technique called locality-sensitive hashing, which they found could create a small summary of a large database of sensitive records. They dubbed the method RACE, drawing its name from the summaries, or "repeated array of count estimators" sketches. According to the researchers, it is safe to make RACE sketches publicly available. They are also useful for algorithms involving kernel sums, which are fundamental to ML, and ML programs that perform classification, ranking, and other common tasks. Companies can use RACE to benefit from large-scale, distributed ML and maintain differential privacy. This article continues to discuss the new method that slashes the cost of implementing differential privacy.
Rice University reports "Big Data Privacy for Machine Learning Just Got 100 Times Cheaper"