- Joined
- Mar 3, 2018
- Messages
- 1,713
A recent study from Berkeley shows that machine learning can be used to connect anonymized health data from different sources, potentially identifying users in that data and violating federal HIPAA regulations. This specific study took physical activity data from health monitor/smartwatches and other basic information employers might have access too, as well as readily available data from health organizations, fed it all to a machine learning algorithm, and got new data that was more than the sum of its parts. Unfortunately, the study looks like the tip of an iceberg, as the researchers say there are other dangerous scenarios a machine algorithm could potentially be used for, and suggest that reforms to healthcare privacy laws are needed soon. Thanks to TechXplore for spotting the study.
Using large national physical activity data sets, we found that machine learning successfully reidentified the physical activity data of most children and adults when using 20-minute data with several pieces of demographic information. Partial aggregation of the data over time (eg, reidentifying daily-level physical activity data) did not significantly reduce the accuracy of the reidentification. These results suggest that current practices for deidentification of PAM data might be insufficient to ensure privacy and that there is a need for deidentification that aggregates the physical activity data of multiple individuals to ensure privacy for single individuals.
Using large national physical activity data sets, we found that machine learning successfully reidentified the physical activity data of most children and adults when using 20-minute data with several pieces of demographic information. Partial aggregation of the data over time (eg, reidentifying daily-level physical activity data) did not significantly reduce the accuracy of the reidentification. These results suggest that current practices for deidentification of PAM data might be insufficient to ensure privacy and that there is a need for deidentification that aggregates the physical activity data of multiple individuals to ensure privacy for single individuals.