IBM, and a number of other researchers and institutions, have made extensive use of a Yahoo!-curated Flickr database for their facial recognition development efforts, and according to a recent report from NBC, this is raising concerns among privacy experts and the subjects within those photos. While IBM says they'll remove photos from the database upon receiving a request, they don't provide an easy way to tell if a particular user's photos are contained within the database, hence NBC has set up a tool to do exactly that. While the report largely focuses on the privacy and social issues surrounding IBM's database, it also touches on another big issue in the world of AI training: licensing. Machine learning algorithms can require huge datasets to effectively train, and many of the images in datasets I've seen are scraped from the web without much thought about their associated restrictions. That's already a legal and ethical issue for researchers, but it becomes even more problematic when those neural networks start showing up in commercial software, which happens more and more every day. Academics often appeal to the noncommercial nature of their work to bypass questions of copyright. Flickr became an appealing resource for facial recognition researchers because many users published their images under "Creative Commons" licenses, which means that others can reuse their pictures without paying license fees... Experts note that the distinction between the research wings and commercial operations of corporations such as IBM and Facebook is a blurry one. Ultimately, IBM owns any intellectual property developed by its research unit... Holzer was concerned that a company like IBM - even its research division - had used photos he published under a noncommercial license. "Since I assume that IBM is not a charitable organization and at the end of the day wants to make money with this technology, this is clearly a commercial use," he said.