Flickr Photos Were Used to Train IBM Facial Recognition

AlphaAtlas

[H]ard|Gawd
Staff member
Joined
Mar 3, 2018
Messages
1,713
IBM, and a number of other researchers and institutions, have made extensive use of a Yahoo!-curated Flickr database for their facial recognition development efforts, and according to a recent report from NBC, this is raising concerns among privacy experts and the subjects within those photos. While IBM says they'll remove photos from the database upon receiving a request, they don't provide an easy way to tell if a particular user's photos are contained within the database, hence NBC has set up a tool to do exactly that.

While the report largely focuses on the privacy and social issues surrounding IBM's database, it also touches on another big issue in the world of AI training: licensing. Machine learning algorithms can require huge datasets to effectively train, and many of the images in datasets I've seen are scraped from the web without much thought about their associated restrictions. That's already a legal and ethical issue for researchers, but it becomes even more problematic when those neural networks start showing up in commercial software, which happens more and more every day.

Academics often appeal to the noncommercial nature of their work to bypass questions of copyright. Flickr became an appealing resource for facial recognition researchers because many users published their images under "Creative Commons" licenses, which means that others can reuse their pictures without paying license fees... Experts note that the distinction between the research wings and commercial operations of corporations such as IBM and Facebook is a blurry one. Ultimately, IBM owns any intellectual property developed by its research unit... Holzer was concerned that a company like IBM - even its research division - had used photos he published under a noncommercial license. "Since I assume that IBM is not a charitable organization and at the end of the day wants to make money with this technology, this is clearly a commercial use," he said.
 
At this point it should be no surprise that anything you put online can and will be used by others in ways you didn't intend.

Many of my pictures are still being used in Chinese apps against my will.

That said, I think it takes a lot of gaul for a company to say "here is something someone posted for reason x, I'm going to go ahead and use it for my own gain". There ought to be a law only allowing use of online content for the purpose it was originally posted.

End all data mining.
 
if the images are public then it's what ever, that's the users fault for making them public. if it's also including private photo's that's a whole different story in my opinion.
 
At this point it should be no surprise that anything you put online can and will be used by others in ways you didn't intend.

Yeah that's my impression. If you post selfies to Flickr under a CC license, and you don't like other people using them non-commercially, tough.

However, the question is where does non-commercial licensing end and commercial use begin? If a neural network gets trained on an image of you, and gets sold as commercial software that doesn't actually contain your image, is that fair use? What if it's only used for developing the network, even if they use a different database for the commercial product? What about "research purposes?"

I don't think our laws are setup to handle usage rights for "imprints" of images (or other data) in trained neural networks. As many AI researchers have said before, the training process is basically a black box.
 
these are public photos BUT are used for a commercial purpose.

so they should be compensated appropriately.

but like the holders of the database would do that.......not even a free month of premium subscription
 
Well if IBM used your image and then use the resulting product for commercial purposes...class action suit

database holders will just trade your pics for IBM shares. US0.001 for 1 pic of a face. Extra charges marked as administrative charge ( so they do not need to increase valuation but still benefit. )

Then charge trading fees to split and sell the share. ( with a fee structure that suits them) . Ends up your class action settlement would be US0.0001 for 1 pic.

then go about and say out of 1000 pics of a face, only 10 is eligible.
Then your payment won't even be worth the paper it is printed on.

Too bad lawmakers are the only ones whom can correct this...
 
Not surprised. If you don't want others using/seeing your pictures/data, then don't post it on the web.
 
they don't provide an easy way to tell if a particular user's photos are contained within the database,

Hmm, maybe they should develop some form of algorithm that searches images for facial features to try to match another image?

"We've likely used your Flickr photo to train our facial recognition software."

"Remove my image!"

"Sorry, we don't have the technology to find your image. But that'd be a cool feature."
 
Well if IBM used your image and then use the resulting product for commercial purposes...class action suit

Problem is, your photos are only being used in the training portion of the algorithm. Once that's done and the model evolves, it's just weights for feature take off. The picture can be discarded or removed. That's why many researches just scrape images from Google search (as mentioned).

This is like people posting pictures and messages to a public (physical) message board, viewable by everyone, only to have the janitor come along and take them all. Except this time, the janitor shredded them down (making them unrecognizable) and made a new art piece which he then sold for tons of money.

How do you put a valuation on something like that? Sure, a single post-it note with your doodle went into the material of the artwork, but was it significant enough to be considered a share of it?

Food for thought.
 
Back
Top