Flickr Photos Were Used to Train IBM Facial Recognition

Discussion in 'HardForum Tech News' started by AlphaAtlas, Mar 13, 2019.

  1. AlphaAtlas

    AlphaAtlas [H]ard|Gawd Staff Member

    Messages:
    1,713
    Joined:
    Mar 3, 2018
    IBM, and a number of other researchers and institutions, have made extensive use of a Yahoo!-curated Flickr database for their facial recognition development efforts, and according to a recent report from NBC, this is raising concerns among privacy experts and the subjects within those photos. While IBM says they'll remove photos from the database upon receiving a request, they don't provide an easy way to tell if a particular user's photos are contained within the database, hence NBC has set up a tool to do exactly that.

    While the report largely focuses on the privacy and social issues surrounding IBM's database, it also touches on another big issue in the world of AI training: licensing. Machine learning algorithms can require huge datasets to effectively train, and many of the images in datasets I've seen are scraped from the web without much thought about their associated restrictions. That's already a legal and ethical issue for researchers, but it becomes even more problematic when those neural networks start showing up in commercial software, which happens more and more every day.

    Academics often appeal to the noncommercial nature of their work to bypass questions of copyright. Flickr became an appealing resource for facial recognition researchers because many users published their images under "Creative Commons" licenses, which means that others can reuse their pictures without paying license fees... Experts note that the distinction between the research wings and commercial operations of corporations such as IBM and Facebook is a blurry one. Ultimately, IBM owns any intellectual property developed by its research unit... Holzer was concerned that a company like IBM - even its research division - had used photos he published under a noncommercial license. "Since I assume that IBM is not a charitable organization and at the end of the day wants to make money with this technology, this is clearly a commercial use," he said.
     
  2. bobdabilder

    bobdabilder Limp Gawd

    Messages:
    292
    Joined:
    Oct 7, 2009
    Mwuhahahaha. We're not gonna share your secret info. Promise.
     
  3. Zarathustra[H]

    Zarathustra[H] Official Forum Curmudgeon

    Messages:
    28,332
    Joined:
    Oct 29, 2000
    At this point it should be no surprise that anything you put online can and will be used by others in ways you didn't intend.

    Many of my pictures are still being used in Chinese apps against my will.

    That said, I think it takes a lot of gaul for a company to say "here is something someone posted for reason x, I'm going to go ahead and use it for my own gain". There ought to be a law only allowing use of online content for the purpose it was originally posted.

    End all data mining.
     
    Bcc335 likes this.
  4. sirmonkey1985

    sirmonkey1985 [H]ard|DCer of the Month - July 2010

    Messages:
    21,455
    Joined:
    Sep 13, 2008
    if the images are public then it's what ever, that's the users fault for making them public. if it's also including private photo's that's a whole different story in my opinion.
     
  5. AlphaAtlas

    AlphaAtlas [H]ard|Gawd Staff Member

    Messages:
    1,713
    Joined:
    Mar 3, 2018
    Yeah that's my impression. If you post selfies to Flickr under a CC license, and you don't like other people using them non-commercially, tough.

    However, the question is where does non-commercial licensing end and commercial use begin? If a neural network gets trained on an image of you, and gets sold as commercial software that doesn't actually contain your image, is that fair use? What if it's only used for developing the network, even if they use a different database for the commercial product? What about "research purposes?"

    I don't think our laws are setup to handle usage rights for "imprints" of images (or other data) in trained neural networks. As many AI researchers have said before, the training process is basically a black box.
     
  6. theBrownLlama

    theBrownLlama Gawd

    Messages:
    795
    Joined:
    Aug 3, 2017
    these are public photos BUT are used for a commercial purpose.

    so they should be compensated appropriately.

    but like the holders of the database would do that.......not even a free month of premium subscription
     
    BloodyIron likes this.
  7. TheOne&OnlyZeke

    TheOne&OnlyZeke 100% Irish

    Messages:
    10,289
    Joined:
    Jul 21, 2000
    Well if IBM used your image and then use the resulting product for commercial purposes...class action suit
     
    BloodyIron likes this.
  8. theBrownLlama

    theBrownLlama Gawd

    Messages:
    795
    Joined:
    Aug 3, 2017
    database holders will just trade your pics for IBM shares. US0.001 for 1 pic of a face. Extra charges marked as administrative charge ( so they do not need to increase valuation but still benefit. )

    Then charge trading fees to split and sell the share. ( with a fee structure that suits them) . Ends up your class action settlement would be US0.0001 for 1 pic.

    then go about and say out of 1000 pics of a face, only 10 is eligible.
    Then your payment won't even be worth the paper it is printed on.

    Too bad lawmakers are the only ones whom can correct this...
     
  9. nutzo

    nutzo [H]ardness Supreme

    Messages:
    7,379
    Joined:
    Feb 15, 2004
    Not surprised. If you don't want others using/seeing your pictures/data, then don't post it on the web.
     
    Fresch likes this.
  10. Spidey329

    Spidey329 [H]ardForum Junkie

    Messages:
    8,677
    Joined:
    Dec 15, 2003
    Hmm, maybe they should develop some form of algorithm that searches images for facial features to try to match another image?

    "We've likely used your Flickr photo to train our facial recognition software."

    "Remove my image!"

    "Sorry, we don't have the technology to find your image. But that'd be a cool feature."
     
  11. Spidey329

    Spidey329 [H]ardForum Junkie

    Messages:
    8,677
    Joined:
    Dec 15, 2003
    Problem is, your photos are only being used in the training portion of the algorithm. Once that's done and the model evolves, it's just weights for feature take off. The picture can be discarded or removed. That's why many researches just scrape images from Google search (as mentioned).

    This is like people posting pictures and messages to a public (physical) message board, viewable by everyone, only to have the janitor come along and take them all. Except this time, the janitor shredded them down (making them unrecognizable) and made a new art piece which he then sold for tons of money.

    How do you put a valuation on something like that? Sure, a single post-it note with your doodle went into the material of the artwork, but was it significant enough to be considered a share of it?

    Food for thought.
     
  12. ButtonPuncher

    ButtonPuncher Limp Gawd

    Messages:
    142
    Joined:
    Jun 7, 2004
    When you don'1 pay for a product, you are the product.