• user28282912@piefed.social
    link
    fedilink
    English
    arrow-up
    46
    arrow-down
    2
    ·
    4 days ago

    The content produced by humans was scraped en-masse for the explicit purpose of training models which were then monetized into business products.

    I struggle to reconcile that with Fair Use.

    I can see if the source was EULA’d to remove all rights to what you post to things like Reddit, Stack Overflow, and if somehow those entities were contacted ahead of time and negotiated usage. You, I and the web server logs know that this was almost never the case.

    • Postimo@lemmy.zip
      link
      fedilink
      English
      arrow-up
      12
      ·
      4 days ago

      I think the core of the fair use argument is that the AI models that are being trained are transformative products of the original works.

      Might be a hot take here but I basically agree. I still believe it was theft and that the realities of the legal framework we had don’t really stand up to the evolving problems, but under the current laws there is really no justification for saying that, taking the input of a bunch of images and giving the output of a set of statistical correlations of pixels based on descriptions, isn’t transformation.

      • fafferlicious@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        23 hours ago

        Legitimate question.

        How is the act of an AI company downloading a copyrighted work and adding it into their “dataset” to generate a summary different from an individual downloading a copyrighted work and adding it to their “dataset” and writing a summary?

        Both instances require consuming the material in some way. Both instances generate something new, transformative of the original work.

        Why do AI companies get to torrent the entirety of human knowledge, but if any single person does it… Well we know what happened with Napster. Limewire. Kazaa. Megaupload.

        Because to me, that hints at a flaw in your logic. AI companies are violating copyright. They had no permission to consume the copyrighted works.

        • Postimo@lemmy.zip
          link
          fedilink
          English
          arrow-up
          2
          ·
          12 hours ago

          I am not a lawyer, but my understand is the truth of legal systems is that they are a gray area of interpretation and precedent. In a strict definition of copyright, copying a file from a server into the cache in your ram is a copy that theoretically could be ruled an unauthorized copy. This being obviously a ridiculous idea, but I believe if you streamed content from a website this is the only way they could fine you for copyright violation. Generally with the networks you mentioned, the distribution is prohibited, and most being peer to peer systems they cite you for that.

          The more charitable take on why they are not the same is; anything on the open web is assumed to give the right to copy to a closed ‘local’ system and use for your own system, as that is fundamental to web browsing. Further in the same way you can make a local copy of a movie to cut it up and use in a review, and that is fair use as a transformative work, you can make copy of the open web content and make a transformative work for it.

          At least that’s how I would argue it, if I was paid to be OpenAI’s legal puppet. All of that is to tiptoe around the reality that the legal system is a tool of the rich. So the law becomes firm for the purpose of piracy of a movie, but long and difficult for the purpose of a large companies profits.

      • red_bull_of_juarez@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        3 days ago

        I agree. I think that all AI companies need to crash and burn. But it’d be disingenuous to claim that what these models are doing is completely different than what humans are doing. Humans don’t pull stuff out of thin air. We are products of our upbringing and schooling. I say that, because I hate our current copyright laws with a burning passion and have done so long before LLMs showed up. It’s possible to hate copyright and AI companies.

        • neclimdul@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 days ago

          2 points to consider.

          1. Humans can’t digest the entirety of the internet
          2. Humans transform through lived experience. At worst current ai is just the statistical correlation of existing information at best you could say a model is trained by a humans experience. Neither are the same.

          I don’t think llms are without value, but treating them like they think or create new things is the problem imho.