Blog 5: Data & Society Databites Talk
Published on:
Watch the video here:
Databite No. 156: Hierarchy | Generative AI’s Labor Impacts
Who And What Is Involved?
The video I have chosen to explore in this post is a panel led by Aiha Nguyen from Data & Society where she talks about the ethics behind the training of many recent models in regards to the works used and the people affected. The panel hosted people of surprisingly different backgrounds to all weigh in their perspective on these models, featuring data scientist Milagros Miceli from the Weizenbaum-Institut, journalist Russell Brandom from TechCrunch, and screenwriter John Lopez with the WGA. I fully expected going in that it would only consist of people in the tech industry but this made the conversation far more in-depth than it would’ve been otherwise. What they’re weighing in on in this panel is their perspective on why the ways these tools have been trained are unethical, especially with the unauthorized use of copyright works, the mistreatment of data workers, and the threats that AI poses to the labor market.
Why Is This Important?
John Lopez is especially concerned with AI models’ role as a creative tool because he understands that while AI isn’t actually capable of replicating any of the creative processes unique to humans, he knows that higher-ups don’t really realize this and instead just view it as a tool to expedite that process. I definitely agree with him on this; treating an artist’s role as simply one who outputs art does a great injustice to the nuances present in people’s work and the thought that goes into every piece created. It’s also a slap in the face to people who enjoy such works by assuming they only enjoy the end result and not the shared nature of the human experience poured into art. As Lopez argues, “Why would I be bothered to read something someone couldn’t be bothered to write?… What’s the point of art if there’s no human voice behind it?”. The group also criticizes the use of copyrighted works of millions of people to train models designed to replace their work, specifically citing ChatGPT’s ability to mimic certain author’s styles like Stephen King and George R. R. Martin. In this sense, it’s clear that AI isn’t just a supplementary tool like some companies claim—it’s specific purpose is to replace human creativity.
Russell Brandom takes somewhat of a different stance on the topic though, acknowledging AI’s ability to “replace things people already didn’t care about.” He brings up Github’s implementation of Copiliot to assist in the coding process as a prime example of AI assisting rather than replacing, with the justification being that the website’s already-massive database of user code can be implemented to improve the efficiency of programmers. I admittedly am a little mixed on this perspective since I believe writing initial code is the most enjoyable part of the programming process (and that some of the critical thinking is lost when people don’t necessarily need to do it all themselves), but I do agree that in situations where people may just want to make an end product could find the tools useful for them.
One thing neither of these people brought up and something I never even considered was the treatment of data workers in this entire process. Because not all data is exactly ‘clean’ (given that it’s mostly scrubbed from the internet anyways) there are people tasked as intermediaries to ensure data is both tagged correctly and that models don’t turn out to be extremely toxic. That kind of work is monotonous and burdensome though, so most of it is outsourced to other countries with far worse wages and working conditions.
What Can Be Done?
This was actually the crux of Milagros Miceli’s argument: that data workers deserve to be held in the same regard as engineers in this field. Not only would these models in their current state never exist without them, but the recognition would bring much higher working standards for them. Lopez also urges companies to be transparent with the data they’re using to train especially when they’re profiting off of it, though most don’t want to admit to utilizing copyrighted materials. I believe companies should be forced to disclose when intellectual property was used and that regulations need to be put into place immediately to prevent it from happening any further. But as Lopez notes, it’s far too late to undo the damage that has already been done from the millions of works used to train what we currently have.
Both Russell and Miceli agree however that transparency in this sense is very vague, noting that it would mean something very different for her as a data scientist as it would to Lopez. Russell also states that there’s nothing users can do themselves to oppose these models beyond petitioning congressmen to begin taking the issue seriously before the problem grows any larger.
What Do I Think?
I’m really inclined to believe that AI is simply in a stage of early adaptation where companies are racing to adapt it without the full understanding of its actual capabilities. As it stands, these models and their functionality are largely incomprehensible to the majority of people who essentially see it as a black box. Until people really know the appropriate applications for these tools, it won’t cease to be forced into every single aspect of our lives. Regulations definitely need to be enacted to protect copyrighted works and conditions of data workers must get the respect they deserve, but unfortunately these may be at odds with the rapid development we’ve seen in recent years that companies are pushing for hard. I do believe Russell was onto something by proclaiming AI to be used specifically to replace already mundane things (one of his other examples were quick meeting summaries), though it begs the question: is the shared mundanity experienced throughout life not another aspect of the human experience that deserves to be preserved? Or must we try to min-max every single possible moment of our lives?
What Would I Ask Them?
If this technology proved to be useful in some regard to helping people who genuinely need it, would the ends justify the means with how it was trained?
This question sat in my head for a while after listening to the talk, as it seemed throughout that Russell was much more in support of this technology than the others but I could still tell even he was definitely against the training methods. I’m curious then what their moral quandaries would be in a situation like this since there really isn’t a right or wrong answer, and I think the discussion could become even more nuanced if the panelists had disagreements in any way since more perspectives could be explored.
Reflection
I got a lot more out of this assignment than I though I was going to. I really just picked a random video out of the playlist based on the title, but I ended up learning a lot more specifically about useful and non-useful applications of this technology, and also about outsourced data workers who make this technology possible in the first place. I loved the inclusion of people in different fields since I was getting different perspectives I wouldn’t initially consider in a discussion like this, but it’s actually very fitting considering the video was about the impact on labor from generative ai. All in all, I wrote a lot about this since I felt I had a lot to say I (hope it’s alright that it’s not formatted exactly like the assignment formatted Task 2), and really enjoyed getting to explore this subject a little deeper.
