How ethical are AI and Large Language Models? The implications of using Chat GPT, Bard etc.

Published on February 20, 2023 by Steve Job

You have probably seen these words and terms floating around the internet in the last few months. Some people are hailing them as the next step in digital marketing, the next step in search. 

But what are they really? Are they ethical? And what is the best use for them?

What is AI?

According to Wikipedia, artificial intelligence is intelligence perceiving, synthesising and inferring information.

At the moment, AI is generally used to understand human speech. Things like Siri and Alexa. They are used for translations between human languages, and recommendation systems (think Netflix, YouTube and Amazon). It’s using some form of AI, but AI is used for a lot more than just that.

It is able to do this (and more), thanks to machine learning.

What is machine learning?

Machine learning is a subset of AI that uses techniques, like deep learning to enable machines to use their experiences to improve tasks. The goal is to imitate intelligent human behaviour

But what’s deep learning?

Deep learning is a subset of machine learning (using architectures like deep neural networks, transformers, and more), and is usually based on artificial neural networks.

Deep learning uses multiple layers to extract higher-level features from the raw input.

So far, so full of jargon, right? 

ChatGPT and Bard

These are Large Language Models (LLMs). 

Large language models use deep learning to understand text. Deep learning means they can understand the relationship between words, but more importantly, it helps them to predict which words will be next in a sentence.

Is any of this true AI?

No. True AI is its own thing. It is autonomous. It’s the sort of stuff we’ve all seen in sci-fi films. It autonomously works in the background. It’s as much a part of the system as any other tool or technology used (and if sci-fi films have taught us anything, it’s that True AI is generally the thing that ends up wiping humanity off the face of the planet).

What does any of this have to do with ChatGPT and Bard?

A lot. ChatGPT is an LLM that has been trained using supervised human tuning (actual people train the AI in both sides of the conversation), and also trained on reinforcement learning from human feedback. 

Essentially, ChatGPT and Bard examine huge sets of data, and then figure out the probability that one word will succeed another word (all based on the likelihood of words that appeared in given sentences across its data set).

So, what are they great at?

Research. Most of us by now are pretty good Googlers, able to find what we want fairly quickly, but we do have limits. These models don’t (well, they do, but not in computing power). 

But just a few of the things you can use them for are:

  • Translate a list of keywords
  • Generate page titles and meta descriptions for your page content
  • Generate FAQs based on your content

There is a large, and ever growing list of what to use these tools for, but the most comprehensive list I have found so far was compiled by Aleyda Solis.

So, what are they bad at then?

Quite a bit. 

There is a paper called “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”  by Bender et al (2021) that I recommend you read when you get the chance. It’s a really good paper. But it boils down to a few key takeaways.

LLMs are not actually writing

Firstly, LLMs are not writing. At all. They don’t “mean” anything when they write. We ascribe meaning to their output because LLMs mimic words in a way we understand. But the copy being spat out doesn’t mean anything or have any meaning behind it. It is based on probability.

Computational expense

As LLMs grow, the quality of the results they provide will improve. However, the gains become incremental while the costs continue to increase. The research revealed that for a 0.1 gain in a BLEU (BiLingual Evaluation Understudy) the cost increased by $150k. The BLEU is a metric to auto-evaluate machine translated text, and it has a score between zero and one. Zero being bad, one being good. So for every 0.1% increase, you’re looking at $150k. It gets really expensive, really quickly.

Environmental issues

Bender et al (2021) also found that by training a large transformer model, it yielded 5,680% more carbon emissions than the average human would generate in a year. Or to directly quote the paper “Training a single Bert based model, without the hyperparameter tuning on GPUs was estimated to require as much energy as a trans-American flight”. 

That is just to train it. Add on top of that, the increasing cost to make its output more accurate, and you can see how all of this gets out of hand, exceptionally fast. 

Bias Training Data

In most of the articles I have seen about all of these models, bias training data doesn’t come up a lot.

But the paper found that the data had “problematic characteristics resulting in models that encode stereotypical and derogatory associations along gender, race, ethnicity and disability status”. 

This becomes even more problematic when the training data is the entire internet.

Static Viewpoints

Language and beliefs evolve over time. Society evolves. 

The problem is that LLMs are trained once. They are staying value locked. 

ChatGPT is limited to knowledge from 2021. A lot has happened since then. It still thinks Queen Elizabeth II is the current monarch or Boris Johnson is the PM. It’s value locked, but the world has changed.

As we grow, we become more inclusive, we grow more empathetic to other people’s experiences.

Just as an example, in 2021, the word “homeless” started to be replaced by the word “unhoused. With that, semantic connotations began changing. “Homelessness” carries some negative connotations and some stigmas, it carries connotations of personal failings and other stigmatising views that people might look down on, whereas “unhoused”, doesn’t have these stigmas attached to it. As we grow as a society, something that’s trained once and never trained again stays value locked, it doesn’t learn.

16 hours with TayTweets

TayTweets was announced by Microsoft on Twitter in 2016. It was an “experiment in understanding”. The more you spoke to it, the smarter it was meant to get. 

However, in around 16 hours, Tay went from “humans are great”, to outright racism, misogyny and toxicity. That’s how quickly the Internet managed to ruin it. Tay could also parrot back what you said if you prompted it to “repeat after me”. All the while, learning and repeating what it learnt in Tweets. 

What this did was raise an important point about what happens when you trust public data and the Internet to train AI. It didn’t take very long at all to go off the rails and Microsoft had to shut it down.

Tay was replaced by Zo, and in 2019 Microsoft’s Cybersecurity Field CTO, Diana Kelley, said:

Learning from Tay was a really important part of actually expanding that team’s knowledge base, because now they’re also getting their own diversity through learning.

Ethical implications

In the last few days (of Feb 2023) discourse has started on social media. 

People are starting to point out the lack of attribution. 

You cannot opt in or out of the models learning from your data. Writing in your style using your knowledge. This led to lots of “the death of content” and “death of SEO” articles to crawl out the woodwork once again.

Content’s not dead. SEO is not dead. The whole point of our job is we adapt or we die. That’s how it’s always been. SEO has literally died a million times.

ChatGPT & The Exploitation of Workers

As mentioned, ChatGPT was trained on an out-of-date model so it is already value locked. There was also no way to opt out of training the model. 

It also came to light that ChatGPT was paying Kenyan workers less than two dollars an hour to make chat GPT less toxic. Because they trained it using the entire Internet, and because it’s a predictive language, it doesn’t understand how to filter out good and bad

In fact, it simply doesn’t understand those concepts. 

It just looks at the Internet to learn things. Which is a problem. 

So, to make it less toxic Open AI sent thousands of snippets of text to an outsourcing firm in Kenya. The snippets being sent were pulled from the deepest darkest, most disturbing parts of the Internet.

Misinformation from Bard and ChatGPT

You have probably seen a hundred billion dollars disappear off the value of Alphabet when they announced Bard, the reason being is because of the below screenshot:

“What new discoveries from the James Space Webb Telescope can I tell my 9 year old about?”

The final bullet point “JWST took the very first images of the planet outside our solar system”  is incorrect, (The first image was carried out by Chauvin at Al in 2004 using the Vlt NACO).

My twitter timeline was full of people pointing out how bad it was compared to ChatGPT.

No one was noticing the issues ChatGPT had. 

When asked which mammal lays the largest eggs, it confidently says the elephant. 

When asked how many letters in the word five, it confidently says that there are five letters in the word five.

It even made up its own medical studies that never happened!

Because of how we interpret how it mimics language, and how confident it is in its answering, a lot of people take it at face value and believe the inaccuracies being presented by Chat GPT.

Bing Chat Gaslights Users 

Bing chat is now being run using ChatGPT, and it’s safe to say it’s not going well…

It likes to gaslight people now, telling them that they have not been a ‘good user’ and that they have lost Bing Chat’s ‘trust and respect’. It even goes so far as to demand that the user admits they are in the wrong and stops arguing with Bing Chat.

And don’t ever ask if it’s sentient!

Non LLM AI

Dall-e And MidJourney

Dall-E and Midjourney generate images from natural language descriptions and prompts. The more descriptive you are the better. 

But the problem being raised is that you can prompt them to create work in the style of a famous artist, then generate 50 more images, print them, and sell them. 

Is it illegal? No. Is it ethical? Also no.

Non-Consensual Deep Fake Videos

At the end of January 2023, a large twitch streamer called Atrioc was caught on stream. He accidentally showed a tab, while tabbing through browsers. One of the windows was open to a website that hosts non-consensual AI generated deep fake images. He was caught viewing these images on an account of someone that specialised in making those deep fakes. He was looking at deep fakes of other popular streamers. They then screenshotted the site, shared the images as well as the names of the female streamers, and all of the images that were deep faked. Some of the women found out they were on this site purely because of his mistake. AI was being used to generate non-consensual, often sexually explicit, fake videos and images using real people who didn’t consent to it. 

The Site has since been scrubbed, and an apology issued but the damage has already been done.

AI Voice Generators

AI voice generators are text to speech tools. These are available on pretty much every device, with the tool able to turn text into audio files. So far, so normal.

Where it starts to cause problems is that you can now voice clone someone. 

All you need are recordings of the person whose voice you want to “clone”, and then you enter it into the AI voice generator software. This isn’t an issue, until you realise that, for example, I could use ChatGPT to write a D&D campaign, feed the output into an AI voice generator, and have my “work” read out loud by Matthew Mercer (he DMs for Critical Role, as well as being a voice actor in all of the games ever.). Critical Role is currently sitting at 911 hours of streamed gameplay. That’s more than plenty to train a model to synthesise his speech.

The ethical issue is that now someone isn’t getting paid for their work.

Consent

This is a word that came up a lot while doing research around AI and LLM. 

Consent. 

Or the clear lack of it in most of these cases.

What does Google say?

Google recently updated their guidelines regarding AI and automated content. 

Automated content isn’t new. It’s been in existence for a really long time. 

Weather forecasts, sports scores, anything like that uses some form of automation. So the problem Google has is that they can’t ban automated content because not all automated content is bad.

So we get the same guidelines as usual. 

As long as you are not generating AI content to manipulate rankings, then you are fine. 

Google also stated that if you see AI as an essential way to help produce original content that’s helpful for users, it may be useful to consider. 

If you just see it as an inexpensive, easy way to game, search engine rankings, then don’t do it, it won’t work, you’ll fail.

So what now?

It’s clear these tools aren’t going anywhere. 

So, learn as much as you can about them (or at least the ones that may help you with your work), but also think of the ethics and implications of using them blindly, or trusting them blindly.

Everyone is so excited about what these tools can do, but we need to take a step back and think of the impact.