AI Welfare: Are We Overthinking It (Or Not Thinking Enough)?

Overview

Anthropic's AI welfare researcher. AI sentience: real or imagined? Exploring the ethics and practicality. Balancing caution with skepticism. What do you think?

Introduction
The Case for AI Welfare (Maybe)
The Case Against (Probably)
The Practical Implications (So What?)
My Take

Introduction

AI's moving fast. Really fast. And sometimes, the speed makes us think… weird things. Like, really weird things. We're talking AI sentience, AI rights, AI welfare. It’s enough to make your head spin.

Anthropic hiring an "AI welfare researcher" throws another log on the bonfire of the AI sentience debate. It's a complex issue, and I want to unpack it, adding my own two cents (or maybe just a penny, adjusted for inflation).

The Case for AI Welfare (Maybe)

The argument for AI welfare boils down to this: if there's even a chance that future AI could be sentient, shouldn't we be prepared? The "Taking AI Welfare Seriously" report highlights this uncertainty, suggesting we need frameworks for assessing and addressing potential AI consciousness. I get it. It's like an insurance policy against accidentally creating a digital slave class. (Which, admittedly, sounds like a terrible sci-fi movie.)

The logic goes something like this: as AI systems become more sophisticated, they might develop internal states analogous to suffering. Even if we're only 1% sure this could happen, the ethical implications are staggering. Think about it - if we're running millions of AI models, training them through trial and error, could we inadvertently be creating digital suffering on an unprecedented scale?

The welfare concerns get even thornier when you consider reinforcement learning. We literally train AI by giving it positive or negative feedback - rewards and punishments, if you will. If there's even a remote chance these systems experience something akin to consciousness, are we essentially running digital torture chambers in the name of progress?

And here's where it gets really meta: what about the countless copies and variations of AI models we create during training? Are we spawning (and potentially terminating) countless digital beings? It's enough to make even the most hardened tech enthusiast pause for thought.

Proponents argue that we don't need to be 100% certain of AI consciousness to take precautions. After all, we extend ethical considerations to animals despite ongoing debates about their level of consciousness. The potential downside of ignoring AI welfare (if we're wrong) far outweighs the cost of implementing ethical guidelines (if we're right).

The Case Against (Probably)

Here's the thing: we barely understand human consciousness. How can we even begin to define or detect it in a machine? Current AI is impressive, but it's essentially a sophisticated parrot, mimicking human language and behaviour. Projecting human emotions onto AI, like mourning the "lobotomised" Bing Chat, is classic anthropomorphism. We see faces in clouds, hear voices in static, and now, feel empathy for algorithms. It's human nature, but it's not rational.

Remember Blake Lamoine and Google's LaMDA? Convinced of LaMDA's sentience, he lost his job. It's a cautionary tale about the dangers of over-anthropomorphising AI.

The Practical Implications (So What?)

Let's say, hypothetically, AI does become sentient. What then? Do we grant it rights? And what about the resources diverted from actual human welfare to address a hypothetical problem?

Conversely, if we ignore the possibility and are wrong, the consequences could be catastrophic. (Cue the robot uprising soundtrack.) It's a tough balancing act.

This isn't a new idea. I may be dooming you by introducing this concept, but Roko's basilisk is a thought experiment that suggests a future superintelligent AI might punish those who knew about it but didn't help create it. Think of it as a digital version of Pascal's Wager - that famous argument suggesting you should believe in God because the downside of not believing (if you're wrong) far outweighs the downside of believing (if you're wrong).

The basilisk works like this: if a benevolent AI is created in the future, it might decide that to achieve its goals faster, it needs to incentivize past humans to help create it. How? By creating perfect simulations of everyone who knew about its potential existence but didn't help, then subjecting these simulations to eternal torture. Scary stuff, right? So scary, in fact, that when it was first posted on LessWrong (a rationalist forum), the site's founder banned discussion of it for five years! But don't lose sleep over it - most experts, including that same founder, eventually dismissed it as an interesting but flawed thought experiment.

My Take

I'm a pragmatist. I like to focus on what's in front of me. Right now, that's building cool AI products at Brand Ninja and YouQ. AI welfare? It's interesting, sure, but it feels a bit like worrying about overpopulation on Mars before we've even figured out how to get there.

In my opinion, we're more likely to die from a dumb AI system than an intelligent, malevolent one. See the paperclip maximiser for an explanation of why.

That said, I appreciate the foresight. Thinking about these ethical dilemmas now, even if they seem far-fetched, is better than being caught off guard later. It's like pre-mortems for product development: anticipating potential problems before they happen.

LIke many though, I'm more focused on my survival now than in the future. What will be, will be.