Post #18: The AI-Overlord Thought Experiment

Nov 20, 2022

你好. That’s “hello” in Mandarin in case you’re not up to date on your Chinese characters, as I am also not.

This post will feel like a slight detour from the “Next Up” topics outlined by the previous post. It will tie together at the end.

The AI-Overlord Thought Experiment

Imagine there’s a super duper advanced artificial intelligence. This super duper AI is capable of recommending what we humans should do with our lives. That means it can make recommendations at an individual level, a species level, and every level of human grouping between.

This AI makes the perfect recommendation for your life or our species or your community and does so at the ideal times. From ‘take this job, go on this date, or buy that outfit’ to ‘merge these two countries, transition to this renewable energy mix, or inhabit this moon of Jupiter.’

As you receive these perfectly timed recommendations, it’s likely that they will resonate and you’ll follow the AI’s idea.

Since each recommendation is better than the decision you would have made yourself, over time you’d grow to trust the AI and “blindly follow” its recommended next step.

Getting the perfect recommendation for what to do next in your life (or as a species) would be awesome. It would arguably be perfect.

But in order to benefit from these perfect recommendations, we have to be able to trust that they are in fact good ones. And if we grow to blindly follow the recommendations, then wouldn’t we automatically follow bad recommendations as well? It appears we have a conundrum. To ensure we can benefit from our super duper AI’s help, we need some way to verify if the recommendations are good or bad.

To build a trustless relationship with our AI over the long term will take some work.

What if the AI is susceptible to malfunction, whether by human or AI hackers or simply by normal wear and tear? And these malfunctions cause the AI to provide harmful recommendations like “harm yourself” or “detonate all of the nukes.”

Or, what if the communication system that the AI uses to send the recommendations to us is hackable, and a recommendation terrorist, whether human or AI, intercepts incoming recommendations and turns them into harmful ones?

Trusting the AI’s integrity and the communication channel’s integrity is paramount for this system to remain valuable.

As far as we can tell this super duper AI in the sky doesn’t actually exist. Instead, you can imagine replacing this AI recommender with civilization’s operating system. Our economies, governments, and religions collectively recommend certain actions to us, and we face the same challenges around their recommendations as we do the AI’s. How do we ensure that our operating system recommendations remain valuable? That they keep us heading towards a relevant future?

“a flying robot with wings made of spaghetti flying through the sky, digital painting” by DALL-E

That question lies at the core of what this project seeks to contribute.

Let’s continue with our AI-Overlord to explore an answer.

In order to solve for trust, we need to verify that the AI and the communication channel it uses to send its recommendations are both working as expected.

Should we monitor the AI’s code for aberrations to ensure it’s functioning? Or should we monitor the AI’s output to judge if the recommendations remain relevant?

What does this monitoring system look like? Should we use human and/or other AIs for monitoring?

Many questions.

Let’s look at how we could monitor the communications channel that the recommendations are traveling along.

To make this tangible, let’s imagine our super duper AI is spitting out billions of recommendations every minute, and each recommendation gets shot through one of those bank vacuum tubes, except each tube is super long as it has to connect the AI up in the sky to every person on Earth. We can’t monitor the integrity of each recommendation the entire time it’s in transit, it would be too costly. We could set up checkpoints at various intervals along each tube, and compare the recommendation’s state at each checkpoint to its original state from when it left the AI. But that means we’re now storing some parallel “correct version” of the recommendation that also has to be monitored. Tough to monitor recommendation integrity when it’s in transit, just ask Claude Shannon, especially when it’s subject to bad actors.

“1,000 bank vacuum tubes coming from the sky” by DALL-E

We could knock out two birds with one stone, monitoring the AI and the communications channel, if we had a system that could effectively monitor the end recommendation that we receive. With a trusted system for measuring the value of each recommendation, we could know if the recommendations coming from the AI were valuable or not.

Is it easier to monitor the AI and the communications channel? Or is it easier to evaluate the recommendations themselves?

Ideally we would do it all, but where’s the best place to start? Assuming this recommendation system for civilization that consists of a far away AI and 8 billion tubes already exists, should we start by building a monitoring system for it or for the recommendations?

It seems like it would be a simpler to start with monitoring the end recommendations that the 8 billion of us are constantly receiving. It would by no means be simple, but simpler.

The Actual Overlord: Our Civilizational Operating System

Now let’s exit the AI-Overload Thought Experiment and look at our civilization as it currently stands. Our operating system is a composition of our institutional decision making systems- economies, governments, religions. This OS is recommending to us what to do: which jobs to work, which people to help, which gods to serve.

How can we ensure that the recommendations our OS is sending us are good?

We gotta evaluate them.

Through the lens of the AI-Overlord system, we could evaluate recommendations by maintaining a list of every single recommendation that is sent to us and evaluating the quality of each one. If it was a valuable recommendation from the AI, we would say great and mark it as “good to follow.” If it was a bad recommendation, we would mark it as “bad to follow.” We could even have a “value probability” attached to each one, like grade to help us prioritize.

Easy enough to conceptualize.

Coming back to our existing OS, we can do something similar. Imagine creating a list of every single recommended action that our OS is sending to us: every single goal that exists within our species. That includes every job being done, every company, every government initiative, every religion’s spiritual quest. This list will be millions of lines long, maybe more. The actual data structure will be interesting.

With a “list” or a database in hand, we can then start to evaluate the recommendations. We’ll want to evaluate all the existing items along with updating the list to include newly generated activities to evaluate those as well.

An important question: what is our evaluation criteria for each item? To have meaningful evaluation criteria to start with, we can create a simple list of values by aggregating the value lists of the major countries and religions. The next post will look into that.

Once that criteria is in hand, we can use it to start evaluating all the efforts humanity is currently working towards, and has conceptualized we might work towards, prioritizing our focus on those projects that best fulfill the evaluation criteria.

Can a Civilizational Evaluation System Make Money?

The reality of this moment is that any meaningful project needs to create economic value. It can be for-profit or non-profit or somewhere in-between, but for a project to succeed today there needs to be a way for it to attract economic attention. Even a project like this that seeks to change how we define value is subject to our current definitions of value, including our money.

For this civilizational evaluation system to be economically valuable, which thing will it provide that people like enough to pay for? That thing will be a filtration system that helps people sift through the billions of possible lives they could lead to find one that is uniquely meaningful to them.

To speculate on the market size of that type of service…a portion of all the non-valuable activities that humans currently spend time on + a portion of the valuable activities we are already doing but not doing enough of + a portion of all the valuable activities that we have yet to start on. This evaluation system would add value to all three of those buckets by helping us do less meaningless stuff, more of the meaningful stuff we’re already doing, and pick new meaningful things to do.

We make this list, we check it twice, and then we incentivize the pursuit of the things we like, focusing our attention on the meaningful while pulling away from the meaningless.

Where Does Web3 Fit Into This?

Let’s return to the previous post’s proposal of a web3 prediction market. That proposal is a starting point to “civilization’s list.” The list can begin by compiling all of the web3 projects, giving us visibility into the entire space and a system for prioritizing the work that is already being done.

What the last post got wrong is it proposes starting this “web3 Ideas Marketplace” with newly submitted ideas. We should start with all the ideas that are already in motion, which are more valuable in the short-term to get a handle on. And then the list can grow with new submissions once the list of current projects is organized and evaluation is underway.

At a high-level, we need to evaluate the existing set of activities civilization is currently doing along with newly proposed ones. To actually compile let alone evaluate the entire set of civilization’s activities is a long project, but feasible and worth pursuing. As covered in the last post, starting with a niche like web3 will make this massive problem approachable.

As it happens, a decentralized web3-based system is well suited for a data compilation task of this magnitude, along with the distributed governance mechanism that will be needed to evaluate the data. In a sense, this proposal is turning into a decentralized data aggregation + prioritization system via a modified prediction market.

Our civilization is already doing lots of good stuff. It’s just that we started doing so many things so quickly in the last few centuries as our population has 13x’d since 1700, that we’re in this place where we can’t tell if the stuff we’re doing is actually serving us. A “civilizational evaluation system” would enable us to know that we’re spending our energy in meaningful ways.

Next Up

Next I’m going to compile civilization’s existing value systems, as found in our country charters and religious scripture, to come up with a starting list of values that we can use as evaluation criteria to prioritize civilization’s list. It won’t be perfect, but it should be a good starting point.

The post after that might be the right time to propose a simple way to start the project. Something along the lines of: compile and organize list of all web3 projects currently underway > create a tool for evaluating each project (i.e. prediction market) > create a method for receiving new project submissions > add a second set of civilizational activities in addition to web3 > rinse and repeat until we have a list of all civilization’s activities along with a system for evaluation.

What’s this substack all about?

We need to upgrade our civilization’s operating system [COS]. This newsletter is a research project that explores how our current operating system came to be, which improvements would be helpful, and how we can make an upgrade happen. This substack’s goal is to land on a project that can be built to upgrade our civilization’s operating system.

I chose to write this publicly to get feedback on these ideas. Don’t hesitate to get in touch. You can join the Discord server called RelevanceDAO to share thoughts. Upgrading our COS and building our future is a team effort.

Mazz

Love: The idea of personifying decision-making and its infinite points of failure.

I often feel the societal agenda is so native to our existence, it’s not often deconstructed what is actually happening. A "thing" makes decisions on “good” and “bad." In reality, the decision-making-thing also has self-interest, is corruptible, and can operate on outdated societal principals.

Also agree web3 ideas evaluated on merit of all ideas - those in existence and those not. It may turn out the existentially relevant path is to contribute to the companies already in existence than starting new!

Looking Forward: Very interested to see the value lists of major countries & religions. One question emerges - if there are differences between them (freedom v. lack of, individualism v. collectivism, etc.) how do we determine which values are existentially relevant? Are some more existentially relevant in different time horizons? For our agenda to be existentially relevant, our evaluation criteria must be. Frequency of occurrence (more countries and religions like freedom than not) does not indicate more existentially relevant.

Other Note: Meaningful projects require sustainability (e.g. in today's society, they require money to exist). The production of economic value does not equal intrinsic value (e.g. activism like Gandhi or MLK).

Expand full comment

jay sheridan weiss

I really like the super duper AI model. Not to say it would work but for its thought-provoking quality. It’s fun to play with and to consider. For example what if the AI moves in the direction of Skynet which I think was the AI that was causing so much trouble in the Terminator series. Or similarly in I Robot where Will Smith has to do battle with “Viki (virtual interactive kinetic intelligence) - I had to look this up. I thought the computer was named Vicky🥺? ) to save mankind from robot overlords. Or Matthew Broderick and Ally Sheedy are up against it in War Games. The list goes on and on. The Matrix, Space Odyssey. OK. I’ll stop. The point is that many highly imaginative authors including those qualified in science such as Arthur C Clarke and Isaac Asimov have delved into this subject and concluded that the danger of the AI finding man to be inferior such that the best course was to do away with humanity.

In conclusion it may be that the biggest issue here is containing the AI assuming we have a bias in favor of preserving humanity.

Also see Childhoods End one of Clarke’s earliest novels for discussion of evolving mankind in conjunction with galactic over mind or something like that.

Existential Relevance

Discussion about this post