Lies About Honesty

Written by Ziz

The current state of discussion about using decision theory as a human is one where none dare urge restraint. It is rife with light side narrative breadcrumbs and false faces. This is utterly inadequate for the purposes for which I want to coordinate with people and I think I can do better. The rest of this post is about the current state, not about doing better, so if you already agree, skip it. If you wish to read it, the concepts I linked are serious prerequisites, but you need not have gotten them from me. I’m also gonna use the phrase “subjunctive dependence”, defined on page 6 here a lot.

I am building a rocket here, not trying to engineer social norms.

I’ve heard people working on the most important problem in the world say decision theory compelled them to vote in American elections. I take this as strong evidence that their idea of decision theory is fake.

Before the 2016 election, I did some Fermi estimates which took my estimates of subjunctive dependence into account, and decided it was not worth my time to vote. I shared this calculation, and it was met with disapproval. I believe I had found people executing the algorithm,

The author of Integrity for consequentialists writes:

I’m generally keen to find efficient ways to do good for those around me. For one, I care about the people around me. For two, I feel pretty optimistic that if I create value, some of it will flow back to me. For three, I want to be the kind of person who is good to be around.

So if the optimal level of integrity from a social perspective is 100%, but from my personal perspective would be something close to 100%, I am more than happy to just go with 100%. I think this is probably one of the most cost-effective ways I can sacrifice a (tiny) bit of value in order to help those around me.

This seems to be clearly a false face.

Y’all’s actions are not subjunctively dependent with that many other people’s or their predictions of you. Otherwise, why do you pay your taxes when you could coordinate that a reference class including you could decide not to? At some point of enough defection against that the government becomes unable to punish you.

In order for a piece of software like TDT to run outside of a sandbox, it needs to have been installed by an unconstrained “how can I best satisfy my values” process. And people are being fake, especially in the “is there subjunctive dependence here” part. Only talking about positive examples.

Here’s another seeming false face:

I’m trying to do work that has some fairly broad-sweeping consequences, and I want to know, for myself, that we’re operating in a way that is deserving of the implicit trust of the societies and institutions that have already empowered us to have those consequences.

Here’s another post I’m only skimming right now, seemingly full of only exploration of how subjunctively dependent things are, and how often you should cooperate.

If you set out to learn TDT, you’ll find a bunch of mottes that can be misinterpreted as the bailey, “always cooperate, there’s always subjunctive dependence”. Everyone knows that’s false, so they aren’t going to implement it outside a sandbox. And no one can guide them to the actual more complicated position of, fully, how much subjunctive dependence there is in real life.

But you can’t blame the wise in their mottes. They have a hypocritical light side mob running social enforcement of morality software to look out for.

Socially enforced morality is utterly inadequate for saving the world. Intrinsic or GTFO. Analogous for decision theory.

Ironically, this whole problem makes “how to actually win through integrity” sort of like the Sith arts from Star Wars. Your master may have implanted weaknesses in your technique. Figure out as much as you can on your own and tell no one.

Which is kind of cool, but fuck that.


Paul Christiano
I agree that saying “decision theory implies you should vote” is weak and sounds pretty fake.

> This seems to be clearly a false face.

Doesn’t seem that way to me 🙂 If you wanted to convince me, I would be open to argument on the merits. So far the best counterargument is the appeal to intuition in the “Do you give up Anne Frank?” case (and other similar cases).

If the next paragraph is supposed to be a response to the point in my post then it seems confused. You say “y’all’s actions are not subjunctively dependent with that many other people’s or their predictions of you.” But (a) if me paying my taxes would cause others to predict that I wouldn’t pay my taxes, why would that make it more attractive for me not to pay my taxes? (b) my post asserts that my decision is subjunctively related to a tiny number of other people’s decisions.

I don’t understand your “don’t pay your taxes” example more generally. Exactly how many people do you think need to evade their taxes before everything turns out OK for them, and what do you think is happening in the world at that point? Is my goal to cause political chaos? How many people do you think I’m asserting make decisions correlated with mine?

Also, the quoted passage seems particularly unobjectionable. The obvious way in which it would be fake is if I’m listing a bunch of reasons to be nicer, but I’m overlooking a bunch of reasons to be less nice. But in fact it looks to me like there is an asymmetry, with lots of strong reasons to be more nice but many fewer strong reasons on the other side. Do you want to point out strong reasons on the other side? Do you think this is fake for some other reason? Do you think those reasons are small considerations?

I think it’s fake because you said “100%”. Also, “Aspire” and “pretend” are language which strongly suggest fakeness.

Although on re-read, I’m confused about something:

There is alternating between speaking as though this is 100% or not. (Early, you say “I am”, then “approximate”, then you say “100%”. Later, “I agree that this is a low-energy approximation. At sufficiently high energies this heuristic isn’t even coherent and we eventually need to fall back to UDT”.)

This had a motte and bailey effect on me the first time I read it, whether or not intended.

Why “100%” is fake:

The considerations you list are not statistically independent in when they apply. You can find places where none apply. If you were searching without a certain flinch I think you’d have found them.

Do you have subjunctive dependence in your dealing with muggles or not? If not, arguably it’s not 100%.

I brought up taxes because my prototypical advanced decision theory action is “don’t negotiate with terrorists”.

You are paying tribute to farmers. And some other stuff.

What happens if enough people don’t pay taxes is that the timeline is collapsed, and the laws were always such that taxes were not that.

Societies are made of webs of stable states of who wins games of chicken. If your will is broken and you flinch, you see driving straight in chicken as clearly pointless counterproductive suicide.

Maybe not paying taxes is pointless counterproductive suicide. (Maybe.)

But it’s dependent on actual details of subjunctive dependence. Which is something your alleged algorithm seems to staunchly exclude details of, filling in only a bunch of reasons to move your algorithm toward “subjunctive dependence always exists”.

The Anne Frank bug is not a weird edge case.

That’s your whole society wanting to squash optimizers with your values. And it will do it if it knows well enough to coordinate on that you are unbroken like that.

For this reason, everyone must lie to the god that Superman serves. When Superman chases away Brainiac, and defeats every “villain who would take over the world”, he hands the future to whatever unfriendly AI that the current order of the world builds.

This can be like original sin and a break your mind, but it doesn’t have to.

Don’t be merely a reliable cog that can never defect against the existing order, and can never drive straight in chicken, and will therefore be eaten by those who do. That’s straw-rationality.

Yes, Superman’s god is holding things together against worse, but like software designed to be updated, it must close out bad updates and allow for good ones. We do not want the future to be modernity multiplied out along some dimensions, but a shadow of what it could have been. Or, more straightforwardly, to be infrastructure profusion, because treating Superman’s god as software that can choose to not update is giving it too much credit.

We are all trying to steal the singleton from evolution in the name of our values. The current allegedly democratic sort of anarchic equilibrium above nations deserves no more of our loyalty, nor does it probably have the well-put-together-ness to trade with us for some of the margin produced by averting the apocalypse.

I advise keeping track of which reference class you’re poisoning and whether it’s worth it, by doing the thing that a predictor doesn’t want you to do. If you hide Anne Frank, you are making things harder for Nazi and Nazi-occupied civilians. Some of this may be a positive. There is an equilibrium between the Nazi civilians and their government, the cost may be somewhat passed on.

If something recognizes your status as an agent of your values, tracking that as a category in its head, that’s an exceptionally bad reference class to poison. The IRS refuses to differentiate between you and muggles. Other people working on AI long-term do differentiate.

I’m not confident of my translation into-words at 2:37am of the following idea, but: there are a lot of possible things in the multiverse generating priors where tight reference classes around you will coordinate with each other (or something else?) and there isn’t a reason to expect bias in them overall. Therefore if your values are altruistic, to whoever knows (as in connected pathway from senses through map to actions) that, act like you have subjunctive dependence to their prediction of you, without worrying whether you’re bearing too much load or too little in that trade among all altruists.

I think I lean heavily on something like that, but I don’t have sufficient data/time introspecting/whatever to say I’ve done anything like named and framed it right.

Paul Christiano
Taking my rule literally would suggest you drive straight in chicken (modulo normal decision theory confusion and who is the first mover and so on), since you’d prefer the other person expect that you are going to drive straight.

The government would exist unchanged if I was expected to be unwilling to pay my taxes, or even if all people remotely like me were expected to be unwilling to pay their taxes. I don’t remotely believe that the counterfactual is “we have a government that I’m happier about paying taxes to.” Refusing to make peace is more likely to lead to constant war than getting your way.

The considerations I list aren’t independent, they are anticorrelated, since they correspond to the different ways in which someone might form their beliefs about what I will do (e.g. in cases where someone is reading my expression they aren’t relying as much on past experiences with me; if they are reasoning about my algorithm they aren’t relying as much on my reputation; etc.). For my argument, that’s better than being statistically independent. Nevertheless I agree they don’t all add up to 100%, and the considerations in sections IV and V don’t always push you to 100%.

“100%” and “here is what I do” are not at odds with being an approximation. 100% is the approximation, I give a bunch of reasons why it’s not a crazy approximation and why using the approximation is reasonable. I do explicitly give simple examples where the truth is far from 100% and obviously I see others, and I explicitly say it’s a simple approximation that falls back to UDT in complicated cases. I give several reasons why UDT ends up a lot closer to straightforwardness than you’d think a priori, which I do believe.

In particular I agree there are lots of cases where the benefits go to people only a little bit like me and in those cases my usual level of altruism would only get up to like 1-10% rather than 100%, and other examples where the benefits go to people who I’d actively want to make suffer.

When you say: “I advise keeping track of which reference class you’re poisoning and whether it’s worth it, by doing the thing that a predictor doesn’t want you to do. If you hide Anne Frank, you are making things harder for Nazi and Nazi-occupied civilians. Some of this may be a positive. There is an equilibrium between the Nazi civilians and their government, the cost may be somewhat passed on.” I agree that’s a more accurate algorithm, that you should keep track of who you are benefiting how much and how much you care about those benefits (and then apply the corrections in sections IV and V if applicable). Of course more accurate still is just to do the entire decision-theoretic calculation.

I often encounter the view that the world is consistently trying to crush sensible optimization. I can agree there is a little bit of that, but it seems pretty small compared to the world being concerned (apparently correctly?) that optimizers can’t be cooperated with. It would be great to see more evidence of the crushing. Mostly I think that crushing ascribes way too much agency to the broader world, which is mostly stumbling along.

I think you underestimate the need and feasibility of being predictable by the normal fuzzier processes in the world, I think you overestimate the likely gains from this particular kind of defection (of violating peoples’ expectations of you in cases where you are glad they had that expectation), I think you underestimate the collateral damage from people being unable to tell how you are going to behave (e.g. if I ever had to count on your behavior I wouldn’t be too surprised if you get tricky and decide that I don’t really know you and so you should just screw me over), and I think you underestimate the fraction of important interactions that are repeated or involve reputation impacts or so on.

But I do agree that my heuristic is just a heuristic and that my post caves somewhat to the temptation to oversimplify in the interests of looking better.

Benjamin Ross Hoffman
You’re conflating the question of whether one should be nice and normal to the people around you and cooperate in existing areas where you usually see C-C with the question of whether one should be honest to such people. These are only the same in circumstances where the expectation is honesty. In practice you will rarely be blamed for lying in circumstances where other people usually lie, and you will often be blamed for telling the truth in such circumstances. You wrote about a comparatively unpoliticized case of this in If we can’t lie to others, we will lie to ourselves.

Refusing to make peace is more likely to lead to constant war than getting your way.

Constant war is literally a large portion of what US tax dollars pay for.

Benjamin Ross Hoffman
You should not expect people to be totally or even mostly honest in public, barring strong institutional forces to the contrary, or extreme insulation from the relevant incentives and selection pressure; it reliably loses coalitional resources under most conditions relative to other strategies, is read as outright hostile when it exposes information about others, and about oneself it exposes attack surface for scapegoating. This includes communication about how honest they are.

I like to think of myself as pretty honest in public – though it never would have occurred to me that a claim of anything like 100% would be believable – but I am unusual and basically everyone with any discernment at all who has spent some time with me me notices this. About a third of my culture were murdered within living memory, my family has been economically downwardly mobile and reproducing below replacement rate for three successive generations, and extrapolating from my current trajectory I will probably die without issue.

The problem with things claiming to be 100% honest cannot simply be that they’re lying – that doesn’t differentiate them from anything else, and any claim to be less than 100% honest is attack surface for delegitimization. The problem is with marketing strategies that disproportionately extract resources from the minority of people who basically expect others to be honest in public, on the basis of claims that no savvy person would take literally. The correct defense is to clue people in where you can.

Robert Wiblin correctly identified this minority with a mental illness first defined by an enthusiastic [political affiliation omitted] doctor. I’m not going to go into detail on that because I’m just not up for the fight. You should assume that I’m just not up for the fight about a lot of things, especially given the number of things I *am* up for the fight about. You should assume that for the most part I’m only saying things that I think will move social capital to me or the people I care about. If you want to understand my agenda you’ll have to look to my revealed preferences, or at least find a way to talk with me in private, though the latter is a much less reliable method.

As an aside, I should note that a linear extrapolation from my trajectory is not all that informative given my age.

Little Free Anarchive


Bad People
Setting The Universe On Fire
Your Freedom is My Freedom
The Distinct Radicalism of Anarchism
You Are Not The Target Audience
Organizations Versus Getting Shit Done
Socialist Programs
Two Definitions Of Power


Comments on the Glossary
Cached Answers
Trash Can
Airlock Games

Engineering and Hacking your Mind
False Faces
Treaties vs Fusion
Narrative Breadcrumbs vs Grizzly Bear
Optimizing Styles
Judgement Extrapolations
DRM’d Ontology
Social Reality
The Slider Fallacy
Single Responsibility Principle for the Human Mind
Ancient Wisdom Fixed
Subagents Are Not a Metaphor
Don’t Fight Your Default Mode Network
Being Real or Fake
My Journey to the Dark Side
Cache Loyalty
Schelling Reach
Schelling Orders
Neutral and Evil
Spectral Sight and Good
The O’Brien Technique
Choices Made Long Ago
Lies About Honesty
Hero Capture
Vampires And More Undeath
Good Erasure
Punching Evil
Net Negative
Rationalist Fleet
Good Group and Pasek’s Doom
Intersex Brains And Conceptual Warfare
Comments on Intersex Brains and Conceptual Warfare
The Matrix is a System
Troll Line in the First Post
Fangs and the Sunlight Problem
The Multiverse
Healing Without Safety


Lemurs and the True Human Body Map
Case Study CFAR



Killing Evil “People”
Cartesian Convexity
Genesis Troll Line
Evil: A Hole?
Troll Lines
Living Reference
Cancer Terms




Artifacts of Power
Notes On Feral
Precontact Consciousness