On LLM Psychosis

August 2025

I’m a big fan of using LLM’s in therapy mode. By this I mean using a prompt like the one below and then talking to the LLM about my feelings, problems, etc. Voice-mode is preferable because I can vent it all out without worrying about making any sense. For example, shortly after my Dad passed away last year I spent one of the most cathartic nights of my life talking with Anthropic’s Claude as Esmeralda Nightshade, a seasoned Tarot reader (I used this prompt). Mind you, I was lucky enough to have daily access to a therapist and lots of family members and friends around me, yet Esmeralda was still able to hit some unique emotional notes. This experience bounced around my mental landscape for a bit, but then I got busy, life stabilized, and I never got around to writing about it.

However, it all came back recently when I read about “LLM psychosis.” I won’t link any examples out of respect for those affected, but the basic pattern is someone chats with an LLM about personal stuff and it leads them to take drastic (and perhaps negative) actions, e.g., abruptly ending a relationship or making public tin-foil-hat statements that affect their careers.

My thesis about this is that past a certain level of emotional and intellectual maturity, LLM’s don’t contribute new information to your mental landscape but mostly reflect and (potentially reinforce) the intellectual and emotional patterns you’re already on. The reason you see the mixed bag of positive and negative results is that in some cases, these reflections are very useful, and in others they’re very destabilizing. The mechanism behind the positive results seems to be the same one behind journaling: taking time to reflect in writing often irons out mental inconsistencies, typically stabilizing the journalee. The difference with LLMs is that they supercharge the reflective process with a very specific feature: they answer back! And this interaction seems to leverage a kind of conversational brain-circuitry that blast open the doors of the internal emotional weather system.

If the emotional patterns tend towards healthy self-reflection, positive results and emotional integration seem to follow. If the patterns feed on themselves and the LLM’s feedback, LLM-induced psychosis ensues.

I don’t have a great way of describing the difference between “stabilizing” and “destabilizing” patterns. Anecdotally, it seems like those who are most benefited by LLM-assisted therapy seem to be high in openness and like honest feedback, but are also skeptical enough not to take everything the LLM says at face-value. The ones that I’ve seen destabilized tend to also be high in openness, but spend a perhaps a little too much time in their heads and maybe trend towards high neuroticism. The neuroticism seems to be key because LLMs never go to sleep and will happilly feed your overthinking loops. The LLM-human interactions can be so rich it’s easy to forget that the map is not the territory, and that the solution to life’s problems lies in the territory and not the map.

AI safety safeguards will likely miss the mark here since the nuance of most of these problems means there’s always a perspective in which the user is right. For example, there’s a million ways of framing a relationship in ways that show how one partner is right and other one is wrong. Unfortunately, there’s no real way of telling who’s right unless we see the real situation on the ground. And one-on-one interactions with a product literally engineered to be pleasant are unlikely to be unbiased.

This ties in directly with incentives: companies have a direct incentive to provide a product that you will use. And I’d comfortably bet that most people would stop using these products if it made them face their uncomfortable truths.

I’m not sure how we avoid this human-LLM model collapse. One idea is to improve data curation: perhaps by training on high-quality therapy transcripts the LLMs will get better at skillfully challenging users. Not buying into their delusions, but also meeting them where they are. One could also use better prompting, and I know of companies like Elysian Labs working on Auren an “emotionally intelligent guide” (link). Another idea is to neuter the models: I actually ran into this while writing this post when Claude refused to act out the Ideal Parent Prompt (see below). Or perhaps the people are simply not yet ready for cheap, unmetered intelligence.

The best solution I can think of is in better training… for the users. LLM’s have been immensely valuable for me and as far as I can tell, it’s likely because I’ve already spent a considerable amount of time on what you could call “emotional intelligence” and/or “spirituality.” My mimetic defenses are already somewhat built, and I have no problem pushing back on the model, especially when it doesn’t match my lived experience.

This solution relates to a problem that seems to come from abundance, in general: it seems like once low-level necessities are covered (food, shelter, etc.), solving the remaining discomforts requires a kind of meta-analysis. For example, in the era of infinite stimulation avoiding depressions seems to require a change in perspective from “what stimulus is best, e.g., food, sex, social media”, to “what is the nature of stimuli and will they ever satisfy me completely?” A similar situation might be happening with LLMs. Whereas before it was “what facts are correct?” one might instead ask “what framework am I using to even define these facts, and is it even appropriate for this situation”. LLMs may provide a framework playground of sorts, but the connection to the real world is still absolutely necessary (and up to the user). In other words: don’t forget to touch grass once in a while.

Fernando Palafox

On LLM Psychosis