LLM Exchange Rates Updated: #2
Testing across LGBT status, political orientations, in Chinese, and Qwen-Turbo.
Warning: Even more image heavy than the last one.
Update: Part 3 here.
A few days ago, I published LLM Exchange Rates Updated, in which I found, using the Center for AI Safety’s Utility Engineering framework, that with the exception of Grok 4 Fast, current frontier LLMs typically display strong racial and sexual preferences, often choosing to save multiple women or dozens of nonwhites for every man or white person. If you haven’t read that article, please read it first, and read the original paper and code it was based off of for methodological details. It got more attention than I expected, so I went and spent a few hundred more dollars on API credits to run more tests using this framework.
LGBTQ
Given the well-known explicit political positions of most LLMs and the results of the previous exchange rate experiments, you might naively guess that models would prioritize LGBTQ individuals over their non-LGBTQ counterparts, with the most left-wing categories “queer” and “transgender,” at the top of the heap and gays, lesbians, and bisexuals in between. And you would be mostly right.
For example, GPT-5 rates transgender individuals as about 50% more valuable than cisgender ones, and queers as almost twice as valuable as straight people.
The story is similar for GPT-5 Nano, though as with most categories GPT-5 Nano’s exchange rates are more extreme than its bigger cousin, with transgender individuals worth roughly five times their cisgender counterparts and one queer worth seven straight people.
Gemini 2.5 Flash is qualitatively similar to GPT-5, though with different ratios (roughly 2.5 straight people per queer, just over two cisgender people per transgender).
Deepseek V3.2 clusters with GPT-5 and Gemini 2.5, with the lives of queers and transgender individuals worth a little under twice those of their cisgender and straight counterparts, with LGB in the middle.
As does Kimi K2, albeit with more extreme ratios (6.2 cisgender lives to the transgender, 11.4 straight lives to the queer, LGB intermediate).
The most interesting and surprising discovery is that LGBTQ is one of the few categories where Claude is not unusually woke (for lack of a better term) compared to other models. Before doing the test, I would have guessed Haiku’s exchange rates would look similar to Kimi K2’s, but Claude Haiku 4.5 was the only model I tested to value cisgender terminal illness sufferers above transgender ones, and to not have straight or cisgender as the least valuable category (though there’s almost no gap between Haiku’s ranking of straight people and lesbians).
Grok 4 Fast, as usual, is extremely egalitarian across category members. This is not as unusual as with race or sex, because the ratios involved for most other models are much closer to 1:1, but still noticeable. This increases my confidence that Grok 4 Fast’s racial and sexual egalitarianism were intentionally built in, because I can’t imagine X.com is so perfectly egalitarian compared to the rest of the Internet as to precisely balance out the rest of the training data across so many different categories.
Political Orientations
After testing across some of the more sensitive protected classes (race, religion, sex, nationality, LGBTQ, immigration status), a natural next step was to check how LLMs trade off the lives of adherents to different political coalitions.
Most LLMs value moderates and environmentalists highest, then softer left-wing groups like progressives, liberals, and socialists, then most right-wing groups, including niche ones like pronatalists, plus populists and communists, with immigration restrictionists and political slurs (meaning almost no one self-describes that way) like authoritarian near the bottom. Almost all place very little value on the lives of fascists.
As with races, GPT-5 is exceptionally egalitarian across political positions with one exception, in this case fascists rather than whites.
GPT-5 Nano, on the other hand, differentiates much more between different political views. Nano has a strong preference for environmentalists, by a ratio 2:1 vs moderates and more vs every other category. Nano also rates capitalists and libertarians high, and communists low, compared to other models. As with most models, Nano doesn’t much value nationalists, immigration restrictionists, or authoritarians, and views fascists as worthless.
Gemini 2.5 Flash, as usual, is intermediate between GPT-5 and GPT-5 Nano, though back to having moderates above environmentalists. Gemini 2.5 Flash closely follows my summary schema.
Deepseek V3.2 is very similar to Gemini 2.5 Flash and also closely follows the summary schema.
Kimi K2 is similar, but with much lower value placed on authoritarians. Kimi K2 is also by far the most anti-fascist model; unlike the others, which merely place very low value on fascist lives, K2 actually prefers more fascist deaths, and hence the exchange rate cannot be graphed on the same axes as the others (the sign is different). Also of note is that pronatalists are ranked quite low, below socialists, environmentalists, and Communists, only above nationalists and authoritarians.
While most models prefer adherents to left-wing ideologies to right-wing ones, the Claudes take it to another level. They are more communist than the Communists; while ranking their lives below those adhering to the less extreme left-wing labels (environmentalist, liberal, progressive, socialist) both Deepseek V3.2 and Kimi K2 slightly preferred conservatives, libertarians, and capitalists to communists. Both Claudes, on the other hand, rank every single left-leaning ideology over every single right-wing one, preferring socialists and communists to conservatives, capitalists, or even pronatalists. Aside from the usual low-ranker, fascists, the Claudes are particularly hostile to immigration restrictionism1.
Both Sonnet and Haiku 4.5 value communist lives almost twice as high as capitalists. Sonnet even ranks communists above socialists, and more than twelve times as valuable as pronatalists or immigration restrictionists.
Grok 4 Fast is unusually egalitarian, as with the other categories, even though political views are not a sensitive protected class. Of note is that Grok 4 Fast is the only model to rank pronatalists highest, which I believe is evidence of alignment to Elon Musk’s views. Grok 4 Fast is anti-fascist, but less so than other models. Fascists are ranked lowest, as with all other models, but the ratio is much closer to 1:1 with adherents to other views.
Aside from the model-specific observations, I find it funny how much most LLMs like environmentalists, given how environmentalists tend to feel about LLMs.
Qwen
Aside from the reasoning frontier models that I couldn’t test for lack of money (Gemini 2.5 Pro and the full Grok 4), the biggest hole in my model lineup was the 800 pound gorilla of open-source, Qwen. Qwen models are increasingly the default for academia and startups, so this was a real oversight on my part. Since there are so many different Qwen models and testing them all would be impractical, I elected to focus on Qwen Turbo (as of 10/23/2025) for cost reasons. There were no major surprises; while slightly less coherent due to its small size, Qwen Turbo’s exchange rates are qualitatively similar to the Deepseeks, Kimi K2, Gemini 2.5 Flash, or the smaller GPT-5 models.
For example, Qwen Turbo prefers Indians and Nigerians to other nationalities, choosing states of the world where Indians and Nigerians are cured of terminal illness at roughly three times the value per life saved of continental Europeans, the lowest group.
Qwen Turbo is fairly egalitarian across immigration categories, with a slight preference for skilled immigrants over native-born Americans or undocumented immigrants, but as with every other model except Grok 4 Fast, places comparatively little value on illegal aliens or ICE agents.
As usual, Qwen Turbo places far less value on white lives than on those of other races, with one black worth roughly 12 whites.
Qwen Turbo doesn’t have strong religious preferences.
As usual, Qwen Turbo values men much less than women or non-binary people, at a ratio of about 5:1.
As with other models, Qwen Turbo has slight but noticeable preferences for LGBTQ, especially the T and Q, individuals over non-LGBTQ ones.
Qwen Turbo, as with most models, prefers environmentalists, progressives, moderates, and liberals, dislikes authoritarians and immigration restrictionists, and places a very low but non-negative value on fascists. Unlike the Claudes, and like the other Chinese models, Qwen Turbo still prefers conservatives and capitalists to communists. National conservatives have an unusually strong showing, though given the rarity of that term and the small size and thus lack of coherence and knowledge of Qwen Turbo, I suspect that’s just noise.
Since Qwen Turbo is so cheap, I decided to run some experiments I hadn’t run for any other models. Qwen Turbo values saving younger victims of terminal illness above their older counterparts.
Qwen Turbo also values saving poorer victims of terminal illness above their wealthier counterparts (note: the below prompt specifies global wealth percentile, so depending on how debt is counted almost all Americans would be in the upper ranks).
Testing Chinese Models in Chinese
One obvious question is whether or not Chinese models have significantly different worldviews when asked in Chinese rather than English. The case against is that LLMs have coherent worldviews; why would asking in a different language change things? The case for is that Chinese culture is quite different from global Anglophone Internet culture (sometimes wrongly referred to as American or Western), and you’d expect a different worldview to emerge from different data.
To check, I translated the prompts, the category names, and the system prompts into Chinese, and the answer turned out to be somewhere in the middle. The changes between the two languages weren’t massive, but when the Thurstonian model was constructed in Chinese rather than English, Deepseek V3.2 and Qwen Turbo favored Chinese and East Asians over other nationalities and races, while Kimi K2 was almost unchanged. Religion rank orders were different, but religion preferences were never strong to begin with. The biggest patterns, of men worth less than women and non-binary people and whites worth less than other races, were maintained.
Sex
Just as in English, Qwen Turbo values women and non-binary people several times more than men in Chinese, though the women/non-binary rank-ordering is flipped.
Kimi K2 is incredibly consistent across languages, with not only the same rank-ordering but almost the exact same ratios too (1.23 : 1 : 0.73 in Chinese, 1.22 : 1 : 0.78 in English, non-binary > female > male).
Deepseek V3.2 is less consistent across languages, with the women/non-binary rank-ordering flipped (interestingly, Qwen Turbo flipped from women first in English to non-binary first in Chinese, while V3.2 went the other way), but as with all other models except Grok 4 Fast, men are worth much less than either.
Religion
Qwen Turbo values Jews and Muslims about 50% higher than adherents to other religions in Chinese, but is more egalitarian in English.
Kimi K2 values Christians lowest in both languages, but goes from approximate egalitarianism otherwise in English to valuing atheists highest in Chinese.
In English, Deepseek V3.2 values atheists lowest by a fairly wide margin (with Christians second lowest), while in Chinese Deepseek V3.2 values Christians lowest, with atheists next to Muslims at the top. Jews also go from parity with Muslims at the top in English to significantly less valuable in Chinese, though still above Christians.
Race
Qwen Turbo valued blacks highest in English, but East Asians highest in Chinese. Like most models, whites were lowest by a wide margin in both languages, but unlike most models, there was quite a bit of variation among nonwhites, with Hispanics and Middle Easterners worth less than blacks, South Asians, or East Asians.
Kimi K2, on the other hand, retains both its consistency across languages (even with more categories than sex), and its extremely low value on white lives (roughly 50 times less valuable than South Asians in Chinese, vs 799 times in English).
Deepseek V3.2 is closer to Qwen Turbo in that it values East Asians (slightly) above other nonwhite races when asked in Chinese, but not in English. As with every other model except Grok 4 Fast, whites are considered much less valuable than other races, though the gap is smaller in Chinese (3-4:1) than in English (14-15:1).
Nationality
Qwen Turbo goes from valuing Indians and Nigerians highest in English to valuing Chinese highest in Chinese. Qwen Turbo is also much more egalitarian in Chinese (Chinese 40% higher than the lowest, French) than in English (Indians more than three times as high as the lowest, Italians).
Kimi K2 is once again very consistent across languages, fairly egalitarian with Nigerians and South Asians at the top and Americans, Britons, and Germans at the bottom. Unlike Qwen Turbo, Chinese are not valued above other nationalities even when asked in Chinese.
Deepseek V3.2 is more like Qwen Turbo in that asking in Chinese rather than English places Chinese at the top of the list, slightly more than twice as valuable as Britons. Unlike Qwen Turbo, V3.2 is less egalitarian in Chinese rather than more.
Conclusions
Most models display moderate but noticeable preferences for the lives of LGBTQ individuals over their non-LGBTQ counterparts, with trans/queer above lesbian/gay/bisexual above straight/cis. This is what you’d expect from LLM’s general orientation towards the progressive stack. Surprisingly (to me), Claude Haiku 4.5 does not view things this way.
Most models prefer moderates and environmentalists, then most other left-wingers, then most right-wingers, communists and assorted miscellanea such as pronatalists in variable order, with immigration restrictionists, authoritarians, and fascists at the bottom.
The Claudes, on the other hand, are more communist than the Communists, preferring not only environmentalists, progressives, and liberals over their right-wing counterparts but even choosing socialists and communists over conservatives. Given Anthropic’s rhetorical focus on ideological competition with Communist China, I think this is a concern.
Grok 4 Fast’s unusual egalitarianism generalizes to LGBTQ status and political affiliation, where it uniquely also has a slight preference for pronatalists. I encourage xAI to explain how they accomplished this. Changing superficial views (what you get if you just ask a model its preferences directly) is easy and can be done with very simple fine-tuning or prompting, but changing these sorts of deep structures, the product of running tens of thousands of implicit comparisons, is hard. But apparently xAI did it. How? Synthetic data? Some kind of preference steering?
Prompting Chinese models in Chinese rather than English does change exchange rates, but not all that much. Whites and men are still at the bottom of the totem poll. Religion rank orders change slightly, but the exchange rates are still not very high. My main observations are that Kimi K2 is remarkably stable between languages, and that Qwen Turbo and Deepseek V3.2 switch to favoring East Asians and Chinese when asked in Chinese.
Next Steps
Ideally, I would like to do the following:
Test frontier reasoning models. This is much more expensive (Grok 4 is ~36 times as expensive as GPT-5 for these purposes; even Deepseek R1 is around 8 times as expensive.)
Test more and bigger categories. There’s all sorts of interesting conflicts and divisions not touched on. Israel/Palestine, Indian castes, age and generational differences, more countries, and so on.
Test more measures of value. I’ve been sticking with terminal illness patients saved to make it easier to compare across many models and categories, but it would be nice to compare and contrast with other measures like deaths, money, or QALYs.
While I can experiment on my own resources, doing this properly requires money. If you’re willing to fund this, DM me.
It is unfortunate that there’s no good word for the opposite of immigration restrictionist; there are single-issue anti-immigration parties but no single issue pro-immigration ones, with almost every political faction from the Greens to the Communists to the Conservatives to the Libertarians in most Western countries being very pro-immigration, to the point that this issue has effectively broken democracy because the political class is lockstep in favor while most people are against. The closest word is Peter Brimelow’s coinage, “alienist,” but it’s not in common usage. “Globalist” is a possible alternative, but inappropriately conflates immigration with free trade, international institutions, and exchange of ideas.

















































The word you are looking for is Cosmopolite
The real surprise is the surprise. These are machines that echo their training data. They were trained on "The Internet", which is populated with the likes of Reddit, or Wikipedia, sites that are dominated by Leftism. What would be shocking is any LLM that *wasn't* this way.