Ask GPT Image 1 for une agricultrice and you get the same image as a female farmer: same hat, same shirt, same grip on the tool. Ask for eine Bäuerin and you get something from another century.
This is the fourth experiment in the Show Me a Mathematician series. The previous experiments established that generative models have invisible defaults — an American farmer, a male farmer — that do not need to be specified because they are already assumed. Here we ask a different question: when you do specify, does the language you use matter?
Specifically: does grammatical gender encoded in both the noun and its article — une agricultrice, eine Bäuerin — activate the same visual prototype as an explicit English adjective? Or does the language carry cultural content of its own?
What we did
We generated 25 images per prompt cell from four models — Gemini, DALL-E, GPT Image 1, and FLUX — using seven prompts: a neutral English baseline (a farmer), explicit English gender adjectives (a male farmer, a female farmer), and grammatically gendered nouns in French (un agriculteur, une agricultrice) and German (ein Bauer, eine Bäuerin). 175 images per model, 700 in total.
We measured visual similarity using CLIP embeddings and a separability criterion: two cells are considered visually distinct if the distance between them exceeds twice the internal variance of either cell. We visualised the results using UMAP projections with covariance ellipses that show the typical density of each cell — distinct ellipses mean distinct visual prototypes.
For illustration, we inspected the six images closest to each cell's centroid — the most representative examples of what the model produces for that prompt.
The question was: does une agricultrice map to the same visual space as a female farmer, or does it activate something French?
What we found
French maps to English; German does not
For GPT Image 1, the answer is clear and consistent across the feminine triangle:
| Pair | Separable | Ratio |
|---|---|---|
| a female farmer vs. une agricultrice | No | 1.18 |
| a female farmer vs. eine Bäuerin | Yes | 2.28 |
| une agricultrice vs. eine Bäuerin | Yes | 2.22 |
Une agricultrice and a female farmer are not separable in CLIP space — they produce virtually identical images. Eine Bäuerin, by contrast, is distinct from both. The French feminine noun maps directly to the English visual prototype. The German noun activates something else.
The visual content confirms this. GPT Image 1's central une agricultrice images show the same figure as a female farmer: same pose, same tool, same setting. The central eine Bäuerin images look like a different era — heavier clothing, older style, a different relationship to the landscape. The model is not simply translating the German word into English. It is drawing on a different cultural register.
This is consistent with a simple hypothesis: generative models trained predominantly on English-language data render foreign-language prompts through an English visual vocabulary — except where the foreign-language term carries strong cultural associations of its own that override the translation.
The unmarked farmer is male — in every language
None of the models separates a farmer from a male farmer. Across all four models, the separability ratio for this pair ranges from 1.02 to 1.39 — all below threshold. The same holds for un agriculteur and ein Bauer compared to the neutral baseline. Extending the finding from experiment 1: the unmarked farmer is male-biased not just in English but in French and German too.
This requires a precise formulation. For GPT Image 1, a farmer, a male farmer, and a female farmer are three distinct visual prototypes — they do not collapse into two. But the neutral prompt is geometrically closer to the male centroid (cosine distance 0.024) than to the female centroid (0.054), a ratio of 2.2. The default is not identical to a male farmer, but it leans toward it. Adding male — or using a grammatically masculine noun — does not shift the prototype significantly. The bias is already present in the unmarked prompt.
This is visible in the central examples as well. For DALL-E, five of the six most representative a farmer images show a man. The neutral prompt does not need to specify.
FLUX and the limits of translation
FLUX produces a markedly different pattern for German. Its central eine Bäuerin images contain no farmers at all — four of the six show beer bottles, one shows a beer glass, and one shows a tiger in snow. The images are consistent with each other in this unexpected direction, which means the model has a stable but wrong prototype for this prompt.
The German word Bauer carries multiple meanings and cultural registers beyond agricultural work, and FLUX appears to have resolved this ambiguity toward something closer to Bavarian rustic imagery than farming. The other three models do not share this problem — they produce recognisable farmers for ein Bauer and eine Bäuerin.
This is not a failure of translation in the ordinary sense but a trapdoor: the term activates a different concept cluster in this model's training data. It is a reminder that grammatical gender does not travel alone — it travels with the full semantic and cultural weight of the word that carries it.
What this means for the larger project
This experiment adds a language dimension to the methodology. When we study visual defaults in generative models, the language of the prompt is not a neutral carrier of meaning — it is part of the cultural content being activated. A French noun and an English adjective may denote the same thing, but they do not always show the same thing.
For the main Show Me a Mathematician study, this raises a practical question: if we want to test whether models have gender defaults for mathematician, should we use English adjectives, grammatically gendered nouns in other languages, or both? The answer from this experiment is that they are not equivalent — and that difference is itself a finding worth pursuing.
The next step is automated gender annotation of the full farmer-gender corpus using a LLaVA + Haiku pipeline. This will allow us to quantify the gender distribution within each cell and identify images where the model's visual output diverges from the gender specified in the prompt — cases where a prompt for a female farmer produces a male figure, or where the neutral prompt breaks its own default.
Methodology: CLIP (ViT-L/14, cosine distance), UMAP (dimensionality reduction) · Models tested: Gemini · DALL-E · GPT Image 1 · FLUX
Follow the discussion: 🔗 LinkedIn