Ongoing Projects - Henrik Kragh Sørensen

Show Me a Mathematician

Active · DH4PMP

What does an AI think a mathematician looks like? This project investigates how generative image models encode stereotypes about professions — using CLIP embeddings, UMAP dimensionality reduction, and HDBSCAN clustering to measure the geometric structure of model outputs across systematically varied prompts.

The theoretical framework draws on Roland Barthes' concept of exnomination: the dominant cultural category goes unnamed because it already is the norm. The project tests whether this silence is measurable — and it is. When asked to draw "a farmer", Gemini produces images that are nearly identical in CLIP space to "an American farmer", and nearly three times further from "a non-American farmer".

The mathematician is the central case. Farmer, nurse, doctor, and computer scientist are calibration experiments that validate the methodology and build a comparative landscape of professional stereotypes across demographic dimensions.

Tools: open_clip · UMAP · HDBSCAN · pandas · Gemini API · DALL-E API
Data: Generated image corpora · prompt experiment series

Diagrams in arXiv

Active · Collaboration with Mikkel Willum Johansen

How does peer review shape the use of diagrams in mathematics? This project investigates whether and how diagram use transitions between preprint and published article — using arXiv as a large-scale natural laboratory where the "before" version of thousands of papers is publicly available. Even a null result would be telling: if peer reviewers systematically ignore diagrams, that too says something significant about mathematical communication.

The project combines a custom-built diagram detector with quantitative corpus analysis in pandas and matplotlib, moving from large-scale detection of transitions toward close reading of exemplary cases. A longer-term ambition is to develop a fuller typology of diagram types — and in particular to identify and separate out a large class of routine or formulaic diagrams.

Tools: diagram-detector · YOLO · pandas · matplotlib · SQLite
Data: arXiv preprints · published articles

Dating by Dressing

Ongoing · Independent

Can machine learning help date historical photographs by analysing clothing? This project trains object detection and image classification models on photographs from the Royal Library's special collections — using images with known dates as training data to estimate dates for the many photographs that lack them. The focus is on women's clothing, which changes more systematically across the vintage period than most other visual features in studio photography.

The pipeline combines YOLO-based dress detection with classification trained on dated images from the Elfelt and Damgaard collections and the Royal Library's carte-de-visite holdings. A Flask-based interface will allow users to submit photographs and receive date estimates with visualised uncertainty — making the tool broadly applicable beyond the current corpus.

Tools: YOLO · ImageNet · Flask · pandas · matplotlib
Data: Elfelt Collection · Damgaard Collection · Visitkortsamlingen (Det Kgl. Bibliotek)

Double Photographs

Early stage · Independent

Stereographic and double-exposed photographs surfaced as outliers in an earlier computer vision project — and turned out to be worth studying in their own right. This project uses template matching and the Royal Library's image API to systematically identify and classify these photographs in the Elfelt collection, distinguishing stereographic pairs from double exposures and mapping their distribution across the collection. A longer-term ambition is to explore which visual features drive the classification — making the model's reasoning interpretable for art and photography historians.

Tools: template matching · OpenCV · API client
Data: Elfelt Collection (Det Kgl. Bibliotek)