Pre-computation in progress... First run takes several hours. Check container logs.

LLM Model Comparison for Gender Classification

Gender was classified using multiple large language models. This section compares agreement rates and differences between models, highlighting where classifications diverge.

Agreement rate between models

Methodology & formula

What it shows: agreement rate between each pair of LLM models for gender classification. For each author-article pair, the two models' classifications are compared. If both say M or both say F → agree. Cases where one model classified but the other didn't are reported as "A only" or "B only".
Formula: agreement % = agree / (agree + disagree) × 100

SELECT "[col_a]", "[col_b]" FROM article_authors -- For each row: normalize to m/f, compare -- agree++ if both valid AND equal -- disagree++ if both valid AND different -- Scanned in chunks of 5M rows by id range

Agreement rate

Model A	Model B	Agreement rate	Agree	Disagree	A only	B only

Confusion matrix

Model pair

Methodology & formula

What it shows: confusion matrix between two selected LLM models. Diagonal cells (green) show where models agree, off-diagonal cells (red) show disagreements.
Example: cell "M × F" indicates how many times model A said M while model B said F.

SELECT "[col_a]" AS label_a, "[col_b]" AS label_b, COUNT(*) FROM article_authors WHERE "[col_a]" IN ('m','f') AND "[col_b]" IN ('m','f') GROUP BY label_a, label_b