Pre-computation in progress... First run takes several hours. Check container logs.
LLM Model Comparison for Gender Classification
Gender was classified using multiple large language models. This section compares agreement rates and differences between models, highlighting where classifications diverge.
Agreement rate between models
Methodology & formula
What it shows: agreement rate between each pair of LLM models for gender classification.
For each author-article pair, the two models' classifications are compared. If both say M or both say F → agree.
Cases where one model classified but the other didn't are reported as "A only" or "B only".
Formula:
Formula:
agreement % = agree / (agree + disagree) × 100
SELECT "[col_a]", "[col_b]"
FROM article_authors
-- For each row: normalize to m/f, compare
-- agree++ if both valid AND equal
-- disagree++ if both valid AND different
-- Scanned in chunks of 5M rows by id range
Agreement rate
| Model A | Model B | Agreement rate | Agree | Disagree | A only | B only |
|---|
Confusion matrix
Methodology & formula
What it shows: confusion matrix between two selected LLM models. Diagonal cells (green) show where models agree, off-diagonal cells (red) show disagreements.
Example: cell "M × F" indicates how many times model A said M while model B said F.
Example: cell "M × F" indicates how many times model A said M while model B said F.
SELECT "[col_a]" AS label_a, "[col_b]" AS label_b, COUNT(*)
FROM article_authors
WHERE "[col_a]" IN ('m','f') AND "[col_b]" IN ('m','f')
GROUP BY label_a, label_b