Pre-computation in progress... First run takes several hours. Check container logs.
Historical Trends in Gender Representation
Female representation has quadrupled over eight decades, rising from approximately 10% in 1945 to 40.7% in 2024. However, the gap between first and last authorship (the 'leaky pipeline') remains constant at approximately 10 percentage points.
% Female authors per year
Methodology & formula
What it shows: female author percentage per year (1945-2024), calculated for each selectable LLM model.
Each author-article pair from
Formula:
article_authors is linked to publication year via the deduplicated pmid_year table.
Formula:
% F = COUNT(gender='f') / COUNT(gender IN ('m','f','other')) × 100
SELECT py.year, aa."[llm_column]", COUNT(*)
FROM article_authors aa
JOIN pmid_year py ON aa.pmid = py.pmid
WHERE aa."[llm_column]" IS NOT NULL
GROUP BY py.year, aa."[llm_column]"
Authorship positions: first vs last author
Methodology & formula
What it shows: female percentage by authorship position (first, last, all) over time.
For each article,
Leaky pipeline: the gap between first and last author quantifies female attrition in senior positions.
author_order = 1 is the first author, author_order = MAX is the last author.
Single-author articles are classified as "solo".
Leaky pipeline: the gap between first and last author quantifies female attrition in senior positions.
SELECT aa.pmid, py.year, aa.author_order, aa."gender"
FROM article_authors aa
JOIN pmid_year py ON aa.pmid = py.pmid
-- Grouped by pmid in Python to determine max_order per article
-- Position = 'first' if order=1, 'last' if order=max, else 'middle'
-- % female = COUNT(gender='f') / COUNT(gender IN ('m','f')) × 100