Pre-computation in progress... First run takes several hours. Check container logs.

Growth of Biomedical Publishing

Annual publication volume increased from approximately 18,600 articles in 1945 to over 1.6 million in 2024, a nearly 90-fold increase. Mean authors per article increased from 1.43 in the 1940s to 6.60 in the 2020s.

Articles / Year
Methodology & formula
What it shows: articles published per year, deduplicated by PMID. The articles table contains duplicate rows (one per author), so the count comes from the pmid_year table which has one record per PMID.
Formula: articles(year) = COUNT(*) FROM pmid_year WHERE year = Y
SELECT year, COUNT(*) FROM pmid_year GROUP BY year
Authors / Articles
Methodology & formula
Formula: mean authors = SUM(authors per article) / COUNT(articles) per year. Calculated by grouping article_authors by PMID (via pmid_year) and counting authors per article.
SELECT py.year, aa.pmid, COUNT(*) AS n FROM article_authors aa JOIN pmid_year py ON aa.pmid = py.pmid GROUP BY py.year, aa.pmid -- Python: mean_authors(year) = sum(n) / count(pmids)
Journals / Year
Methodology & formula
Formula: journals(year) = COUNT(DISTINCT journal_title) from articles table per year.
SELECT pub_date_year::integer AS yr, COUNT(DISTINCT journal_title) FROM articles WHERE pub_date_year BETWEEN '1945' AND '2024' GROUP BY yr
Exclusion rate
Methodology & formula
What it shows: percentage of authors excluded from gender analysis because their name is missing or contains only initials (e.g. "J.", "A. B.").
Criterion: an author is excluded if fore_name is NULL, empty, ≤2 chars, or matches the initials-only pattern /^[A-Z][\.\s]*([A-Z][\.\s]*)*$/.
Formula: exclusion % = excluded / total × 100
SELECT py.year, aa.fore_name FROM article_authors aa JOIN pmid_year py ON aa.pmid = py.pmid -- Python: for each row, check if fore_name is null/empty/initials -- exclusion_rate(year) = excluded / total × 100