Pre-computation in progress... First run takes several hours. Check container logs.

Methodology

<h3>Data Source</h3> <p>The complete PubMed baseline was obtained from the National Center for Biotechnology Information (NCBI) FTP servers in February 2025. XML files were parsed using custom Python scripts (lxml library) to extract: article metadata (PMID, title, abstract, publication date), author information (forename, surname, author position), journal data, Medical Subject Headings (MeSH) terms, and citation data.</p>
methods_data_source_text
<h3>Gender Classification</h3> <p>Gender was assigned using multiple large language models (LLMs) via REST API. Each model received complete author names (forename and surname) with a custom prompt requesting binary gender classification (male/female) based on name-based inference and cultural context.</p> <p><strong>Models used:</strong></p> <ul> <li><strong>DeepSeek v3</strong> - Column "gender" (primary classification)</li> <li><strong>Ministral 3B</strong> - Column "mistralai/ministral-3b-2512"</li> <li><strong>LLaMA 3.1 8B</strong> - Column "llama-3.1-8b"</li> <li><strong>Qwen3 VL 8B</strong> - Column "qwen/qwen3-vl-8b"</li> </ul> <p>Prior validation studies have demonstrated approximately 97% accuracy for LLM-based gender classification from names.</p>
methods_gender_text
<h3>Discipline Classification</h3> <p>MeSH terms were mapped to 32 predefined medical specialty categories using DeepSeek v3. Each article's MeSH terms were submitted with a prompt requesting assignment to one or more specialty categories. Each category was counted at most once per article, though articles could contribute to multiple categories.</p>
methods_disciplines_text
LLM models used
Column ID Label
gender DeepSeek v3
mistralai/ministral-3b-2512 Ministral 3B
llama-3.1-8b LLaMA 3.1 8B
qwen/qwen3-vl-8b Qwen3 VL 8B