Analyzed Global Scholarly Publication Statistics

  • Day: 2025-08-15
  • Time: 11:55 to 12:05
  • Project: Media
  • Workspace: WP 2: Operational
  • Status: Completed
  • Priority: MEDIUM
  • Assignee: Matías Nehuen Iglesias
  • Tags: Scholarly Articles, Data Analysis, Python, Data Storage, Publication Statistics

Description

Session Goal

The session aimed to analyze global scholarly publication statistics, focusing on data from various databases and fields to understand trends and distributions.

Key Activities

  • Conducted search queries to gather statistics on global scholarly article publications for the year 2023, including estimates from databases like Scopus and Web of Science.
  • Executed Python scripts to calculate the average yearly count of Science and Engineering (S&E) articles from the 1950s to the 2020s, using an exponential growth model.
  • Implemented code snippets to calculate non-health totals by decade and allocate these totals by discipline.
  • Created Pandas DataFrames from decade allocations and calculated historical totals for pre-1950 and post-1950 periods.
  • Developed functions to estimate storage requirements for different data types and dimensions, and calculated yearly storage needs.

Achievements

  • Successfully gathered and analyzed data on scholarly article publications, providing insights into historical and current trends.
  • Developed a comprehensive set of Python functions and scripts to facilitate data analysis and storage estimation.

Pending Tasks

  • Further analysis is needed to refine projections for future scholarly outputs and their storage implications.
  • Additional exploration of non-health scientific articles and their data indexing requirements is required.

Evidence

  • source_file=2025-08-15.sessions.jsonl, line_number=6, event_count=0, session_id=07bfbfb31aaed8e2fe3e690632c5574688017d3cc10433f70d539e9558d6091e
  • event_ids: []