Implemented data loading and manipulation with Pandas
- Day: 2023-10-03
- Time: 16:10 to 16:35
- Project: Dev
- Workspace: WP 2: Operational
- Status: Completed
- Priority: MEDIUM
- Assignee: Matías Nehuen Iglesias
- Tags: Python, Pandas, Data Manipulation, Dataframes, Concatenation
Description
Session Goal
The goal of this session was to implement efficient data loading and manipulation techniques using the Pandas library in Python.
Key Activities
- Loading Datasets: Utilized Pandas to load the first 5 rows of multiple datasets based on combinations of source, unit, and time, storing them in a dictionary.
- File Handling: Implemented a Python loop to load datasets, checking for file existence to ensure robust data processing.
- Data Iteration: Iterated through datasets to print filenames and display grouped data, excluding specific columns.
- Group Analysis: Used the
size()method to display the sizes of groups within a DataFrame’s ‘GID’ column. - Data Concatenation: Demonstrated horizontal concatenation of DataFrames by unit, organizing them in a dictionary before concatenation.
- Column Naming: Set filenames as column names during DataFrame concatenation for clearer organization.
Achievements
- Successfully loaded and manipulated datasets using Pandas, enhancing data processing efficiency.
- Improved data organization by setting filenames as column names during concatenation.
Pending Tasks
- Further optimization of data loading processes to handle larger datasets efficiently.
- Exploration of additional Pandas functionalities for more complex data manipulations.
Evidence
- source_file=2023-10-03.sessions.jsonl, line_number=2, event_count=0, session_id=b2ea2e48c0cdbbbdf8d8e33fad253b45d2352ab3558fd4902198083e09ee3268
- event_ids: []