π 2023-10-03 β Session: Implemented data loading and manipulation with Pandas
π 16:10β16:35
π·οΈ Labels: Python, Pandas, Data Manipulation, Dataframes, Concatenation
π Project: Dev
β Priority: MEDIUM
Session Goal
The goal of this session was to implement efficient data loading and manipulation techniques using the Pandas library in Python.
Key Activities
- Loading Datasets: Utilized Pandas to load the first 5 rows of multiple datasets based on combinations of source, unit, and time, storing them in a dictionary.
- File Handling: Implemented a Python loop to load datasets, checking for file existence to ensure robust data processing.
- Data Iteration: Iterated through datasets to print filenames and display grouped data, excluding specific columns.
- Group Analysis: Used the
size()method to display the sizes of groups within a DataFrameβs βGIDβ column. - Data Concatenation: Demonstrated horizontal concatenation of DataFrames by unit, organizing them in a dictionary before concatenation.
- Column Naming: Set filenames as column names during DataFrame concatenation for clearer organization.
Achievements
- Successfully loaded and manipulated datasets using Pandas, enhancing data processing efficiency.
- Improved data organization by setting filenames as column names during concatenation.
Pending Tasks
- Further optimization of data loading processes to handle larger datasets efficiently.
- Exploration of additional Pandas functionalities for more complex data manipulations.