📅 2023-03-27 — Session: Enhanced Data Processing and Jupyter Notebook Automation
🕒 17:10–18:05
🏷️ Labels: Data Processing, Jupyter Automation, Pandas, Geopandas, Command Line
📂 Project: Dev
⭐ Priority: MEDIUM
Session Goal:
The session aimed to improve data processing techniques and automate the conversion of Jupyter notebooks to different formats.
Key Activities:
- Conducted a comparison between old and new versions of data processing code, focusing on improvements like merging with country shapes and creating unique project location identifiers.
- Modified DataFrames using Pandas and GeoPandas to explode geographical columns and include new calculated columns such as
even_split_totalamt
. - Cleaned DataFrame columns by removing commas and converting strings to floats.
- Automated the conversion of Jupyter notebooks to PDF and HTML formats using command-line tools like
jupyter nbconvert
andfind
, excluding checkpoint files. - Utilized Unix command history to efficiently locate and reuse
nbconvert
commands.
Achievements:
- Successfully refactored data processing code to enhance functionality and clarity.
- Implemented automation scripts for converting Jupyter notebooks, improving workflow efficiency.
Pending Tasks:
- Further testing of the new data processing code to ensure robustness.
- Explore additional automation opportunities for other data formats or environments.