📅 2023-02-23 — Session: Enhanced Data Generation and Visualization Functions
🕒 06:45–08:20
🏷️ Labels: Python, Data Generation, Visualization, Refactoring
📂 Project: Dev
⭐ Priority: MEDIUM
Session Goal
The primary aim of this session was to enhance data generation and visualization functions in Python, focusing on flexibility and code efficiency.
Key Activities
- Updated the
generate_data()function to include a boolean argumentuse_formula_2, allowing selection between two treatment distribution formulas. - Implemented random sampling for treatment variables using
np.random.choicefor equal probability distribution. - Modified the
generate_datafunction to introduce ause_prob_distargument, enabling choice between binomial and choice distributions. - Developed a function to generate synthetic data for regression problems, incorporating treatment effect options and filename tagging.
- Refactored code to reduce duplication by defining a function for similar operations with different parameters.
- Updated Python plotting functions to pass figure and axis objects as arguments, improving maintainability.
- Enhanced the
scatterplot_resultsfunction with aplot_kwargsparameter for flexible plotting. - Demonstrated the use of
plot_kwargsin thescatterplot_sts()function for passing additional arguments to plotting functions.
Achievements
- Successfully updated and refactored data generation and visualization functions, improving flexibility and reducing code redundancy.
Pending Tasks
- Further testing of the updated functions in diverse scenarios to ensure robustness.