📅 2025-01-16 — Session: Developed Scripts for HTML Data Parsing and Extraction
🕒 16:45–17:15
🏷️ Labels: Python, HTML, Data Extraction, Web Scraping, JSON, Automation
📂 Project: Dev
⭐ Priority: MEDIUM
Session Goal:
The primary aim of this session was to develop and execute Python scripts for parsing HTML data, extracting user and chat information, and saving it in structured formats like CSV and JSON.
Key Activities:
- Developed a Python script to parse HTML files, extracting Instagram usernames and profile links, and saving the data into a CSV format.
- Utilized BeautifulSoup for web scraping tasks to extract data from Instagram profiles.
- Planned and executed a script to extract chat data from HTML files, structuring it into JSON format with sender names, message content, and timestamps.
- Updated scripts to extract links to individual chat files from
chats.html, preparing for further processing. - Implemented a process to read and extract dialogues from chat files, saving the structured data in JSON format.
Achievements:
- Successfully created and executed scripts for extracting and structuring data from HTML files into CSV and JSON formats.
- Achieved integration of chat data into a JSON structure, facilitating further data processing and analysis.
Pending Tasks:
- Further processing of individual chat files using the extracted links to enhance data completeness and accuracy.