M.I. Journal

❯

❯

Developed Techniques for JSON Data Extraction

Developed Techniques for JSON Data Extraction

Oct 22, 20231 min read

JSON
Web-Scraping
Python
Javascript
Beautifulsoup

Developed Techniques for JSON Data Extraction

Day: 2023-10-22
Time: 01:00 to 01:25
Project: Dev
Workspace: WP 2: Operational
Status: Completed
Priority: MEDIUM
Assignee: Matías Nehuen Iglesias
Tags: JSON, Web Scraping, Python, Javascript, Beautifulsoup

Description

Session Goal

The session aimed to explore and develop methods for extracting JSON data from HTML and JavaScript sources using Python and JavaScript.

Key Activities

Explored methods to extract JSON data from HTML <script> tags using Python and BeautifulSoup.
Investigated techniques for parsing JSON data embedded in JavaScript variables, specifically window.__PRELOADED_STATE__.
Addressed error handling strategies for truncated JSON content and corrected library import oversights.
Implemented regular expressions to extract JSON when it is not located in expected script tags.
Developed a Python script to scrape product metadata from HTML using BeautifulSoup, focusing on extracting title, description, image URL, and product URL.

Achievements

Successfully extracted JSON data from both HTML and JavaScript sources.
Corrected errors related to library imports and variable loss in JavaScript.
Enhanced understanding of error handling in JSON processing.

Pending Tasks

Further refine regular expression patterns for more robust JSON data extraction.
Explore additional error handling techniques for incomplete JSON data.

Evidence

source_file=2023-10-22.sessions.jsonl, line_number=2, event_count=0, session_id=4a04f570cee6f4bcbd49541f8dfef4fc27ae3d2c22756f320f10b061c5c8c218
event_ids: []

Graph View

Developed Techniques for JSON Data Extraction
Description
Session Goal
Key Activities
Achievements
Pending Tasks
Evidence

Backlinks

Monthly Journal 2023-10

Created with Quartz v4.5.1 © 2026

Home
CV
Projects
Thesis
GitHub