📅 2025-02-24 — Session: Designed Generalized Bill Ingestion System and Data Schema

🕒 21:15–21:50
🏷️ Labels: Bill Ingestion, Data Extraction, Financial Schema, Automation, Credit Card Processing, Property Tax
📂 Project: Dev
⭐ Priority: MEDIUM

Session Goal:

The session aimed to design a generalized system for bill ingestion, data extraction from financial documents, and creating a unified data schema for financial reporting.

Key Activities:

  • Bill Ingestion System Design: Developed a plan for parsing bills from PDFs, standardizing the data, and storing it in a structured format. This included creating custom parsers for different bill types and automating the data ingestion process.
  • Directory Structure Analysis: Analyzed the directory structure for organizing raw data, identified strengths and potential issues, and provided recommendations for an automated ingestion pipeline.
  • Document Understanding: Differentiated between CuponPago and Factura documents from AySA, proposed a data structure design, and outlined an ingestion plan.
  • Credit Card Data Extraction: Discussed extracting data from credit card statements, including key fields and integration into a financial data processing pipeline.
  • Unified Parsing Approach: Proposed a unified approach for parsing Visa and Mastercard statements.
  • General Financial Schema Design: Outlined a schema for managing financial documents, including tables for bills, credit card transactions, and payments.
  • Property Tax Bills Analysis: Analyzed property tax bills to suggest a parser design for effective handling.

Achievements:

  • Developed a comprehensive plan for a generalized bill ingestion system.
  • Created a unified data schema for financial documents.
  • Proposed a unified parsing approach for Visa and Mastercard statements.

Pending Tasks:

  • Implement the designed parsers and ingestion pipeline.
  • Test the unified data schema with sample data.
  • Develop a parser for property tax bills based on the analysis.