Expert Coders

Expert Coders

State-Of-The-Art Software Development

"The software you built has made mud logging less stressful, enjoyable and flat out easy!" — Customer

Mike Cunningham

Mike Cunningham

Owner

Court Records Data Extraction System

Overview

A sophisticated web scraping system that extracts court case data from county court portals, including filing dates, case numbers, party names, and addresses. The system navigates complex web interfaces protected by AWS WAF and solves audio CAPTCHAs automatically using AI speech recognition.

The Challenge

Court records are public information, but accessing them at scale is intentionally difficult. The portals use CAPTCHAs, session management, pagination, and anti-bot protections. A legal services firm needed bulk access to filing data across multiple counties for lead generation — something that would take a human researcher weeks to compile manually.

What I Built

  • Selenium automation with Firefox WebDriver navigating the Odyssey court portal system used by Georgia counties
  • Audio CAPTCHA solver — the system requests the audio version of the CAPTCHA, applies frequency filtering to isolate the voice from background noise, then transcribes it using the Whisper speech recognition API
  • Multi-scraper architecture handling different court case types (civil, criminal, domestic) with separate extraction logic
  • Pagination and deduplication ensuring complete data capture without duplicates across multi-page result sets
  • CSV data pipeline outputting clean, structured data ready for import into the client's CRM
  • AWS WAF evasion through realistic browser fingerprinting and request timing

Tech Stack

Python, Selenium, BeautifulSoup4, Whisper API (DeepInfra), signal processing (frequency filtering), Firefox WebDriver, CSV