Multi-source market data.
Daily ranked targets.
A Python 3.10+ pipeline that pulls Redfin Data Center CSVs and live ZIP-level pending and active metrics, runs corroboration across multiple providers (agreement within 5% confirms, disagreement flags for review), scores listing descriptions for high-DOM, price-reduced, and relisted signals, and lands a daily ranked call list in Airtable. Playwright handles the sources that fight scraping. Async asyncio runs the collection in parallel. Slack pings the moment a new hot market or priority property crosses threshold. Four milestones, two weeks, sign-off at every gate.
Three failure modes in single-source data pipelines.
Every operator running a one-source scraper hits the same three. Different markets, same root cause: confidence without corroboration is just noise dressed up as signal.
Free CSVs alone aren't enough
Redfin Data Center is the right foundation but it misses live ZIP-level pending and active velocity, plus the listing-description signals that surface real opportunities. One source means false confidence on hot markets that already cooled.
Single-source confidence is dangerous
One scraper, one provider, one metric. When two providers disagree by 20% on the same ZIP you need to know before the call list ships, not after the call goes cold. Corroboration is what turns a number into a signal.
Manual ZIP triage doesn't scale
Even with the right data, walking ZIP by ZIP through pending percentages, DOM, price reductions, and relisted properties is hours per day. The scoring layer is where the system has to compound, not the spreadsheet.
One pipeline. Cross-validated end to end.
A Python 3.10+ ETL pipeline built around Playwright for the sources that fight scraping with anti-bot logic and REST APIs for the providers with clean endpoints. Async asyncio collects across all sources in parallel. Redfin Data Center CSV is the foundation layer. Secondary providers feed live ZIP-level active and pending market metrics with the corroboration rule baked in (sources agree within 5% confirms, otherwise flag for review). A third validation provider resolves disagreements between the first two. A weighted composite scoring model ranks every ZIP code daily. Inside the high-scoring ZIPs, a keyword-based scoring layer reads listing descriptions and flags high-DOM, price-reduced, relisted, and other configurable signals as priority properties. Output lands in your Airtable Target Properties table with a daily call list view sorted by composite score. Slack webhooks alert the moment a new hot market or priority property crosses threshold. Automated scheduling at daily, weekly, and monthly cadences. Setup documentation so you maintain it independently after the two-week build.
Four milestones. Sign-off at every gate.
Your milestone structure mapped to concrete deliverables. Every milestone ships with explicit acceptance criteria before the next begins.
- Python 3.10+ pipeline with async asyncio orchestration for parallel collection
- Playwright scraper for Redfin Data Center CSV (anti-bot resilient with proxy rotation if needed)
- REST integrations for ZIP-level active and pending market metrics from approved sources
- Pending percentage calculated per ZIP code, output to Google Sheets
- Secondary housing market data source integrated for the same ZIP code set
- Corroboration rule: both sources agree within 5% confirms, otherwise flagged for review
- Additional data source as third validation layer when sources disagree
- Corroboration results written to Airtable with full source attribution attached
- Keyword-based scoring layer on listing descriptions with configurable rule set
- High-DOM, price-reduction, relisted, and other configurable indicators classified
- Configurable scoring rules with rubric versioning so you adjust without code changes
- Scored properties written to Airtable Target Properties table with reasoning attached
- Automated scheduling at daily, weekly, and monthly cadences (cron or scheduled workers)
- Slack webhook alerts for new hot markets and priority properties above threshold
- Daily call list view in Airtable sorted by composite score, ready for the morning standup
- Complete setup documentation so you maintain the system independently
- Code repository, credentials, scheduler config, and Airtable base ownership transferred to you on day one
- 14-day post-launch window for bug fixes, scoring rubric calibration, and minor adjustments
- Loom walkthrough of the codebase, the scheduler config, and how to add a new data source
- AI-assisted maintenance approach documented (Claude and Cursor for future debugging and rubric updates)
Two weeks. Milestone to milestone.
Each milestone ships with explicit acceptance criteria before the next begins. Click any milestone to see what lands.
Deliverables this milestone
- Python 3.10+ pipeline with async asyncio for parallel collection
- Playwright scraper for Redfin Data Center CSV with anti-bot resilience
- REST integrations for ZIP-level active and pending market metrics from approved sources
- Pending percentage calculated per ZIP code, output to Google Sheets
Deliverables this milestone
- Secondary housing market data source integrated for same ZIPs
- Corroboration rule: both sources agree within 5% confirms, otherwise flagged for review
- Additional data source as third validation layer when sources disagree
- Corroboration results written to Airtable with source attribution
Deliverables this milestone
- Keyword-based scoring layer on listing descriptions with configurable rule set
- High-DOM, price-reduction, relisted, and other configurable indicators
- Configurable scoring rules with versioning so you adjust without code
- Scored properties written to Airtable Target Properties table with reasoning
Deliverables this milestone
- Automated scheduling at daily, weekly, and monthly cadences
- Slack webhook alerts for new hot markets and priority properties above threshold
- Daily call list view in Airtable sorted by composite score
- Complete setup documentation plus Loom walkthrough of codebase and scheduler config
Let's walk this together.
A 30 minute call where I share my screen, walk through the architecture, show a Playwright + proxy scrape running against a real estate source live, and confirm scope against your target ZIP code list and corroboration thresholds. Happy to walk through commercials on the call.