info@aequum.ai DEMOGRAPHIC INTELLIGENCE API  ·  DOCKER DEPLOYMENT  ·  US-WIDE COVERAGE
Published · arXiv:2504.21259

Demographic estimation
that outperforms BISG

aequumAI's LSTM+Geo ensemble reduces minority misclassification by over 20% versus the industry standard — deployed as a Docker container on your own infrastructure.

arXiv 2504.21259  ·  2025 LSTM+Geo with XGBoost Filtering: A Novel Approach for Race and Ethnicity Imputation with Reduced Bias
89.4%
Model accuracy
(vs 82.9% BISG)
17.8%
White false positive rate
(vs 24.6% BISG)
6
Race categories
incl. Native American
US
Nationwide coverage
Docker / on-premise
no data egress
Aggregate only
no individual prediction

The gap your examiner will ask about

BISG produces point estimates with no uncertainty measure. When your fair lending examiner asks how confident you are in your disparate impact analysis, BISG has no answer. We give you the full distribution — with confidence intervals at the tract level.

The White false positive rate is the specific failure mode regulators care most about: misclassifying non-White individuals as White causes banks to systematically undercount disparate impact on minority borrowers.

Model Accuracy White FPR
BISG (industry standard)
82.9%
BIFSG
86.8%
Standalone LSTM
86.4%
24.6%
LSTM+Geo
88.7%
19.3%
LSTM+Geo + XGBoost aequumAI
89.4%
17.8%

Source: arXiv:2504.21259 · Leitch, Chalavadi, Pastor (2025)

Enterprise-ready.
On your infrastructure.

A containerized demographic estimation API that produces tract and portfolio-level racial composition estimates with confidence intervals from name and address data — deployable on your AWS, Azure, or on-premise environment. No data leaves your perimeter.

Docker Container Deployment

Self-contained image runs on your infrastructure. IT security review is a deployment decision, not a vendor access discussion.

Uncertainty Quantification

Every estimate includes confidence intervals and error bounds — not just point estimates. Defensible under regulatory scrutiny.

Spatial Error Correlation

Errors within a geography are modeled as correlated, not independent. Tail events — systematic misclassification — are properly estimated.

6 Race Categories

White, Black, Hispanic, Asian, Other, and Native American — matching the full HMDA race category set including American Indian / Alaska Native as a distinct category.

Aggregate Only

Population-level estimates at the tract, ZCTA, or portfolio level. No individual-level prediction. Ethically and legally clean for compliance use.

US-Wide Coverage

Name and address inputs work nationwide. Validated against NY State homeowner records; applicable to any US geography.

Deploy in your environment

The aequumAI API runs as a Docker container on your own AWS, Azure, or on-premise server. Your client data never leaves your infrastructure — which is exactly what your IT and legal teams need.

Docker AWS EC2 / ECS Azure ACI On-Premise No Data Egress
Request Access

One engine.
Multiple compliance markets.

The same demographic estimation infrastructure applies wherever population-level race and ethnicity analysis is required for compliance, equity reporting, or disparity detection.

01

Fair Lending & HMDA Compliance

Disparate impact analysis requires knowing the demographic baseline. HMDA self-reported race is incomplete for a significant portion of applications. BISG gives a point estimate; we give the distribution with confidence intervals that survive regulatory scrutiny from CFPB, OCC, and state examiners.

Available Now
02

Health Equity & CMS Reporting

CMS now requires hospitals and ACOs to report on health equity as part of value-based care programs. Population-level demographic estimation for patient populations where self-reported race is incomplete or unreliable.

Available Now
03

Pharmaceutical Dispensing Analysis

Disparities in prescription drug dispensing patterns analyzed by race and ethnicity at the population level. Supports PBM compliance, health system equity audits, and emerging CMS dispensing equity requirements.

Available Now
04

Voter Demographics & Redistricting

Two-step pipeline: race prediction scores as features for party preference modeling, combined with income and demographic factors. Supports redistricting analysis, voting rights litigation, and demographic estimation for geographies where self-reported race on voter registration is incomplete.

Coming Soon

Financial infrastructure for equitable homeownership

EQUITYchain is a rule-based financial infrastructure that changes how homeownership equity is built, recorded, and protected — combining smart contracts, tokenized ownership, and auditable compliance logs to eliminate the discretion through which housing inequity is reproduced.

Designed for the communities where four decades of steering, appraisal bias, foreclosure profiteering, and wealth extraction have compounded into the racial wealth gap visible across New York State and beyond.

Trademark published May 5, 2026 in the USPTO registry — Serial No. 99486925. Patent pending. Pilot in development.

Architecture

Vesting-First Equity

Fixed token supply established at origination. Tokens released from a controlled vault per a predefined vesting schedule. No minting on each payment.

Risk Structure

First-Loss Guarantor

Funded reserve, unfunded declining guarantee, hybrid, or government entity variants. NYDHCR pilot in development under state housing agency authority.

Valuation

Event-Triggered Oracle

DON valuation invoked only at defined economic events — exit, default, refinance, transfer. Dispute window with automatic transaction pause.

Oversight

Real-Time Regulator Visibility

Permissioned compliance dashboard for NYDFS and state regulators. Replaces lagged periodic reporting with continuous supervision.

Built on four years of SSA research

aequumAI was founded by the team that developed race-modeling methodology at the Social Security Administration, producing multiple peer-reviewed research papers on demographic estimation. That research foundation — combined with comprehensive New York State real estate, voter, and property records — became the basis for our commercial API.

Our published model (arXiv:2504.21259) demonstrates that our LSTM+Geo + XGBoost ensemble outperforms BISG on every measured dimension. The error simulation methodology is unpublished, patent-pending, and extends the published work significantly.

Published LSTM+Geo Model

Neural network combining name-based surname probability with geolocation. Outperforms BISG and BIFSG on accuracy and White false positive rate. Extended to 6 race categories including Native American.

Spatially Correlated Error Simulation

Errors within a geography are modeled as correlated, not independent. Produces fat-tailed uncertainty distributions that correctly estimate the probability of systematic misclassification across a population segment.

Ground Truth Reliability Adjustment

In certain geographies, self-reported race is systematically corrupted by non-response bias — as documented in PPP Hispanic underreporting in AZ, NV, and UT. Our model adjusts for ground truth corruption rather than treating divergence as model error.

Enterprise Docker Deployment

Containerized API deployable on client infrastructure. No data egress. Annual license with methodology updates.

Home Ownership by Race in New York State

Using NY State and City tax records — name, address, and assessed value — we estimated racial homeownership composition across census tracts and compared predicted ownership percentages against Census-reported demographics.

White & Hispanic Ownership — Income Analysis
  • White individuals only: ~85.30% ownership. Overrepresented in most low-income tracts, near parity in higher-income areas.
  • With corporate included: ~77.60%. Overrepresentation decreases but remains.
  • Hispanic individuals only: ~5.29%. Underrepresented across the income spectrum.
  • With corporate included: ~4.81%. Underrepresentation deepens in lower-income areas.
White & African American Ownership — Racial Analysis
  • White individuals only: ~85.29% total. Generally overrepresented.
  • Including corporate: Drops to ~77.55%, still above parity in many tracts.
  • Black / African American individuals only: ~2.58% total. Mostly underrepresented.
  • With corporate: Further decreases to ~2.35%, deepening the gap.
NY State Case Overview — Hispanic
Tool Demo — Hispanic
NY State Case Overview — African American
Tool Demo — African American
White vs Black ownership chart 1
White vs Black ownership chart 2

The people behind the model

Founder

Terry Leitch

Former SSA race-modeling lead. Founded aequumAI to bring federal demographic estimation methodology to commercial compliance markets. HBS Foundry 2026 cohort. Consulting partner at Anthropic.

Chief Technology Officer

Andrei Pastor

Machine learning and neural network architecture. Leads integration of large datasets including NY real estate, voter, FEMA, and Census records. Co-author, arXiv:2504.21259.

Data Scientist

Sanjana Chalavadi

Machine learning and AI modeling. Co-author, arXiv:2504.21259. Specialist in demographic estimation methodology and model validation.

System Architect

Mark Frutig

Infrastructure architecture and system design. Responsible for Docker deployment pipeline, API architecture, and enterprise integration.

Ready to talk?

Whether you're a compliance team evaluating BISG alternatives, a health system building equity reporting, or a researcher — we'd like to hear from you.

General Inquiries

info@aequum.ai

Published Research

arXiv:2504.21259

LSTM+Geo with XGBoost Filtering

Location

East Aurora, NY
Buffalo Metro Area