AdminAnalytics/README.md
2026-03-30 20:42:08 -04:00

3.5 KiB

Admin Analytics

University of Delaware administrative cost benchmarking using public data (IRS 990, IPEDS, BLS CPI-U). Ingests data into a local DuckDB database and serves an interactive Dash dashboard for analysis.

Prerequisites

  • Python 3.11+
  • uv package manager
  • Playwright browsers (only needed for the scrape command)

Setup

# Clone and install
git clone <repo-url>
cd AdminAnalytics
uv sync

# Install Playwright browsers (optional, only for scraping)
uv run playwright install chromium

Ingesting Data

Load data from public sources into the local DuckDB database (data/admin_analytics.duckdb).

# Ingest everything (IPEDS + IRS 990 + CPI + scraper)
uv run admin-analytics ingest all

# Or ingest individual sources
uv run admin-analytics ingest ipeds --year-range 2005-2024
uv run admin-analytics ingest irs990 --year-range 2019-2025
uv run admin-analytics ingest cpi
uv run admin-analytics ingest scrape

Use --force on any command to re-download files that already exist locally.

Downloaded files are stored in data/raw/ (gitignored).

Launching the Dashboard

uv run admin-analytics dashboard

Opens at http://localhost:8050. Use --port to change the port, or --host 0.0.0.0 for network access (e.g. over Tailscale).

The dashboard must be restarted to pick up newly ingested data (DuckDB opens in read-only mode to avoid lock conflicts).

The dashboard has four tabs:

  • Executive Compensation -- top earners from IRS 990 Schedule J, compensation trends by role, compensation breakdown by component, growth vs CPI-U (2017-2023)
  • Admin Cost Overview -- admin cost ratios, expense breakdown by function, cost per student, admin-to-faculty ratio (IPEDS data, 2005-2024)
  • Staffing & Enrollment -- staff composition, student-to-staff ratios, management vs faculty vs enrollment growth (indexed)
  • Current Headcount -- scraped UD staff directory data with overhead/non-overhead classification by unit

Validating Data

Check row counts, NULL rates, year coverage, and cross-source consistency:

uv run admin-analytics validate

Running Tests

uv sync --group dev
uv run pytest

Project Structure

src/admin_analytics/
    cli.py              # CLI entry point (typer)
    config.py           # Constants (UD identifiers, URLs, paths)
    db/                 # DuckDB schema and connection
    ipeds/              # IPEDS download, parsing, loading
    irs990/             # IRS 990 XML download, parsing, title normalization
    bls/                # BLS CPI-U download and loading
    scraper/            # UD staff directory scraper and classifier
    dashboard/          # Dash app, queries, page layouts
    validation.py       # Data validation queries
data/raw/               # Downloaded files (gitignored)
docs/data_dictionary.md # Schema documentation
tests/                  # pytest test suite

Data Sources

Source What it provides Years
IPEDS Institutional directory, expenses by function, staffing, enrollment 2005-2024
IRS 990 e-file UD Foundation filings, executive compensation (Schedule J) 2019-2025 index years (tax years 2017-2023)
BLS CPI-U Consumer Price Index for inflation adjustment Full history
UD staff directories Admin office headcounts and overhead classification Current snapshot