- Python 100%
| docs | ||
| src/admin_analytics | ||
| tests | ||
| .gitignore | ||
| administrative_analytics_scope_v0.1.md | ||
| CLAUDE.md | ||
| phase1_plan.md | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
Admin Analytics
University of Delaware administrative cost benchmarking using public data (IRS 990, IPEDS, BLS CPI-U). Ingests data into a local DuckDB database and serves an interactive Dash dashboard for analysis.
Prerequisites
- Python 3.11+
- uv package manager
- Playwright browsers (only needed for the
scrapecommand)
Setup
# Clone and install
git clone <repo-url>
cd AdminAnalytics
uv sync
# Install Playwright browsers (optional, only for scraping)
uv run playwright install chromium
Ingesting Data
Load data from public sources into the local DuckDB database (data/admin_analytics.duckdb).
# Ingest everything (IPEDS + IRS 990 + CPI + scraper)
uv run admin-analytics ingest all
# Or ingest individual sources
uv run admin-analytics ingest ipeds --year-range 2005-2024
uv run admin-analytics ingest irs990 --year-range 2019-2025
uv run admin-analytics ingest cpi
uv run admin-analytics ingest scrape
Use --force on any command to re-download files that already exist locally.
Downloaded files are stored in data/raw/ (gitignored).
Launching the Dashboard
uv run admin-analytics dashboard
Opens at http://localhost:8050. Use --port to change the port, or --host 0.0.0.0 for network access (e.g. over Tailscale).
The dashboard must be restarted to pick up newly ingested data (DuckDB opens in read-only mode to avoid lock conflicts).
The dashboard has four tabs:
- Executive Compensation -- top earners from IRS 990 Schedule J, compensation trends by role, compensation breakdown by component, growth vs CPI-U (2017-2023)
- Admin Cost Overview -- admin cost ratios, expense breakdown by function, cost per student, admin-to-faculty ratio (IPEDS data, 2005-2024)
- Staffing & Enrollment -- staff composition, student-to-staff ratios, management vs faculty vs enrollment growth (indexed)
- Current Headcount -- scraped UD staff directory data with overhead/non-overhead classification by unit
Validating Data
Check row counts, NULL rates, year coverage, and cross-source consistency:
uv run admin-analytics validate
Running Tests
uv sync --group dev
uv run pytest
Project Structure
src/admin_analytics/
cli.py # CLI entry point (typer)
config.py # Constants (UD identifiers, URLs, paths)
db/ # DuckDB schema and connection
ipeds/ # IPEDS download, parsing, loading
irs990/ # IRS 990 XML download, parsing, title normalization
bls/ # BLS CPI-U download and loading
scraper/ # UD staff directory scraper and classifier
dashboard/ # Dash app, queries, page layouts
validation.py # Data validation queries
data/raw/ # Downloaded files (gitignored)
docs/data_dictionary.md # Schema documentation
tests/ # pytest test suite
Data Sources
| Source | What it provides | Years |
|---|---|---|
| IPEDS | Institutional directory, expenses by function, staffing, enrollment | 2005-2024 |
| IRS 990 e-file | UD Foundation filings, executive compensation (Schedule J) | 2019-2025 index years (tax years 2017-2023) |
| BLS CPI-U | Consumer Price Index for inflation adjustment | Full history |
| UD staff directories | Admin office headcounts and overhead classification | Current snapshot |