# Admin Analytics University of Delaware administrative cost benchmarking using public data (IRS 990, IPEDS, BLS CPI-U). Ingests data into a local DuckDB database and serves an interactive Dash dashboard for analysis. ## Prerequisites - Python 3.11+ - [uv](https://docs.astral.sh/uv/) package manager - Playwright browsers (only needed for the `scrape` command) ## Setup ```bash # Clone and install git clone cd AdminAnalytics uv sync # Install Playwright browsers (optional, only for scraping) uv run playwright install chromium ``` ## Ingesting Data Load data from public sources into the local DuckDB database (`data/admin_analytics.duckdb`). ```bash # Ingest everything (IPEDS + IRS 990 + CPI + scraper) uv run admin-analytics ingest all # Or ingest individual sources uv run admin-analytics ingest ipeds --year-range 2005-2024 uv run admin-analytics ingest irs990 --year-range 2019-2025 uv run admin-analytics ingest cpi uv run admin-analytics ingest scrape ``` Use `--force` on any command to re-download files that already exist locally. Downloaded files are stored in `data/raw/` (gitignored). ## Launching the Dashboard ```bash uv run admin-analytics dashboard ``` Opens at [http://localhost:8050](http://localhost:8050). Use `--port` to change the port, or `--host 0.0.0.0` for network access (e.g. over Tailscale). The dashboard must be restarted to pick up newly ingested data (DuckDB opens in read-only mode to avoid lock conflicts). The dashboard has four tabs: - **Executive Compensation** -- top earners from IRS 990 Schedule J, compensation trends by role, compensation breakdown by component, growth vs CPI-U (2017-2023) - **Admin Cost Overview** -- admin cost ratios, expense breakdown by function, cost per student, admin-to-faculty ratio (IPEDS data, 2005-2024) - **Staffing & Enrollment** -- staff composition, student-to-staff ratios, management vs faculty vs enrollment growth (indexed) - **Current Headcount** -- scraped UD staff directory data with overhead/non-overhead classification by unit ## Validating Data Check row counts, NULL rates, year coverage, and cross-source consistency: ```bash uv run admin-analytics validate ``` ## Running Tests ```bash uv sync --group dev uv run pytest ``` ## Project Structure ``` src/admin_analytics/ cli.py # CLI entry point (typer) config.py # Constants (UD identifiers, URLs, paths) db/ # DuckDB schema and connection ipeds/ # IPEDS download, parsing, loading irs990/ # IRS 990 XML download, parsing, title normalization bls/ # BLS CPI-U download and loading scraper/ # UD staff directory scraper and classifier dashboard/ # Dash app, queries, page layouts validation.py # Data validation queries data/raw/ # Downloaded files (gitignored) docs/data_dictionary.md # Schema documentation tests/ # pytest test suite ``` ## Data Sources | Source | What it provides | Years | |--------|-----------------|-------| | [IPEDS](https://nces.ed.gov/ipeds/) | Institutional directory, expenses by function, staffing, enrollment | 2005-2024 | | [IRS 990 e-file](https://www.irs.gov/charities-non-profits/form-990-series-downloads) | UD Foundation filings, executive compensation (Schedule J) | 2019-2025 index years (tax years 2017-2023) | | [BLS CPI-U](https://www.bls.gov/cpi/) | Consumer Price Index for inflation adjustment | Full history | | UD staff directories | Admin office headcounts and overhead classification | Current snapshot |