122 lines
5.2 KiB
Markdown
122 lines
5.2 KiB
Markdown
# Admin Analytics
|
|
|
|
University of Delaware administrative cost benchmarking using public data (IRS 990, IPEDS, BLS CPI-U). Ingests data into a local DuckDB database and serves an interactive Dash dashboard for analysis.
|
|
|
|
## Scope
|
|
|
|
This project is currently scoped to the **University of Delaware** as a single institution. It tracks:
|
|
|
|
- **Executive compensation** from IRS 990 Schedule J filings by the University of Delaware (EIN 516000297) and UD Research Foundation (EIN 516017306)
|
|
- **Administrative cost ratios** from IPEDS finance surveys (expenses by function, staffing levels, enrollment)
|
|
- **Endowment performance** and **philanthropic giving** from IPEDS F2 (FASB) financial data
|
|
- **Administrative headcount** via web scraping, currently focused on the **College of Engineering line management** (COE Central, department offices) and the Provost's Office
|
|
|
|
### Changing the target institution
|
|
|
|
The institution scope is controlled by constants in `src/admin_analytics/config.py`:
|
|
|
|
- `UD_UNITID = 130943` -- IPEDS institution identifier. Change this to target a different institution. Look up UNITIDs at the [IPEDS Data Center](https://nces.ed.gov/ipeds/use-the-data).
|
|
- `UD_EINS = [516000297, 516017306]` -- IRS Employer Identification Numbers for 990 filings. Update these to the EINs of the target institution's nonprofit entities.
|
|
|
|
All IPEDS loaders accept a `unitid_filter` parameter. The scraper URLs in `src/admin_analytics/scraper/directory.py` are UD-specific and would need to be updated for a different institution.
|
|
|
|
Multi-institution comparisons (AAU peers, Carnegie peers) are planned for a future phase.
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.11+
|
|
- [uv](https://docs.astral.sh/uv/) package manager
|
|
- Playwright browsers (only needed for the `scrape` command)
|
|
|
|
## Setup
|
|
|
|
```bash
|
|
# Clone and install
|
|
git clone <repo-url>
|
|
cd AdminAnalytics
|
|
uv sync
|
|
|
|
# Install Playwright browsers (optional, only for scraping)
|
|
uv run playwright install chromium
|
|
```
|
|
|
|
## Ingesting Data
|
|
|
|
Load data from public sources into the local DuckDB database (`data/admin_analytics.duckdb`).
|
|
|
|
```bash
|
|
# Ingest everything (IPEDS + IRS 990 + CPI + scraper)
|
|
uv run admin-analytics ingest all
|
|
|
|
# Or ingest individual sources
|
|
uv run admin-analytics ingest ipeds --year-range 2005-2024
|
|
uv run admin-analytics ingest irs990 --year-range 2019-2025
|
|
uv run admin-analytics ingest cpi
|
|
uv run admin-analytics ingest scrape
|
|
```
|
|
|
|
Use `--force` on any command to re-download files that already exist locally.
|
|
|
|
Downloaded files are stored in `data/raw/` (gitignored).
|
|
|
|
## Launching the Dashboard
|
|
|
|
```bash
|
|
uv run admin-analytics dashboard
|
|
```
|
|
|
|
Opens at [http://localhost:8050](http://localhost:8050). Use `--port` to change the port, or `--host 0.0.0.0` for network access (e.g. over Tailscale).
|
|
|
|
The dashboard must be restarted to pick up newly ingested data (DuckDB opens in read-only mode to avoid lock conflicts).
|
|
|
|
The dashboard has seven tabs:
|
|
|
|
- **Executive Compensation** -- top earners from IRS 990 Schedule J, President and top-10 CAGR, trends by role, compensation breakdown by component, growth vs CPI-U (2015-2023)
|
|
- **Admin Cost Overview** -- admin cost ratios, expense breakdown by function, cost per student, admin-to-faculty ratio (IPEDS data, 2005-2024)
|
|
- **Staffing & Enrollment** -- staff composition, student-to-staff ratios, management vs faculty vs enrollment growth (indexed)
|
|
- **Endowment** -- endowment value trends, CAGR, investment return rate, CIO compensation vs endowment growth (IPEDS F2)
|
|
- **Philanthropy** -- total private gifts and grants, gift allocation, President and VP Development compensation growth vs fundraising (IPEDS F2 and IRS 990)
|
|
- **Current Headcount** -- scraped UD staff directory data with overhead/non-overhead classification by unit
|
|
- **About** -- data sources, methodology, and limitations
|
|
|
|
## Validating Data
|
|
|
|
Check row counts, NULL rates, year coverage, and cross-source consistency:
|
|
|
|
```bash
|
|
uv run admin-analytics validate
|
|
```
|
|
|
|
## Running Tests
|
|
|
|
```bash
|
|
uv sync --group dev
|
|
uv run pytest
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
src/admin_analytics/
|
|
cli.py # CLI entry point (typer)
|
|
config.py # Constants (UD identifiers, URLs, paths)
|
|
db/ # DuckDB schema and connection
|
|
ipeds/ # IPEDS download, parsing, loading
|
|
irs990/ # IRS 990 XML download, parsing, title normalization
|
|
bls/ # BLS CPI-U download and loading
|
|
scraper/ # UD staff directory scraper and classifier
|
|
dashboard/ # Dash app, queries, page layouts
|
|
validation.py # Data validation queries
|
|
data/raw/ # Downloaded files (gitignored)
|
|
docs/data_dictionary.md # Schema documentation
|
|
tests/ # pytest test suite
|
|
```
|
|
|
|
## Data Sources
|
|
|
|
| Source | What it provides | Years |
|
|
|--------|-----------------|-------|
|
|
| [IPEDS](https://nces.ed.gov/ipeds/) | Institutional directory, expenses by function, staffing, enrollment | 2005-2024 |
|
|
| [IRS 990 e-file](https://www.irs.gov/charities-non-profits/form-990-series-downloads) | UD Foundation filings, executive compensation (Schedule J) | 2019-2025 index years (tax years 2017-2023) |
|
|
| [BLS CPI-U](https://www.bls.gov/cpi/) | Consumer Price Index for inflation adjustment | Full history |
|
|
| UD staff directories | Admin office headcounts and overhead classification | Current snapshot |
|