Phase 1 project prototype
This commit is contained in:
parent
29215e2bd2
commit
2c9ae1c312
29 changed files with 2967 additions and 22 deletions
97
README.md
Normal file
97
README.md
Normal file
|
|
@ -0,0 +1,97 @@
|
|||
# Admin Analytics
|
||||
|
||||
University of Delaware administrative cost benchmarking using public data (IRS 990, IPEDS, BLS CPI-U). Ingests data into a local DuckDB database and serves an interactive Dash dashboard for analysis.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.11+
|
||||
- [uv](https://docs.astral.sh/uv/) package manager
|
||||
- Playwright browsers (only needed for the `scrape` command)
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
# Clone and install
|
||||
git clone <repo-url>
|
||||
cd AdminAnalytics
|
||||
uv sync
|
||||
|
||||
# Install Playwright browsers (optional, only for scraping)
|
||||
uv run playwright install chromium
|
||||
```
|
||||
|
||||
## Ingesting Data
|
||||
|
||||
Load data from public sources into the local DuckDB database (`data/admin_analytics.duckdb`).
|
||||
|
||||
```bash
|
||||
# Ingest everything (IPEDS + IRS 990 + CPI + scraper)
|
||||
uv run admin-analytics ingest all
|
||||
|
||||
# Or ingest individual sources
|
||||
uv run admin-analytics ingest ipeds --year-range 2005-2024
|
||||
uv run admin-analytics ingest irs990 --year-range 2019-2024
|
||||
uv run admin-analytics ingest cpi
|
||||
uv run admin-analytics ingest scrape
|
||||
```
|
||||
|
||||
Use `--force` on any command to re-download files that already exist locally.
|
||||
|
||||
Downloaded files are stored in `data/raw/` (gitignored).
|
||||
|
||||
## Launching the Dashboard
|
||||
|
||||
```bash
|
||||
uv run admin-analytics dashboard
|
||||
```
|
||||
|
||||
Opens at [http://localhost:8050](http://localhost:8050). Use `--port` to change the port.
|
||||
|
||||
The dashboard has four tabs:
|
||||
|
||||
- **Admin Cost Overview** -- admin cost ratios, expense breakdown by function, cost per student, admin-to-faculty ratio (IPEDS data, 2005-2024)
|
||||
- **Executive Compensation** -- top earners from IRS 990 Schedule J, compensation trends by role, growth vs CPI-U (2017-2023)
|
||||
- **Staffing & Enrollment** -- staff composition, student-to-staff ratios, management growth vs enrollment growth
|
||||
- **Current Headcount** -- scraped UD staff directory data with overhead classification
|
||||
|
||||
## Validating Data
|
||||
|
||||
Check row counts, NULL rates, year coverage, and cross-source consistency:
|
||||
|
||||
```bash
|
||||
uv run admin-analytics validate
|
||||
```
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
uv sync --group dev
|
||||
uv run pytest
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
src/admin_analytics/
|
||||
cli.py # CLI entry point (typer)
|
||||
config.py # Constants (UD identifiers, URLs, paths)
|
||||
db/ # DuckDB schema and connection
|
||||
ipeds/ # IPEDS download, parsing, loading
|
||||
irs990/ # IRS 990 XML download, parsing, title normalization
|
||||
bls/ # BLS CPI-U download and loading
|
||||
scraper/ # UD staff directory scraper and classifier
|
||||
dashboard/ # Dash app, queries, page layouts
|
||||
validation.py # Data validation queries
|
||||
data/raw/ # Downloaded files (gitignored)
|
||||
docs/data_dictionary.md # Schema documentation
|
||||
tests/ # pytest test suite
|
||||
```
|
||||
|
||||
## Data Sources
|
||||
|
||||
| Source | What it provides | Years |
|
||||
|--------|-----------------|-------|
|
||||
| [IPEDS](https://nces.ed.gov/ipeds/) | Institutional directory, expenses by function, staffing, enrollment | 2005-2024 |
|
||||
| [IRS 990 e-file](https://www.irs.gov/charities-non-profits/form-990-series-downloads) | UD Foundation filings, executive compensation (Schedule J) | 2017-2023 |
|
||||
| [BLS CPI-U](https://www.bls.gov/cpi/) | Consumer Price Index for inflation adjustment | Full history |
|
||||
| UD staff directories | Admin office headcounts and overhead classification | Current snapshot |
|
||||
Loading…
Add table
Add a link
Reference in a new issue