Initial project planning docs for UD administrative analytics

- Project scope document (v0.1): objectives, data sources, key metrics, phases
- Phase 1 implementation plan: IPEDS, IRS 990, BLS CPI-U acquisition for UD
- CLAUDE.md: project context and conventions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Eric 2026-03-29 18:28:30 -04:00
commit f037c50736
3 changed files with 390 additions and 0 deletions

View file

@ -0,0 +1,118 @@
# Administrative Analytics — Project Scope
**Version:** 0.1 | **Status:** Draft | **Date:** March 2026
---
## Problem statement
University administrative costs have grown significantly over the past two decades, yet institutions lack easy tools to benchmark their administrative spending against peer institutions or correlate those costs with performance outcomes. This project aims to close that gap using publicly available data.
---
## Objectives
Build a data pipeline and analytics dashboard that aggregates public data on university administrative costs, benchmarks institutions against peers, and surfaces correlations between administrative spend and key performance indicators such as fundraising revenue.
**First iteration scope:** Data acquisition and analysis will focus exclusively on the **University of Delaware**. Peer institution comparisons (AAU members, Carnegie peers, etc.) will be added in a later iteration.
---
## Data sources
### Primary
- **IRS Form 990** — private universities and non-profits
- **IPEDS** (Integrated Postsecondary Education Data System) — all institutions, including public universities
- **NACUBO endowment study reports**
### Secondary
- **BLS CPI-U data** — Consumer Price Index for All Urban Consumers, for inflation-adjusted compensation analysis
- University philanthropy and fundraising reports (public fact books)
- Chronicle of Higher Education data
- Public institutional fact books
### Stretch
- **Institutional administrative office web pages** — scrape Provost, President, VP unit, and college administration pages for staff directories / headcount tracking
---
## Key metrics
### Cost metrics
- Admin cost per student
- Admin-to-faculty ratio
- Administrative spending as % of total expenses
### Compensation metrics
- Key employee salaries from IRS 990 Schedule J (President, Provost, VPs, Deans, etc.)
- Year-over-year compensation growth per position
- Compensation growth vs. CPI-U (Bureau of Labor Statistics)
### Performance metrics
- Philanthropic revenue raised
- Endowment growth year-over-year
- Grant funding secured
### Benchmarking (later iteration)
- Peer institution comparisons by Carnegie classification, size, and public/private status
- AAU institution comparisons
### Trends
- Year-over-year cost and performance trajectories (510 year views)
---
## Phases
### Phase 1 — Data acquisition
Build parsers for IRS 990 filings (including Schedule J key employee compensation) and IPEDS data for the **University of Delaware** only. Ingest BLS CPI-U series for inflation benchmarking. Establish a raw data store. **Stretch:** prototype scraper for UD administrative office web pages to track headcount.
### Phase 2 — Data pipeline & normalization
Clean, normalize, and reconcile data across sources. Define a unified schema. Build institution-matching logic to link records across datasets using IPEDS Unit ID as the canonical identifier.
### Phase 3 — Internal analytics dashboard
Build an internal tool for our institution to explore cost and performance data. Validate findings with stakeholders.
### Phase 4 — Multi-institution expansion
Extend data acquisition to peer institutions (AAU members, Carnegie peers, etc.). Add benchmarking comparisons, configurable peer groups, and export features.
---
## Technical approach
### Data collection
- Python scraping with Scrapy / BeautifulSoup for supplementary sources
- IRS 990 XML bulk data parser (IRS provides annual bulk downloads)
- IPEDS bulk data files (CSV downloads, no scraping required)
### Data storage & transformation
- PostgreSQL or DuckDB as primary data store
- dbt for data transformation and modeling
### Frontend & API
- React with a charting library (e.g., Recharts or Observable Plot)
- REST or GraphQL API layer
---
## Risks & considerations
| Severity | Risk | Notes |
|----------|------|-------|
| Medium | Data completeness | Public universities do not file 990s. IPEDS is the primary fallback and provides expense breakdowns by function. |
| Medium | Institution matching | Names vary across datasets. Use IPEDS Unit ID as the canonical identifier from the start. |
| Low | Rate limiting | IRS and IPEDS data are available as bulk downloads; scraping is mostly not required for core datasets. |
| Low | Data licensing | All target sources are public domain or open government data. No licensing barriers anticipated. |
---
## Out of scope (v1)
- Internal financial systems integration
- Real-time data feeds
- Accreditation or ranking data
- Faculty / HR compensation analysis (key executive compensation from 990s *is* in scope)
---
*Generated by Claude · Administrative Analytics Project*