- Project scope document (v0.1): objectives, data sources, key metrics, phases - Phase 1 implementation plan: IPEDS, IRS 990, BLS CPI-U acquisition for UD - CLAUDE.md: project context and conventions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.6 KiB
Administrative Analytics — Project Scope
Version: 0.1 | Status: Draft | Date: March 2026
Problem statement
University administrative costs have grown significantly over the past two decades, yet institutions lack easy tools to benchmark their administrative spending against peer institutions or correlate those costs with performance outcomes. This project aims to close that gap using publicly available data.
Objectives
Build a data pipeline and analytics dashboard that aggregates public data on university administrative costs, benchmarks institutions against peers, and surfaces correlations between administrative spend and key performance indicators such as fundraising revenue.
First iteration scope: Data acquisition and analysis will focus exclusively on the University of Delaware. Peer institution comparisons (AAU members, Carnegie peers, etc.) will be added in a later iteration.
Data sources
Primary
- IRS Form 990 — private universities and non-profits
- IPEDS (Integrated Postsecondary Education Data System) — all institutions, including public universities
- NACUBO endowment study reports
Secondary
- BLS CPI-U data — Consumer Price Index for All Urban Consumers, for inflation-adjusted compensation analysis
- University philanthropy and fundraising reports (public fact books)
- Chronicle of Higher Education data
- Public institutional fact books
Stretch
- Institutional administrative office web pages — scrape Provost, President, VP unit, and college administration pages for staff directories / headcount tracking
Key metrics
Cost metrics
- Admin cost per student
- Admin-to-faculty ratio
- Administrative spending as % of total expenses
Compensation metrics
- Key employee salaries from IRS 990 Schedule J (President, Provost, VPs, Deans, etc.)
- Year-over-year compensation growth per position
- Compensation growth vs. CPI-U (Bureau of Labor Statistics)
Performance metrics
- Philanthropic revenue raised
- Endowment growth year-over-year
- Grant funding secured
Benchmarking (later iteration)
- Peer institution comparisons by Carnegie classification, size, and public/private status
- AAU institution comparisons
Trends
- Year-over-year cost and performance trajectories (5–10 year views)
Phases
Phase 1 — Data acquisition
Build parsers for IRS 990 filings (including Schedule J key employee compensation) and IPEDS data for the University of Delaware only. Ingest BLS CPI-U series for inflation benchmarking. Establish a raw data store. Stretch: prototype scraper for UD administrative office web pages to track headcount.
Phase 2 — Data pipeline & normalization
Clean, normalize, and reconcile data across sources. Define a unified schema. Build institution-matching logic to link records across datasets using IPEDS Unit ID as the canonical identifier.
Phase 3 — Internal analytics dashboard
Build an internal tool for our institution to explore cost and performance data. Validate findings with stakeholders.
Phase 4 — Multi-institution expansion
Extend data acquisition to peer institutions (AAU members, Carnegie peers, etc.). Add benchmarking comparisons, configurable peer groups, and export features.
Technical approach
Data collection
- Python scraping with Scrapy / BeautifulSoup for supplementary sources
- IRS 990 XML bulk data parser (IRS provides annual bulk downloads)
- IPEDS bulk data files (CSV downloads, no scraping required)
Data storage & transformation
- PostgreSQL or DuckDB as primary data store
- dbt for data transformation and modeling
Frontend & API
- React with a charting library (e.g., Recharts or Observable Plot)
- REST or GraphQL API layer
Risks & considerations
| Severity | Risk | Notes |
|---|---|---|
| Medium | Data completeness | Public universities do not file 990s. IPEDS is the primary fallback and provides expense breakdowns by function. |
| Medium | Institution matching | Names vary across datasets. Use IPEDS Unit ID as the canonical identifier from the start. |
| Low | Rate limiting | IRS and IPEDS data are available as bulk downloads; scraping is mostly not required for core datasets. |
| Low | Data licensing | All target sources are public domain or open government data. No licensing barriers anticipated. |
Out of scope (v1)
- Internal financial systems integration
- Real-time data feeds
- Accreditation or ranking data
- Faculty / HR compensation analysis (key executive compensation from 990s is in scope)
Generated by Claude · Administrative Analytics Project