# Administrative Analytics — Project Scope **Version:** 0.1 | **Status:** Draft | **Date:** March 2026 --- ## Problem statement University administrative costs have grown significantly over the past two decades, yet institutions lack easy tools to benchmark their administrative spending against peer institutions or correlate those costs with performance outcomes. This project aims to close that gap using publicly available data. --- ## Objectives Build a data pipeline and analytics dashboard that aggregates public data on university administrative costs, benchmarks institutions against peers, and surfaces correlations between administrative spend and key performance indicators such as fundraising revenue. **First iteration scope:** Data acquisition and analysis will focus exclusively on the **University of Delaware**. Peer institution comparisons (AAU members, Carnegie peers, etc.) will be added in a later iteration. --- ## Data sources ### Primary - **IRS Form 990** — private universities and non-profits - **IPEDS** (Integrated Postsecondary Education Data System) — all institutions, including public universities - **NACUBO endowment study reports** ### Secondary - **BLS CPI-U data** — Consumer Price Index for All Urban Consumers, for inflation-adjusted compensation analysis - University philanthropy and fundraising reports (public fact books) - Chronicle of Higher Education data - Public institutional fact books ### Stretch - **Institutional administrative office web pages** — scrape Provost, President, VP unit, and college administration pages for staff directories / headcount tracking --- ## Key metrics ### Cost metrics - Admin cost per student - Admin-to-faculty ratio - Administrative spending as % of total expenses ### Compensation metrics - Key employee salaries from IRS 990 Schedule J (President, Provost, VPs, Deans, etc.) - Year-over-year compensation growth per position - Compensation growth vs. CPI-U (Bureau of Labor Statistics) ### Performance metrics - Philanthropic revenue raised - Endowment growth year-over-year - Grant funding secured ### Benchmarking (later iteration) - Peer institution comparisons by Carnegie classification, size, and public/private status - AAU institution comparisons ### Trends - Year-over-year cost and performance trajectories (5–10 year views) --- ## Phases ### Phase 1 — Data acquisition Build parsers for IRS 990 filings (including Schedule J key employee compensation) and IPEDS data for the **University of Delaware** only. Ingest BLS CPI-U series for inflation benchmarking. Establish a raw data store. **Stretch:** prototype scraper for UD administrative office web pages to track headcount. ### Phase 2 — Data pipeline & normalization Clean, normalize, and reconcile data across sources. Define a unified schema. Build institution-matching logic to link records across datasets using IPEDS Unit ID as the canonical identifier. ### Phase 3 — Internal analytics dashboard Build an internal tool for our institution to explore cost and performance data. Validate findings with stakeholders. ### Phase 4 — Multi-institution expansion Extend data acquisition to peer institutions (AAU members, Carnegie peers, etc.). Add benchmarking comparisons, configurable peer groups, and export features. --- ## Technical approach ### Data collection - Python scraping with Scrapy / BeautifulSoup for supplementary sources - IRS 990 XML bulk data parser (IRS provides annual bulk downloads) - IPEDS bulk data files (CSV downloads, no scraping required) ### Data storage & transformation - PostgreSQL or DuckDB as primary data store - dbt for data transformation and modeling ### Frontend & API - React with a charting library (e.g., Recharts or Observable Plot) - REST or GraphQL API layer --- ## Risks & considerations | Severity | Risk | Notes | |----------|------|-------| | Medium | Data completeness | Public universities do not file 990s. IPEDS is the primary fallback and provides expense breakdowns by function. | | Medium | Institution matching | Names vary across datasets. Use IPEDS Unit ID as the canonical identifier from the start. | | Low | Rate limiting | IRS and IPEDS data are available as bulk downloads; scraping is mostly not required for core datasets. | | Low | Data licensing | All target sources are public domain or open government data. No licensing barriers anticipated. | --- ## Out of scope (v1) - Internal financial systems integration - Real-time data feeds - Accreditation or ranking data - Faculty / HR compensation analysis (key executive compensation from 990s *is* in scope) --- *Generated by Claude · Administrative Analytics Project*