AdminAnalytics/administrative_analytics_scope_v0.1.md
Eric f037c50736 Initial project planning docs for UD administrative analytics
- Project scope document (v0.1): objectives, data sources, key metrics, phases
- Phase 1 implementation plan: IPEDS, IRS 990, BLS CPI-U acquisition for UD
- CLAUDE.md: project context and conventions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:28:30 -04:00

4.6 KiB
Raw Blame History

Administrative Analytics — Project Scope

Version: 0.1 | Status: Draft | Date: March 2026


Problem statement

University administrative costs have grown significantly over the past two decades, yet institutions lack easy tools to benchmark their administrative spending against peer institutions or correlate those costs with performance outcomes. This project aims to close that gap using publicly available data.


Objectives

Build a data pipeline and analytics dashboard that aggregates public data on university administrative costs, benchmarks institutions against peers, and surfaces correlations between administrative spend and key performance indicators such as fundraising revenue.

First iteration scope: Data acquisition and analysis will focus exclusively on the University of Delaware. Peer institution comparisons (AAU members, Carnegie peers, etc.) will be added in a later iteration.


Data sources

Primary

  • IRS Form 990 — private universities and non-profits
  • IPEDS (Integrated Postsecondary Education Data System) — all institutions, including public universities
  • NACUBO endowment study reports

Secondary

  • BLS CPI-U data — Consumer Price Index for All Urban Consumers, for inflation-adjusted compensation analysis
  • University philanthropy and fundraising reports (public fact books)
  • Chronicle of Higher Education data
  • Public institutional fact books

Stretch

  • Institutional administrative office web pages — scrape Provost, President, VP unit, and college administration pages for staff directories / headcount tracking

Key metrics

Cost metrics

  • Admin cost per student
  • Admin-to-faculty ratio
  • Administrative spending as % of total expenses

Compensation metrics

  • Key employee salaries from IRS 990 Schedule J (President, Provost, VPs, Deans, etc.)
  • Year-over-year compensation growth per position
  • Compensation growth vs. CPI-U (Bureau of Labor Statistics)

Performance metrics

  • Philanthropic revenue raised
  • Endowment growth year-over-year
  • Grant funding secured

Benchmarking (later iteration)

  • Peer institution comparisons by Carnegie classification, size, and public/private status
  • AAU institution comparisons
  • Year-over-year cost and performance trajectories (510 year views)

Phases

Phase 1 — Data acquisition

Build parsers for IRS 990 filings (including Schedule J key employee compensation) and IPEDS data for the University of Delaware only. Ingest BLS CPI-U series for inflation benchmarking. Establish a raw data store. Stretch: prototype scraper for UD administrative office web pages to track headcount.

Phase 2 — Data pipeline & normalization

Clean, normalize, and reconcile data across sources. Define a unified schema. Build institution-matching logic to link records across datasets using IPEDS Unit ID as the canonical identifier.

Phase 3 — Internal analytics dashboard

Build an internal tool for our institution to explore cost and performance data. Validate findings with stakeholders.

Phase 4 — Multi-institution expansion

Extend data acquisition to peer institutions (AAU members, Carnegie peers, etc.). Add benchmarking comparisons, configurable peer groups, and export features.


Technical approach

Data collection

  • Python scraping with Scrapy / BeautifulSoup for supplementary sources
  • IRS 990 XML bulk data parser (IRS provides annual bulk downloads)
  • IPEDS bulk data files (CSV downloads, no scraping required)

Data storage & transformation

  • PostgreSQL or DuckDB as primary data store
  • dbt for data transformation and modeling

Frontend & API

  • React with a charting library (e.g., Recharts or Observable Plot)
  • REST or GraphQL API layer

Risks & considerations

Severity Risk Notes
Medium Data completeness Public universities do not file 990s. IPEDS is the primary fallback and provides expense breakdowns by function.
Medium Institution matching Names vary across datasets. Use IPEDS Unit ID as the canonical identifier from the start.
Low Rate limiting IRS and IPEDS data are available as bulk downloads; scraping is mostly not required for core datasets.
Low Data licensing All target sources are public domain or open government data. No licensing barriers anticipated.

Out of scope (v1)

  • Internal financial systems integration
  • Real-time data feeds
  • Accreditation or ranking data
  • Faculty / HR compensation analysis (key executive compensation from 990s is in scope)

Generated by Claude · Administrative Analytics Project