Compensation, endowmnet tweaks. Added About.

This commit is contained in:
emfurst 2026-03-31 08:03:58 -04:00
commit 13fb4b8418
13 changed files with 914 additions and 17 deletions

View file

@ -2,6 +2,26 @@
University of Delaware administrative cost benchmarking using public data (IRS 990, IPEDS, BLS CPI-U). Ingests data into a local DuckDB database and serves an interactive Dash dashboard for analysis.
## Scope
This project is currently scoped to the **University of Delaware** as a single institution. It tracks:
- **Executive compensation** from IRS 990 Schedule J filings by the University of Delaware (EIN 516000297) and UD Research Foundation (EIN 516017306)
- **Administrative cost ratios** from IPEDS finance surveys (expenses by function, staffing levels, enrollment)
- **Endowment performance** and **philanthropic giving** from IPEDS F2 (FASB) financial data
- **Administrative headcount** via web scraping, currently focused on the **College of Engineering line management** (COE Central, department offices) and the Provost's Office
### Changing the target institution
The institution scope is controlled by constants in `src/admin_analytics/config.py`:
- `UD_UNITID = 130943` -- IPEDS institution identifier. Change this to target a different institution. Look up UNITIDs at the [IPEDS Data Center](https://nces.ed.gov/ipeds/use-the-data).
- `UD_EINS = [516000297, 516017306]` -- IRS Employer Identification Numbers for 990 filings. Update these to the EINs of the target institution's nonprofit entities.
All IPEDS loaders accept a `unitid_filter` parameter. The scraper URLs in `src/admin_analytics/scraper/directory.py` are UD-specific and would need to be updated for a different institution.
Multi-institution comparisons (AAU peers, Carnegie peers) are planned for a future phase.
## Prerequisites
- Python 3.11+
@ -49,12 +69,15 @@ Opens at [http://localhost:8050](http://localhost:8050). Use `--port` to change
The dashboard must be restarted to pick up newly ingested data (DuckDB opens in read-only mode to avoid lock conflicts).
The dashboard has four tabs:
The dashboard has seven tabs:
- **Executive Compensation** -- top earners from IRS 990 Schedule J, compensation trends by role, compensation breakdown by component, growth vs CPI-U (2017-2023)
- **Executive Compensation** -- top earners from IRS 990 Schedule J, President and top-10 CAGR, trends by role, compensation breakdown by component, growth vs CPI-U (2015-2023)
- **Admin Cost Overview** -- admin cost ratios, expense breakdown by function, cost per student, admin-to-faculty ratio (IPEDS data, 2005-2024)
- **Staffing & Enrollment** -- staff composition, student-to-staff ratios, management vs faculty vs enrollment growth (indexed)
- **Endowment** -- endowment value trends, CAGR, investment return rate, CIO compensation vs endowment growth (IPEDS F2)
- **Philanthropy** -- total private gifts and grants, gift allocation, President and VP Development compensation growth vs fundraising (IPEDS F2 and IRS 990)
- **Current Headcount** -- scraped UD staff directory data with overhead/non-overhead classification by unit
- **About** -- data sources, methodology, and limitations
## Validating Data

View file

@ -147,6 +147,26 @@ Raw data layer for University of Delaware administrative analytics. All tables a
| value | DOUBLE | CPI-U index value (base period: 1982-84 = 100) |
| series_id | VARCHAR | BLS series identifier (always CUUR0000SA0) |
### raw_ipeds_endowment
**Source:** IPEDS F2 (FASB) finance survey — endowment and investment sections
**Granularity:** One row per institution per year
**Primary Key:** (unitid, year)
**Note:** Endowment fields (F2H*) are available for all years 2005-2023.
| Column | Type | Description | Source field |
|--------|------|-------------|-------------|
| unitid | INTEGER | IPEDS institution identifier | UNITID |
| year | INTEGER | Fiscal year | derived from filename |
| endowment_boy | BIGINT | Endowment value, beginning of fiscal year | F2H01 |
| endowment_eoy | BIGINT | Endowment value, end of fiscal year | F2H02 |
| new_gifts | BIGINT | New gifts and additions to endowment | F2H03A |
| net_investment_return | BIGINT | Net investment return on endowment | F2H03B |
| other_changes | BIGINT | Other changes in endowment value | F2H03D |
| total_private_gifts | BIGINT | Total private gifts, grants, and contracts | F2D08 |
| total_investment_return | BIGINT | Total investment return (all funds) | F2D10 |
| long_term_investments | BIGINT | Long-term investments (balance sheet) | F2A01 |
### raw_admin_headcount
**Source:** Web scraping of UD staff directory pages
@ -170,4 +190,5 @@ Raw data layer for University of Delaware administrative analytics. All tables a
- **IRS 990 tables** are linked by `object_id` (filing) and `ein` (organization)
- **IPEDS → IRS 990:** The `ein` field in `raw_institution` links to `ein` in 990 tables. UD Foundation EINs: 516000297, 516017306
- **CPI-U** is used for inflation adjustment — join on `year` (and optionally `month`) to any table with a year column
- **Endowment** data comes from IPEDS F2 endowment section; 990 `total_assets` provides a cross-check
- **Admin headcount** links to IPEDS via institutional context (UD only in first iteration)

View file

@ -39,7 +39,7 @@ def ipeds(
from admin_analytics.ipeds.download import download_all
from admin_analytics.ipeds.institution import load_institutions
from admin_analytics.ipeds.finance import load_finance
from admin_analytics.ipeds.finance import load_finance, load_endowment
from admin_analytics.ipeds.staff import load_staff
from admin_analytics.ipeds.enrollment import load_enrollment
@ -62,8 +62,10 @@ def ipeds(
load_institutions(conn, years)
if "finance" in components:
typer.echo("Loading finance data (F1A)...")
typer.echo("Loading finance data (F1A/F2)...")
load_finance(conn, years)
typer.echo("Loading endowment data (F2)...")
load_endowment(conn, years)
if "staff" in components:
typer.echo("Loading staff data (S)...")

View file

@ -4,7 +4,9 @@ import dash
from dash import dcc, html, Input, Output
from admin_analytics.db.connection import get_connection
from admin_analytics.dashboard.pages import overview, compensation, staffing, headcount
from admin_analytics.dashboard.pages import (
overview, compensation, staffing, headcount, endowment, philanthropy, about,
)
def create_app() -> dash.Dash:
@ -25,7 +27,10 @@ def create_app() -> dash.Dash:
dcc.Tab(label="Executive Compensation", value="compensation"),
dcc.Tab(label="Admin Cost Overview", value="overview"),
dcc.Tab(label="Staffing & Enrollment", value="staffing"),
dcc.Tab(label="Endowment", value="endowment"),
dcc.Tab(label="Philanthropy", value="philanthropy"),
dcc.Tab(label="Current Headcount", value="headcount"),
dcc.Tab(label="About", value="about"),
],
style={"marginBottom": "20px"},
),
@ -42,8 +47,14 @@ def create_app() -> dash.Dash:
return compensation.layout(conn)
elif tab == "staffing":
return staffing.layout(conn)
elif tab == "endowment":
return endowment.layout(conn)
elif tab == "philanthropy":
return philanthropy.layout(conn)
elif tab == "headcount":
return headcount.layout(conn)
elif tab == "about":
return about.layout()
return html.Div("Unknown tab")
compensation.register_callbacks(app, conn)

View file

@ -0,0 +1,85 @@
"""Page: About — data sources, methodology, and limitations."""
from dash import html, dcc
def layout(_conn=None):
return html.Div([
dcc.Markdown("""
## About This Dashboard
This dashboard provides administrative cost benchmarking analytics for the
**University of Delaware** using exclusively **publicly available data**. No
internal university financial systems, personnel records, or confidential data
were used.
### Data Sources
All data is drawn from public, open-access sources:
| Source | Publisher | What We Use | Coverage |
|--------|-----------|-------------|----------|
| **IPEDS** | U.S. Dept. of Education, NCES | Institutional directory, expenses by function, staffing by occupation, enrollment, endowment, philanthropic gifts | 2005-2024 |
| **IRS Form 990** | Internal Revenue Service | Executive compensation (Schedule J), filing financials for UD and UD Research Foundation | Tax years 2015-2023 |
| **BLS CPI-U** | Bureau of Labor Statistics | Consumer Price Index for inflation adjustment (series CUUR0000SA0) | Full history |
| **UD Staff Directories** | University of Delaware public web pages | Administrative office headcounts (College of Engineering line management, Provost's Office) | Current snapshot |
### Methodology
**Executive Compensation** is extracted from IRS Form 990 Schedule J, which
reports detailed compensation for officers, directors, trustees, and key
employees of tax-exempt organizations. The University of Delaware (EIN
516000297) and UD Research Foundation (EIN 516017306) are the filing entities.
Titles are normalized to canonical roles (President, Provost, VP Finance, etc.)
using pattern matching. CAGR is computed as compound annual growth rate from
first to last available year.
**Administrative Cost Ratios** use IPEDS finance survey data. "Institutional
support" is the IPEDS functional expense category that most closely
corresponds to administrative overhead. The admin-to-faculty ratio uses IPEDS
occupational categories: OCCUPCAT 200 (instructional, research, and public
service staff) for faculty and OCCUPCAT 300 (management) for administration.
**Endowment Performance** uses IPEDS F2 (FASB) survey fields for beginning and
end-of-year endowment values, net investment return, and new gifts. The
endowment CAGR reflects total value growth including investment returns, new
gifts, and spending draws. The CIO compensation comparison uses the Chief
Investment Officer's Schedule J total compensation indexed against endowment
value. Note: the detailed endowment breakdown (investment return, new gifts,
other changes) is only available from IPEDS starting in the 2020 reporting
year. For 2005-2019, only beginning and end-of-year values are reported.
**Philanthropic Giving** uses IPEDS F2 total private gifts, grants, and
contracts. The compensation-vs-giving comparison indexes the President and VP
of Development compensation against total philanthropic revenue.
**Inflation Adjustment** uses the BLS CPI-U annual average (all items, U.S.
city average, not seasonally adjusted). CPI-adjusted values are expressed in
the most recent available year's dollars.
**Staffing** uses IPEDS Fall Staff survey occupational categories for full-time
employees only (FTPT=2).
### Limitations
- **IRS 990 coverage** depends on e-file availability. Not all years may have
filings for all entities, and XML schema variations across years can cause
individual fields to be missing.
- **IPEDS data** has a reporting lag; the most recent fiscal year may not yet
be available.
- **Endowment CAGR** reflects net growth after all inflows and outflows, not
pure investment return. It is not directly comparable to an investment
benchmark.
- **Title normalization** uses pattern matching and may misclassify titles that
don't follow common naming conventions.
- **Admin headcount** from web scraping is a point-in-time snapshot and is
limited to the pages currently targeted (College of Engineering and
Provost's Office).
- **Single institution** this prototype covers the University of Delaware
only. Peer comparisons are planned for a future phase.
### License
This project is released under the MIT License. Copyright (c) 2026 Eric Furst.
"""),
], style={"maxWidth": "900px", "margin": "0 auto", "lineHeight": "1.6"})

View file

@ -10,6 +10,9 @@ from admin_analytics.dashboard.queries import (
query_top_earners,
query_comp_by_role,
query_comp_vs_cpi,
query_comp_cagr,
query_aggregate_comp,
query_aggregate_comp_cagr,
)
_NO_DATA = html.Div(
@ -21,6 +24,24 @@ _NO_DATA = html.Div(
_KEY_ROLES = ["PRESIDENT", "PROVOST", "VP_FINANCE", "VP_RESEARCH", "VP_ADVANCEMENT", "CFO"]
def _kpi_card(title: str, value: str, subtitle: str = "") -> html.Div:
return html.Div(
[
html.H4(title, style={"margin": "0", "color": "#666", "fontSize": "14px"}),
html.H2(value, style={"margin": "5px 0", "color": "#00539F"}),
html.P(subtitle, style={"margin": "0", "color": "#999", "fontSize": "12px"}),
],
style={
"flex": "1",
"padding": "20px",
"backgroundColor": "#f8f9fa",
"borderRadius": "8px",
"textAlign": "center",
"margin": "0 8px",
},
)
def layout(conn: duckdb.DuckDBPyConnection):
all_earners = query_top_earners(conn)
if all_earners.height == 0:
@ -31,6 +52,34 @@ def layout(conn: duckdb.DuckDBPyConnection):
{"label": str(y), "value": y} for y in years
]
# KPI cards
cagr = query_comp_cagr(conn)
agg_cagr = query_aggregate_comp_cagr(conn)
kpi_cards = []
if cagr:
kpi_cards.append(_kpi_card(
"President Compensation",
f"${cagr['end_comp']:,}",
f"Tax year {cagr['end_year']}",
))
kpi_cards.append(_kpi_card(
"President CAGR",
f"{cagr['cagr_pct']}%",
f"Annualized growth, {cagr['start_year']}-{cagr['end_year']}",
))
if agg_cagr:
kpi_cards.append(_kpi_card(
"Top-10 Total Compensation",
f"${agg_cagr['end_comp']:,}",
f"Tax year {agg_cagr['end_year']}",
))
kpi_cards.append(_kpi_card(
"Top-10 CAGR",
f"{agg_cagr['cagr_pct']}%",
f"Annualized growth, {agg_cagr['start_year']}-{agg_cagr['end_year']}",
))
kpi_row = html.Div(kpi_cards, style={"display": "flex", "marginBottom": "24px"}) if kpi_cards else html.Div()
# Compensation by role trend
role_df = query_comp_by_role(conn)
role_fig = go.Figure()
@ -61,18 +110,24 @@ def layout(conn: duckdb.DuckDBPyConnection):
mode="lines+markers", name="Top Compensation",
line={"color": "#00539F"},
))
cpi_fig.add_trace(go.Scatter(
x=cpi_pd["year"], y=cpi_pd["agg_index"],
mode="lines+markers", name="Top-10 Aggregate",
line={"color": "#E07A5F"},
))
cpi_fig.add_trace(go.Scatter(
x=cpi_pd["year"], y=cpi_pd["cpi_index"],
mode="lines+markers", name="CPI-U",
line={"color": "#FFD200", "dash": "dash"},
))
cpi_fig.update_layout(
title="Top Compensation vs CPI-U (Indexed, Base Year = 100)",
title="Compensation vs CPI-U (Indexed, Base Year = 100)",
xaxis_title="Year", yaxis_title="Index",
template="plotly_white", height=380,
)
return html.Div([
kpi_row,
html.Div(
[
html.Label("Filter by Tax Year: ", style={"fontWeight": "bold"}),
@ -136,8 +191,17 @@ def register_callbacks(app: dash.Dash, conn: duckdb.DuckDBPyConnection) -> None:
breakdown_fig = go.Figure()
if earners.height > 0:
ep = earners.to_pandas().head(10) # top 10 by total comp
short_names = [n.split(",")[0][:20] if "," in n else n.split()[-1][:20]
for n in ep["person_name"]]
_SUFFIXES = {"JR", "SR", "II", "III", "IV", "JR.", "SR."}
def _short_name(n):
if "," in n:
return n.split(",")[0][:20]
parts = n.split()
while len(parts) > 1 and parts[-1].upper().rstrip(".") in _SUFFIXES:
parts.pop()
return parts[-1][:20] if parts else n[:20]
short_names = [_short_name(n) for n in ep["person_name"]]
for comp_type, label, color in [
("base_compensation", "Base", "#00539F"),
("bonus_compensation", "Bonus", "#FFD200"),

View file

@ -0,0 +1,190 @@
"""Page: Endowment Performance."""
import duckdb
from dash import html, dcc
import plotly.graph_objects as go
from admin_analytics.dashboard.queries import (
query_endowment, query_endowment_per_student, query_cio_vs_endowment,
)
_NO_DATA = html.Div(
"No endowment data loaded. Run: admin-analytics ingest ipeds --component finance",
style={"textAlign": "center", "padding": "40px", "color": "#888"},
)
def _kpi_card(title: str, value: str, subtitle: str = "") -> html.Div:
return html.Div(
[
html.H4(title, style={"margin": "0", "color": "#666", "fontSize": "14px"}),
html.H2(value, style={"margin": "5px 0", "color": "#00539F"}),
html.P(subtitle, style={"margin": "0", "color": "#999", "fontSize": "12px"}),
],
style={
"flex": "1",
"padding": "20px",
"backgroundColor": "#f8f9fa",
"borderRadius": "8px",
"textAlign": "center",
"margin": "0 8px",
},
)
def layout(conn: duckdb.DuckDBPyConnection):
df = query_endowment(conn)
if df.height == 0:
return _NO_DATA
pd = df.to_pandas()
latest = pd.iloc[-1]
# Endowment CAGR from first to last year with data
valid = pd.dropna(subset=["endowment_eoy"])
if len(valid) >= 2:
first = valid.iloc[0]
last = valid.iloc[-1]
start_year = int(first["year"])
end_year = int(last["year"])
n_years = end_year - start_year
start_val = float(first["endowment_eoy"])
end_val = float(last["endowment_eoy"])
endow_cagr = round(((end_val / start_val) ** (1.0 / n_years) - 1) * 100, 1) if n_years > 0 and start_val > 0 else None
else:
start_year = end_year = None
endow_cagr = None
# Endowment per student
eps_df = query_endowment_per_student(conn)
eps_pd = eps_df.to_pandas() if eps_df.height > 0 else None
latest_eps = eps_pd.iloc[-1] if eps_pd is not None and len(eps_pd) > 0 else None
# KPI cards — single row
kpi_cards = [
_kpi_card(
"Endowment Value",
f"${latest['endowment_eoy'] / 1e9:.2f}B" if latest["endowment_eoy"] else "N/A",
f"End of FY {int(latest['year'])}",
),
_kpi_card(
"Endowment CAGR",
f"{endow_cagr}%" if endow_cagr is not None else "N/A",
f"FY {start_year}-{end_year}" if start_year else "",
),
]
if latest_eps is not None and latest_eps["endowment_per_student"]:
kpi_cards.append(_kpi_card(
"Endowment per Student",
f"${int(latest_eps['endowment_per_student']):,}",
f"FY {int(latest_eps['year'])}",
))
kpi_cards.append(_kpi_card(
"New Gifts to Endowment",
f"${latest['new_gifts'] / 1e6:.1f}M" if latest["new_gifts"] else "N/A",
f"FY {int(latest['year'])}",
))
kpi_row = html.Div(kpi_cards, style={"display": "flex", "marginBottom": "24px"})
# Endowment value trend
value_fig = go.Figure()
value_fig.add_trace(go.Scatter(
x=pd["year"], y=pd["endowment_eoy"] / 1e9,
mode="lines+markers", name="End-of-Year Value",
line={"color": "#00539F"},
fill="tozeroy", fillcolor="rgba(0,83,159,0.1)",
))
value_fig.update_layout(
title="Endowment Value Over Time",
xaxis_title="Year", yaxis_title="Billions $",
template="plotly_white", height=400,
)
# Investment return and new gifts bar chart
components_fig = go.Figure()
components_fig.add_trace(go.Bar(
x=pd["year"], y=pd["net_investment_return"] / 1e6,
name="Net Investment Return",
marker_color="#7FB069",
))
components_fig.add_trace(go.Bar(
x=pd["year"], y=pd["new_gifts"] / 1e6,
name="New Gifts to Endowment",
marker_color="#00539F",
))
if "other_changes" in pd.columns:
components_fig.add_trace(go.Bar(
x=pd["year"], y=pd["other_changes"] / 1e6,
name="Other Changes",
marker_color="#999",
))
components_fig.update_layout(
title="Endowment Changes by Component (Millions $)",
xaxis_title="Year", yaxis_title="Millions $",
barmode="group",
template="plotly_white", height=400,
)
# Investment return rate
rate_fig = go.Figure()
rates = pd.copy()
rates["return_pct"] = rates["net_investment_return"] * 100 / rates["endowment_boy"]
rate_fig.add_trace(go.Scatter(
x=rates["year"], y=rates["return_pct"],
mode="lines+markers", name="Return %",
line={"color": "#00539F"},
))
rate_fig.add_hline(y=0, line_dash="dot", line_color="#ccc")
rate_fig.update_layout(
title="Endowment Net Investment Return Rate (%)",
xaxis_title="Year", yaxis_title="%",
template="plotly_white", height=380,
)
# CIO compensation vs endowment growth
cio_df = query_cio_vs_endowment(conn)
cio_fig = go.Figure()
if cio_df.height > 1:
cio_pd = cio_df.to_pandas()
cio_fig.add_trace(go.Scatter(
x=cio_pd["tax_year"], y=cio_pd["cio_index"],
mode="lines+markers", name="CIO Compensation",
line={"color": "#E07A5F"},
))
cio_fig.add_trace(go.Scatter(
x=cio_pd["tax_year"], y=cio_pd["endowment_index"],
mode="lines+markers", name="Endowment Value",
line={"color": "#00539F"},
))
cio_fig.add_hline(y=100, line_dash="dot", line_color="#ccc")
cio_fig.update_layout(
title="Chief Investment Officer Compensation vs Endowment Growth (Indexed, Base Year = 100)",
xaxis_title="Year", yaxis_title="Index",
template="plotly_white", height=400,
)
# Endowment per student trend
eps_fig = go.Figure()
if eps_pd is not None and len(eps_pd) > 0:
eps_fig.add_trace(go.Scatter(
x=eps_pd["year"], y=eps_pd["endowment_per_student"],
mode="lines+markers", name="Endowment per Student",
line={"color": "#00539F"},
))
eps_fig.update_layout(
title="Endowment per Student ($)",
xaxis_title="Year", yaxis_title="$",
template="plotly_white", height=380,
)
charts = [
kpi_row,
dcc.Graph(figure=value_fig),
dcc.Graph(figure=eps_fig),
dcc.Graph(figure=components_fig),
dcc.Graph(figure=rate_fig),
]
if cio_df.height > 1:
charts.append(dcc.Graph(figure=cio_fig))
return html.Div(charts)

View file

@ -0,0 +1,132 @@
"""Page: Philanthropic Giving."""
import duckdb
from dash import html, dcc
import plotly.graph_objects as go
from admin_analytics.dashboard.queries import query_philanthropy, query_comp_vs_philanthropy
_NO_DATA = html.Div(
"No philanthropy data loaded. Run: admin-analytics ingest ipeds --component finance",
style={"textAlign": "center", "padding": "40px", "color": "#888"},
)
def _kpi_card(title: str, value: str, subtitle: str = "") -> html.Div:
return html.Div(
[
html.H4(title, style={"margin": "0", "color": "#666", "fontSize": "14px"}),
html.H2(value, style={"margin": "5px 0", "color": "#00539F"}),
html.P(subtitle, style={"margin": "0", "color": "#999", "fontSize": "12px"}),
],
style={
"flex": "1",
"padding": "20px",
"backgroundColor": "#f8f9fa",
"borderRadius": "8px",
"textAlign": "center",
"margin": "0 8px",
},
)
def layout(conn: duckdb.DuckDBPyConnection):
df = query_philanthropy(conn)
if df.height == 0:
return _NO_DATA
pd = df.to_pandas()
latest = pd.iloc[-1]
# KPI cards
kpi_row = html.Div(
[
_kpi_card(
"Total Private Gifts & Grants",
f"${latest['total_private_gifts'] / 1e6:.1f}M" if latest["total_private_gifts"] else "N/A",
f"FY {int(latest['year'])}",
),
_kpi_card(
"Gifts to Endowment",
f"${latest['endowment_gifts'] / 1e6:.1f}M" if latest["endowment_gifts"] else "N/A",
f"FY {int(latest['year'])}",
),
],
style={"display": "flex", "marginBottom": "24px"},
)
# Private gifts trend (nominal and CPI-adjusted)
gifts_fig = go.Figure()
gifts_fig.add_trace(go.Bar(
x=pd["year"], y=pd["total_private_gifts"] / 1e6,
name="Nominal",
marker_color="#00539F",
))
if "gifts_cpi_adjusted" in pd.columns and pd["gifts_cpi_adjusted"].notna().any():
gifts_fig.add_trace(go.Scatter(
x=pd["year"], y=pd["gifts_cpi_adjusted"] / 1e6,
mode="lines+markers", name="CPI-Adjusted",
line={"color": "#FFD200", "dash": "dash"},
))
gifts_fig.update_layout(
title="Total Private Gifts & Grants (Millions $)",
xaxis_title="Year", yaxis_title="Millions $",
template="plotly_white", height=420,
)
# Endowment gifts vs total gifts
split_fig = go.Figure()
pd["non_endowment_gifts"] = pd["total_private_gifts"] - pd["endowment_gifts"].fillna(0)
split_fig.add_trace(go.Bar(
x=pd["year"], y=pd["endowment_gifts"] / 1e6,
name="To Endowment",
marker_color="#00539F",
))
split_fig.add_trace(go.Bar(
x=pd["year"], y=pd["non_endowment_gifts"] / 1e6,
name="Current Use / Other",
marker_color="#7FB069",
))
split_fig.update_layout(
title="Gift Allocation: Endowment vs Current Use (Millions $)",
xaxis_title="Year", yaxis_title="Millions $",
barmode="stack",
template="plotly_white", height=400,
)
# Compensation vs philanthropy indexed chart
cvp_df = query_comp_vs_philanthropy(conn)
cvp_fig = go.Figure()
if cvp_df.height > 1:
cvp_pd = cvp_df.to_pandas()
cvp_fig.add_trace(go.Scatter(
x=cvp_pd["tax_year"], y=cvp_pd["president_index"],
mode="lines+markers", name="President Compensation",
line={"color": "#00539F"},
))
cvp_fig.add_trace(go.Scatter(
x=cvp_pd["tax_year"], y=cvp_pd["vp_adv_index"],
mode="lines+markers", name="VP Development Compensation",
line={"color": "#E07A5F"},
))
cvp_fig.add_trace(go.Scatter(
x=cvp_pd["tax_year"], y=cvp_pd["gifts_index"],
mode="lines+markers", name="Philanthropic Gifts",
line={"color": "#7FB069"},
))
cvp_fig.add_hline(y=100, line_dash="dot", line_color="#ccc")
cvp_fig.update_layout(
title="Compensation Growth vs Philanthropic Giving (Indexed, Base Year = 100)",
xaxis_title="Year", yaxis_title="Index",
template="plotly_white", height=420,
)
charts = [
kpi_row,
dcc.Graph(figure=gifts_fig),
dcc.Graph(figure=split_fig),
]
if cvp_df.height > 1:
charts.append(dcc.Graph(figure=cvp_fig))
return html.Div(charts)

View file

@ -96,6 +96,116 @@ def query_admin_faculty_ratio(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
""", [UD_UNITID]).pl()
def query_aggregate_comp(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
"""Top-10 Schedule J compensation per year — total, count, and average."""
return conn.execute("""
WITH ranked AS (
SELECT j.tax_year, j.total_compensation,
j.base_compensation, j.bonus_compensation,
j.deferred_compensation, j.nontaxable_benefits,
j.other_compensation,
ROW_NUMBER() OVER (PARTITION BY j.tax_year
ORDER BY j.total_compensation DESC) AS rn
FROM raw_990_schedule_j j
WHERE j.total_compensation > 0
)
SELECT tax_year,
COUNT(*) AS headcount,
SUM(total_compensation) AS total_comp,
ROUND(AVG(total_compensation), 0) AS avg_comp,
SUM(base_compensation) AS total_base,
SUM(bonus_compensation) AS total_bonus,
SUM(deferred_compensation) AS total_deferred,
SUM(nontaxable_benefits) AS total_benefits,
SUM(other_compensation) AS total_other
FROM ranked
WHERE rn <= 10
GROUP BY tax_year
ORDER BY tax_year
""").pl()
def query_aggregate_comp_cagr(conn: duckdb.DuckDBPyConnection) -> dict | None:
"""CAGR of aggregate Schedule J compensation over the last 5 years of data."""
df = query_aggregate_comp(conn)
if df.height < 2:
return None
# Use last 5 years of available data
df = df.tail(min(5, df.height))
start_year = df["tax_year"][0]
end_year = df["tax_year"][-1]
start_comp = float(df["total_comp"][0])
end_comp = float(df["total_comp"][-1])
n_years = end_year - start_year
if n_years <= 0 or start_comp <= 0:
return None
cagr = ((end_comp / start_comp) ** (1.0 / n_years) - 1) * 100
return {
"cagr_pct": round(cagr, 1),
"start_year": start_year,
"end_year": end_year,
"start_comp": int(end_comp),
"end_comp": int(end_comp),
}
def query_comp_cagr(conn: duckdb.DuckDBPyConnection) -> dict | None:
"""Annualized growth rate (CAGR) of President compensation.
Tracks the President role specifically using title normalization.
Returns dict with cagr_pct, start_year, end_year, start_comp, end_comp,
or None if insufficient data.
"""
raw = conn.execute("""
SELECT j.tax_year, j.title, j.total_compensation
FROM raw_990_schedule_j j
WHERE j.total_compensation > 0
ORDER BY j.tax_year
""").pl()
if raw.height == 0:
return None
raw = raw.with_columns(
pl.col("title").map_elements(
normalize_title, return_dtype=pl.Utf8
).alias("role")
)
df = (
raw.filter(pl.col("role") == "PRESIDENT")
.group_by("tax_year")
.agg(pl.col("total_compensation").max().alias("top_comp"))
.sort("tax_year")
)
if df.height < 2:
return None
start_year = df["tax_year"][0]
end_year = df["tax_year"][-1]
start_comp = df["top_comp"][0]
end_comp = df["top_comp"][-1]
n_years = end_year - start_year
if n_years <= 0 or start_comp <= 0:
return None
cagr = ((end_comp / start_comp) ** (1.0 / n_years) - 1) * 100
return {
"cagr_pct": round(cagr, 1),
"start_year": start_year,
"end_year": end_year,
"start_comp": start_comp,
"end_comp": end_comp,
}
def query_top_earners(
conn: duckdb.DuckDBPyConnection, year: int | None = None
) -> pl.DataFrame:
@ -162,11 +272,23 @@ def query_comp_by_role(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
def query_comp_vs_cpi(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
"""Compensation growth vs CPI growth, indexed to first available year = 100."""
"""Compensation growth vs CPI growth, indexed to first available year = 100.
Includes top earner, top-10 aggregate, and CPI-U.
"""
return conn.execute("""
WITH yearly_max_comp AS (
SELECT tax_year, MAX(total_compensation) AS top_comp
WITH ranked AS (
SELECT tax_year, total_compensation,
ROW_NUMBER() OVER (PARTITION BY tax_year
ORDER BY total_compensation DESC) AS rn
FROM raw_990_schedule_j
WHERE total_compensation > 0
),
yearly_comp AS (
SELECT tax_year,
MAX(total_compensation) AS top_comp,
SUM(CASE WHEN rn <= 10 THEN total_compensation END) AS agg_comp
FROM ranked
GROUP BY tax_year
),
annual_cpi AS (
@ -174,20 +296,24 @@ def query_comp_vs_cpi(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
FROM raw_cpi_u GROUP BY year
),
base AS (
SELECT c.top_comp AS base_comp, ac.avg_cpi AS base_cpi
FROM yearly_max_comp c
SELECT c.top_comp AS base_top, c.agg_comp AS base_agg,
ac.avg_cpi AS base_cpi
FROM yearly_comp c
JOIN annual_cpi ac ON ac.year = c.tax_year
ORDER BY c.tax_year LIMIT 1
)
SELECT
c.tax_year AS year,
c.top_comp,
c.agg_comp,
ac.avg_cpi,
ROUND(c.top_comp * 100.0 / NULLIF((SELECT base_comp FROM base), 0), 1)
ROUND(c.top_comp * 100.0 / NULLIF((SELECT base_top FROM base), 0), 1)
AS comp_index,
ROUND(c.agg_comp * 100.0 / NULLIF((SELECT base_agg FROM base), 0), 1)
AS agg_index,
ROUND(ac.avg_cpi * 100.0 / NULLIF((SELECT base_cpi FROM base), 0), 1)
AS cpi_index
FROM yearly_max_comp c
FROM yearly_comp c
JOIN annual_cpi ac ON ac.year = c.tax_year
ORDER BY year
""").pl()
@ -249,6 +375,166 @@ def query_growth_index(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
""", [UD_UNITID, UD_UNITID]).pl()
def query_endowment(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
"""Endowment performance over time."""
return conn.execute("""
SELECT year, endowment_boy, endowment_eoy, new_gifts,
net_investment_return, other_changes, long_term_investments
FROM raw_ipeds_endowment
WHERE unitid = ?
ORDER BY year
""", [UD_UNITID]).pl()
def query_endowment_per_student(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
"""Endowment value per student over time."""
return conn.execute("""
SELECT e.year, e.endowment_eoy, en.total_enrollment,
ROUND(e.endowment_eoy * 1.0 / NULLIF(en.total_enrollment, 0), 0)
AS endowment_per_student
FROM raw_ipeds_endowment e
JOIN raw_ipeds_enrollment en ON en.unitid = e.unitid AND en.year = e.year
WHERE e.unitid = ?
ORDER BY e.year
""", [UD_UNITID]).pl()
def query_cio_vs_endowment(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
"""Chief Investment Officer compensation vs endowment growth, indexed."""
raw = conn.execute("""
SELECT j.tax_year, j.title, j.total_compensation
FROM raw_990_schedule_j j
WHERE j.total_compensation > 0
""").pl()
if raw.height == 0:
return pl.DataFrame()
raw = raw.with_columns(
pl.col("title").map_elements(
normalize_title, return_dtype=pl.Utf8
).alias("role")
)
cio = (
raw.filter(pl.col("role") == "CHIEF_INVESTMENT_OFFICER")
.group_by("tax_year")
.agg(pl.col("total_compensation").max().alias("cio_comp"))
.sort("tax_year")
)
if cio.height == 0:
return pl.DataFrame()
endow = conn.execute("""
SELECT year, endowment_eoy
FROM raw_ipeds_endowment
WHERE unitid = ?
ORDER BY year
""", [UD_UNITID]).pl()
merged = (
cio.join(endow, left_on="tax_year", right_on="year", how="inner")
.drop_nulls(subset=["cio_comp", "endowment_eoy"])
.sort("tax_year")
)
if merged.height < 2:
return merged
base_comp = float(merged["cio_comp"][0])
base_endow = float(merged["endowment_eoy"][0])
merged = merged.with_columns(
(pl.col("cio_comp").cast(pl.Float64) * 100.0 / base_comp).round(1).alias("cio_index"),
(pl.col("endowment_eoy").cast(pl.Float64) * 100.0 / base_endow).round(1).alias("endowment_index"),
)
return merged
def query_philanthropy(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
"""Philanthropic giving over time — IPEDS private gifts + 990 revenue."""
return conn.execute(f"""
{_CPI_CTE}
SELECT e.year, e.total_private_gifts, e.new_gifts AS endowment_gifts,
ROUND(e.total_private_gifts * (SELECT avg_cpi FROM latest_cpi)
/ ac.avg_cpi, 0) AS gifts_cpi_adjusted
FROM raw_ipeds_endowment e
LEFT JOIN annual_cpi ac ON ac.year = e.year
WHERE e.unitid = ?
ORDER BY e.year
""", [UD_UNITID]).pl()
def query_comp_vs_philanthropy(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
"""VP Advancement and President comp vs philanthropic gifts, indexed."""
raw = conn.execute("""
SELECT j.tax_year, j.title, j.total_compensation
FROM raw_990_schedule_j j
WHERE j.total_compensation > 0
""").pl()
if raw.height == 0:
return pl.DataFrame()
raw = raw.with_columns(
pl.col("title").map_elements(
normalize_title, return_dtype=pl.Utf8
).alias("role")
)
# Get max comp per role per year for President and VP Advancement
roles = raw.filter(pl.col("role").is_in(["PRESIDENT", "VP_ADVANCEMENT"]))
if roles.height == 0:
return pl.DataFrame()
pivoted = (
roles.group_by(["tax_year", "role"])
.agg(pl.col("total_compensation").max().alias("comp"))
.sort("tax_year")
)
pres = (
pivoted.filter(pl.col("role") == "PRESIDENT")
.select(pl.col("tax_year"), pl.col("comp").alias("president_comp"))
)
vp = (
pivoted.filter(pl.col("role") == "VP_ADVANCEMENT")
.select(pl.col("tax_year"), pl.col("comp").alias("vp_adv_comp"))
)
gifts = conn.execute("""
SELECT year, total_private_gifts
FROM raw_ipeds_endowment
WHERE unitid = ?
ORDER BY year
""", [UD_UNITID]).pl()
# Join all three on year
merged = (
pres.join(vp, on="tax_year", how="outer_coalesce")
.join(gifts, left_on="tax_year", right_on="year", how="inner")
.drop_nulls(subset=["total_private_gifts"])
.sort("tax_year")
)
if merged.height < 2:
return merged
base_pres = float(merged.drop_nulls("president_comp")["president_comp"][0])
base_vp = float(merged.drop_nulls("vp_adv_comp")["vp_adv_comp"][0])
base_gifts = float(merged["total_private_gifts"][0])
merged = merged.with_columns(
(pl.col("president_comp").cast(pl.Float64) * 100.0 / base_pres).round(1).alias("president_index"),
(pl.col("vp_adv_comp").cast(pl.Float64) * 100.0 / base_vp).round(1).alias("vp_adv_index"),
(pl.col("total_private_gifts").cast(pl.Float64) * 100.0 / base_gifts).round(1).alias("gifts_index"),
)
return merged
def query_admin_headcount(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
"""All scraped admin headcount entries."""
return conn.execute("""

View file

@ -100,6 +100,21 @@ TABLES = {
other_compensation BIGINT
)
""",
"raw_ipeds_endowment": """
CREATE TABLE IF NOT EXISTS raw_ipeds_endowment (
unitid INTEGER NOT NULL,
year INTEGER NOT NULL,
endowment_boy BIGINT,
endowment_eoy BIGINT,
new_gifts BIGINT,
net_investment_return BIGINT,
other_changes BIGINT,
total_private_gifts BIGINT,
total_investment_return BIGINT,
long_term_investments BIGINT,
PRIMARY KEY (unitid, year)
)
""",
"raw_cpi_u": """
CREATE TABLE IF NOT EXISTS raw_cpi_u (
year INTEGER NOT NULL,

View file

@ -40,6 +40,25 @@ F2_COLUMN_VARIANTS = {
"benefits": ["F2E133"],
}
# F2 endowment / philanthropy fields
F2_ENDOWMENT_VARIANTS = {
"unitid": ["UNITID"],
"endowment_boy": ["F2H01"],
"endowment_eoy": ["F2H02"],
"new_gifts": ["F2H03A"],
"net_investment_return": ["F2H03B"],
"other_changes": ["F2H03D"],
"total_private_gifts": ["F2D08"],
"total_investment_return": ["F2D10"],
"long_term_investments": ["F2A01"],
}
ENDOWMENT_COLUMNS = [
"unitid", "year", "endowment_boy", "endowment_eoy", "new_gifts",
"net_investment_return", "other_changes", "total_private_gifts",
"total_investment_return", "long_term_investments",
]
CANONICAL_COLUMNS = [
"unitid", "year", "reporting_standard", "total_expenses",
"instruction_expenses", "research_expenses", "public_service_expenses",
@ -56,7 +75,7 @@ def _find_csv(component_dir: Path) -> Path | None:
def _resolve_columns(df: pl.DataFrame, variants: dict) -> dict[str, str]:
"""For each canonical name, find the first matching column."""
upper_cols = {c.upper(): c for c in df.columns}
upper_cols = {c.strip().upper(): c for c in df.columns}
resolved = {}
for canonical, candidates in variants.items():
for var in candidates:
@ -140,3 +159,49 @@ def load_finance(
print(f" No finance CSV found for {year}, skipping")
return total
def load_endowment(
conn: duckdb.DuckDBPyConnection,
year_range: range,
unitid_filter: int | None = UD_UNITID,
) -> int:
"""Load IPEDS F2 endowment and philanthropy data into raw_ipeds_endowment."""
total = 0
for year in year_range:
f2_dir = config.IPEDS_DATA_DIR / "finance_f2" / str(year)
csv_path = _find_csv(f2_dir)
if csv_path is None:
continue
df = pl.read_csv(csv_path, infer_schema_length=0, encoding="utf8-lossy")
col_map = _resolve_columns(df, F2_ENDOWMENT_VARIANTS)
if "unitid" not in col_map:
continue
result = pl.DataFrame({
canonical: df[actual] for canonical, actual in col_map.items()
})
result = result.with_columns(pl.lit(year).alias("year"))
for col in ENDOWMENT_COLUMNS:
if col not in result.columns:
result = result.with_columns(pl.lit(None).alias(col))
elif col not in ("year",):
result = result.with_columns(pl.col(col).cast(pl.Int64, strict=False))
if unitid_filter is not None:
result = result.filter(pl.col("unitid") == unitid_filter)
if result.height == 0:
continue
result = result.select(ENDOWMENT_COLUMNS)
conn.execute("DELETE FROM raw_ipeds_endowment WHERE year = ?", [year])
conn.register("_tmp_endow", result.to_arrow())
conn.execute("INSERT INTO raw_ipeds_endowment SELECT * FROM _tmp_endow")
conn.unregister("_tmp_endow")
total += result.height
return total

View file

@ -10,9 +10,10 @@ TITLE_PATTERNS: list[tuple[str, re.Pattern]] = [
("VP_FINANCE", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\b(?:financ|budget|business|admin)|\b(?:financ|budget|business|admin).*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
("VP_RESEARCH", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\bresearch|\bresearch.*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
("VP_STUDENT_AFFAIRS", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\bstudent|\bstudent.*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
("VP_ADVANCEMENT", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\b(?:advancement|development|giving|fundrais)|\b(?:advancement|development|giving|fundrais).*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
("VP_ADVANCEMENT", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\b(?:advancement|develop|alumni|giving|fundrais)|\b(?:advancement|develop|alumni|giving|fundrais).*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
("VP_OTHER", re.compile(r"\bv\.?p\.?\b|\bvice\s+president\b", re.I)),
("CFO", re.compile(r"\b(cfo|chief\s+financial)\b", re.I)),
("CHIEF_INVESTMENT_OFFICER", re.compile(r"\bchief\s+investment\b", re.I)),
("CIO", re.compile(r"\b(cio|chief\s+information)\b", re.I)),
("COO", re.compile(r"\b(coo|chief\s+operating)\b", re.I)),
("GENERAL_COUNSEL", re.compile(r"\b(general\s+counsel|chief\s+legal)\b", re.I)),

View file

@ -16,6 +16,7 @@ KEY_COLUMNS: dict[str, list[str]] = {
"raw_990_filing": ["ein", "tax_year", "total_revenue", "total_expenses"],
"raw_990_part_vii": ["ein", "tax_year", "person_name", "reportable_comp_from_org"],
"raw_990_schedule_j": ["ein", "tax_year", "person_name", "total_compensation"],
"raw_ipeds_endowment": ["unitid", "year", "endowment_eoy"],
"raw_cpi_u": ["year", "month", "value"],
"raw_admin_headcount": ["unit", "person_name", "category"],
}
@ -29,6 +30,7 @@ YEAR_COLUMN: dict[str, str] = {
"raw_990_filing": "tax_year",
"raw_990_part_vii": "tax_year",
"raw_990_schedule_j": "tax_year",
"raw_ipeds_endowment": "year",
"raw_cpi_u": "year",
}