diff --git a/README.md b/README.md index c7734fb..7ccc269 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,26 @@ University of Delaware administrative cost benchmarking using public data (IRS 990, IPEDS, BLS CPI-U). Ingests data into a local DuckDB database and serves an interactive Dash dashboard for analysis. +## Scope + +This project is currently scoped to the **University of Delaware** as a single institution. It tracks: + +- **Executive compensation** from IRS 990 Schedule J filings by the University of Delaware (EIN 516000297) and UD Research Foundation (EIN 516017306) +- **Administrative cost ratios** from IPEDS finance surveys (expenses by function, staffing levels, enrollment) +- **Endowment performance** and **philanthropic giving** from IPEDS F2 (FASB) financial data +- **Administrative headcount** via web scraping, currently focused on the **College of Engineering line management** (COE Central, department offices) and the Provost's Office + +### Changing the target institution + +The institution scope is controlled by constants in `src/admin_analytics/config.py`: + +- `UD_UNITID = 130943` -- IPEDS institution identifier. Change this to target a different institution. Look up UNITIDs at the [IPEDS Data Center](https://nces.ed.gov/ipeds/use-the-data). +- `UD_EINS = [516000297, 516017306]` -- IRS Employer Identification Numbers for 990 filings. Update these to the EINs of the target institution's nonprofit entities. + +All IPEDS loaders accept a `unitid_filter` parameter. The scraper URLs in `src/admin_analytics/scraper/directory.py` are UD-specific and would need to be updated for a different institution. + +Multi-institution comparisons (AAU peers, Carnegie peers) are planned for a future phase. + ## Prerequisites - Python 3.11+ @@ -49,12 +69,15 @@ Opens at [http://localhost:8050](http://localhost:8050). Use `--port` to change The dashboard must be restarted to pick up newly ingested data (DuckDB opens in read-only mode to avoid lock conflicts). -The dashboard has four tabs: +The dashboard has seven tabs: -- **Executive Compensation** -- top earners from IRS 990 Schedule J, compensation trends by role, compensation breakdown by component, growth vs CPI-U (2017-2023) +- **Executive Compensation** -- top earners from IRS 990 Schedule J, President and top-10 CAGR, trends by role, compensation breakdown by component, growth vs CPI-U (2015-2023) - **Admin Cost Overview** -- admin cost ratios, expense breakdown by function, cost per student, admin-to-faculty ratio (IPEDS data, 2005-2024) - **Staffing & Enrollment** -- staff composition, student-to-staff ratios, management vs faculty vs enrollment growth (indexed) +- **Endowment** -- endowment value trends, CAGR, investment return rate, CIO compensation vs endowment growth (IPEDS F2) +- **Philanthropy** -- total private gifts and grants, gift allocation, President and VP Development compensation growth vs fundraising (IPEDS F2 and IRS 990) - **Current Headcount** -- scraped UD staff directory data with overhead/non-overhead classification by unit +- **About** -- data sources, methodology, and limitations ## Validating Data diff --git a/docs/data_dictionary.md b/docs/data_dictionary.md index 4aae435..e5216bf 100644 --- a/docs/data_dictionary.md +++ b/docs/data_dictionary.md @@ -147,6 +147,26 @@ Raw data layer for University of Delaware administrative analytics. All tables a | value | DOUBLE | CPI-U index value (base period: 1982-84 = 100) | | series_id | VARCHAR | BLS series identifier (always CUUR0000SA0) | +### raw_ipeds_endowment + +**Source:** IPEDS F2 (FASB) finance survey — endowment and investment sections +**Granularity:** One row per institution per year +**Primary Key:** (unitid, year) +**Note:** Endowment fields (F2H*) are available for all years 2005-2023. + +| Column | Type | Description | Source field | +|--------|------|-------------|-------------| +| unitid | INTEGER | IPEDS institution identifier | UNITID | +| year | INTEGER | Fiscal year | derived from filename | +| endowment_boy | BIGINT | Endowment value, beginning of fiscal year | F2H01 | +| endowment_eoy | BIGINT | Endowment value, end of fiscal year | F2H02 | +| new_gifts | BIGINT | New gifts and additions to endowment | F2H03A | +| net_investment_return | BIGINT | Net investment return on endowment | F2H03B | +| other_changes | BIGINT | Other changes in endowment value | F2H03D | +| total_private_gifts | BIGINT | Total private gifts, grants, and contracts | F2D08 | +| total_investment_return | BIGINT | Total investment return (all funds) | F2D10 | +| long_term_investments | BIGINT | Long-term investments (balance sheet) | F2A01 | + ### raw_admin_headcount **Source:** Web scraping of UD staff directory pages @@ -170,4 +190,5 @@ Raw data layer for University of Delaware administrative analytics. All tables a - **IRS 990 tables** are linked by `object_id` (filing) and `ein` (organization) - **IPEDS → IRS 990:** The `ein` field in `raw_institution` links to `ein` in 990 tables. UD Foundation EINs: 516000297, 516017306 - **CPI-U** is used for inflation adjustment — join on `year` (and optionally `month`) to any table with a year column +- **Endowment** data comes from IPEDS F2 endowment section; 990 `total_assets` provides a cross-check - **Admin headcount** links to IPEDS via institutional context (UD only in first iteration) diff --git a/src/admin_analytics/cli.py b/src/admin_analytics/cli.py index 6b7342c..8d550c8 100644 --- a/src/admin_analytics/cli.py +++ b/src/admin_analytics/cli.py @@ -39,7 +39,7 @@ def ipeds( from admin_analytics.ipeds.download import download_all from admin_analytics.ipeds.institution import load_institutions - from admin_analytics.ipeds.finance import load_finance + from admin_analytics.ipeds.finance import load_finance, load_endowment from admin_analytics.ipeds.staff import load_staff from admin_analytics.ipeds.enrollment import load_enrollment @@ -62,8 +62,10 @@ def ipeds( load_institutions(conn, years) if "finance" in components: - typer.echo("Loading finance data (F1A)...") + typer.echo("Loading finance data (F1A/F2)...") load_finance(conn, years) + typer.echo("Loading endowment data (F2)...") + load_endowment(conn, years) if "staff" in components: typer.echo("Loading staff data (S)...") diff --git a/src/admin_analytics/dashboard/app.py b/src/admin_analytics/dashboard/app.py index 487f321..55e6909 100644 --- a/src/admin_analytics/dashboard/app.py +++ b/src/admin_analytics/dashboard/app.py @@ -4,7 +4,9 @@ import dash from dash import dcc, html, Input, Output from admin_analytics.db.connection import get_connection -from admin_analytics.dashboard.pages import overview, compensation, staffing, headcount +from admin_analytics.dashboard.pages import ( + overview, compensation, staffing, headcount, endowment, philanthropy, about, +) def create_app() -> dash.Dash: @@ -25,7 +27,10 @@ def create_app() -> dash.Dash: dcc.Tab(label="Executive Compensation", value="compensation"), dcc.Tab(label="Admin Cost Overview", value="overview"), dcc.Tab(label="Staffing & Enrollment", value="staffing"), + dcc.Tab(label="Endowment", value="endowment"), + dcc.Tab(label="Philanthropy", value="philanthropy"), dcc.Tab(label="Current Headcount", value="headcount"), + dcc.Tab(label="About", value="about"), ], style={"marginBottom": "20px"}, ), @@ -42,8 +47,14 @@ def create_app() -> dash.Dash: return compensation.layout(conn) elif tab == "staffing": return staffing.layout(conn) + elif tab == "endowment": + return endowment.layout(conn) + elif tab == "philanthropy": + return philanthropy.layout(conn) elif tab == "headcount": return headcount.layout(conn) + elif tab == "about": + return about.layout() return html.Div("Unknown tab") compensation.register_callbacks(app, conn) diff --git a/src/admin_analytics/dashboard/pages/about.py b/src/admin_analytics/dashboard/pages/about.py new file mode 100644 index 0000000..e3ade3f --- /dev/null +++ b/src/admin_analytics/dashboard/pages/about.py @@ -0,0 +1,85 @@ +"""Page: About — data sources, methodology, and limitations.""" + +from dash import html, dcc + + +def layout(_conn=None): + return html.Div([ + dcc.Markdown(""" +## About This Dashboard + +This dashboard provides administrative cost benchmarking analytics for the +**University of Delaware** using exclusively **publicly available data**. No +internal university financial systems, personnel records, or confidential data +were used. + +### Data Sources + +All data is drawn from public, open-access sources: + +| Source | Publisher | What We Use | Coverage | +|--------|-----------|-------------|----------| +| **IPEDS** | U.S. Dept. of Education, NCES | Institutional directory, expenses by function, staffing by occupation, enrollment, endowment, philanthropic gifts | 2005-2024 | +| **IRS Form 990** | Internal Revenue Service | Executive compensation (Schedule J), filing financials for UD and UD Research Foundation | Tax years 2015-2023 | +| **BLS CPI-U** | Bureau of Labor Statistics | Consumer Price Index for inflation adjustment (series CUUR0000SA0) | Full history | +| **UD Staff Directories** | University of Delaware public web pages | Administrative office headcounts (College of Engineering line management, Provost's Office) | Current snapshot | + +### Methodology + +**Executive Compensation** is extracted from IRS Form 990 Schedule J, which +reports detailed compensation for officers, directors, trustees, and key +employees of tax-exempt organizations. The University of Delaware (EIN +516000297) and UD Research Foundation (EIN 516017306) are the filing entities. +Titles are normalized to canonical roles (President, Provost, VP Finance, etc.) +using pattern matching. CAGR is computed as compound annual growth rate from +first to last available year. + +**Administrative Cost Ratios** use IPEDS finance survey data. "Institutional +support" is the IPEDS functional expense category that most closely +corresponds to administrative overhead. The admin-to-faculty ratio uses IPEDS +occupational categories: OCCUPCAT 200 (instructional, research, and public +service staff) for faculty and OCCUPCAT 300 (management) for administration. + +**Endowment Performance** uses IPEDS F2 (FASB) survey fields for beginning and +end-of-year endowment values, net investment return, and new gifts. The +endowment CAGR reflects total value growth including investment returns, new +gifts, and spending draws. The CIO compensation comparison uses the Chief +Investment Officer's Schedule J total compensation indexed against endowment +value. Note: the detailed endowment breakdown (investment return, new gifts, +other changes) is only available from IPEDS starting in the 2020 reporting +year. For 2005-2019, only beginning and end-of-year values are reported. + +**Philanthropic Giving** uses IPEDS F2 total private gifts, grants, and +contracts. The compensation-vs-giving comparison indexes the President and VP +of Development compensation against total philanthropic revenue. + +**Inflation Adjustment** uses the BLS CPI-U annual average (all items, U.S. +city average, not seasonally adjusted). CPI-adjusted values are expressed in +the most recent available year's dollars. + +**Staffing** uses IPEDS Fall Staff survey occupational categories for full-time +employees only (FTPT=2). + +### Limitations + +- **IRS 990 coverage** depends on e-file availability. Not all years may have + filings for all entities, and XML schema variations across years can cause + individual fields to be missing. +- **IPEDS data** has a reporting lag; the most recent fiscal year may not yet + be available. +- **Endowment CAGR** reflects net growth after all inflows and outflows, not + pure investment return. It is not directly comparable to an investment + benchmark. +- **Title normalization** uses pattern matching and may misclassify titles that + don't follow common naming conventions. +- **Admin headcount** from web scraping is a point-in-time snapshot and is + limited to the pages currently targeted (College of Engineering and + Provost's Office). +- **Single institution** — this prototype covers the University of Delaware + only. Peer comparisons are planned for a future phase. + +### License + +This project is released under the MIT License. Copyright (c) 2026 Eric Furst. + """), + ], style={"maxWidth": "900px", "margin": "0 auto", "lineHeight": "1.6"}) diff --git a/src/admin_analytics/dashboard/pages/compensation.py b/src/admin_analytics/dashboard/pages/compensation.py index 45506dc..922fc51 100644 --- a/src/admin_analytics/dashboard/pages/compensation.py +++ b/src/admin_analytics/dashboard/pages/compensation.py @@ -10,6 +10,9 @@ from admin_analytics.dashboard.queries import ( query_top_earners, query_comp_by_role, query_comp_vs_cpi, + query_comp_cagr, + query_aggregate_comp, + query_aggregate_comp_cagr, ) _NO_DATA = html.Div( @@ -21,6 +24,24 @@ _NO_DATA = html.Div( _KEY_ROLES = ["PRESIDENT", "PROVOST", "VP_FINANCE", "VP_RESEARCH", "VP_ADVANCEMENT", "CFO"] +def _kpi_card(title: str, value: str, subtitle: str = "") -> html.Div: + return html.Div( + [ + html.H4(title, style={"margin": "0", "color": "#666", "fontSize": "14px"}), + html.H2(value, style={"margin": "5px 0", "color": "#00539F"}), + html.P(subtitle, style={"margin": "0", "color": "#999", "fontSize": "12px"}), + ], + style={ + "flex": "1", + "padding": "20px", + "backgroundColor": "#f8f9fa", + "borderRadius": "8px", + "textAlign": "center", + "margin": "0 8px", + }, + ) + + def layout(conn: duckdb.DuckDBPyConnection): all_earners = query_top_earners(conn) if all_earners.height == 0: @@ -31,6 +52,34 @@ def layout(conn: duckdb.DuckDBPyConnection): {"label": str(y), "value": y} for y in years ] + # KPI cards + cagr = query_comp_cagr(conn) + agg_cagr = query_aggregate_comp_cagr(conn) + kpi_cards = [] + if cagr: + kpi_cards.append(_kpi_card( + "President Compensation", + f"${cagr['end_comp']:,}", + f"Tax year {cagr['end_year']}", + )) + kpi_cards.append(_kpi_card( + "President CAGR", + f"{cagr['cagr_pct']}%", + f"Annualized growth, {cagr['start_year']}-{cagr['end_year']}", + )) + if agg_cagr: + kpi_cards.append(_kpi_card( + "Top-10 Total Compensation", + f"${agg_cagr['end_comp']:,}", + f"Tax year {agg_cagr['end_year']}", + )) + kpi_cards.append(_kpi_card( + "Top-10 CAGR", + f"{agg_cagr['cagr_pct']}%", + f"Annualized growth, {agg_cagr['start_year']}-{agg_cagr['end_year']}", + )) + kpi_row = html.Div(kpi_cards, style={"display": "flex", "marginBottom": "24px"}) if kpi_cards else html.Div() + # Compensation by role trend role_df = query_comp_by_role(conn) role_fig = go.Figure() @@ -61,18 +110,24 @@ def layout(conn: duckdb.DuckDBPyConnection): mode="lines+markers", name="Top Compensation", line={"color": "#00539F"}, )) + cpi_fig.add_trace(go.Scatter( + x=cpi_pd["year"], y=cpi_pd["agg_index"], + mode="lines+markers", name="Top-10 Aggregate", + line={"color": "#E07A5F"}, + )) cpi_fig.add_trace(go.Scatter( x=cpi_pd["year"], y=cpi_pd["cpi_index"], mode="lines+markers", name="CPI-U", line={"color": "#FFD200", "dash": "dash"}, )) cpi_fig.update_layout( - title="Top Compensation vs CPI-U (Indexed, Base Year = 100)", + title="Compensation vs CPI-U (Indexed, Base Year = 100)", xaxis_title="Year", yaxis_title="Index", template="plotly_white", height=380, ) return html.Div([ + kpi_row, html.Div( [ html.Label("Filter by Tax Year: ", style={"fontWeight": "bold"}), @@ -136,8 +191,17 @@ def register_callbacks(app: dash.Dash, conn: duckdb.DuckDBPyConnection) -> None: breakdown_fig = go.Figure() if earners.height > 0: ep = earners.to_pandas().head(10) # top 10 by total comp - short_names = [n.split(",")[0][:20] if "," in n else n.split()[-1][:20] - for n in ep["person_name"]] + _SUFFIXES = {"JR", "SR", "II", "III", "IV", "JR.", "SR."} + + def _short_name(n): + if "," in n: + return n.split(",")[0][:20] + parts = n.split() + while len(parts) > 1 and parts[-1].upper().rstrip(".") in _SUFFIXES: + parts.pop() + return parts[-1][:20] if parts else n[:20] + + short_names = [_short_name(n) for n in ep["person_name"]] for comp_type, label, color in [ ("base_compensation", "Base", "#00539F"), ("bonus_compensation", "Bonus", "#FFD200"), diff --git a/src/admin_analytics/dashboard/pages/endowment.py b/src/admin_analytics/dashboard/pages/endowment.py new file mode 100644 index 0000000..1331fca --- /dev/null +++ b/src/admin_analytics/dashboard/pages/endowment.py @@ -0,0 +1,190 @@ +"""Page: Endowment Performance.""" + +import duckdb +from dash import html, dcc +import plotly.graph_objects as go + +from admin_analytics.dashboard.queries import ( + query_endowment, query_endowment_per_student, query_cio_vs_endowment, +) + +_NO_DATA = html.Div( + "No endowment data loaded. Run: admin-analytics ingest ipeds --component finance", + style={"textAlign": "center", "padding": "40px", "color": "#888"}, +) + + +def _kpi_card(title: str, value: str, subtitle: str = "") -> html.Div: + return html.Div( + [ + html.H4(title, style={"margin": "0", "color": "#666", "fontSize": "14px"}), + html.H2(value, style={"margin": "5px 0", "color": "#00539F"}), + html.P(subtitle, style={"margin": "0", "color": "#999", "fontSize": "12px"}), + ], + style={ + "flex": "1", + "padding": "20px", + "backgroundColor": "#f8f9fa", + "borderRadius": "8px", + "textAlign": "center", + "margin": "0 8px", + }, + ) + + +def layout(conn: duckdb.DuckDBPyConnection): + df = query_endowment(conn) + if df.height == 0: + return _NO_DATA + + pd = df.to_pandas() + latest = pd.iloc[-1] + + # Endowment CAGR from first to last year with data + valid = pd.dropna(subset=["endowment_eoy"]) + if len(valid) >= 2: + first = valid.iloc[0] + last = valid.iloc[-1] + start_year = int(first["year"]) + end_year = int(last["year"]) + n_years = end_year - start_year + start_val = float(first["endowment_eoy"]) + end_val = float(last["endowment_eoy"]) + endow_cagr = round(((end_val / start_val) ** (1.0 / n_years) - 1) * 100, 1) if n_years > 0 and start_val > 0 else None + else: + start_year = end_year = None + endow_cagr = None + + # Endowment per student + eps_df = query_endowment_per_student(conn) + eps_pd = eps_df.to_pandas() if eps_df.height > 0 else None + latest_eps = eps_pd.iloc[-1] if eps_pd is not None and len(eps_pd) > 0 else None + + # KPI cards — single row + kpi_cards = [ + _kpi_card( + "Endowment Value", + f"${latest['endowment_eoy'] / 1e9:.2f}B" if latest["endowment_eoy"] else "N/A", + f"End of FY {int(latest['year'])}", + ), + _kpi_card( + "Endowment CAGR", + f"{endow_cagr}%" if endow_cagr is not None else "N/A", + f"FY {start_year}-{end_year}" if start_year else "", + ), + ] + if latest_eps is not None and latest_eps["endowment_per_student"]: + kpi_cards.append(_kpi_card( + "Endowment per Student", + f"${int(latest_eps['endowment_per_student']):,}", + f"FY {int(latest_eps['year'])}", + )) + kpi_cards.append(_kpi_card( + "New Gifts to Endowment", + f"${latest['new_gifts'] / 1e6:.1f}M" if latest["new_gifts"] else "N/A", + f"FY {int(latest['year'])}", + )) + kpi_row = html.Div(kpi_cards, style={"display": "flex", "marginBottom": "24px"}) + + # Endowment value trend + value_fig = go.Figure() + value_fig.add_trace(go.Scatter( + x=pd["year"], y=pd["endowment_eoy"] / 1e9, + mode="lines+markers", name="End-of-Year Value", + line={"color": "#00539F"}, + fill="tozeroy", fillcolor="rgba(0,83,159,0.1)", + )) + value_fig.update_layout( + title="Endowment Value Over Time", + xaxis_title="Year", yaxis_title="Billions $", + template="plotly_white", height=400, + ) + + # Investment return and new gifts bar chart + components_fig = go.Figure() + components_fig.add_trace(go.Bar( + x=pd["year"], y=pd["net_investment_return"] / 1e6, + name="Net Investment Return", + marker_color="#7FB069", + )) + components_fig.add_trace(go.Bar( + x=pd["year"], y=pd["new_gifts"] / 1e6, + name="New Gifts to Endowment", + marker_color="#00539F", + )) + if "other_changes" in pd.columns: + components_fig.add_trace(go.Bar( + x=pd["year"], y=pd["other_changes"] / 1e6, + name="Other Changes", + marker_color="#999", + )) + components_fig.update_layout( + title="Endowment Changes by Component (Millions $)", + xaxis_title="Year", yaxis_title="Millions $", + barmode="group", + template="plotly_white", height=400, + ) + + # Investment return rate + rate_fig = go.Figure() + rates = pd.copy() + rates["return_pct"] = rates["net_investment_return"] * 100 / rates["endowment_boy"] + rate_fig.add_trace(go.Scatter( + x=rates["year"], y=rates["return_pct"], + mode="lines+markers", name="Return %", + line={"color": "#00539F"}, + )) + rate_fig.add_hline(y=0, line_dash="dot", line_color="#ccc") + rate_fig.update_layout( + title="Endowment Net Investment Return Rate (%)", + xaxis_title="Year", yaxis_title="%", + template="plotly_white", height=380, + ) + + # CIO compensation vs endowment growth + cio_df = query_cio_vs_endowment(conn) + cio_fig = go.Figure() + if cio_df.height > 1: + cio_pd = cio_df.to_pandas() + cio_fig.add_trace(go.Scatter( + x=cio_pd["tax_year"], y=cio_pd["cio_index"], + mode="lines+markers", name="CIO Compensation", + line={"color": "#E07A5F"}, + )) + cio_fig.add_trace(go.Scatter( + x=cio_pd["tax_year"], y=cio_pd["endowment_index"], + mode="lines+markers", name="Endowment Value", + line={"color": "#00539F"}, + )) + cio_fig.add_hline(y=100, line_dash="dot", line_color="#ccc") + cio_fig.update_layout( + title="Chief Investment Officer Compensation vs Endowment Growth (Indexed, Base Year = 100)", + xaxis_title="Year", yaxis_title="Index", + template="plotly_white", height=400, + ) + + # Endowment per student trend + eps_fig = go.Figure() + if eps_pd is not None and len(eps_pd) > 0: + eps_fig.add_trace(go.Scatter( + x=eps_pd["year"], y=eps_pd["endowment_per_student"], + mode="lines+markers", name="Endowment per Student", + line={"color": "#00539F"}, + )) + eps_fig.update_layout( + title="Endowment per Student ($)", + xaxis_title="Year", yaxis_title="$", + template="plotly_white", height=380, + ) + + charts = [ + kpi_row, + dcc.Graph(figure=value_fig), + dcc.Graph(figure=eps_fig), + dcc.Graph(figure=components_fig), + dcc.Graph(figure=rate_fig), + ] + if cio_df.height > 1: + charts.append(dcc.Graph(figure=cio_fig)) + + return html.Div(charts) diff --git a/src/admin_analytics/dashboard/pages/philanthropy.py b/src/admin_analytics/dashboard/pages/philanthropy.py new file mode 100644 index 0000000..55236e4 --- /dev/null +++ b/src/admin_analytics/dashboard/pages/philanthropy.py @@ -0,0 +1,132 @@ +"""Page: Philanthropic Giving.""" + +import duckdb +from dash import html, dcc +import plotly.graph_objects as go + +from admin_analytics.dashboard.queries import query_philanthropy, query_comp_vs_philanthropy + +_NO_DATA = html.Div( + "No philanthropy data loaded. Run: admin-analytics ingest ipeds --component finance", + style={"textAlign": "center", "padding": "40px", "color": "#888"}, +) + + +def _kpi_card(title: str, value: str, subtitle: str = "") -> html.Div: + return html.Div( + [ + html.H4(title, style={"margin": "0", "color": "#666", "fontSize": "14px"}), + html.H2(value, style={"margin": "5px 0", "color": "#00539F"}), + html.P(subtitle, style={"margin": "0", "color": "#999", "fontSize": "12px"}), + ], + style={ + "flex": "1", + "padding": "20px", + "backgroundColor": "#f8f9fa", + "borderRadius": "8px", + "textAlign": "center", + "margin": "0 8px", + }, + ) + + +def layout(conn: duckdb.DuckDBPyConnection): + df = query_philanthropy(conn) + if df.height == 0: + return _NO_DATA + + pd = df.to_pandas() + latest = pd.iloc[-1] + + # KPI cards + kpi_row = html.Div( + [ + _kpi_card( + "Total Private Gifts & Grants", + f"${latest['total_private_gifts'] / 1e6:.1f}M" if latest["total_private_gifts"] else "N/A", + f"FY {int(latest['year'])}", + ), + _kpi_card( + "Gifts to Endowment", + f"${latest['endowment_gifts'] / 1e6:.1f}M" if latest["endowment_gifts"] else "N/A", + f"FY {int(latest['year'])}", + ), + ], + style={"display": "flex", "marginBottom": "24px"}, + ) + + # Private gifts trend (nominal and CPI-adjusted) + gifts_fig = go.Figure() + gifts_fig.add_trace(go.Bar( + x=pd["year"], y=pd["total_private_gifts"] / 1e6, + name="Nominal", + marker_color="#00539F", + )) + if "gifts_cpi_adjusted" in pd.columns and pd["gifts_cpi_adjusted"].notna().any(): + gifts_fig.add_trace(go.Scatter( + x=pd["year"], y=pd["gifts_cpi_adjusted"] / 1e6, + mode="lines+markers", name="CPI-Adjusted", + line={"color": "#FFD200", "dash": "dash"}, + )) + gifts_fig.update_layout( + title="Total Private Gifts & Grants (Millions $)", + xaxis_title="Year", yaxis_title="Millions $", + template="plotly_white", height=420, + ) + + # Endowment gifts vs total gifts + split_fig = go.Figure() + pd["non_endowment_gifts"] = pd["total_private_gifts"] - pd["endowment_gifts"].fillna(0) + split_fig.add_trace(go.Bar( + x=pd["year"], y=pd["endowment_gifts"] / 1e6, + name="To Endowment", + marker_color="#00539F", + )) + split_fig.add_trace(go.Bar( + x=pd["year"], y=pd["non_endowment_gifts"] / 1e6, + name="Current Use / Other", + marker_color="#7FB069", + )) + split_fig.update_layout( + title="Gift Allocation: Endowment vs Current Use (Millions $)", + xaxis_title="Year", yaxis_title="Millions $", + barmode="stack", + template="plotly_white", height=400, + ) + + # Compensation vs philanthropy indexed chart + cvp_df = query_comp_vs_philanthropy(conn) + cvp_fig = go.Figure() + if cvp_df.height > 1: + cvp_pd = cvp_df.to_pandas() + cvp_fig.add_trace(go.Scatter( + x=cvp_pd["tax_year"], y=cvp_pd["president_index"], + mode="lines+markers", name="President Compensation", + line={"color": "#00539F"}, + )) + cvp_fig.add_trace(go.Scatter( + x=cvp_pd["tax_year"], y=cvp_pd["vp_adv_index"], + mode="lines+markers", name="VP Development Compensation", + line={"color": "#E07A5F"}, + )) + cvp_fig.add_trace(go.Scatter( + x=cvp_pd["tax_year"], y=cvp_pd["gifts_index"], + mode="lines+markers", name="Philanthropic Gifts", + line={"color": "#7FB069"}, + )) + cvp_fig.add_hline(y=100, line_dash="dot", line_color="#ccc") + cvp_fig.update_layout( + title="Compensation Growth vs Philanthropic Giving (Indexed, Base Year = 100)", + xaxis_title="Year", yaxis_title="Index", + template="plotly_white", height=420, + ) + + charts = [ + kpi_row, + dcc.Graph(figure=gifts_fig), + dcc.Graph(figure=split_fig), + ] + if cvp_df.height > 1: + charts.append(dcc.Graph(figure=cvp_fig)) + + return html.Div(charts) diff --git a/src/admin_analytics/dashboard/queries.py b/src/admin_analytics/dashboard/queries.py index 8206415..9d81ecc 100644 --- a/src/admin_analytics/dashboard/queries.py +++ b/src/admin_analytics/dashboard/queries.py @@ -96,6 +96,116 @@ def query_admin_faculty_ratio(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: """, [UD_UNITID]).pl() +def query_aggregate_comp(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: + """Top-10 Schedule J compensation per year — total, count, and average.""" + return conn.execute(""" + WITH ranked AS ( + SELECT j.tax_year, j.total_compensation, + j.base_compensation, j.bonus_compensation, + j.deferred_compensation, j.nontaxable_benefits, + j.other_compensation, + ROW_NUMBER() OVER (PARTITION BY j.tax_year + ORDER BY j.total_compensation DESC) AS rn + FROM raw_990_schedule_j j + WHERE j.total_compensation > 0 + ) + SELECT tax_year, + COUNT(*) AS headcount, + SUM(total_compensation) AS total_comp, + ROUND(AVG(total_compensation), 0) AS avg_comp, + SUM(base_compensation) AS total_base, + SUM(bonus_compensation) AS total_bonus, + SUM(deferred_compensation) AS total_deferred, + SUM(nontaxable_benefits) AS total_benefits, + SUM(other_compensation) AS total_other + FROM ranked + WHERE rn <= 10 + GROUP BY tax_year + ORDER BY tax_year + """).pl() + + +def query_aggregate_comp_cagr(conn: duckdb.DuckDBPyConnection) -> dict | None: + """CAGR of aggregate Schedule J compensation over the last 5 years of data.""" + df = query_aggregate_comp(conn) + if df.height < 2: + return None + + # Use last 5 years of available data + df = df.tail(min(5, df.height)) + + start_year = df["tax_year"][0] + end_year = df["tax_year"][-1] + start_comp = float(df["total_comp"][0]) + end_comp = float(df["total_comp"][-1]) + n_years = end_year - start_year + + if n_years <= 0 or start_comp <= 0: + return None + + cagr = ((end_comp / start_comp) ** (1.0 / n_years) - 1) * 100 + return { + "cagr_pct": round(cagr, 1), + "start_year": start_year, + "end_year": end_year, + "start_comp": int(end_comp), + "end_comp": int(end_comp), + } + + +def query_comp_cagr(conn: duckdb.DuckDBPyConnection) -> dict | None: + """Annualized growth rate (CAGR) of President compensation. + + Tracks the President role specifically using title normalization. + Returns dict with cagr_pct, start_year, end_year, start_comp, end_comp, + or None if insufficient data. + """ + raw = conn.execute(""" + SELECT j.tax_year, j.title, j.total_compensation + FROM raw_990_schedule_j j + WHERE j.total_compensation > 0 + ORDER BY j.tax_year + """).pl() + + if raw.height == 0: + return None + + raw = raw.with_columns( + pl.col("title").map_elements( + normalize_title, return_dtype=pl.Utf8 + ).alias("role") + ) + + df = ( + raw.filter(pl.col("role") == "PRESIDENT") + .group_by("tax_year") + .agg(pl.col("total_compensation").max().alias("top_comp")) + .sort("tax_year") + ) + + if df.height < 2: + return None + + start_year = df["tax_year"][0] + end_year = df["tax_year"][-1] + start_comp = df["top_comp"][0] + end_comp = df["top_comp"][-1] + n_years = end_year - start_year + + if n_years <= 0 or start_comp <= 0: + return None + + cagr = ((end_comp / start_comp) ** (1.0 / n_years) - 1) * 100 + + return { + "cagr_pct": round(cagr, 1), + "start_year": start_year, + "end_year": end_year, + "start_comp": start_comp, + "end_comp": end_comp, + } + + def query_top_earners( conn: duckdb.DuckDBPyConnection, year: int | None = None ) -> pl.DataFrame: @@ -162,11 +272,23 @@ def query_comp_by_role(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: def query_comp_vs_cpi(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: - """Compensation growth vs CPI growth, indexed to first available year = 100.""" + """Compensation growth vs CPI growth, indexed to first available year = 100. + + Includes top earner, top-10 aggregate, and CPI-U. + """ return conn.execute(""" - WITH yearly_max_comp AS ( - SELECT tax_year, MAX(total_compensation) AS top_comp + WITH ranked AS ( + SELECT tax_year, total_compensation, + ROW_NUMBER() OVER (PARTITION BY tax_year + ORDER BY total_compensation DESC) AS rn FROM raw_990_schedule_j + WHERE total_compensation > 0 + ), + yearly_comp AS ( + SELECT tax_year, + MAX(total_compensation) AS top_comp, + SUM(CASE WHEN rn <= 10 THEN total_compensation END) AS agg_comp + FROM ranked GROUP BY tax_year ), annual_cpi AS ( @@ -174,20 +296,24 @@ def query_comp_vs_cpi(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: FROM raw_cpi_u GROUP BY year ), base AS ( - SELECT c.top_comp AS base_comp, ac.avg_cpi AS base_cpi - FROM yearly_max_comp c + SELECT c.top_comp AS base_top, c.agg_comp AS base_agg, + ac.avg_cpi AS base_cpi + FROM yearly_comp c JOIN annual_cpi ac ON ac.year = c.tax_year ORDER BY c.tax_year LIMIT 1 ) SELECT c.tax_year AS year, c.top_comp, + c.agg_comp, ac.avg_cpi, - ROUND(c.top_comp * 100.0 / NULLIF((SELECT base_comp FROM base), 0), 1) + ROUND(c.top_comp * 100.0 / NULLIF((SELECT base_top FROM base), 0), 1) AS comp_index, + ROUND(c.agg_comp * 100.0 / NULLIF((SELECT base_agg FROM base), 0), 1) + AS agg_index, ROUND(ac.avg_cpi * 100.0 / NULLIF((SELECT base_cpi FROM base), 0), 1) AS cpi_index - FROM yearly_max_comp c + FROM yearly_comp c JOIN annual_cpi ac ON ac.year = c.tax_year ORDER BY year """).pl() @@ -249,6 +375,166 @@ def query_growth_index(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: """, [UD_UNITID, UD_UNITID]).pl() +def query_endowment(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: + """Endowment performance over time.""" + return conn.execute(""" + SELECT year, endowment_boy, endowment_eoy, new_gifts, + net_investment_return, other_changes, long_term_investments + FROM raw_ipeds_endowment + WHERE unitid = ? + ORDER BY year + """, [UD_UNITID]).pl() + + +def query_endowment_per_student(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: + """Endowment value per student over time.""" + return conn.execute(""" + SELECT e.year, e.endowment_eoy, en.total_enrollment, + ROUND(e.endowment_eoy * 1.0 / NULLIF(en.total_enrollment, 0), 0) + AS endowment_per_student + FROM raw_ipeds_endowment e + JOIN raw_ipeds_enrollment en ON en.unitid = e.unitid AND en.year = e.year + WHERE e.unitid = ? + ORDER BY e.year + """, [UD_UNITID]).pl() + + +def query_cio_vs_endowment(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: + """Chief Investment Officer compensation vs endowment growth, indexed.""" + raw = conn.execute(""" + SELECT j.tax_year, j.title, j.total_compensation + FROM raw_990_schedule_j j + WHERE j.total_compensation > 0 + """).pl() + + if raw.height == 0: + return pl.DataFrame() + + raw = raw.with_columns( + pl.col("title").map_elements( + normalize_title, return_dtype=pl.Utf8 + ).alias("role") + ) + + cio = ( + raw.filter(pl.col("role") == "CHIEF_INVESTMENT_OFFICER") + .group_by("tax_year") + .agg(pl.col("total_compensation").max().alias("cio_comp")) + .sort("tax_year") + ) + + if cio.height == 0: + return pl.DataFrame() + + endow = conn.execute(""" + SELECT year, endowment_eoy + FROM raw_ipeds_endowment + WHERE unitid = ? + ORDER BY year + """, [UD_UNITID]).pl() + + merged = ( + cio.join(endow, left_on="tax_year", right_on="year", how="inner") + .drop_nulls(subset=["cio_comp", "endowment_eoy"]) + .sort("tax_year") + ) + + if merged.height < 2: + return merged + + base_comp = float(merged["cio_comp"][0]) + base_endow = float(merged["endowment_eoy"][0]) + + merged = merged.with_columns( + (pl.col("cio_comp").cast(pl.Float64) * 100.0 / base_comp).round(1).alias("cio_index"), + (pl.col("endowment_eoy").cast(pl.Float64) * 100.0 / base_endow).round(1).alias("endowment_index"), + ) + + return merged + + +def query_philanthropy(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: + """Philanthropic giving over time — IPEDS private gifts + 990 revenue.""" + return conn.execute(f""" + {_CPI_CTE} + SELECT e.year, e.total_private_gifts, e.new_gifts AS endowment_gifts, + ROUND(e.total_private_gifts * (SELECT avg_cpi FROM latest_cpi) + / ac.avg_cpi, 0) AS gifts_cpi_adjusted + FROM raw_ipeds_endowment e + LEFT JOIN annual_cpi ac ON ac.year = e.year + WHERE e.unitid = ? + ORDER BY e.year + """, [UD_UNITID]).pl() + + +def query_comp_vs_philanthropy(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: + """VP Advancement and President comp vs philanthropic gifts, indexed.""" + raw = conn.execute(""" + SELECT j.tax_year, j.title, j.total_compensation + FROM raw_990_schedule_j j + WHERE j.total_compensation > 0 + """).pl() + + if raw.height == 0: + return pl.DataFrame() + + raw = raw.with_columns( + pl.col("title").map_elements( + normalize_title, return_dtype=pl.Utf8 + ).alias("role") + ) + + # Get max comp per role per year for President and VP Advancement + roles = raw.filter(pl.col("role").is_in(["PRESIDENT", "VP_ADVANCEMENT"])) + if roles.height == 0: + return pl.DataFrame() + + pivoted = ( + roles.group_by(["tax_year", "role"]) + .agg(pl.col("total_compensation").max().alias("comp")) + .sort("tax_year") + ) + + pres = ( + pivoted.filter(pl.col("role") == "PRESIDENT") + .select(pl.col("tax_year"), pl.col("comp").alias("president_comp")) + ) + vp = ( + pivoted.filter(pl.col("role") == "VP_ADVANCEMENT") + .select(pl.col("tax_year"), pl.col("comp").alias("vp_adv_comp")) + ) + + gifts = conn.execute(""" + SELECT year, total_private_gifts + FROM raw_ipeds_endowment + WHERE unitid = ? + ORDER BY year + """, [UD_UNITID]).pl() + + # Join all three on year + merged = ( + pres.join(vp, on="tax_year", how="outer_coalesce") + .join(gifts, left_on="tax_year", right_on="year", how="inner") + .drop_nulls(subset=["total_private_gifts"]) + .sort("tax_year") + ) + + if merged.height < 2: + return merged + + base_pres = float(merged.drop_nulls("president_comp")["president_comp"][0]) + base_vp = float(merged.drop_nulls("vp_adv_comp")["vp_adv_comp"][0]) + base_gifts = float(merged["total_private_gifts"][0]) + + merged = merged.with_columns( + (pl.col("president_comp").cast(pl.Float64) * 100.0 / base_pres).round(1).alias("president_index"), + (pl.col("vp_adv_comp").cast(pl.Float64) * 100.0 / base_vp).round(1).alias("vp_adv_index"), + (pl.col("total_private_gifts").cast(pl.Float64) * 100.0 / base_gifts).round(1).alias("gifts_index"), + ) + + return merged + + def query_admin_headcount(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame: """All scraped admin headcount entries.""" return conn.execute(""" diff --git a/src/admin_analytics/db/schema.py b/src/admin_analytics/db/schema.py index 4fbea4e..01c9ae3 100644 --- a/src/admin_analytics/db/schema.py +++ b/src/admin_analytics/db/schema.py @@ -100,6 +100,21 @@ TABLES = { other_compensation BIGINT ) """, + "raw_ipeds_endowment": """ + CREATE TABLE IF NOT EXISTS raw_ipeds_endowment ( + unitid INTEGER NOT NULL, + year INTEGER NOT NULL, + endowment_boy BIGINT, + endowment_eoy BIGINT, + new_gifts BIGINT, + net_investment_return BIGINT, + other_changes BIGINT, + total_private_gifts BIGINT, + total_investment_return BIGINT, + long_term_investments BIGINT, + PRIMARY KEY (unitid, year) + ) + """, "raw_cpi_u": """ CREATE TABLE IF NOT EXISTS raw_cpi_u ( year INTEGER NOT NULL, diff --git a/src/admin_analytics/ipeds/finance.py b/src/admin_analytics/ipeds/finance.py index 3ec718c..dd1bdc2 100644 --- a/src/admin_analytics/ipeds/finance.py +++ b/src/admin_analytics/ipeds/finance.py @@ -40,6 +40,25 @@ F2_COLUMN_VARIANTS = { "benefits": ["F2E133"], } +# F2 endowment / philanthropy fields +F2_ENDOWMENT_VARIANTS = { + "unitid": ["UNITID"], + "endowment_boy": ["F2H01"], + "endowment_eoy": ["F2H02"], + "new_gifts": ["F2H03A"], + "net_investment_return": ["F2H03B"], + "other_changes": ["F2H03D"], + "total_private_gifts": ["F2D08"], + "total_investment_return": ["F2D10"], + "long_term_investments": ["F2A01"], +} + +ENDOWMENT_COLUMNS = [ + "unitid", "year", "endowment_boy", "endowment_eoy", "new_gifts", + "net_investment_return", "other_changes", "total_private_gifts", + "total_investment_return", "long_term_investments", +] + CANONICAL_COLUMNS = [ "unitid", "year", "reporting_standard", "total_expenses", "instruction_expenses", "research_expenses", "public_service_expenses", @@ -56,7 +75,7 @@ def _find_csv(component_dir: Path) -> Path | None: def _resolve_columns(df: pl.DataFrame, variants: dict) -> dict[str, str]: """For each canonical name, find the first matching column.""" - upper_cols = {c.upper(): c for c in df.columns} + upper_cols = {c.strip().upper(): c for c in df.columns} resolved = {} for canonical, candidates in variants.items(): for var in candidates: @@ -140,3 +159,49 @@ def load_finance( print(f" No finance CSV found for {year}, skipping") return total + + +def load_endowment( + conn: duckdb.DuckDBPyConnection, + year_range: range, + unitid_filter: int | None = UD_UNITID, +) -> int: + """Load IPEDS F2 endowment and philanthropy data into raw_ipeds_endowment.""" + total = 0 + for year in year_range: + f2_dir = config.IPEDS_DATA_DIR / "finance_f2" / str(year) + csv_path = _find_csv(f2_dir) + if csv_path is None: + continue + + df = pl.read_csv(csv_path, infer_schema_length=0, encoding="utf8-lossy") + col_map = _resolve_columns(df, F2_ENDOWMENT_VARIANTS) + + if "unitid" not in col_map: + continue + + result = pl.DataFrame({ + canonical: df[actual] for canonical, actual in col_map.items() + }) + result = result.with_columns(pl.lit(year).alias("year")) + + for col in ENDOWMENT_COLUMNS: + if col not in result.columns: + result = result.with_columns(pl.lit(None).alias(col)) + elif col not in ("year",): + result = result.with_columns(pl.col(col).cast(pl.Int64, strict=False)) + + if unitid_filter is not None: + result = result.filter(pl.col("unitid") == unitid_filter) + + if result.height == 0: + continue + + result = result.select(ENDOWMENT_COLUMNS) + conn.execute("DELETE FROM raw_ipeds_endowment WHERE year = ?", [year]) + conn.register("_tmp_endow", result.to_arrow()) + conn.execute("INSERT INTO raw_ipeds_endowment SELECT * FROM _tmp_endow") + conn.unregister("_tmp_endow") + total += result.height + + return total diff --git a/src/admin_analytics/irs990/titles.py b/src/admin_analytics/irs990/titles.py index 416eced..7adaf55 100644 --- a/src/admin_analytics/irs990/titles.py +++ b/src/admin_analytics/irs990/titles.py @@ -10,9 +10,10 @@ TITLE_PATTERNS: list[tuple[str, re.Pattern]] = [ ("VP_FINANCE", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\b(?:financ|budget|business|admin)|\b(?:financ|budget|business|admin).*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)), ("VP_RESEARCH", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\bresearch|\bresearch.*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)), ("VP_STUDENT_AFFAIRS", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\bstudent|\bstudent.*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)), - ("VP_ADVANCEMENT", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\b(?:advancement|development|giving|fundrais)|\b(?:advancement|development|giving|fundrais).*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)), + ("VP_ADVANCEMENT", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\b(?:advancement|develop|alumni|giving|fundrais)|\b(?:advancement|develop|alumni|giving|fundrais).*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)), ("VP_OTHER", re.compile(r"\bv\.?p\.?\b|\bvice\s+president\b", re.I)), ("CFO", re.compile(r"\b(cfo|chief\s+financial)\b", re.I)), + ("CHIEF_INVESTMENT_OFFICER", re.compile(r"\bchief\s+investment\b", re.I)), ("CIO", re.compile(r"\b(cio|chief\s+information)\b", re.I)), ("COO", re.compile(r"\b(coo|chief\s+operating)\b", re.I)), ("GENERAL_COUNSEL", re.compile(r"\b(general\s+counsel|chief\s+legal)\b", re.I)), diff --git a/src/admin_analytics/validation.py b/src/admin_analytics/validation.py index a68027d..eff1b64 100644 --- a/src/admin_analytics/validation.py +++ b/src/admin_analytics/validation.py @@ -16,6 +16,7 @@ KEY_COLUMNS: dict[str, list[str]] = { "raw_990_filing": ["ein", "tax_year", "total_revenue", "total_expenses"], "raw_990_part_vii": ["ein", "tax_year", "person_name", "reportable_comp_from_org"], "raw_990_schedule_j": ["ein", "tax_year", "person_name", "total_compensation"], + "raw_ipeds_endowment": ["unitid", "year", "endowment_eoy"], "raw_cpi_u": ["year", "month", "value"], "raw_admin_headcount": ["unit", "person_name", "category"], } @@ -29,6 +30,7 @@ YEAR_COLUMN: dict[str, str] = { "raw_990_filing": "tax_year", "raw_990_part_vii": "tax_year", "raw_990_schedule_j": "tax_year", + "raw_ipeds_endowment": "year", "raw_cpi_u": "year", }