Compensation, endowmnet tweaks. Added About.
This commit is contained in:
parent
a41f78545b
commit
13fb4b8418
13 changed files with 914 additions and 17 deletions
27
README.md
27
README.md
|
|
@ -2,6 +2,26 @@
|
|||
|
||||
University of Delaware administrative cost benchmarking using public data (IRS 990, IPEDS, BLS CPI-U). Ingests data into a local DuckDB database and serves an interactive Dash dashboard for analysis.
|
||||
|
||||
## Scope
|
||||
|
||||
This project is currently scoped to the **University of Delaware** as a single institution. It tracks:
|
||||
|
||||
- **Executive compensation** from IRS 990 Schedule J filings by the University of Delaware (EIN 516000297) and UD Research Foundation (EIN 516017306)
|
||||
- **Administrative cost ratios** from IPEDS finance surveys (expenses by function, staffing levels, enrollment)
|
||||
- **Endowment performance** and **philanthropic giving** from IPEDS F2 (FASB) financial data
|
||||
- **Administrative headcount** via web scraping, currently focused on the **College of Engineering line management** (COE Central, department offices) and the Provost's Office
|
||||
|
||||
### Changing the target institution
|
||||
|
||||
The institution scope is controlled by constants in `src/admin_analytics/config.py`:
|
||||
|
||||
- `UD_UNITID = 130943` -- IPEDS institution identifier. Change this to target a different institution. Look up UNITIDs at the [IPEDS Data Center](https://nces.ed.gov/ipeds/use-the-data).
|
||||
- `UD_EINS = [516000297, 516017306]` -- IRS Employer Identification Numbers for 990 filings. Update these to the EINs of the target institution's nonprofit entities.
|
||||
|
||||
All IPEDS loaders accept a `unitid_filter` parameter. The scraper URLs in `src/admin_analytics/scraper/directory.py` are UD-specific and would need to be updated for a different institution.
|
||||
|
||||
Multi-institution comparisons (AAU peers, Carnegie peers) are planned for a future phase.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.11+
|
||||
|
|
@ -49,12 +69,15 @@ Opens at [http://localhost:8050](http://localhost:8050). Use `--port` to change
|
|||
|
||||
The dashboard must be restarted to pick up newly ingested data (DuckDB opens in read-only mode to avoid lock conflicts).
|
||||
|
||||
The dashboard has four tabs:
|
||||
The dashboard has seven tabs:
|
||||
|
||||
- **Executive Compensation** -- top earners from IRS 990 Schedule J, compensation trends by role, compensation breakdown by component, growth vs CPI-U (2017-2023)
|
||||
- **Executive Compensation** -- top earners from IRS 990 Schedule J, President and top-10 CAGR, trends by role, compensation breakdown by component, growth vs CPI-U (2015-2023)
|
||||
- **Admin Cost Overview** -- admin cost ratios, expense breakdown by function, cost per student, admin-to-faculty ratio (IPEDS data, 2005-2024)
|
||||
- **Staffing & Enrollment** -- staff composition, student-to-staff ratios, management vs faculty vs enrollment growth (indexed)
|
||||
- **Endowment** -- endowment value trends, CAGR, investment return rate, CIO compensation vs endowment growth (IPEDS F2)
|
||||
- **Philanthropy** -- total private gifts and grants, gift allocation, President and VP Development compensation growth vs fundraising (IPEDS F2 and IRS 990)
|
||||
- **Current Headcount** -- scraped UD staff directory data with overhead/non-overhead classification by unit
|
||||
- **About** -- data sources, methodology, and limitations
|
||||
|
||||
## Validating Data
|
||||
|
||||
|
|
|
|||
|
|
@ -147,6 +147,26 @@ Raw data layer for University of Delaware administrative analytics. All tables a
|
|||
| value | DOUBLE | CPI-U index value (base period: 1982-84 = 100) |
|
||||
| series_id | VARCHAR | BLS series identifier (always CUUR0000SA0) |
|
||||
|
||||
### raw_ipeds_endowment
|
||||
|
||||
**Source:** IPEDS F2 (FASB) finance survey — endowment and investment sections
|
||||
**Granularity:** One row per institution per year
|
||||
**Primary Key:** (unitid, year)
|
||||
**Note:** Endowment fields (F2H*) are available for all years 2005-2023.
|
||||
|
||||
| Column | Type | Description | Source field |
|
||||
|--------|------|-------------|-------------|
|
||||
| unitid | INTEGER | IPEDS institution identifier | UNITID |
|
||||
| year | INTEGER | Fiscal year | derived from filename |
|
||||
| endowment_boy | BIGINT | Endowment value, beginning of fiscal year | F2H01 |
|
||||
| endowment_eoy | BIGINT | Endowment value, end of fiscal year | F2H02 |
|
||||
| new_gifts | BIGINT | New gifts and additions to endowment | F2H03A |
|
||||
| net_investment_return | BIGINT | Net investment return on endowment | F2H03B |
|
||||
| other_changes | BIGINT | Other changes in endowment value | F2H03D |
|
||||
| total_private_gifts | BIGINT | Total private gifts, grants, and contracts | F2D08 |
|
||||
| total_investment_return | BIGINT | Total investment return (all funds) | F2D10 |
|
||||
| long_term_investments | BIGINT | Long-term investments (balance sheet) | F2A01 |
|
||||
|
||||
### raw_admin_headcount
|
||||
|
||||
**Source:** Web scraping of UD staff directory pages
|
||||
|
|
@ -170,4 +190,5 @@ Raw data layer for University of Delaware administrative analytics. All tables a
|
|||
- **IRS 990 tables** are linked by `object_id` (filing) and `ein` (organization)
|
||||
- **IPEDS → IRS 990:** The `ein` field in `raw_institution` links to `ein` in 990 tables. UD Foundation EINs: 516000297, 516017306
|
||||
- **CPI-U** is used for inflation adjustment — join on `year` (and optionally `month`) to any table with a year column
|
||||
- **Endowment** data comes from IPEDS F2 endowment section; 990 `total_assets` provides a cross-check
|
||||
- **Admin headcount** links to IPEDS via institutional context (UD only in first iteration)
|
||||
|
|
|
|||
|
|
@ -39,7 +39,7 @@ def ipeds(
|
|||
|
||||
from admin_analytics.ipeds.download import download_all
|
||||
from admin_analytics.ipeds.institution import load_institutions
|
||||
from admin_analytics.ipeds.finance import load_finance
|
||||
from admin_analytics.ipeds.finance import load_finance, load_endowment
|
||||
from admin_analytics.ipeds.staff import load_staff
|
||||
from admin_analytics.ipeds.enrollment import load_enrollment
|
||||
|
||||
|
|
@ -62,8 +62,10 @@ def ipeds(
|
|||
load_institutions(conn, years)
|
||||
|
||||
if "finance" in components:
|
||||
typer.echo("Loading finance data (F1A)...")
|
||||
typer.echo("Loading finance data (F1A/F2)...")
|
||||
load_finance(conn, years)
|
||||
typer.echo("Loading endowment data (F2)...")
|
||||
load_endowment(conn, years)
|
||||
|
||||
if "staff" in components:
|
||||
typer.echo("Loading staff data (S)...")
|
||||
|
|
|
|||
|
|
@ -4,7 +4,9 @@ import dash
|
|||
from dash import dcc, html, Input, Output
|
||||
|
||||
from admin_analytics.db.connection import get_connection
|
||||
from admin_analytics.dashboard.pages import overview, compensation, staffing, headcount
|
||||
from admin_analytics.dashboard.pages import (
|
||||
overview, compensation, staffing, headcount, endowment, philanthropy, about,
|
||||
)
|
||||
|
||||
|
||||
def create_app() -> dash.Dash:
|
||||
|
|
@ -25,7 +27,10 @@ def create_app() -> dash.Dash:
|
|||
dcc.Tab(label="Executive Compensation", value="compensation"),
|
||||
dcc.Tab(label="Admin Cost Overview", value="overview"),
|
||||
dcc.Tab(label="Staffing & Enrollment", value="staffing"),
|
||||
dcc.Tab(label="Endowment", value="endowment"),
|
||||
dcc.Tab(label="Philanthropy", value="philanthropy"),
|
||||
dcc.Tab(label="Current Headcount", value="headcount"),
|
||||
dcc.Tab(label="About", value="about"),
|
||||
],
|
||||
style={"marginBottom": "20px"},
|
||||
),
|
||||
|
|
@ -42,8 +47,14 @@ def create_app() -> dash.Dash:
|
|||
return compensation.layout(conn)
|
||||
elif tab == "staffing":
|
||||
return staffing.layout(conn)
|
||||
elif tab == "endowment":
|
||||
return endowment.layout(conn)
|
||||
elif tab == "philanthropy":
|
||||
return philanthropy.layout(conn)
|
||||
elif tab == "headcount":
|
||||
return headcount.layout(conn)
|
||||
elif tab == "about":
|
||||
return about.layout()
|
||||
return html.Div("Unknown tab")
|
||||
|
||||
compensation.register_callbacks(app, conn)
|
||||
|
|
|
|||
85
src/admin_analytics/dashboard/pages/about.py
Normal file
85
src/admin_analytics/dashboard/pages/about.py
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
"""Page: About — data sources, methodology, and limitations."""
|
||||
|
||||
from dash import html, dcc
|
||||
|
||||
|
||||
def layout(_conn=None):
|
||||
return html.Div([
|
||||
dcc.Markdown("""
|
||||
## About This Dashboard
|
||||
|
||||
This dashboard provides administrative cost benchmarking analytics for the
|
||||
**University of Delaware** using exclusively **publicly available data**. No
|
||||
internal university financial systems, personnel records, or confidential data
|
||||
were used.
|
||||
|
||||
### Data Sources
|
||||
|
||||
All data is drawn from public, open-access sources:
|
||||
|
||||
| Source | Publisher | What We Use | Coverage |
|
||||
|--------|-----------|-------------|----------|
|
||||
| **IPEDS** | U.S. Dept. of Education, NCES | Institutional directory, expenses by function, staffing by occupation, enrollment, endowment, philanthropic gifts | 2005-2024 |
|
||||
| **IRS Form 990** | Internal Revenue Service | Executive compensation (Schedule J), filing financials for UD and UD Research Foundation | Tax years 2015-2023 |
|
||||
| **BLS CPI-U** | Bureau of Labor Statistics | Consumer Price Index for inflation adjustment (series CUUR0000SA0) | Full history |
|
||||
| **UD Staff Directories** | University of Delaware public web pages | Administrative office headcounts (College of Engineering line management, Provost's Office) | Current snapshot |
|
||||
|
||||
### Methodology
|
||||
|
||||
**Executive Compensation** is extracted from IRS Form 990 Schedule J, which
|
||||
reports detailed compensation for officers, directors, trustees, and key
|
||||
employees of tax-exempt organizations. The University of Delaware (EIN
|
||||
516000297) and UD Research Foundation (EIN 516017306) are the filing entities.
|
||||
Titles are normalized to canonical roles (President, Provost, VP Finance, etc.)
|
||||
using pattern matching. CAGR is computed as compound annual growth rate from
|
||||
first to last available year.
|
||||
|
||||
**Administrative Cost Ratios** use IPEDS finance survey data. "Institutional
|
||||
support" is the IPEDS functional expense category that most closely
|
||||
corresponds to administrative overhead. The admin-to-faculty ratio uses IPEDS
|
||||
occupational categories: OCCUPCAT 200 (instructional, research, and public
|
||||
service staff) for faculty and OCCUPCAT 300 (management) for administration.
|
||||
|
||||
**Endowment Performance** uses IPEDS F2 (FASB) survey fields for beginning and
|
||||
end-of-year endowment values, net investment return, and new gifts. The
|
||||
endowment CAGR reflects total value growth including investment returns, new
|
||||
gifts, and spending draws. The CIO compensation comparison uses the Chief
|
||||
Investment Officer's Schedule J total compensation indexed against endowment
|
||||
value. Note: the detailed endowment breakdown (investment return, new gifts,
|
||||
other changes) is only available from IPEDS starting in the 2020 reporting
|
||||
year. For 2005-2019, only beginning and end-of-year values are reported.
|
||||
|
||||
**Philanthropic Giving** uses IPEDS F2 total private gifts, grants, and
|
||||
contracts. The compensation-vs-giving comparison indexes the President and VP
|
||||
of Development compensation against total philanthropic revenue.
|
||||
|
||||
**Inflation Adjustment** uses the BLS CPI-U annual average (all items, U.S.
|
||||
city average, not seasonally adjusted). CPI-adjusted values are expressed in
|
||||
the most recent available year's dollars.
|
||||
|
||||
**Staffing** uses IPEDS Fall Staff survey occupational categories for full-time
|
||||
employees only (FTPT=2).
|
||||
|
||||
### Limitations
|
||||
|
||||
- **IRS 990 coverage** depends on e-file availability. Not all years may have
|
||||
filings for all entities, and XML schema variations across years can cause
|
||||
individual fields to be missing.
|
||||
- **IPEDS data** has a reporting lag; the most recent fiscal year may not yet
|
||||
be available.
|
||||
- **Endowment CAGR** reflects net growth after all inflows and outflows, not
|
||||
pure investment return. It is not directly comparable to an investment
|
||||
benchmark.
|
||||
- **Title normalization** uses pattern matching and may misclassify titles that
|
||||
don't follow common naming conventions.
|
||||
- **Admin headcount** from web scraping is a point-in-time snapshot and is
|
||||
limited to the pages currently targeted (College of Engineering and
|
||||
Provost's Office).
|
||||
- **Single institution** — this prototype covers the University of Delaware
|
||||
only. Peer comparisons are planned for a future phase.
|
||||
|
||||
### License
|
||||
|
||||
This project is released under the MIT License. Copyright (c) 2026 Eric Furst.
|
||||
"""),
|
||||
], style={"maxWidth": "900px", "margin": "0 auto", "lineHeight": "1.6"})
|
||||
|
|
@ -10,6 +10,9 @@ from admin_analytics.dashboard.queries import (
|
|||
query_top_earners,
|
||||
query_comp_by_role,
|
||||
query_comp_vs_cpi,
|
||||
query_comp_cagr,
|
||||
query_aggregate_comp,
|
||||
query_aggregate_comp_cagr,
|
||||
)
|
||||
|
||||
_NO_DATA = html.Div(
|
||||
|
|
@ -21,6 +24,24 @@ _NO_DATA = html.Div(
|
|||
_KEY_ROLES = ["PRESIDENT", "PROVOST", "VP_FINANCE", "VP_RESEARCH", "VP_ADVANCEMENT", "CFO"]
|
||||
|
||||
|
||||
def _kpi_card(title: str, value: str, subtitle: str = "") -> html.Div:
|
||||
return html.Div(
|
||||
[
|
||||
html.H4(title, style={"margin": "0", "color": "#666", "fontSize": "14px"}),
|
||||
html.H2(value, style={"margin": "5px 0", "color": "#00539F"}),
|
||||
html.P(subtitle, style={"margin": "0", "color": "#999", "fontSize": "12px"}),
|
||||
],
|
||||
style={
|
||||
"flex": "1",
|
||||
"padding": "20px",
|
||||
"backgroundColor": "#f8f9fa",
|
||||
"borderRadius": "8px",
|
||||
"textAlign": "center",
|
||||
"margin": "0 8px",
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def layout(conn: duckdb.DuckDBPyConnection):
|
||||
all_earners = query_top_earners(conn)
|
||||
if all_earners.height == 0:
|
||||
|
|
@ -31,6 +52,34 @@ def layout(conn: duckdb.DuckDBPyConnection):
|
|||
{"label": str(y), "value": y} for y in years
|
||||
]
|
||||
|
||||
# KPI cards
|
||||
cagr = query_comp_cagr(conn)
|
||||
agg_cagr = query_aggregate_comp_cagr(conn)
|
||||
kpi_cards = []
|
||||
if cagr:
|
||||
kpi_cards.append(_kpi_card(
|
||||
"President Compensation",
|
||||
f"${cagr['end_comp']:,}",
|
||||
f"Tax year {cagr['end_year']}",
|
||||
))
|
||||
kpi_cards.append(_kpi_card(
|
||||
"President CAGR",
|
||||
f"{cagr['cagr_pct']}%",
|
||||
f"Annualized growth, {cagr['start_year']}-{cagr['end_year']}",
|
||||
))
|
||||
if agg_cagr:
|
||||
kpi_cards.append(_kpi_card(
|
||||
"Top-10 Total Compensation",
|
||||
f"${agg_cagr['end_comp']:,}",
|
||||
f"Tax year {agg_cagr['end_year']}",
|
||||
))
|
||||
kpi_cards.append(_kpi_card(
|
||||
"Top-10 CAGR",
|
||||
f"{agg_cagr['cagr_pct']}%",
|
||||
f"Annualized growth, {agg_cagr['start_year']}-{agg_cagr['end_year']}",
|
||||
))
|
||||
kpi_row = html.Div(kpi_cards, style={"display": "flex", "marginBottom": "24px"}) if kpi_cards else html.Div()
|
||||
|
||||
# Compensation by role trend
|
||||
role_df = query_comp_by_role(conn)
|
||||
role_fig = go.Figure()
|
||||
|
|
@ -61,18 +110,24 @@ def layout(conn: duckdb.DuckDBPyConnection):
|
|||
mode="lines+markers", name="Top Compensation",
|
||||
line={"color": "#00539F"},
|
||||
))
|
||||
cpi_fig.add_trace(go.Scatter(
|
||||
x=cpi_pd["year"], y=cpi_pd["agg_index"],
|
||||
mode="lines+markers", name="Top-10 Aggregate",
|
||||
line={"color": "#E07A5F"},
|
||||
))
|
||||
cpi_fig.add_trace(go.Scatter(
|
||||
x=cpi_pd["year"], y=cpi_pd["cpi_index"],
|
||||
mode="lines+markers", name="CPI-U",
|
||||
line={"color": "#FFD200", "dash": "dash"},
|
||||
))
|
||||
cpi_fig.update_layout(
|
||||
title="Top Compensation vs CPI-U (Indexed, Base Year = 100)",
|
||||
title="Compensation vs CPI-U (Indexed, Base Year = 100)",
|
||||
xaxis_title="Year", yaxis_title="Index",
|
||||
template="plotly_white", height=380,
|
||||
)
|
||||
|
||||
return html.Div([
|
||||
kpi_row,
|
||||
html.Div(
|
||||
[
|
||||
html.Label("Filter by Tax Year: ", style={"fontWeight": "bold"}),
|
||||
|
|
@ -136,8 +191,17 @@ def register_callbacks(app: dash.Dash, conn: duckdb.DuckDBPyConnection) -> None:
|
|||
breakdown_fig = go.Figure()
|
||||
if earners.height > 0:
|
||||
ep = earners.to_pandas().head(10) # top 10 by total comp
|
||||
short_names = [n.split(",")[0][:20] if "," in n else n.split()[-1][:20]
|
||||
for n in ep["person_name"]]
|
||||
_SUFFIXES = {"JR", "SR", "II", "III", "IV", "JR.", "SR."}
|
||||
|
||||
def _short_name(n):
|
||||
if "," in n:
|
||||
return n.split(",")[0][:20]
|
||||
parts = n.split()
|
||||
while len(parts) > 1 and parts[-1].upper().rstrip(".") in _SUFFIXES:
|
||||
parts.pop()
|
||||
return parts[-1][:20] if parts else n[:20]
|
||||
|
||||
short_names = [_short_name(n) for n in ep["person_name"]]
|
||||
for comp_type, label, color in [
|
||||
("base_compensation", "Base", "#00539F"),
|
||||
("bonus_compensation", "Bonus", "#FFD200"),
|
||||
|
|
|
|||
190
src/admin_analytics/dashboard/pages/endowment.py
Normal file
190
src/admin_analytics/dashboard/pages/endowment.py
Normal file
|
|
@ -0,0 +1,190 @@
|
|||
"""Page: Endowment Performance."""
|
||||
|
||||
import duckdb
|
||||
from dash import html, dcc
|
||||
import plotly.graph_objects as go
|
||||
|
||||
from admin_analytics.dashboard.queries import (
|
||||
query_endowment, query_endowment_per_student, query_cio_vs_endowment,
|
||||
)
|
||||
|
||||
_NO_DATA = html.Div(
|
||||
"No endowment data loaded. Run: admin-analytics ingest ipeds --component finance",
|
||||
style={"textAlign": "center", "padding": "40px", "color": "#888"},
|
||||
)
|
||||
|
||||
|
||||
def _kpi_card(title: str, value: str, subtitle: str = "") -> html.Div:
|
||||
return html.Div(
|
||||
[
|
||||
html.H4(title, style={"margin": "0", "color": "#666", "fontSize": "14px"}),
|
||||
html.H2(value, style={"margin": "5px 0", "color": "#00539F"}),
|
||||
html.P(subtitle, style={"margin": "0", "color": "#999", "fontSize": "12px"}),
|
||||
],
|
||||
style={
|
||||
"flex": "1",
|
||||
"padding": "20px",
|
||||
"backgroundColor": "#f8f9fa",
|
||||
"borderRadius": "8px",
|
||||
"textAlign": "center",
|
||||
"margin": "0 8px",
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def layout(conn: duckdb.DuckDBPyConnection):
|
||||
df = query_endowment(conn)
|
||||
if df.height == 0:
|
||||
return _NO_DATA
|
||||
|
||||
pd = df.to_pandas()
|
||||
latest = pd.iloc[-1]
|
||||
|
||||
# Endowment CAGR from first to last year with data
|
||||
valid = pd.dropna(subset=["endowment_eoy"])
|
||||
if len(valid) >= 2:
|
||||
first = valid.iloc[0]
|
||||
last = valid.iloc[-1]
|
||||
start_year = int(first["year"])
|
||||
end_year = int(last["year"])
|
||||
n_years = end_year - start_year
|
||||
start_val = float(first["endowment_eoy"])
|
||||
end_val = float(last["endowment_eoy"])
|
||||
endow_cagr = round(((end_val / start_val) ** (1.0 / n_years) - 1) * 100, 1) if n_years > 0 and start_val > 0 else None
|
||||
else:
|
||||
start_year = end_year = None
|
||||
endow_cagr = None
|
||||
|
||||
# Endowment per student
|
||||
eps_df = query_endowment_per_student(conn)
|
||||
eps_pd = eps_df.to_pandas() if eps_df.height > 0 else None
|
||||
latest_eps = eps_pd.iloc[-1] if eps_pd is not None and len(eps_pd) > 0 else None
|
||||
|
||||
# KPI cards — single row
|
||||
kpi_cards = [
|
||||
_kpi_card(
|
||||
"Endowment Value",
|
||||
f"${latest['endowment_eoy'] / 1e9:.2f}B" if latest["endowment_eoy"] else "N/A",
|
||||
f"End of FY {int(latest['year'])}",
|
||||
),
|
||||
_kpi_card(
|
||||
"Endowment CAGR",
|
||||
f"{endow_cagr}%" if endow_cagr is not None else "N/A",
|
||||
f"FY {start_year}-{end_year}" if start_year else "",
|
||||
),
|
||||
]
|
||||
if latest_eps is not None and latest_eps["endowment_per_student"]:
|
||||
kpi_cards.append(_kpi_card(
|
||||
"Endowment per Student",
|
||||
f"${int(latest_eps['endowment_per_student']):,}",
|
||||
f"FY {int(latest_eps['year'])}",
|
||||
))
|
||||
kpi_cards.append(_kpi_card(
|
||||
"New Gifts to Endowment",
|
||||
f"${latest['new_gifts'] / 1e6:.1f}M" if latest["new_gifts"] else "N/A",
|
||||
f"FY {int(latest['year'])}",
|
||||
))
|
||||
kpi_row = html.Div(kpi_cards, style={"display": "flex", "marginBottom": "24px"})
|
||||
|
||||
# Endowment value trend
|
||||
value_fig = go.Figure()
|
||||
value_fig.add_trace(go.Scatter(
|
||||
x=pd["year"], y=pd["endowment_eoy"] / 1e9,
|
||||
mode="lines+markers", name="End-of-Year Value",
|
||||
line={"color": "#00539F"},
|
||||
fill="tozeroy", fillcolor="rgba(0,83,159,0.1)",
|
||||
))
|
||||
value_fig.update_layout(
|
||||
title="Endowment Value Over Time",
|
||||
xaxis_title="Year", yaxis_title="Billions $",
|
||||
template="plotly_white", height=400,
|
||||
)
|
||||
|
||||
# Investment return and new gifts bar chart
|
||||
components_fig = go.Figure()
|
||||
components_fig.add_trace(go.Bar(
|
||||
x=pd["year"], y=pd["net_investment_return"] / 1e6,
|
||||
name="Net Investment Return",
|
||||
marker_color="#7FB069",
|
||||
))
|
||||
components_fig.add_trace(go.Bar(
|
||||
x=pd["year"], y=pd["new_gifts"] / 1e6,
|
||||
name="New Gifts to Endowment",
|
||||
marker_color="#00539F",
|
||||
))
|
||||
if "other_changes" in pd.columns:
|
||||
components_fig.add_trace(go.Bar(
|
||||
x=pd["year"], y=pd["other_changes"] / 1e6,
|
||||
name="Other Changes",
|
||||
marker_color="#999",
|
||||
))
|
||||
components_fig.update_layout(
|
||||
title="Endowment Changes by Component (Millions $)",
|
||||
xaxis_title="Year", yaxis_title="Millions $",
|
||||
barmode="group",
|
||||
template="plotly_white", height=400,
|
||||
)
|
||||
|
||||
# Investment return rate
|
||||
rate_fig = go.Figure()
|
||||
rates = pd.copy()
|
||||
rates["return_pct"] = rates["net_investment_return"] * 100 / rates["endowment_boy"]
|
||||
rate_fig.add_trace(go.Scatter(
|
||||
x=rates["year"], y=rates["return_pct"],
|
||||
mode="lines+markers", name="Return %",
|
||||
line={"color": "#00539F"},
|
||||
))
|
||||
rate_fig.add_hline(y=0, line_dash="dot", line_color="#ccc")
|
||||
rate_fig.update_layout(
|
||||
title="Endowment Net Investment Return Rate (%)",
|
||||
xaxis_title="Year", yaxis_title="%",
|
||||
template="plotly_white", height=380,
|
||||
)
|
||||
|
||||
# CIO compensation vs endowment growth
|
||||
cio_df = query_cio_vs_endowment(conn)
|
||||
cio_fig = go.Figure()
|
||||
if cio_df.height > 1:
|
||||
cio_pd = cio_df.to_pandas()
|
||||
cio_fig.add_trace(go.Scatter(
|
||||
x=cio_pd["tax_year"], y=cio_pd["cio_index"],
|
||||
mode="lines+markers", name="CIO Compensation",
|
||||
line={"color": "#E07A5F"},
|
||||
))
|
||||
cio_fig.add_trace(go.Scatter(
|
||||
x=cio_pd["tax_year"], y=cio_pd["endowment_index"],
|
||||
mode="lines+markers", name="Endowment Value",
|
||||
line={"color": "#00539F"},
|
||||
))
|
||||
cio_fig.add_hline(y=100, line_dash="dot", line_color="#ccc")
|
||||
cio_fig.update_layout(
|
||||
title="Chief Investment Officer Compensation vs Endowment Growth (Indexed, Base Year = 100)",
|
||||
xaxis_title="Year", yaxis_title="Index",
|
||||
template="plotly_white", height=400,
|
||||
)
|
||||
|
||||
# Endowment per student trend
|
||||
eps_fig = go.Figure()
|
||||
if eps_pd is not None and len(eps_pd) > 0:
|
||||
eps_fig.add_trace(go.Scatter(
|
||||
x=eps_pd["year"], y=eps_pd["endowment_per_student"],
|
||||
mode="lines+markers", name="Endowment per Student",
|
||||
line={"color": "#00539F"},
|
||||
))
|
||||
eps_fig.update_layout(
|
||||
title="Endowment per Student ($)",
|
||||
xaxis_title="Year", yaxis_title="$",
|
||||
template="plotly_white", height=380,
|
||||
)
|
||||
|
||||
charts = [
|
||||
kpi_row,
|
||||
dcc.Graph(figure=value_fig),
|
||||
dcc.Graph(figure=eps_fig),
|
||||
dcc.Graph(figure=components_fig),
|
||||
dcc.Graph(figure=rate_fig),
|
||||
]
|
||||
if cio_df.height > 1:
|
||||
charts.append(dcc.Graph(figure=cio_fig))
|
||||
|
||||
return html.Div(charts)
|
||||
132
src/admin_analytics/dashboard/pages/philanthropy.py
Normal file
132
src/admin_analytics/dashboard/pages/philanthropy.py
Normal file
|
|
@ -0,0 +1,132 @@
|
|||
"""Page: Philanthropic Giving."""
|
||||
|
||||
import duckdb
|
||||
from dash import html, dcc
|
||||
import plotly.graph_objects as go
|
||||
|
||||
from admin_analytics.dashboard.queries import query_philanthropy, query_comp_vs_philanthropy
|
||||
|
||||
_NO_DATA = html.Div(
|
||||
"No philanthropy data loaded. Run: admin-analytics ingest ipeds --component finance",
|
||||
style={"textAlign": "center", "padding": "40px", "color": "#888"},
|
||||
)
|
||||
|
||||
|
||||
def _kpi_card(title: str, value: str, subtitle: str = "") -> html.Div:
|
||||
return html.Div(
|
||||
[
|
||||
html.H4(title, style={"margin": "0", "color": "#666", "fontSize": "14px"}),
|
||||
html.H2(value, style={"margin": "5px 0", "color": "#00539F"}),
|
||||
html.P(subtitle, style={"margin": "0", "color": "#999", "fontSize": "12px"}),
|
||||
],
|
||||
style={
|
||||
"flex": "1",
|
||||
"padding": "20px",
|
||||
"backgroundColor": "#f8f9fa",
|
||||
"borderRadius": "8px",
|
||||
"textAlign": "center",
|
||||
"margin": "0 8px",
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def layout(conn: duckdb.DuckDBPyConnection):
|
||||
df = query_philanthropy(conn)
|
||||
if df.height == 0:
|
||||
return _NO_DATA
|
||||
|
||||
pd = df.to_pandas()
|
||||
latest = pd.iloc[-1]
|
||||
|
||||
# KPI cards
|
||||
kpi_row = html.Div(
|
||||
[
|
||||
_kpi_card(
|
||||
"Total Private Gifts & Grants",
|
||||
f"${latest['total_private_gifts'] / 1e6:.1f}M" if latest["total_private_gifts"] else "N/A",
|
||||
f"FY {int(latest['year'])}",
|
||||
),
|
||||
_kpi_card(
|
||||
"Gifts to Endowment",
|
||||
f"${latest['endowment_gifts'] / 1e6:.1f}M" if latest["endowment_gifts"] else "N/A",
|
||||
f"FY {int(latest['year'])}",
|
||||
),
|
||||
],
|
||||
style={"display": "flex", "marginBottom": "24px"},
|
||||
)
|
||||
|
||||
# Private gifts trend (nominal and CPI-adjusted)
|
||||
gifts_fig = go.Figure()
|
||||
gifts_fig.add_trace(go.Bar(
|
||||
x=pd["year"], y=pd["total_private_gifts"] / 1e6,
|
||||
name="Nominal",
|
||||
marker_color="#00539F",
|
||||
))
|
||||
if "gifts_cpi_adjusted" in pd.columns and pd["gifts_cpi_adjusted"].notna().any():
|
||||
gifts_fig.add_trace(go.Scatter(
|
||||
x=pd["year"], y=pd["gifts_cpi_adjusted"] / 1e6,
|
||||
mode="lines+markers", name="CPI-Adjusted",
|
||||
line={"color": "#FFD200", "dash": "dash"},
|
||||
))
|
||||
gifts_fig.update_layout(
|
||||
title="Total Private Gifts & Grants (Millions $)",
|
||||
xaxis_title="Year", yaxis_title="Millions $",
|
||||
template="plotly_white", height=420,
|
||||
)
|
||||
|
||||
# Endowment gifts vs total gifts
|
||||
split_fig = go.Figure()
|
||||
pd["non_endowment_gifts"] = pd["total_private_gifts"] - pd["endowment_gifts"].fillna(0)
|
||||
split_fig.add_trace(go.Bar(
|
||||
x=pd["year"], y=pd["endowment_gifts"] / 1e6,
|
||||
name="To Endowment",
|
||||
marker_color="#00539F",
|
||||
))
|
||||
split_fig.add_trace(go.Bar(
|
||||
x=pd["year"], y=pd["non_endowment_gifts"] / 1e6,
|
||||
name="Current Use / Other",
|
||||
marker_color="#7FB069",
|
||||
))
|
||||
split_fig.update_layout(
|
||||
title="Gift Allocation: Endowment vs Current Use (Millions $)",
|
||||
xaxis_title="Year", yaxis_title="Millions $",
|
||||
barmode="stack",
|
||||
template="plotly_white", height=400,
|
||||
)
|
||||
|
||||
# Compensation vs philanthropy indexed chart
|
||||
cvp_df = query_comp_vs_philanthropy(conn)
|
||||
cvp_fig = go.Figure()
|
||||
if cvp_df.height > 1:
|
||||
cvp_pd = cvp_df.to_pandas()
|
||||
cvp_fig.add_trace(go.Scatter(
|
||||
x=cvp_pd["tax_year"], y=cvp_pd["president_index"],
|
||||
mode="lines+markers", name="President Compensation",
|
||||
line={"color": "#00539F"},
|
||||
))
|
||||
cvp_fig.add_trace(go.Scatter(
|
||||
x=cvp_pd["tax_year"], y=cvp_pd["vp_adv_index"],
|
||||
mode="lines+markers", name="VP Development Compensation",
|
||||
line={"color": "#E07A5F"},
|
||||
))
|
||||
cvp_fig.add_trace(go.Scatter(
|
||||
x=cvp_pd["tax_year"], y=cvp_pd["gifts_index"],
|
||||
mode="lines+markers", name="Philanthropic Gifts",
|
||||
line={"color": "#7FB069"},
|
||||
))
|
||||
cvp_fig.add_hline(y=100, line_dash="dot", line_color="#ccc")
|
||||
cvp_fig.update_layout(
|
||||
title="Compensation Growth vs Philanthropic Giving (Indexed, Base Year = 100)",
|
||||
xaxis_title="Year", yaxis_title="Index",
|
||||
template="plotly_white", height=420,
|
||||
)
|
||||
|
||||
charts = [
|
||||
kpi_row,
|
||||
dcc.Graph(figure=gifts_fig),
|
||||
dcc.Graph(figure=split_fig),
|
||||
]
|
||||
if cvp_df.height > 1:
|
||||
charts.append(dcc.Graph(figure=cvp_fig))
|
||||
|
||||
return html.Div(charts)
|
||||
|
|
@ -96,6 +96,116 @@ def query_admin_faculty_ratio(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
|||
""", [UD_UNITID]).pl()
|
||||
|
||||
|
||||
def query_aggregate_comp(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
||||
"""Top-10 Schedule J compensation per year — total, count, and average."""
|
||||
return conn.execute("""
|
||||
WITH ranked AS (
|
||||
SELECT j.tax_year, j.total_compensation,
|
||||
j.base_compensation, j.bonus_compensation,
|
||||
j.deferred_compensation, j.nontaxable_benefits,
|
||||
j.other_compensation,
|
||||
ROW_NUMBER() OVER (PARTITION BY j.tax_year
|
||||
ORDER BY j.total_compensation DESC) AS rn
|
||||
FROM raw_990_schedule_j j
|
||||
WHERE j.total_compensation > 0
|
||||
)
|
||||
SELECT tax_year,
|
||||
COUNT(*) AS headcount,
|
||||
SUM(total_compensation) AS total_comp,
|
||||
ROUND(AVG(total_compensation), 0) AS avg_comp,
|
||||
SUM(base_compensation) AS total_base,
|
||||
SUM(bonus_compensation) AS total_bonus,
|
||||
SUM(deferred_compensation) AS total_deferred,
|
||||
SUM(nontaxable_benefits) AS total_benefits,
|
||||
SUM(other_compensation) AS total_other
|
||||
FROM ranked
|
||||
WHERE rn <= 10
|
||||
GROUP BY tax_year
|
||||
ORDER BY tax_year
|
||||
""").pl()
|
||||
|
||||
|
||||
def query_aggregate_comp_cagr(conn: duckdb.DuckDBPyConnection) -> dict | None:
|
||||
"""CAGR of aggregate Schedule J compensation over the last 5 years of data."""
|
||||
df = query_aggregate_comp(conn)
|
||||
if df.height < 2:
|
||||
return None
|
||||
|
||||
# Use last 5 years of available data
|
||||
df = df.tail(min(5, df.height))
|
||||
|
||||
start_year = df["tax_year"][0]
|
||||
end_year = df["tax_year"][-1]
|
||||
start_comp = float(df["total_comp"][0])
|
||||
end_comp = float(df["total_comp"][-1])
|
||||
n_years = end_year - start_year
|
||||
|
||||
if n_years <= 0 or start_comp <= 0:
|
||||
return None
|
||||
|
||||
cagr = ((end_comp / start_comp) ** (1.0 / n_years) - 1) * 100
|
||||
return {
|
||||
"cagr_pct": round(cagr, 1),
|
||||
"start_year": start_year,
|
||||
"end_year": end_year,
|
||||
"start_comp": int(end_comp),
|
||||
"end_comp": int(end_comp),
|
||||
}
|
||||
|
||||
|
||||
def query_comp_cagr(conn: duckdb.DuckDBPyConnection) -> dict | None:
|
||||
"""Annualized growth rate (CAGR) of President compensation.
|
||||
|
||||
Tracks the President role specifically using title normalization.
|
||||
Returns dict with cagr_pct, start_year, end_year, start_comp, end_comp,
|
||||
or None if insufficient data.
|
||||
"""
|
||||
raw = conn.execute("""
|
||||
SELECT j.tax_year, j.title, j.total_compensation
|
||||
FROM raw_990_schedule_j j
|
||||
WHERE j.total_compensation > 0
|
||||
ORDER BY j.tax_year
|
||||
""").pl()
|
||||
|
||||
if raw.height == 0:
|
||||
return None
|
||||
|
||||
raw = raw.with_columns(
|
||||
pl.col("title").map_elements(
|
||||
normalize_title, return_dtype=pl.Utf8
|
||||
).alias("role")
|
||||
)
|
||||
|
||||
df = (
|
||||
raw.filter(pl.col("role") == "PRESIDENT")
|
||||
.group_by("tax_year")
|
||||
.agg(pl.col("total_compensation").max().alias("top_comp"))
|
||||
.sort("tax_year")
|
||||
)
|
||||
|
||||
if df.height < 2:
|
||||
return None
|
||||
|
||||
start_year = df["tax_year"][0]
|
||||
end_year = df["tax_year"][-1]
|
||||
start_comp = df["top_comp"][0]
|
||||
end_comp = df["top_comp"][-1]
|
||||
n_years = end_year - start_year
|
||||
|
||||
if n_years <= 0 or start_comp <= 0:
|
||||
return None
|
||||
|
||||
cagr = ((end_comp / start_comp) ** (1.0 / n_years) - 1) * 100
|
||||
|
||||
return {
|
||||
"cagr_pct": round(cagr, 1),
|
||||
"start_year": start_year,
|
||||
"end_year": end_year,
|
||||
"start_comp": start_comp,
|
||||
"end_comp": end_comp,
|
||||
}
|
||||
|
||||
|
||||
def query_top_earners(
|
||||
conn: duckdb.DuckDBPyConnection, year: int | None = None
|
||||
) -> pl.DataFrame:
|
||||
|
|
@ -162,11 +272,23 @@ def query_comp_by_role(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
|||
|
||||
|
||||
def query_comp_vs_cpi(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
||||
"""Compensation growth vs CPI growth, indexed to first available year = 100."""
|
||||
"""Compensation growth vs CPI growth, indexed to first available year = 100.
|
||||
|
||||
Includes top earner, top-10 aggregate, and CPI-U.
|
||||
"""
|
||||
return conn.execute("""
|
||||
WITH yearly_max_comp AS (
|
||||
SELECT tax_year, MAX(total_compensation) AS top_comp
|
||||
WITH ranked AS (
|
||||
SELECT tax_year, total_compensation,
|
||||
ROW_NUMBER() OVER (PARTITION BY tax_year
|
||||
ORDER BY total_compensation DESC) AS rn
|
||||
FROM raw_990_schedule_j
|
||||
WHERE total_compensation > 0
|
||||
),
|
||||
yearly_comp AS (
|
||||
SELECT tax_year,
|
||||
MAX(total_compensation) AS top_comp,
|
||||
SUM(CASE WHEN rn <= 10 THEN total_compensation END) AS agg_comp
|
||||
FROM ranked
|
||||
GROUP BY tax_year
|
||||
),
|
||||
annual_cpi AS (
|
||||
|
|
@ -174,20 +296,24 @@ def query_comp_vs_cpi(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
|||
FROM raw_cpi_u GROUP BY year
|
||||
),
|
||||
base AS (
|
||||
SELECT c.top_comp AS base_comp, ac.avg_cpi AS base_cpi
|
||||
FROM yearly_max_comp c
|
||||
SELECT c.top_comp AS base_top, c.agg_comp AS base_agg,
|
||||
ac.avg_cpi AS base_cpi
|
||||
FROM yearly_comp c
|
||||
JOIN annual_cpi ac ON ac.year = c.tax_year
|
||||
ORDER BY c.tax_year LIMIT 1
|
||||
)
|
||||
SELECT
|
||||
c.tax_year AS year,
|
||||
c.top_comp,
|
||||
c.agg_comp,
|
||||
ac.avg_cpi,
|
||||
ROUND(c.top_comp * 100.0 / NULLIF((SELECT base_comp FROM base), 0), 1)
|
||||
ROUND(c.top_comp * 100.0 / NULLIF((SELECT base_top FROM base), 0), 1)
|
||||
AS comp_index,
|
||||
ROUND(c.agg_comp * 100.0 / NULLIF((SELECT base_agg FROM base), 0), 1)
|
||||
AS agg_index,
|
||||
ROUND(ac.avg_cpi * 100.0 / NULLIF((SELECT base_cpi FROM base), 0), 1)
|
||||
AS cpi_index
|
||||
FROM yearly_max_comp c
|
||||
FROM yearly_comp c
|
||||
JOIN annual_cpi ac ON ac.year = c.tax_year
|
||||
ORDER BY year
|
||||
""").pl()
|
||||
|
|
@ -249,6 +375,166 @@ def query_growth_index(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
|||
""", [UD_UNITID, UD_UNITID]).pl()
|
||||
|
||||
|
||||
def query_endowment(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
||||
"""Endowment performance over time."""
|
||||
return conn.execute("""
|
||||
SELECT year, endowment_boy, endowment_eoy, new_gifts,
|
||||
net_investment_return, other_changes, long_term_investments
|
||||
FROM raw_ipeds_endowment
|
||||
WHERE unitid = ?
|
||||
ORDER BY year
|
||||
""", [UD_UNITID]).pl()
|
||||
|
||||
|
||||
def query_endowment_per_student(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
||||
"""Endowment value per student over time."""
|
||||
return conn.execute("""
|
||||
SELECT e.year, e.endowment_eoy, en.total_enrollment,
|
||||
ROUND(e.endowment_eoy * 1.0 / NULLIF(en.total_enrollment, 0), 0)
|
||||
AS endowment_per_student
|
||||
FROM raw_ipeds_endowment e
|
||||
JOIN raw_ipeds_enrollment en ON en.unitid = e.unitid AND en.year = e.year
|
||||
WHERE e.unitid = ?
|
||||
ORDER BY e.year
|
||||
""", [UD_UNITID]).pl()
|
||||
|
||||
|
||||
def query_cio_vs_endowment(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
||||
"""Chief Investment Officer compensation vs endowment growth, indexed."""
|
||||
raw = conn.execute("""
|
||||
SELECT j.tax_year, j.title, j.total_compensation
|
||||
FROM raw_990_schedule_j j
|
||||
WHERE j.total_compensation > 0
|
||||
""").pl()
|
||||
|
||||
if raw.height == 0:
|
||||
return pl.DataFrame()
|
||||
|
||||
raw = raw.with_columns(
|
||||
pl.col("title").map_elements(
|
||||
normalize_title, return_dtype=pl.Utf8
|
||||
).alias("role")
|
||||
)
|
||||
|
||||
cio = (
|
||||
raw.filter(pl.col("role") == "CHIEF_INVESTMENT_OFFICER")
|
||||
.group_by("tax_year")
|
||||
.agg(pl.col("total_compensation").max().alias("cio_comp"))
|
||||
.sort("tax_year")
|
||||
)
|
||||
|
||||
if cio.height == 0:
|
||||
return pl.DataFrame()
|
||||
|
||||
endow = conn.execute("""
|
||||
SELECT year, endowment_eoy
|
||||
FROM raw_ipeds_endowment
|
||||
WHERE unitid = ?
|
||||
ORDER BY year
|
||||
""", [UD_UNITID]).pl()
|
||||
|
||||
merged = (
|
||||
cio.join(endow, left_on="tax_year", right_on="year", how="inner")
|
||||
.drop_nulls(subset=["cio_comp", "endowment_eoy"])
|
||||
.sort("tax_year")
|
||||
)
|
||||
|
||||
if merged.height < 2:
|
||||
return merged
|
||||
|
||||
base_comp = float(merged["cio_comp"][0])
|
||||
base_endow = float(merged["endowment_eoy"][0])
|
||||
|
||||
merged = merged.with_columns(
|
||||
(pl.col("cio_comp").cast(pl.Float64) * 100.0 / base_comp).round(1).alias("cio_index"),
|
||||
(pl.col("endowment_eoy").cast(pl.Float64) * 100.0 / base_endow).round(1).alias("endowment_index"),
|
||||
)
|
||||
|
||||
return merged
|
||||
|
||||
|
||||
def query_philanthropy(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
||||
"""Philanthropic giving over time — IPEDS private gifts + 990 revenue."""
|
||||
return conn.execute(f"""
|
||||
{_CPI_CTE}
|
||||
SELECT e.year, e.total_private_gifts, e.new_gifts AS endowment_gifts,
|
||||
ROUND(e.total_private_gifts * (SELECT avg_cpi FROM latest_cpi)
|
||||
/ ac.avg_cpi, 0) AS gifts_cpi_adjusted
|
||||
FROM raw_ipeds_endowment e
|
||||
LEFT JOIN annual_cpi ac ON ac.year = e.year
|
||||
WHERE e.unitid = ?
|
||||
ORDER BY e.year
|
||||
""", [UD_UNITID]).pl()
|
||||
|
||||
|
||||
def query_comp_vs_philanthropy(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
||||
"""VP Advancement and President comp vs philanthropic gifts, indexed."""
|
||||
raw = conn.execute("""
|
||||
SELECT j.tax_year, j.title, j.total_compensation
|
||||
FROM raw_990_schedule_j j
|
||||
WHERE j.total_compensation > 0
|
||||
""").pl()
|
||||
|
||||
if raw.height == 0:
|
||||
return pl.DataFrame()
|
||||
|
||||
raw = raw.with_columns(
|
||||
pl.col("title").map_elements(
|
||||
normalize_title, return_dtype=pl.Utf8
|
||||
).alias("role")
|
||||
)
|
||||
|
||||
# Get max comp per role per year for President and VP Advancement
|
||||
roles = raw.filter(pl.col("role").is_in(["PRESIDENT", "VP_ADVANCEMENT"]))
|
||||
if roles.height == 0:
|
||||
return pl.DataFrame()
|
||||
|
||||
pivoted = (
|
||||
roles.group_by(["tax_year", "role"])
|
||||
.agg(pl.col("total_compensation").max().alias("comp"))
|
||||
.sort("tax_year")
|
||||
)
|
||||
|
||||
pres = (
|
||||
pivoted.filter(pl.col("role") == "PRESIDENT")
|
||||
.select(pl.col("tax_year"), pl.col("comp").alias("president_comp"))
|
||||
)
|
||||
vp = (
|
||||
pivoted.filter(pl.col("role") == "VP_ADVANCEMENT")
|
||||
.select(pl.col("tax_year"), pl.col("comp").alias("vp_adv_comp"))
|
||||
)
|
||||
|
||||
gifts = conn.execute("""
|
||||
SELECT year, total_private_gifts
|
||||
FROM raw_ipeds_endowment
|
||||
WHERE unitid = ?
|
||||
ORDER BY year
|
||||
""", [UD_UNITID]).pl()
|
||||
|
||||
# Join all three on year
|
||||
merged = (
|
||||
pres.join(vp, on="tax_year", how="outer_coalesce")
|
||||
.join(gifts, left_on="tax_year", right_on="year", how="inner")
|
||||
.drop_nulls(subset=["total_private_gifts"])
|
||||
.sort("tax_year")
|
||||
)
|
||||
|
||||
if merged.height < 2:
|
||||
return merged
|
||||
|
||||
base_pres = float(merged.drop_nulls("president_comp")["president_comp"][0])
|
||||
base_vp = float(merged.drop_nulls("vp_adv_comp")["vp_adv_comp"][0])
|
||||
base_gifts = float(merged["total_private_gifts"][0])
|
||||
|
||||
merged = merged.with_columns(
|
||||
(pl.col("president_comp").cast(pl.Float64) * 100.0 / base_pres).round(1).alias("president_index"),
|
||||
(pl.col("vp_adv_comp").cast(pl.Float64) * 100.0 / base_vp).round(1).alias("vp_adv_index"),
|
||||
(pl.col("total_private_gifts").cast(pl.Float64) * 100.0 / base_gifts).round(1).alias("gifts_index"),
|
||||
)
|
||||
|
||||
return merged
|
||||
|
||||
|
||||
def query_admin_headcount(conn: duckdb.DuckDBPyConnection) -> pl.DataFrame:
|
||||
"""All scraped admin headcount entries."""
|
||||
return conn.execute("""
|
||||
|
|
|
|||
|
|
@ -100,6 +100,21 @@ TABLES = {
|
|||
other_compensation BIGINT
|
||||
)
|
||||
""",
|
||||
"raw_ipeds_endowment": """
|
||||
CREATE TABLE IF NOT EXISTS raw_ipeds_endowment (
|
||||
unitid INTEGER NOT NULL,
|
||||
year INTEGER NOT NULL,
|
||||
endowment_boy BIGINT,
|
||||
endowment_eoy BIGINT,
|
||||
new_gifts BIGINT,
|
||||
net_investment_return BIGINT,
|
||||
other_changes BIGINT,
|
||||
total_private_gifts BIGINT,
|
||||
total_investment_return BIGINT,
|
||||
long_term_investments BIGINT,
|
||||
PRIMARY KEY (unitid, year)
|
||||
)
|
||||
""",
|
||||
"raw_cpi_u": """
|
||||
CREATE TABLE IF NOT EXISTS raw_cpi_u (
|
||||
year INTEGER NOT NULL,
|
||||
|
|
|
|||
|
|
@ -40,6 +40,25 @@ F2_COLUMN_VARIANTS = {
|
|||
"benefits": ["F2E133"],
|
||||
}
|
||||
|
||||
# F2 endowment / philanthropy fields
|
||||
F2_ENDOWMENT_VARIANTS = {
|
||||
"unitid": ["UNITID"],
|
||||
"endowment_boy": ["F2H01"],
|
||||
"endowment_eoy": ["F2H02"],
|
||||
"new_gifts": ["F2H03A"],
|
||||
"net_investment_return": ["F2H03B"],
|
||||
"other_changes": ["F2H03D"],
|
||||
"total_private_gifts": ["F2D08"],
|
||||
"total_investment_return": ["F2D10"],
|
||||
"long_term_investments": ["F2A01"],
|
||||
}
|
||||
|
||||
ENDOWMENT_COLUMNS = [
|
||||
"unitid", "year", "endowment_boy", "endowment_eoy", "new_gifts",
|
||||
"net_investment_return", "other_changes", "total_private_gifts",
|
||||
"total_investment_return", "long_term_investments",
|
||||
]
|
||||
|
||||
CANONICAL_COLUMNS = [
|
||||
"unitid", "year", "reporting_standard", "total_expenses",
|
||||
"instruction_expenses", "research_expenses", "public_service_expenses",
|
||||
|
|
@ -56,7 +75,7 @@ def _find_csv(component_dir: Path) -> Path | None:
|
|||
|
||||
def _resolve_columns(df: pl.DataFrame, variants: dict) -> dict[str, str]:
|
||||
"""For each canonical name, find the first matching column."""
|
||||
upper_cols = {c.upper(): c for c in df.columns}
|
||||
upper_cols = {c.strip().upper(): c for c in df.columns}
|
||||
resolved = {}
|
||||
for canonical, candidates in variants.items():
|
||||
for var in candidates:
|
||||
|
|
@ -140,3 +159,49 @@ def load_finance(
|
|||
print(f" No finance CSV found for {year}, skipping")
|
||||
|
||||
return total
|
||||
|
||||
|
||||
def load_endowment(
|
||||
conn: duckdb.DuckDBPyConnection,
|
||||
year_range: range,
|
||||
unitid_filter: int | None = UD_UNITID,
|
||||
) -> int:
|
||||
"""Load IPEDS F2 endowment and philanthropy data into raw_ipeds_endowment."""
|
||||
total = 0
|
||||
for year in year_range:
|
||||
f2_dir = config.IPEDS_DATA_DIR / "finance_f2" / str(year)
|
||||
csv_path = _find_csv(f2_dir)
|
||||
if csv_path is None:
|
||||
continue
|
||||
|
||||
df = pl.read_csv(csv_path, infer_schema_length=0, encoding="utf8-lossy")
|
||||
col_map = _resolve_columns(df, F2_ENDOWMENT_VARIANTS)
|
||||
|
||||
if "unitid" not in col_map:
|
||||
continue
|
||||
|
||||
result = pl.DataFrame({
|
||||
canonical: df[actual] for canonical, actual in col_map.items()
|
||||
})
|
||||
result = result.with_columns(pl.lit(year).alias("year"))
|
||||
|
||||
for col in ENDOWMENT_COLUMNS:
|
||||
if col not in result.columns:
|
||||
result = result.with_columns(pl.lit(None).alias(col))
|
||||
elif col not in ("year",):
|
||||
result = result.with_columns(pl.col(col).cast(pl.Int64, strict=False))
|
||||
|
||||
if unitid_filter is not None:
|
||||
result = result.filter(pl.col("unitid") == unitid_filter)
|
||||
|
||||
if result.height == 0:
|
||||
continue
|
||||
|
||||
result = result.select(ENDOWMENT_COLUMNS)
|
||||
conn.execute("DELETE FROM raw_ipeds_endowment WHERE year = ?", [year])
|
||||
conn.register("_tmp_endow", result.to_arrow())
|
||||
conn.execute("INSERT INTO raw_ipeds_endowment SELECT * FROM _tmp_endow")
|
||||
conn.unregister("_tmp_endow")
|
||||
total += result.height
|
||||
|
||||
return total
|
||||
|
|
|
|||
|
|
@ -10,9 +10,10 @@ TITLE_PATTERNS: list[tuple[str, re.Pattern]] = [
|
|||
("VP_FINANCE", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\b(?:financ|budget|business|admin)|\b(?:financ|budget|business|admin).*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
|
||||
("VP_RESEARCH", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\bresearch|\bresearch.*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
|
||||
("VP_STUDENT_AFFAIRS", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\bstudent|\bstudent.*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
|
||||
("VP_ADVANCEMENT", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\b(?:advancement|development|giving|fundrais)|\b(?:advancement|development|giving|fundrais).*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
|
||||
("VP_ADVANCEMENT", re.compile(r"(?:\bv\.?p\.?\b|\bvice\s+president\b).*\b(?:advancement|develop|alumni|giving|fundrais)|\b(?:advancement|develop|alumni|giving|fundrais).*(?:\bv\.?p\.?\b|\bvice\s+president\b)", re.I)),
|
||||
("VP_OTHER", re.compile(r"\bv\.?p\.?\b|\bvice\s+president\b", re.I)),
|
||||
("CFO", re.compile(r"\b(cfo|chief\s+financial)\b", re.I)),
|
||||
("CHIEF_INVESTMENT_OFFICER", re.compile(r"\bchief\s+investment\b", re.I)),
|
||||
("CIO", re.compile(r"\b(cio|chief\s+information)\b", re.I)),
|
||||
("COO", re.compile(r"\b(coo|chief\s+operating)\b", re.I)),
|
||||
("GENERAL_COUNSEL", re.compile(r"\b(general\s+counsel|chief\s+legal)\b", re.I)),
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ KEY_COLUMNS: dict[str, list[str]] = {
|
|||
"raw_990_filing": ["ein", "tax_year", "total_revenue", "total_expenses"],
|
||||
"raw_990_part_vii": ["ein", "tax_year", "person_name", "reportable_comp_from_org"],
|
||||
"raw_990_schedule_j": ["ein", "tax_year", "person_name", "total_compensation"],
|
||||
"raw_ipeds_endowment": ["unitid", "year", "endowment_eoy"],
|
||||
"raw_cpi_u": ["year", "month", "value"],
|
||||
"raw_admin_headcount": ["unit", "person_name", "category"],
|
||||
}
|
||||
|
|
@ -29,6 +30,7 @@ YEAR_COLUMN: dict[str, str] = {
|
|||
"raw_990_filing": "tax_year",
|
||||
"raw_990_part_vii": "tax_year",
|
||||
"raw_990_schedule_j": "tax_year",
|
||||
"raw_ipeds_endowment": "year",
|
||||
"raw_cpi_u": "year",
|
||||
}
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue