AdminAnalytics/docs/data_dictionary.md
emfurst a766f6ff0d Add endowment spending distribution, move planning docs to private
- Add IPEDS F2H03C (spending distribution for current use) to endowment schema, loader, queries, and dashboard
- Endowment tab now shows spend rate alongside investment return rate
- Move planning docs to planning/ directory (gitignored)
- Update data dictionary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 07:27:57 -04:00

195 lines
8.4 KiB
Markdown

# Data Dictionary
Raw data layer for University of Delaware administrative analytics. All tables are prefixed `raw_` and loaded into DuckDB.
## Tables
### raw_institution
**Source:** IPEDS HD (Institutional Characteristics) survey
**Granularity:** One row per institution per year
**Primary Key:** (unitid, year)
| Column | Type | Description |
|--------|------|-------------|
| unitid | INTEGER | IPEDS institution identifier |
| year | INTEGER | Survey year |
| ein | VARCHAR | IRS Employer Identification Number |
| institution_name | VARCHAR | Institution name |
| city | VARCHAR | City |
| state | VARCHAR | State abbreviation |
| sector | INTEGER | IPEDS sector code (public/private/for-profit) |
| control | INTEGER | IPEDS control code (1=public, 2=private nonprofit, 3=private for-profit) |
| carnegie_class | INTEGER | Carnegie Classification code |
| enrollment_total | INTEGER | Total enrollment from HD survey |
### raw_ipeds_finance
**Source:** IPEDS F1A (GASB public) and F2 (FASB private) finance surveys
**Granularity:** One row per institution per year
**Primary Key:** (unitid, year)
**Note:** UD reports under FASB (F2) despite being public. The loader tries both F1A and F2.
| Column | Type | Description |
|--------|------|-------------|
| unitid | INTEGER | IPEDS institution identifier |
| year | INTEGER | Fiscal year |
| reporting_standard | VARCHAR | "GASB" or "FASB" |
| total_expenses | BIGINT | Total expenses |
| instruction_expenses | BIGINT | Instruction function expenses |
| research_expenses | BIGINT | Research function expenses |
| public_service_expenses | BIGINT | Public service function expenses |
| academic_support_expenses | BIGINT | Academic support function expenses |
| student_services_expenses | BIGINT | Student services function expenses |
| institutional_support_expenses | BIGINT | Institutional support (admin) expenses |
| auxiliary_expenses | BIGINT | Auxiliary enterprises expenses |
| hospital_expenses | BIGINT | Hospital services expenses |
| other_expenses | BIGINT | Other expenses |
| salaries_wages | BIGINT | Total salaries and wages |
| benefits | BIGINT | Total employee benefits |
### raw_ipeds_staff
**Source:** IPEDS S (Fall Staff) survey, occupational categories
**Granularity:** One row per institution per year
**Primary Key:** (unitid, year)
| Column | Type | Description |
|--------|------|-------------|
| unitid | INTEGER | IPEDS institution identifier |
| year | INTEGER | Survey year |
| total_staff | INTEGER | Total staff headcount (OCCUPCAT 100) |
| faculty_total | INTEGER | Faculty headcount (OCCUPCAT 250) |
| management_total | INTEGER | Management headcount (OCCUPCAT 200) |
### raw_ipeds_enrollment
**Source:** IPEDS EF (Fall Enrollment) survey
**Granularity:** One row per institution per year
**Primary Key:** (unitid, year)
| Column | Type | Description |
|--------|------|-------------|
| unitid | INTEGER | IPEDS institution identifier |
| year | INTEGER | Survey year |
| total_enrollment | INTEGER | Total student headcount (EFALEVEL=1 grand total) |
### raw_990_filing
**Source:** IRS 990 e-file XML — filing header
**Granularity:** One row per filing
**Primary Key:** object_id
| Column | Type | Description |
|--------|------|-------------|
| object_id | VARCHAR | IRS e-file object ID (unique filing identifier) |
| ein | VARCHAR | Employer Identification Number |
| tax_year | INTEGER | Tax year of the filing |
| organization_name | VARCHAR | Organization name from the filing |
| return_type | VARCHAR | Return type (990, 990PF) |
| filing_date | DATE | Date the return was filed |
| total_revenue | BIGINT | Total revenue (Part I) |
| total_expenses | BIGINT | Total functional expenses (Part IX) |
| total_assets | BIGINT | Total assets (Balance Sheet) |
### raw_990_part_vii
**Source:** IRS 990 Part VII Section A — Officers, Directors, Key Employees
**Granularity:** One row per person per filing
**Primary Key:** id (auto-increment)
| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Auto-increment surrogate key |
| object_id | VARCHAR | FK to raw_990_filing |
| ein | VARCHAR | Employer Identification Number |
| tax_year | INTEGER | Tax year |
| person_name | VARCHAR | Name of officer/director/key employee |
| title | VARCHAR | Title or position |
| avg_hours_per_week | DOUBLE | Average hours per week devoted to position |
| reportable_comp_from_org | BIGINT | Reportable compensation from the organization |
| reportable_comp_from_related | BIGINT | Reportable compensation from related organizations |
| other_compensation | BIGINT | Other compensation |
### raw_990_schedule_j
**Source:** IRS 990 Schedule J — Compensation Information for Officers, Directors, etc.
**Granularity:** One row per person per filing
**Primary Key:** id (auto-increment)
| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Auto-increment surrogate key |
| object_id | VARCHAR | FK to raw_990_filing |
| ein | VARCHAR | Employer Identification Number |
| tax_year | INTEGER | Tax year |
| person_name | VARCHAR | Name of individual |
| title | VARCHAR | Title or position |
| base_compensation | BIGINT | Base compensation |
| bonus_compensation | BIGINT | Bonus and incentive compensation |
| other_compensation | BIGINT | Other reportable compensation |
| deferred_compensation | BIGINT | Deferred compensation |
| nontaxable_benefits | BIGINT | Nontaxable benefits |
| total_compensation | BIGINT | Total (sum of all compensation components) |
| compensation_from_related | BIGINT | Compensation from related organizations |
### raw_cpi_u
**Source:** BLS CPI-U flat file (cu.data.0.Current)
**Granularity:** One row per month
**Primary Key:** (year, month)
**Filter:** Series CUUR0000SA0 — All Urban Consumers, U.S. City Average, All Items, Not Seasonally Adjusted
| Column | Type | Description |
|--------|------|-------------|
| year | INTEGER | Calendar year |
| month | INTEGER | Month (1-12) |
| value | DOUBLE | CPI-U index value (base period: 1982-84 = 100) |
| series_id | VARCHAR | BLS series identifier (always CUUR0000SA0) |
### raw_ipeds_endowment
**Source:** IPEDS F2 (FASB) finance survey — endowment and investment sections
**Granularity:** One row per institution per year
**Primary Key:** (unitid, year)
**Note:** Endowment fields (F2H*) are available for all years 2005-2023.
| Column | Type | Description | Source field |
|--------|------|-------------|-------------|
| unitid | INTEGER | IPEDS institution identifier | UNITID |
| year | INTEGER | Fiscal year | derived from filename |
| endowment_boy | BIGINT | Endowment value, beginning of fiscal year | F2H01 |
| endowment_eoy | BIGINT | Endowment value, end of fiscal year | F2H02 |
| new_gifts | BIGINT | New gifts and additions to endowment | F2H03A |
| net_investment_return | BIGINT | Net investment return on endowment | F2H03B |
| spending_distribution | BIGINT | Spending distribution for current use (negative) | F2H03C |
| other_changes | BIGINT | Other changes in endowment value | F2H03D |
| total_private_gifts | BIGINT | Total private gifts, grants, and contracts | F2D08 |
| total_investment_return | BIGINT | Total investment return (all funds) | F2D10 |
| long_term_investments | BIGINT | Long-term investments (balance sheet) | F2A01 |
### raw_admin_headcount
**Source:** Web scraping of UD staff directory pages
**Granularity:** One row per staff member per scrape
**Primary Key:** id (auto-increment)
| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Auto-increment surrogate key |
| scrape_date | DATE | Date the page was scraped |
| unit | VARCHAR | Administrative unit (e.g., "Office of the President") |
| person_name | VARCHAR | Staff member name |
| title | VARCHAR | Job title |
| email | VARCHAR | Email address |
| category | VARCHAR | Classified category (LEADERSHIP, FINANCE, IT, etc.) |
| is_overhead | BOOLEAN | True = overhead, False = mission-aligned, NULL = debatable |
## Cross-Source Relationships
- **IPEDS tables** are linked by `unitid` (UD = 130943)
- **IRS 990 tables** are linked by `object_id` (filing) and `ein` (organization)
- **IPEDS → IRS 990:** The `ein` field in `raw_institution` links to `ein` in 990 tables. UD Foundation EINs: 516000297, 516017306
- **CPI-U** is used for inflation adjustment — join on `year` (and optionally `month`) to any table with a year column
- **Endowment** data comes from IPEDS F2 endowment section; 990 `total_assets` provides a cross-check
- **Admin headcount** links to IPEDS via institutional context (UD only in first iteration)