Phase 1 project prototype
This commit is contained in:
parent
29215e2bd2
commit
2c9ae1c312
29 changed files with 2967 additions and 22 deletions
173
docs/data_dictionary.md
Normal file
173
docs/data_dictionary.md
Normal file
|
|
@ -0,0 +1,173 @@
|
|||
# Data Dictionary
|
||||
|
||||
Raw data layer for University of Delaware administrative analytics. All tables are prefixed `raw_` and loaded into DuckDB.
|
||||
|
||||
## Tables
|
||||
|
||||
### raw_institution
|
||||
|
||||
**Source:** IPEDS HD (Institutional Characteristics) survey
|
||||
**Granularity:** One row per institution per year
|
||||
**Primary Key:** (unitid, year)
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| unitid | INTEGER | IPEDS institution identifier |
|
||||
| year | INTEGER | Survey year |
|
||||
| ein | VARCHAR | IRS Employer Identification Number |
|
||||
| institution_name | VARCHAR | Institution name |
|
||||
| city | VARCHAR | City |
|
||||
| state | VARCHAR | State abbreviation |
|
||||
| sector | INTEGER | IPEDS sector code (public/private/for-profit) |
|
||||
| control | INTEGER | IPEDS control code (1=public, 2=private nonprofit, 3=private for-profit) |
|
||||
| carnegie_class | INTEGER | Carnegie Classification code |
|
||||
| enrollment_total | INTEGER | Total enrollment from HD survey |
|
||||
|
||||
### raw_ipeds_finance
|
||||
|
||||
**Source:** IPEDS F1A (GASB public) and F2 (FASB private) finance surveys
|
||||
**Granularity:** One row per institution per year
|
||||
**Primary Key:** (unitid, year)
|
||||
**Note:** UD reports under FASB (F2) despite being public. The loader tries both F1A and F2.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| unitid | INTEGER | IPEDS institution identifier |
|
||||
| year | INTEGER | Fiscal year |
|
||||
| reporting_standard | VARCHAR | "GASB" or "FASB" |
|
||||
| total_expenses | BIGINT | Total expenses |
|
||||
| instruction_expenses | BIGINT | Instruction function expenses |
|
||||
| research_expenses | BIGINT | Research function expenses |
|
||||
| public_service_expenses | BIGINT | Public service function expenses |
|
||||
| academic_support_expenses | BIGINT | Academic support function expenses |
|
||||
| student_services_expenses | BIGINT | Student services function expenses |
|
||||
| institutional_support_expenses | BIGINT | Institutional support (admin) expenses |
|
||||
| auxiliary_expenses | BIGINT | Auxiliary enterprises expenses |
|
||||
| hospital_expenses | BIGINT | Hospital services expenses |
|
||||
| other_expenses | BIGINT | Other expenses |
|
||||
| salaries_wages | BIGINT | Total salaries and wages |
|
||||
| benefits | BIGINT | Total employee benefits |
|
||||
|
||||
### raw_ipeds_staff
|
||||
|
||||
**Source:** IPEDS S (Fall Staff) survey, occupational categories
|
||||
**Granularity:** One row per institution per year
|
||||
**Primary Key:** (unitid, year)
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| unitid | INTEGER | IPEDS institution identifier |
|
||||
| year | INTEGER | Survey year |
|
||||
| total_staff | INTEGER | Total staff headcount (OCCUPCAT 100) |
|
||||
| faculty_total | INTEGER | Faculty headcount (OCCUPCAT 250) |
|
||||
| management_total | INTEGER | Management headcount (OCCUPCAT 200) |
|
||||
|
||||
### raw_ipeds_enrollment
|
||||
|
||||
**Source:** IPEDS EF (Fall Enrollment) survey
|
||||
**Granularity:** One row per institution per year
|
||||
**Primary Key:** (unitid, year)
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| unitid | INTEGER | IPEDS institution identifier |
|
||||
| year | INTEGER | Survey year |
|
||||
| total_enrollment | INTEGER | Total student headcount (EFALEVEL=1 grand total) |
|
||||
|
||||
### raw_990_filing
|
||||
|
||||
**Source:** IRS 990 e-file XML — filing header
|
||||
**Granularity:** One row per filing
|
||||
**Primary Key:** object_id
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| object_id | VARCHAR | IRS e-file object ID (unique filing identifier) |
|
||||
| ein | VARCHAR | Employer Identification Number |
|
||||
| tax_year | INTEGER | Tax year of the filing |
|
||||
| organization_name | VARCHAR | Organization name from the filing |
|
||||
| return_type | VARCHAR | Return type (990, 990PF) |
|
||||
| filing_date | DATE | Date the return was filed |
|
||||
| total_revenue | BIGINT | Total revenue (Part I) |
|
||||
| total_expenses | BIGINT | Total functional expenses (Part IX) |
|
||||
| total_assets | BIGINT | Total assets (Balance Sheet) |
|
||||
|
||||
### raw_990_part_vii
|
||||
|
||||
**Source:** IRS 990 Part VII Section A — Officers, Directors, Key Employees
|
||||
**Granularity:** One row per person per filing
|
||||
**Primary Key:** id (auto-increment)
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | INTEGER | Auto-increment surrogate key |
|
||||
| object_id | VARCHAR | FK to raw_990_filing |
|
||||
| ein | VARCHAR | Employer Identification Number |
|
||||
| tax_year | INTEGER | Tax year |
|
||||
| person_name | VARCHAR | Name of officer/director/key employee |
|
||||
| title | VARCHAR | Title or position |
|
||||
| avg_hours_per_week | DOUBLE | Average hours per week devoted to position |
|
||||
| reportable_comp_from_org | BIGINT | Reportable compensation from the organization |
|
||||
| reportable_comp_from_related | BIGINT | Reportable compensation from related organizations |
|
||||
| other_compensation | BIGINT | Other compensation |
|
||||
|
||||
### raw_990_schedule_j
|
||||
|
||||
**Source:** IRS 990 Schedule J — Compensation Information for Officers, Directors, etc.
|
||||
**Granularity:** One row per person per filing
|
||||
**Primary Key:** id (auto-increment)
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | INTEGER | Auto-increment surrogate key |
|
||||
| object_id | VARCHAR | FK to raw_990_filing |
|
||||
| ein | VARCHAR | Employer Identification Number |
|
||||
| tax_year | INTEGER | Tax year |
|
||||
| person_name | VARCHAR | Name of individual |
|
||||
| title | VARCHAR | Title or position |
|
||||
| base_compensation | BIGINT | Base compensation |
|
||||
| bonus_compensation | BIGINT | Bonus and incentive compensation |
|
||||
| other_compensation | BIGINT | Other reportable compensation |
|
||||
| deferred_compensation | BIGINT | Deferred compensation |
|
||||
| nontaxable_benefits | BIGINT | Nontaxable benefits |
|
||||
| total_compensation | BIGINT | Total (sum of all compensation components) |
|
||||
| compensation_from_related | BIGINT | Compensation from related organizations |
|
||||
|
||||
### raw_cpi_u
|
||||
|
||||
**Source:** BLS CPI-U flat file (cu.data.0.Current)
|
||||
**Granularity:** One row per month
|
||||
**Primary Key:** (year, month)
|
||||
**Filter:** Series CUUR0000SA0 — All Urban Consumers, U.S. City Average, All Items, Not Seasonally Adjusted
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| year | INTEGER | Calendar year |
|
||||
| month | INTEGER | Month (1-12) |
|
||||
| value | DOUBLE | CPI-U index value (base period: 1982-84 = 100) |
|
||||
| series_id | VARCHAR | BLS series identifier (always CUUR0000SA0) |
|
||||
|
||||
### raw_admin_headcount
|
||||
|
||||
**Source:** Web scraping of UD staff directory pages
|
||||
**Granularity:** One row per staff member per scrape
|
||||
**Primary Key:** id (auto-increment)
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | INTEGER | Auto-increment surrogate key |
|
||||
| scrape_date | DATE | Date the page was scraped |
|
||||
| unit | VARCHAR | Administrative unit (e.g., "Office of the President") |
|
||||
| person_name | VARCHAR | Staff member name |
|
||||
| title | VARCHAR | Job title |
|
||||
| email | VARCHAR | Email address |
|
||||
| category | VARCHAR | Classified category (LEADERSHIP, FINANCE, IT, etc.) |
|
||||
| is_overhead | BOOLEAN | True = overhead, False = mission-aligned, NULL = debatable |
|
||||
|
||||
## Cross-Source Relationships
|
||||
|
||||
- **IPEDS tables** are linked by `unitid` (UD = 130943)
|
||||
- **IRS 990 tables** are linked by `object_id` (filing) and `ein` (organization)
|
||||
- **IPEDS → IRS 990:** The `ein` field in `raw_institution` links to `ein` in 990 tables. UD Foundation EINs: 516000297, 516017306
|
||||
- **CPI-U** is used for inflation adjustment — join on `year` (and optionally `month`) to any table with a year column
|
||||
- **Admin headcount** links to IPEDS via institutional context (UD only in first iteration)
|
||||
Loading…
Add table
Add a link
Reference in a new issue