AdminAnalytics/docs/data_dictionary.md
2026-03-30 19:29:33 -04:00

7.2 KiB

Data Dictionary

Raw data layer for University of Delaware administrative analytics. All tables are prefixed raw_ and loaded into DuckDB.

Tables

raw_institution

Source: IPEDS HD (Institutional Characteristics) survey Granularity: One row per institution per year Primary Key: (unitid, year)

Column Type Description
unitid INTEGER IPEDS institution identifier
year INTEGER Survey year
ein VARCHAR IRS Employer Identification Number
institution_name VARCHAR Institution name
city VARCHAR City
state VARCHAR State abbreviation
sector INTEGER IPEDS sector code (public/private/for-profit)
control INTEGER IPEDS control code (1=public, 2=private nonprofit, 3=private for-profit)
carnegie_class INTEGER Carnegie Classification code
enrollment_total INTEGER Total enrollment from HD survey

raw_ipeds_finance

Source: IPEDS F1A (GASB public) and F2 (FASB private) finance surveys Granularity: One row per institution per year Primary Key: (unitid, year) Note: UD reports under FASB (F2) despite being public. The loader tries both F1A and F2.

Column Type Description
unitid INTEGER IPEDS institution identifier
year INTEGER Fiscal year
reporting_standard VARCHAR "GASB" or "FASB"
total_expenses BIGINT Total expenses
instruction_expenses BIGINT Instruction function expenses
research_expenses BIGINT Research function expenses
public_service_expenses BIGINT Public service function expenses
academic_support_expenses BIGINT Academic support function expenses
student_services_expenses BIGINT Student services function expenses
institutional_support_expenses BIGINT Institutional support (admin) expenses
auxiliary_expenses BIGINT Auxiliary enterprises expenses
hospital_expenses BIGINT Hospital services expenses
other_expenses BIGINT Other expenses
salaries_wages BIGINT Total salaries and wages
benefits BIGINT Total employee benefits

raw_ipeds_staff

Source: IPEDS S (Fall Staff) survey, occupational categories Granularity: One row per institution per year Primary Key: (unitid, year)

Column Type Description
unitid INTEGER IPEDS institution identifier
year INTEGER Survey year
total_staff INTEGER Total staff headcount (OCCUPCAT 100)
faculty_total INTEGER Faculty headcount (OCCUPCAT 250)
management_total INTEGER Management headcount (OCCUPCAT 200)

raw_ipeds_enrollment

Source: IPEDS EF (Fall Enrollment) survey Granularity: One row per institution per year Primary Key: (unitid, year)

Column Type Description
unitid INTEGER IPEDS institution identifier
year INTEGER Survey year
total_enrollment INTEGER Total student headcount (EFALEVEL=1 grand total)

raw_990_filing

Source: IRS 990 e-file XML — filing header Granularity: One row per filing Primary Key: object_id

Column Type Description
object_id VARCHAR IRS e-file object ID (unique filing identifier)
ein VARCHAR Employer Identification Number
tax_year INTEGER Tax year of the filing
organization_name VARCHAR Organization name from the filing
return_type VARCHAR Return type (990, 990PF)
filing_date DATE Date the return was filed
total_revenue BIGINT Total revenue (Part I)
total_expenses BIGINT Total functional expenses (Part IX)
total_assets BIGINT Total assets (Balance Sheet)

raw_990_part_vii

Source: IRS 990 Part VII Section A — Officers, Directors, Key Employees Granularity: One row per person per filing Primary Key: id (auto-increment)

Column Type Description
id INTEGER Auto-increment surrogate key
object_id VARCHAR FK to raw_990_filing
ein VARCHAR Employer Identification Number
tax_year INTEGER Tax year
person_name VARCHAR Name of officer/director/key employee
title VARCHAR Title or position
avg_hours_per_week DOUBLE Average hours per week devoted to position
reportable_comp_from_org BIGINT Reportable compensation from the organization
reportable_comp_from_related BIGINT Reportable compensation from related organizations
other_compensation BIGINT Other compensation

raw_990_schedule_j

Source: IRS 990 Schedule J — Compensation Information for Officers, Directors, etc. Granularity: One row per person per filing Primary Key: id (auto-increment)

Column Type Description
id INTEGER Auto-increment surrogate key
object_id VARCHAR FK to raw_990_filing
ein VARCHAR Employer Identification Number
tax_year INTEGER Tax year
person_name VARCHAR Name of individual
title VARCHAR Title or position
base_compensation BIGINT Base compensation
bonus_compensation BIGINT Bonus and incentive compensation
other_compensation BIGINT Other reportable compensation
deferred_compensation BIGINT Deferred compensation
nontaxable_benefits BIGINT Nontaxable benefits
total_compensation BIGINT Total (sum of all compensation components)
compensation_from_related BIGINT Compensation from related organizations

raw_cpi_u

Source: BLS CPI-U flat file (cu.data.0.Current) Granularity: One row per month Primary Key: (year, month) Filter: Series CUUR0000SA0 — All Urban Consumers, U.S. City Average, All Items, Not Seasonally Adjusted

Column Type Description
year INTEGER Calendar year
month INTEGER Month (1-12)
value DOUBLE CPI-U index value (base period: 1982-84 = 100)
series_id VARCHAR BLS series identifier (always CUUR0000SA0)

raw_admin_headcount

Source: Web scraping of UD staff directory pages Granularity: One row per staff member per scrape Primary Key: id (auto-increment)

Column Type Description
id INTEGER Auto-increment surrogate key
scrape_date DATE Date the page was scraped
unit VARCHAR Administrative unit (e.g., "Office of the President")
person_name VARCHAR Staff member name
title VARCHAR Job title
email VARCHAR Email address
category VARCHAR Classified category (LEADERSHIP, FINANCE, IT, etc.)
is_overhead BOOLEAN True = overhead, False = mission-aligned, NULL = debatable

Cross-Source Relationships

  • IPEDS tables are linked by unitid (UD = 130943)
  • IRS 990 tables are linked by object_id (filing) and ein (organization)
  • IPEDS → IRS 990: The ein field in raw_institution links to ein in 990 tables. UD Foundation EINs: 516000297, 516017306
  • CPI-U is used for inflation adjustment — join on year (and optionally month) to any table with a year column
  • Admin headcount links to IPEDS via institutional context (UD only in first iteration)