Longitudinal Inference Without Longitudinal Data

A Copula Based Framework for Modeling Dependence and Growth in Educational Assessment

Damian W. Betebenner

National Center for the Improvement of Educational Assessment

NCIEA Board Meeting · April 2026 · Boston, Massachusetts

AI Contribution & Authorship Statement

This manuscript represents the collaborative intellectual work of the authors with substantive and continuous assistance from AI-based systems (OpenAI GPT, Anthropic Claude), integrated natively into all phases of the research.

The use of AI in this paper is not incidental or peripheral — it is essential, embedded, and reflective of a methodological shift toward AI-native scholarship. This statement is offered in the spirit of methodological clarity and epistemic honesty.

Theoretical development: The foundational idea to extend TAMP using copulas and Sklar’s Theorem emerged in human–AI collaboration.
Mathematical formalization: AI supported the expression of key relationships — the Fréchet–Hoeffding bounds reinterpreting TAMP as a degenerate copula, and the clarification that SGPc is a scaled evaluation of the transition kernel $\partial C / \partial u$ associated with the copula (the conditional CDF $F_{V\mid U=u}(v)$ that carries prior-year ranks to current-year ranks)
Computational modules: AI contributed to the design of R code for copula estimation, simulation, and visualization — forming part of the SGPc R package
Authorship & responsibility: Authors retain full responsibility for all claims (ICMJE/AERA norms). AI is not listed as author but its instrumental role is explicitly acknowledged

A Generally Accepted Fact

In order to perform longitudinal data analysis, one needs longitudinal data.

Growth is change over time — measuring it seems to require repeated measures of the same units at two points of time.
Traditional growth methods — gain scores, value-tables, value-added models, Student Growth Percentiles — all depend on student-level linkage across years.
Without linked data, we have cross-sectional snapshots of student status, not growth.
Most flagship national and international assessments — NAEP, TIMSS, PISA — produce exactly that: rich snapshots, no student-level longitudinal records.
A whole class of policy-relevant growth questions appears off-limits to these programs.

… But Not Always

Under certain circumstances — precisely those that hold for many large-scale assessments in education — cross-sectional data can support credible growth inference.

What those circumstances look like:

Large, representative cohorts measured on comparable constructs across grades/years.
A stable dependence structure between adjacent time points across content areas, grades, and subgroups — an empirical regularity in educational assessment data.
Marginals that are observable even when linkage is not — the time 1 and time 2 score distributions ( $F_X$ and $F_Y$ ).
A principled way to model the dependence separately from the marginals, so that knowledge learned where linkage exists can be transported to settings where it does not.

If we can model how ranks move together, we can reconstruct a valid joint distribution — and therefore growth — from cross-sectional data alone. Copulas are the machinery that makes this possible.

A Counter-Intuitive Claim

The central claim of this work:

Group-level growth inference does not require student-level longitudinal data.

On first impression, this seems impossible. The intuition that longitudinal questions require longitudinal data is deeply ingrained. Support of our claim requires we make two complementary arguments:

A theoretical argument. Sklar’s theorem and the copula decomposition establish that group-level growth is mathematically recoverable from marginals combined with an appropriate dependence structure — the claim is at least possible.
An empirical argument. Across 966 longitudinal conditions, four datasets, and four content areas, the dependence structure in educational data turns out to be remarkably stable, well-approximated by a small family of copulas, and yields growth inferences that are accurate enough to trust — the claim is also practical.

Neither argument stands alone. Theory without evidence is speculation; evidence without theory is coincidence. Together, they make a counter-intuitive claim defensible — which is what the remainder of this presentation will demonstrate.

Four Connections to NCIEA’s Work

Extends Student Growth Percentiles (SGP). SGP is an NCIEA-developed methodology (Betebenner, 2009) now used in dozens of state accountability systems. This copula based framework retains longitudinal SGP when linkage exists while extending SGPc-style growth inference to settings where student-level linkage does not exist.
Unlocks growth inference for NAEP, TIMSS, PISA, and cross-state comparisons. These are programs NCIEA advises on where longitudinal linkage is often impossible by design; the copula framework opens population- and subpopulation-level growth questions that were previously out of reach.
Strengthens the measurement foundation for fair score use. Copulas provide an ordinally defensible basis for growth comparisons, aligning with NCIEA’s long-standing commitment to valid, equitable, and defensible assessment interpretation.
Improves practical monitoring and decision support through a Statistical Process Control (SPC) lens. Combining SGPc with SPC frameworks for process monitoring and signal detection helps distinguish real process change from other forms of change, such as scale drift — supporting clearer, more actionable signals for policy and practice.

A single methodological shift — separating dependence from measurement — strengthens SGP practice, expands to support NAEP-class growth inference, reinforces measurement validity, and sharpens SPC-style monitoring signals.

The Methodology

How Does It Work?

A brief look under the hood.

When We Have Longitudinal Data

With longitudinal data, we observe linked pairs $(x_i, y_i)$ — the same student measured at two time points.

Each dot is one student: prior score $X$ and current score $Y$
The bivariate scatter is the joint distribution $h(x,y)$
From this, we can compute growth directly — conditional distributions, gain scores/trajectories (with a suitable scale), $\ldots$

The joint density $h(x,y)$ encodes everything — the marginals and the dependence between them.

Decomposing the Bivariate Relationship

The joint density $h(x,y)$ can be decomposed into three mathematical pieces: the marginal density $f_X$ (left), the marginal density $f_Y$ (right), and the bivariate scatter encoding dependence between them (center).

The Cross-Sectional Reality

With cross-sectional data like TIMSS or NAEP, the marginals $f_X$ and $f_Y$ are observed — but the joint density $h(x,y)$ connecting them is unknown. The center panel is missing. We need a mathematical bridge to reconstruct the dependence.

The copula is that bridge. It models the dependence structure independently of the marginals, allowing us to reconstruct a valid joint distribution from cross-sectional data alone.

What Is a Copula?

Sklar’s Theorem (1959)

Any joint distribution $H$ with continuous marginals can be decomposed uniquely as:

$H(x,y) = C\big(F_X(x),\, F_Y(y)\big)$

where $C: [0,1]^2 \to [0,1]$ is a copula — a function that couples the marginals into a valid joint distribution.

Separate dependence from measurement. A copula describes how two distributions move together, independently of the scale on which either is measured.
Its inputs are ranks (percentiles), not scale scores — which is why a copula is remarkably stable across content areas, grades, and years, even when score scales differ.
This separation lets us borrow strength: learn the dependence from data where linkage exists, and apply it where only marginals are available.
Sklar’s theorem is the mathematical license for that separation — any valid joint distribution decomposes this way.
The same idea extends naturally to three or more variables (e.g., time points).

Sklar’s decomposition turns the “missing data” problem into a “missing dependence” problem — and copulas provide a principled way to fill that gap.

A Rich Framework for Dependence

Copulas provide a flexible toolkit for modeling dependence that goes far beyond correlation:

Parametric families — Gaussian, Student’s $t$ , Clayton, Gumbel, Frank — each encode different dependence shapes
Tail dependence — the $t$ and Clayton copulas capture the tendency for extreme scores to co-occur, a feature present in education data but absent from Gaussian models
Asymmetric dependence — Archimedean copulas allow lower-tail and upper-tail dependence to differ

The copula is not just a decomposition tool — it is a modular design choice. Swap in a different copula, and you get a different joint distribution with the same marginals.

Figure 1: Copula surfaces from $W$ (lower bound) through $\Pi$ (independence) and Gaussian to $M$ (upper bound).

TAMP: Braun’s Original Insight

Braun (1988) introduced Trajectory Analysis of Matched Percentiles (TAMP) to investigate how growth might be measured on assessments like NAEP where no longitudinal student data exist.

Let $F_0$ and $F_1$ be the CDFs of scores at times 1 and 2
The equipercentile mapping links the two distributions:

$e(x) = F_1^{-1}\!\bigl[F_0(x)\bigr]$

$e(x)$ is the score at time 2 whose percentile rank matches that of $x$ at time 1
The resulting curve emulates a cohort’s growth trajectory from cross-sectional data alone
TAMP is invariant to monotone rescaling: if one cohort shows retarded progress relative to another, that relation holds under all monotonic transformations

AI Breakthrough: TAMP = Comonotonic Copula

TAMP implicitly assumes perfect rank preservation — a student at the $p$ -th percentile at time 1 is assumed to remain at the $p$ -th percentile at time 2. The equipercentile rule is an identity mapping in percentile space.

Once you see TAMP as a copula, you see how to generalize it. Any copula $C$ supports a pseudo-longitudinal model. The comonotonic case is just one — the most extreme/degenerate — choice.

TAMP equipercentile mapping for two groups, reproduced with enhancements from Braun (1988). The diagonal line $H$ is the identity mapping in percentile space.

From Copula to Student Growth Percentile

Copula CDF surface. The t-copula $C_{\nu,\rho}^{(t)}(u,v)$ is the joint CDF of $(U,V)$ . The green slice fixes $v_0=0.3$ . The tangent slope in the $u$ -direction at $(u_0,v_0)=(0.5,0.3)$ equals

$\left.\frac{\partial C}{\partial u}\right|_{(u_0,v_0)} = F_{V\mid U=u_0}(v_0)$

Conditional CDF. For fixed $u_0$ , the partial derivative $\partial C/\partial u$ is a one-dimensional CDF in $v$ . At $v_0=0.3$ the height $\approx 0.27$ , so

$\mathrm{SGPc}(u_0,v_0) \approx 27$

the 27th conditional percentile.

Conditional PDF. Differentiating once more in $v$ yields the conditional density $f_{V\mid U=u_0}(v)$ . The shaded area satisfies

$\int_0^{v_0}\! f_{V\mid U=u_0}(v)\,dv = P(V \le v_0 \mid U\!=\!u_0)$

matching the conditional CDF value and $\mathrm{SGPc}$ .

The copula encodes the Student Growth Percentile of any pair of pseudo-observations as a tangent-line slope to the copula surface.

The Key Question

Two cases arise in practice:

Longitudinal data exist. Fit an empirical copula directly from paired observations.
Only cross-sectional data. Can we substitute an “off-the-shelf” parametric copula and still recover meaningful growth inferences?

The probability-integral transform $U\!=\!F_X(x),\; V\!=\!F_Y(y)$ strips all metric structure, mapping every score distribution onto $[0,1]$ . In this uniform space, educational data exhibit remarkably similar dependence structures regardless of content area, grade span, or subgroup.

Our bet: Once the PIT removes metric structure from the scales, a simple parametric copula is a good-enough proxy for the unknown empirical copula — especially for group-level summaries with $N \geq 2{,}500$ .

From scores to pseudo-observations to copula: the marginal transformation via the probability-integral transform with empirical and parametric copula contours.

Which Copula? A Sensitivity Analysis

966 longitudinal conditions across 4 datasets, 4 content areas, and year spans 1–4. Six copula families fitted to each; model selected by AIC with bootstrap goodness-of-fit.

Family	Best (AIC)	Share
t	614	63.6%
frank	297	30.7%
gumbel	35	3.6%
gaussian	20	2.1%
comonotonic	0	0.0%

The $t$ -copula dominates every year span (58–65%) and content area. It beats the Gaussian in 98% of head-to-head comparisons (mean $\Delta$ AIC = 169–241).

Compute: 5,796 fits $\times$ bootstrap $\approx$ 39 core-hours, parallelised across 192 vCPUs (AWS m8g.48xlarge)

Copula family selection by content area and year span for 966 longitudinal conditions. Each dot is one condition; colour indicates AIC-best family.

A single $t$ -copula family covers >94% of observed educational dependence structures — the canonical copula is not an arbitrary choice but an empirically justified default.

Longitudinal Inference Without Longitudinal Data

Observe unlinked marginals. We have the prior-year PDF $f_U(u)$ and current-year PDF $f_V(v)$ — but no student-level pairs $(u_i, v_i)$ . The PIT yields marginal pseudo-observations $U$ and $V$ whose joint dependence is unknown.

Estimate the growth regime. Search the Beta family $\mathcal{H} = \{H_{m,\kappa}\}$ for the regime $\widehat{H}_S$ that minimises the Wasserstein distance between the observed $F_V^{\mathrm{obs}}$ and the predicted current-year CDF induced by the canonical copula:

$\widehat{H}_S = \arg\min_{H \in \mathcal{H}} W_1\!\bigl(F_V^{\,\mathrm{obs}},\; F_H\bigr)$

Verify the fit. The forward CDF check compares the observed current-year CDF (solid) against the CDF predicted by the estimated regime passed through the canonical copula kernel (dashed). Close agreement confirms the regime recovers the marginal.

From marginals to growth: the canonical copula provides the dependence kernel; the minimum-distance regime $\widehat{H}_S$ recovers mean SGPc — all without student-level linkage.

Accuracy: Cross-Condition Validation

239 subgroups across multiple conditions, datasets, content areas, year spans, and pool types.

$r = 0.995$ between inferred and true mean SGPc
MAE = 0.40 SGPc points
Dashed = identity; dotted = $\pm 2$ SGPc envelope

The canonical $t$ copula approach recovers subgroup growth with median absolute error under 0.5 percentile points across the full validation battery.

At the individual student level (37.4M observations):

Best-fit vs canonical: $r = 0.993$ , MAD = 2.2 pts
Empirical vs canonical: $r = 0.979$ , MAD = 3.8 pts
Comonotonic (TAMP): $r = 0.811$ , MAD = 26.1 pts

Cross-condition validation: inferred vs true mean *SGPc* across 239 subgroups. Point size reflects subgroup $N$ ; shape reflects pool type (district vs cluster); color reflects content area.

Precision: The Linkage Premium

What do we lose without student-level linkage?

For subgroups of $N \approx 2{,}500$ (typical NAEP/TIMSS state samples). All values in SGPc units (0–100):

Metric	Paired (linked)	Independent (unlinked)	Ratio
SE (mean SGPc)	0.60	1.48	2.5 $\times$
CI width (mean)	2.16	5.65	2.6 $\times$
SE (median SGPc)	0.79	1.96	2.5 $\times$
CI width (median)	2.91	7.48	2.6 $\times$

Linkage buys a $\sim 2.5 \times$ precision gain. But even the unlinked mean SGPc CI width ( $\sim5.7$ pts) is actionable for policy comparisons at scale. The price of cross-sectionality is quantifiable and manageable.

Copula parameter stability:

The $t$ copula is recommended across all year spans:

Year span	$\tau$ (median)	$\rho$ (median)	$df$ (median)
1	0.638	0.842	25.6
2	0.601	0.810	28.7
3	0.574	0.784	28.2
4	0.553	0.764	27.4

Dependence attenuates with time span as expected, but the structure remains remarkably stable.

Summary

Cross-sectional data are sufficient for group-level growth inference. The canonical $t$ -copula supplies the dependence kernel; the minimum-distance regime recovers mean SGPc from marginals alone — no student-level linkage required.
The canonical copula is empirically grounded. Across 966 conditions, 4 datasets, and 4 content areas, the $t$ -copula is selected in 64% of cases, and together with the Frank covers >94%. Dependence parameters attenuate smoothly with year span ( $\tau$ : 0.64 $\to$ 0.55).
Accuracy is high; the precision cost is quantifiable. Inferred vs true mean SGPc: $r = 0.995$ , MAE $= 0.40$ pts. Without linkage, confidence intervals widen $\sim\!2.5\times$ — a manageable price for programs (NAEP, TIMSS, PISA) where paired data simply do not exist.

Growth measurement no longer requires longitudinal data — it requires marginals, a canonical copula, and a minimum-distance fit.

Next Steps

In the next few weeks.

Run SGPc alongside existing SGP analyses on state longitudinal datasets to surface what copulas reveal that standard SGP cannot:
1. Characterize the full dependence structure — not just rank correlation, but the shape of how ranks move together across grades/years & content areas (e.g., mathematics vs ELA).
2. Diagnose tail behavior — does the system retain its high performers? does it release students from the bottom?
3. Read the result as a policy signal:
  - Desirable: not sticky at the bottom, sticky at the top — low-scoring students have upward mobility; high-scoring students sustain their performance.
  - Undesirable: sticky at the bottom, porous at the top — entrenched low performance paired with instability among high performers.

Over the coming year.

Apply the framework to NAEP and TIMSS. Produce the first population- and subpopulation-level growth inferences for flagship assessments that were designed without student-level longitudinal linkage — the natural proving ground for this methodology.

These steps walk the Four NCIEA Connections in sequence: extending SGP (1), unlocking growth inference for NAEP- and TIMSS-class assessments (2), exercising the ordinally defensible measurement foundation (3), and opening the door to SPC-style monitoring of tail behavior (4). The work sits squarely within the Center’s mission — both within its near- and long-term agendas.

Longitudinal Inference Without Longitudinal Data

A Copula Based Framework for Modeling Dependence and Growth in Educational Assessment

Damian W. Betebenner
National Center for the Improvement of Educational Assessment
dbetebenner@nciea.org

Thank you

Appendix

Longitudinal Inference Without Longitudinal Data

Technical Background & Supplemental Material

Mathematical Conventions

The development is distribution-function first — CDFs are the primitive objects, not densities.

Key notation:

Symbol	Meaning
$F_X(x)$	CDF of $X$
$f_X(x)$	Density of $X$
$F_X^{-1}(u)$	Quantile function
$C(u,v)$	Copula
$U, V$	Pseudo-observations on $[0,1]$
$P$	Latent conditional percentile
$H_S(p)$	Subgroup growth regime CDF

Why CDF-first?

Every real-valued random variable has a CDF; densities require additional regularity
The probability-integral transform: if $X$ is continuous with CDF $F_X$ , then $U = F_X(X) \sim \text{Uniform(0,1)}$
Sklar’s representation is itself CDF-based: $H(x_1,\dots,x_d) = C(F_1(x_1),\dots,F_d(x_d))$
The mathematical development is much more elegant and clear

Extension: Implications for Psychometrics

In Order We Trust

If a score is defensible only up to strictly increasing transformations (ordinal meaning), then:

The copula is the maximal invariant of dependence available under that measurement regime.

Two joint distributions are related by coordinatewise strictly increasing transformations if and only if they share the same copula.

This places the classical “permissible statistics” debate (Stevens, Luce & Suppes) into direct contact with copula theory:

The measurement level determines the transformation group
The transformation group determines what can be meaningfully inferred
Under ordinal measurement, dependence reduces canonically to the copula

What this means

Copulas are not merely compatible with rank-based analysis — they are the dependence-theoretic core of it.

— Schweizer & Wolff (1981)

For educational measurement:

Growth claims using scale score differences smuggle in interval assumptions the data don’t support
Copula based growth (SGPc) is the natural coordinate system for ordinally defensible longitudinal inference
This isn’t a technical refinement — it’s a foundational measurement statement

Copula based growth is the measurement-theoretic consequence of taking ordinal meaning seriously. That is exactly the separation-of-concerns the Sklar factorization is trying to teach: dependence lives in $C$ ; measurement choices live in the margins.

In Order We Trust (PDF)

Extension: Causal Modeling with Copulas

Copulas as a Tool for Causal Inference

Omitted variable bias (OVB) is the core problem of causal inference in non-experimental research. Standard remedies — covariate adjustment, instrumental variables — require external information that is often unavailable or unreliable.

Park & Gupta (2012) introduced a copula correction that requires neither covariates nor instruments:

Specify the dependence between an endogenous regressor $X$ and the structural error $\varepsilon_Y$ via a Gaussian copula
Add the copula correction term $C = \Phi^{-1}(F_X(X))$ as an additional regressor
This “absorbs” the confounding, allowing unbiased estimation of the causal effect $\beta$

The key identification assumption is mild: $X$ must be non-normally distributed — a condition routinely satisfied by educational assessment data.

Relevance to Educational Measurement

The copula decomposition that separates marginals from dependence is the same machinery used here for growth inference — now applied to the endogeneity problem.

Why this matters for us:

Assessment scores are rarely Gaussian, so the identification condition is naturally met
The copula correction can be applied to any observational regression in education — program effects, teacher value-added, policy impacts
Falkenström, Park & McIntosh (2022) confirmed via simulation that the method is robust across sample sizes and effect sizes when skewness is moderate

The copula is not just a tool for modeling dependence — it is a tool for protecting causal claims from unobserved confounding.

Extension: Statistical Process Control

In Signals We Trust

SPC and copula based growth are complements, not competitors:

SPC provides the logic of regime detection (common-cause vs special-cause variation)
SGPc provides an ordinally meaningful coordinate system for monitoring
The copula decomposition clarifies what kind of signal occurred

A regime departure can arise through:

Shift in conditional location
Spread in conditional dispersion
Coupling change in the dependence structure

Scale drift as a special-cause problem

If scale score diagnostics move while copula/SGPc diagnostics remain stable, the alarm is living in the units, not the order.

If the copula itself moves, the problem is no longer mere relabeling of the scale.

The copula–SPC lens:

SPC tells us whether the regime moved
The copula decomposition tells us how
SGPc provides the measurement-invariant coordinate in which those questions can be asked
The contrast between margin-sensitive and copula-sensitive diagnostics treats scale drift as a special-cause problem

Selected References

Betebenner, D. W. (2009). Norm- and criterion-referenced student growth. Educational Measurement: Issues and Practice, 28(4), 42–51.

Braun, H. I. (1988). A new approach to avoiding problems of scale in interpreting trends in mental measurement data. Journal of Educational Measurement, 25(3), 171–191.

Falkenström, F., Park, S., & McIntosh, C. N. (2022). Using copulas to enable causal inference from non-experimental data: Tutorial and simulation studies. Psychological Methods, 27(6), 1009–1039.

Nelsen, R. B. (2006). An Introduction to Copulas (2nd ed.). Springer.

Park, S., & Gupta, S. (2012). Handling endogenous regressors by joint estimation using copulas. Marketing Science, 31(4), 567–586.

Schweizer, B., & Wolff, E. F. (1981). On nonparametric measures of dependence for random variables. Annals of Statistics, 9(4), 879–885.

Sklar, A. (1959). Fonctions de répartition à $n$ dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris, 8, 229–231.