Introduction
AI alignment—ensuring that artificial intelligence systems act in accordance with human values—faces a fundamental challenge: human values are not static. What societies consider morally acceptable changes across generations, sometimes dramatically. Slavery was widely accepted for millennia before abolition movements succeeded in the 19th century. Women’s suffrage, interracial marriage, and LGBTQ+ rights have all undergone rapid shifts in public opinion within living memory.
Current approaches to AI alignment typically treat values as fixed targets. Reinforcement Learning from Human Feedback (RLHF) trains systems on contemporary human preferences Christiano et al., 2017. Constitutional AI defines principles the system should follow Bai et al., 2022. These approaches implicitly assume we know what values to align with—either our current values or a set of principles we can articulate in advance.
But if values evolve, which values should we use? Our current preferences may be biased, incomplete, or destined to change. Philosophers have proposed aligning AI to “idealized” values—what fully rational, fully informed agents would want MacAskill, 2014. This approach, while philosophically attractive, lacks empirical grounding: how do we compute what idealized agents would choose?
We propose a different framing: treat value alignment partly as a forecasting problem. Rather than asking “what are the correct values?” we ask “where are human values heading?” This transforms an intractable philosophical question into an empirical one. If AI systems can predict how values evolve—learning patterns from historical data—those predictions could inform alignment targets.
This approach offers several advantages:
Empirically testable: Unlike philosophical idealization, forecasting accuracy can be measured against actual value changes.
Uncertainty-aware: Forecasts naturally come with confidence intervals, acknowledging that we don’t know future values with certainty.
Heterogeneity-preserving: Rather than assuming humanity converges to one value system, we can forecast the distribution of values across populations.
Temporally grounded: We align to projected post-reflection values rather than current, possibly transient preferences.
In this paper, we test the core empirical premise: can language models forecast human value evolution? We use the General Social Survey (GSS), which has tracked American public opinion since 1972, as ground truth. The GSS provides over 50 years of data on attitudes toward controversial topics—same-sex relationships, marijuana legalization, abortion, and more—enabling rigorous historical validation.
Our contributions are:
We establish a methodology for testing value forecasting using temporal holdout: train on data before a cutoff year, predict values after, validate against actual survey results.
We show that LLMs outperform time series baselines (linear extrapolation, ARIMA, exponential smoothing) by 2.2× on historical value prediction.
Using a methodologically clean test—GPT-4o predicting GSS 2024 data that postdates its training—we demonstrate a critical failure mode: the model predicted continued liberalization while actual values reversed.
We analyze the reversal, finding it occurred across demographic groups but was concentrated among Republicans and young adults, consistent with backlash dynamics.
We discuss implications for AI alignment: value forecasting may require predicting not just trends but inflection points where progress triggers counter-mobilization.
The remainder of this paper is organized as follows. Section 2 reviews related work on moral change, axiological futurism, and LLM-based survey prediction. Section 3 describes our data and methods. Section 4 presents results. Section 5 discusses implications and limitations. Section 6 concludes.
- Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30.
- Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., & others. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv Preprint arXiv:2212.08073.
- MacAskill, W. (2014). Normative uncertainty [Phdthesis]. University of Oxford.