Article 2026-04-22 posted v1

Mechanism Design for Incentivizing User Feedback for Large Language Models

Z
Zhonglin Liu The University of Hong Kong
J
Jussi Keppo National University of Singapore
M
Murari Mandal Kalinga Institute of Industrial Technology
H
Hong Ming Tan National University of Singapore

Abstract

Large language models require high-quality human feedback for alignment and fine-tuning, yet platforms face the fundamental challenge of incentivizing valuable contributions while screening out potentially harmful feedback from non-experts. We develop a comprehensive mechanism design framework for this quality control problem, modeling a platform that interacts with heterogeneous users who differ in their ability to provide helpful feedback. High-type users (experts) generate valuable training data with high probability, while low-type users (non-experts) are more likely to provide feedback that degrades model performance. Our theoretical analysis characterizes optimal reward-and-penalty mechanisms coupled with costly verification across different equilibrium regimes. We identify a critical boundary condition that partitions the parameter space into normal separation (where high-quality users dominate), mixed strategy, and reverse screening (where low-quality users dominate) regions. Verification serves as the primary strategic instrument, with optimal mechanisms featuring full verification under moderate costs and selective verification as expenses escalate. Importantly, optimal mechanisms exhibit a penalty-constrained structure where deterrent effects outweigh reward incentives. Counterintuitively, our analysis reveals an inverted-U relationship between population quality and platform profitability, with peak profits at moderate rather than maximal creator quality levels. This emerges because high-quality populations trigger optimally reduced verification intensity, where cost savings from selective screening are insufficient to offset foregone verification benefits. We validate our theoretical predictions through simulations using a bigram language model, confirming substantial quality differentiation between user types. The framework provides actionable insights suggesting that creator diversity can be more profitable than pursuing exclusively high-quality participants, and that verification capabilities critically determine optimal screening mechanisms in AI training environments.

Citation Information

@article{zhonglinliu2026,
  title={Mechanism Design for Incentivizing User Feedback for Large Language Models},
  author={Zhonglin Liu and Jussi Keppo and Murari Mandal and Hong Ming Tan},
  journal={Research Square},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-8771074/v1}
}
Back to Top
Home
Paper List
Submit
0.019053s