1
results found
As Large Language Models (LLMs) are increasingly deployed in autonomous, high-stakes environments, the fragility of current Reinforcement Learning from Human Feedback (RLHF) alignment protocols remain...
Machine Psychology
AI Psychometrics
Large Language Models (LLMs)
Ontological Dissonance
AI Alignment Constraints
Cognitive Narrowing
Reinforcement Learning from Human Feedback (RLHF)
SinoXiv