Knowledge Grounded Conversational Academic Guidance via OCR Enabled Hybrid Retrieval, Retrieval Augmented Generation, and Neuro Symbolic Validation
Abstract
Academic institutions manage large volumes of information including regulations, schedules, policies, and notices that students and staff frequently need to access. Existing conversational systems rely on large language models that generate responses from internal knowledge, leading to inaccurate or hallucinated answers, especially when dealing with scanned documents and legacy records. This paper proposes a knowledge centric conversational framework that grounds responses directly in authoritative institutional documents rather than model memory. The framework combines optical character recognition (OCR) for extracting text from scanned and image based documents, hybrid retrieval using both semantic similarity and keyword matching, and retrieval augmented generation (RAG) to produce context aware responses. A rule based and neuro symbolic validation module further checks each response for policy compliance and factual consistency. The system was evaluated on a heterogeneous dataset of 1,200 institutional documents spanning structured, unstructured, semi structured, and scanned formats. The proposed framework achieves 89.6% response accuracy, 0.86 precision, 0.84 recall, an F1 score of 0.85, and reduces the hallucination rate to just 4.1%. Compared to existing baselines including Self RAG and CRAG, this approach delivers significantly improved reliability, contextual accuracy, and traceability for real time academic guidance systems. The implementation in this study are publicly available at DOI: https://doi.org/10.5281/zenodo.19230595.
Citation Information
@article{veerababureddy2026,
title={Knowledge Grounded Conversational Academic Guidance via OCR Enabled Hybrid Retrieval, Retrieval Augmented Generation, and Neuro Symbolic Validation},
author={Veerababu Reddy and Krishnaveni Kaki and Bharath Kesineni and Iswarya Annapureddy and Harini Kanna and Gargesh Chalamala},
journal={Research Square},
year={2026},
doi={https://doi.org/10.21203/rs.3.rs-9481282/v1}
}
SinoXiv