Research Article 2026-04-22 posted v1

NCERTQABench: A Large-Scale Bilingual Question Answering Dataset Grounded in Indian School Curriculum with Fine-tuned Language Model Evaluation

A
Abhinav Saxena Motilal Nehru National Institute of Technology
S
Sarsij Tripathi Motilal Nehru National Institute of Technology

Abstract

India’s school education system revolves around the National Council of Educational Research and Training (NCERT) textbooks, yet the research community has largely overlooked them as a source for structured question-answering datasets. We address this gap with NCERTQABench— a collection of 222,880 question-answer pairs drawn from NCERT textbooks spanning Grades 6 to 12. The dataset covers Mathematics, Science, Social Science, Commerce, English literature,and Hindi literature, making it both curriculum-broad and bilingual (English: 78.7%, Hindi:21.3%).  To probe how much domain-specific training actually matters, we fine-tune Qwen2.5-3B-Instruct via Quantized Low-Rank Adaptation (QLoRA) and compare it against the same untrained model on a held-out evaluation set of 6,042 samples (4,712 English, 1,330 Hindi). The fine-tuned model reaches a ROUGE-L score of 0.4373 on English questions against 0.2017 for the zero-shot baseline — a 2.17× improvement. On Hindi questions, evaluated with character-level ROUGE-L (which correctly handles Devanagari script), the fine-tuned model scores 0.5303 versus 0.4266 for the baseline. The zero-shot model fails to produce a single verbatim match(0% Exact Match), while the fine-tuned model reaches 0.93% on English and 0.30% on Hindi. We release the dataset, trained model weights, and evaluation scripts publicly.

Citation Information

@article{abhinavsaxena2026,
  title={NCERTQABench: A Large-Scale Bilingual Question Answering Dataset Grounded in Indian School Curriculum with Fine-tuned Language Model Evaluation},
  author={Abhinav Saxena and Sarsij Tripathi},
  journal={Research Square},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9334872/v1}
}
Back to Top
Home
Paper List
Submit
0.036951s