Research Article 2026-04-21 posted v1

SuperCluster: Learning Prediction Prototypes via Target-Guided Cross-Attention Clustering

A
Aaron Danielson The University of Texas at Austin

Abstract

 Clustering is widely used to discover interpretable structure in data, but  conventional unsupervised methods group examples by \emph{feature similarity}  alone, ignoring any available prediction target. When the goal is to explain  \emph{why} certain examples are predicted differently, feature-similar clusters  can be misleading: they may lump together examples with very different outcomes.  We propose \textbf{SuperCluster}, a method that learns $K$ prediction prototypes  jointly with a supervised objective. Examples are routed to prototypes via a  cross-attention mechanism, and the final prediction is a weighted average of  the prototype logits. This forces the  model to discover $K$ predictive archetypes: groups that are both  feature-coherent and share a common prediction regime. We evaluate  SuperCluster on four tabular binary classification benchmarks (NBA shot  prediction, bank telemarketing subscription, adult income, and credit card  default) and a 7-class benchmark (UCI Covertype, $n=581$k), showing that  target-guided clustering produces clusters qualitatively different from  feature-only k-means clusters, matches or exceeds a deep MLP in accuracy on  all five datasets, and consistently outperforms an unsupervised clustering  baseline by 1.6--4.3 percentage points in AUC on the binary tasks.  A key design choice separates prototype geometry (used for routing) from  prototype prediction scores (used for output), so that each prototype's  predicted probability $\sigma(s_k)$ is directly readable without going  through a downstream linear head. Beyond prediction and interpretation,  SuperCluster provides an operational tool for estimating the effective  number of latent prediction regimes: over-specify the prototype budget $K$  and examine which prototypes are stably occupied across seeds and values of $K$.  On Covertype, the model activates 6 prototypes that map closely onto  distinct forest cover types, suggesting that $K_{\text{regime}}$ may grow  with prediction dimensionality $C$.

Citation Information

@article{aarondanielson2026,
  title={SuperCluster: Learning Prediction Prototypes via Target-Guided Cross-Attention Clustering},
  author={Aaron Danielson},
  journal={Research Square},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9163443/v1}
}
Back to Top
Home
Paper List
Submit
0.019798s