1
results found
Cross-modal image-text retrieval aims to precisely match visual content with natural language descriptions, a task pivotal in multimodal understanding. Despite advancements in feature extraction and a...
Cross-Modal Image-Text Retrieval
Multi-Scale Feature Extraction
Adaptive Similarity Fusion
Semantic Complexity
Dynamic Sparse Aggregation
SinoXiv