Abstract:Human pose detection fundamentally involves the precise localization of key anatomical points on the human body, which is pivotal for a wide range of visual perception tasks. Despite the notable advancements in high-resolution networks, they exhibit inherent limitations that hinder their performance. To address these challenges, we introduce the Adaptive Cross-Dimensional Weighting High-Resolution Network (ACW-HRNet), a novel architecture designed to enhance pose detection accuracy. Specifically, to mitigate the problem of inadequate interaction between cross-dimensional information, we propose the adoption of cross-dimensional split convolution, a technique that facilitates the efficient exchange of information between spatial and channel dimensions. To further improve the precision of key point localization, we integrate Adaptive Context Modeling (ACM), which augments the network's capacity to capture complex spatial relationships through adaptive transformations and spatial weighting of the input features. This approach enables the network to extract rich, multi-scale contextual information while simultaneously establishing cross-dimensional dependencies, resulting in a marked improvement in accuracy without incurring additional computational overhead. Moreover, we incorporate a coordinate attention mechanism that facilitates the fusion of multi-branch and multi-scale features, further enhancing detection accuracy. Empirical evaluations conducted on the COCO and MPII datasets demonstrate that ACW-HRNet significantly outperforms leading lightweight networks, achieving a harmonious balance between computational efficiency and detection accuracy.