Ensuring spatial scalability with temporal-wise spatial attentive pooling for temporal action detection

    Research output: Contribution to journalArticlepeer-review

    4 Citations (Scopus)

    Abstract

    Recent temporal action detection models have focused on end-to-end trainable approaches to utilize the representational power of backbone networks. Despite the advantages of end-to-end trainable methods, these models still employ a small spatial resolution (e.g., 96 × 96) due to the inefficient trade-off between computational cost and spatial resolution. In this study, we argue that a simple pooling method (e.g., adaptive average pooling) acts as a bottleneck at the spatial aggregation part, restricting representational power. To address this issue, we propose a temporal-wise spatial attentive pooling (TSAP), which alleviates the bottleneck between the backbone and the detection head using a temporal-wise attention mechanism. Our approach mitigates the inefficient trade-off between spatial resolution and computational cost, thereby enhancing spatial scalability in temporal action detection. Moreover, TSAP is adaptable to previous end-to-end approaches by simply replacing the spatial pooling part. Our experiments demonstrated the essential role of spatial aggregation, and consistent improvements are observed by incorporating TSAP into previous end-to-end methods.

    Original languageEnglish
    Article number106321
    JournalNeural Networks
    Volume176
    DOIs
    Publication statusPublished - 2024 Aug

    Bibliographical note

    Publisher Copyright:
    © 2024 Elsevier Ltd

    Keywords

    • End-to-end training
    • Temporal action detection

    ASJC Scopus subject areas

    • Cognitive Neuroscience
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Ensuring spatial scalability with temporal-wise spatial attentive pooling for temporal action detection'. Together they form a unique fingerprint.

    Cite this