Seminar: Graduate Seminar
XBF: Optimizing Bounded Future Mechanism for Surgical Workflow Recognition
Purpose: Advances in surgical workflow recognition have demonstrated the effectiveness of temporal convolutional networks to model sequential patterns, particularly using the acausal MS TCN++ and semi-causal BF-MS-TCN++. However, in the surgical context these architectures rely on 2D feature extractors, limiting their ability to exploit motion information from video clips. This study integrates X3D into the BF-MS-TCN++ framework to enhance recognition performance while preserving real-time compatibility.
Methods: Using a 3D backbone, the models exploit motion-aware spatiotemporal features, reducing latency and computational cost compared to prior 2D approaches. Evaluations on four surgical datasets, JIGSAWS, MultiBypass140, SAR-RARP50, and VTS, which covers gesture, step, and phase recognition tasks. Metrics include accuracy, F1-Macro, Edit distance, and segmental F1 scores across varying future window sizes.
Results: X3D-based models consistently improve performance, particularly in high-frame-rate and temporally dense tasks. For JIGSAWS, a standard benchmark, the X3D model surpasses the EfficientNetV2 baseline. Notable gains in F1-Macro, Edit and segmental F1 scores were observed in SAR-RARP50 and MultiBypass140, with diminishing gains for VTS.
Conclusion: Integrating 3D backbones into MS-TCN++ enhances performance across surgical datasets, maintaining low latency and supporting real-time deployment with lightweight 3D features.
M.Sc. student under the supervision of Prof. Shlomi Laufer.