:zap:Video Object Segmentation Paper List
A list of video object segmentation (VOS) papers.
Any suggestions and requests are always welcomed :)
[STMA] Spatial-Temporal Multi-level Association for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[OneVOS] OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework, ECCV [Paper] [arXiv] [Code]
[RMem] RMem: Restricted Memory Banks Improve Video Object Segmentation, CVPR [Paper] [arXiv] [Page]
[Point-VOS] Point-VOS: Pointing Up Video Object Segmentation, CVPR [Paper] [arXiv] [Page]
[Cutie] Putting the Object Back into Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[DeVOS] DeVOS: Flow-Guided Deformable Transformer for Video Object Segmentation, WACV [Paper]
[TTT] Test-time Training for Matching-based Video Object Segmentation, NeurIPS [Paper] [Code]
[READMem] READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation, BMVC [Paper] [arXiv] [Code]
[XMem++] XMem++: Production-level Video Segmentation From Few Annotated Frames, ICCV [Paper] [arXiv] [Code]
[SimVOS] Scalable Video Object Segmentation with Simplified Framework, ICCV [Paper] [arXiv] [Code]
[TMRN] Alignment Before Aggregation: Trajectory Memory Retrieval Network for Video Object Segmentation, ICCV [Paper]
[ISVOS] Look Before You Match: Instance Understanding Matters in Video Object Segmentation, CVPR [Paper] [arXiv]
[CorrLearn] Boosting Video Object Segmentation via Space-time Correspondence Learning, CVPR [Paper] [arXiv] [Code]
[MobileVOS] MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation, CVPR [Paper] [arXiv]
[TSVOS] Two-shot Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[LLB] Learning to Learn Better for Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
[DeAOT] Decoupling Features in Hierarchical Propagation for Video Object Segmentation, NeurIPS [Paper] [arXiv] [Code]
[AOC] Towards Robust Video Object Segmentation with Adaptive Object Calibration, ACMMM [Paper] [arXiv] [Code]
[BATMAN] BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation, ECCV [Paper] [arXiv]
[XMem] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model, ECCV [Paper] [arXiv] [Code]
[QDMN] Learning Quality-aware Dynamic Memory for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[TBD] Tackling Background Distraction in Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[GSFM] Global Spectral Filter Memory Network for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[RDE-VOS] Recurrent Dynamic Embedding for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[PCVOS] Per-Clip Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[CoVOS] Accelerating Video Object Segmentation with Compressed Video, CVPR [Paper] [arXiv] [Code]
[SWEM] SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization, CVPR [Paper] [arXiv] [Code]
[RPCMVOS] Reliable Propagation-Correction Modulation for Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
[SITVOS] Siamese Network with Interactive Transformer for Video Object Segmentation, AAAI [Paper] [arXiv]
[BMVOS] Pixel-Level Bijective Matching for Video Object Segmentation, WACV [Paper] [arXiv] [Code]
[AOT] Associating Objects with Transformers for Video Object Segmentation, NeurIPS [Paper] [arXiv] [Code]
[STCN] Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation, NeurIPS [Paper] [arXiv] [Code]
[JOINT] Joint Inductive and Transductive Learning for Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
[HMMN] Hierarchical Memory Matching Network for Video Object Segmentation, ICCV [Paper]
[arXiv] [Code]
[DMN-AOA] Video Object Segmentation with Dynamic Memory Networks and Adaptive Object Alignment, ICCV [Paper] [Code]
[RMNet] Efficient Regional Memory Network for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[LCM] Learning Position and Target Consistency for Memory-Based Video Object Segmentation, CVPR [Paper] [arXiv]
[GIEL] Video Object Segmentation Using Global and Instance Embedding Learning, CVPR [Paper]
[SwiftNet] SwiftNet: Real-time Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[SSTVOS] SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[Reuse-VOS] Learning Dynamic Network Using a Reuse Gate Function in Semi-Supervised Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[STG-Net] Spatiotemporal Graph Neural Network Based Mask Reconstruction for Video Object Segmentation, AAAI [Paper] [arXiv]
[QMRA] Query-Memory Re-Aggregation for Weakly-Supervised Video Object Segmentation, AAAI [Paper]
[STM-cycle] Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation, NeurIPS [Paper] [arXiv] [Code]
[AFB-URR] Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement, NeurIPS [Paper] [arXiv] [Code]
[e-OSVOS] Make One-Shot Video Object Segmentation Efficient Again, NeurIPS [Paper] [arXiv] [Code]
[LWL] Learning What to Learn for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[EGMN] Video Object Segmentation with Episodic Graph Memory Networks, ECCV [Paper] [arXiv] [Code]
[CFBI] Collaborative Video Object Segmentation by Foreground-Background Integration, ECCV [Paper] [arXiv] [Code]
[GC] Fast Video Object Segmentation using the Global Context Module, ECCV [Paper] [arXiv]
[KMN] Kernelized Memory Network for Video Object Segmentation, ECCV [Paper] [arXiv]
[SAT] State-Aware Tracker for Real-Time Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[FRTM] Learning Fast and Robust Target Models for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[TVOS] A Transductive Approach for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[TAN-DTTM] Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching, CVPR [Paper] [arXiv]
[FTMU] Fast Template Matching and Update for Video Object Tracking and Segmentation, CVPR [Paper] [arXiv] [Code]
[DIPNet] DIPNet: Dynamic Identity Propagation Network for Video Object Segmentation, WACV [Paper]
[DMM-Net] DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
[AGSS-VOS] AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation, ICCV [Paper] [Code]
[RANet] RANet: Ranking Attention Network for Fast Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
[DTN] Fast Video Object Segmentation via Dynamic Targeting Network, ICCV [Paper]
[CapsuleVOS] CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing, ICCV [Paper] [arXiv] [Code]
[STM] Video Object Segmentation Using Space-Time Memory Networks, ICCV [Paper] [arXiv] [Code]
[MHP-VOS] MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[STCNN] Spatiotemporal CNN for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[RVOS] RVOS: End-To-End Recurrent Network for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[A-GAME] A Generative Appearance Model for End-To-End Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[FEELVOS] FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[SiamMask] Fast Online Object Tracking and Segmentation: A Unifying Approach, CVPR [Paper] [arXiv] [Code]
[TIS] Tukey-Inspired Video Object Segmentation, WACV [Paper] [arXiv] [Code]
[S2S] YouTube-VOS: Sequence-to-Sequence Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[PReMVOS] PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation, ACCV [arXiv] [Code]
[OSMN] Efficient Video Object Segmentation via Network Modulation, CVPR [Paper] [arXiv] [Code]
[RGMP] Fast Video Object Segmentation by Reference-Guided Mask Propagation, CVPR [Paper] [Code]
[FAVOS] Fast and Accurate Online Video Object Segmentation via Tracking Parts, CVPR [Paper] [arXiv] [Code]
[SegFlow] SegFlow: Joint Learning for Video Object Segmentation and Optical Flow, ICCV [Paper] [arXiv] [Code]
[OSVOS] One-Shot Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[MaskTrack] Learning Video Object Segmentation from Static Images, CVPR [Paper] [arXiv] [Code]
[DPA] Dual Prototype Attention for Unsupervised Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[GSA-Net] Guided Slot Attention for Unsupervised Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[DATTT] Depth-aware Test-Time Training for Zero-shot Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[GFA] Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation, AAAI [Paper]
[SimulFlow] SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object Segmentation, ACMMM [Paper] [arXiv]
[TGFormer] Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation, ACMMM [Paper]
[Isomer] Isomer: Isomerous Transformer for Zero-Shot Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
[OAST] Unsupervised Video Object Segmentation with Online Adversarial Self-Tuning, ICCV [Paper]
[PMN] Unsupervised Video Object Segmentation via Prototype Memory Network, WACV [Paper] [arXiv] [Code]
[TMO] Treating Motion as Option to Reduce Motion Dependency in Unsupervised Video Object Segmentation, WACV [Paper] [arXiv] [Code]
[IMP] Iteratively Selecting an Easy Reference Frame Makes Unsupervised Video Object Segmentation Easier, AAAI [Paper] [arXiv]
[D2Conv3D] D2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos, WACV [Paper] [arXiv] [Code]
[CFAM] Video Salient Object Detection via Contrastive Features and Attention Modules, WACV [Paper] [arXiv]
[FSNet] Full-Duplex Strategy for Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
[TransportNet] Deep Transport Network for Unsupervised Video Object Segmentation, ICCV [Paper]
[AMC-Net] Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation, ICCV [Paper] [Code]
[RTNet] Reciprocal Transformations for Unsupervised Video Object Segmentation, CVPR [Paper] [Code]
[F2Net] F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
[FrameSelect] Mask Selection and Propagation for Unsupervised Video Object Segmentation, WACV [Paper] [Code]
[3DC-Seg] Making a Case for 3D Convolutions for Object Segmentation in Videos, BMVC [Paper] [arXiv] [Code]
[WCS-Net] Unsupervised Video Object Segmentation with Joint Hotspot Tracking, ECCV [Paper] [Code]
[DFNet] Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation, ECCV [Paper] [arXiv]
[MATNet] Motion-Attentive Transition for Zero-Shot Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
[UnOVOST] UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking, WACV [Paper] [arXiv] [Code]
[EpO-Net] EpO-Net: Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency, WACV [Paper] [arXiv] [Code]
[AD-Net] Anchor Diffusion for Unsupervised Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
[AGNN] Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks, ICCV [Paper] [arXiv] [Code]
[AGS] Learning Unsupervised Video Object Segmentation Through Visual Attention, CVPR [Paper] [Code]
[COSNet] See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks, CVPR [Paper] [arXiv] [Code]
[SSAV] Shifting More Attention to Video Salient Object Detection, CVPR [Paper] [Code]
[MOTAdapt] Video Object Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting, ICRA [Paper] [arXiv] [Code]
[VISA] VISA: Reasoning Video Object Segmentation via Large Language Models, ECCV [Paper] [arXiv] [Code]
[VD-IT] Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[ActionVOS] ActionVOS: Actions as Prompts for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[DsHmp] Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation, CVPR [Paper] [arXiv] [Code]
[LoSh] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[MUTR] Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
[TCE-RVOS] Temporal Context Enhanced Referring Video Object Segmentation, WACV [Paper] [Code]
[SOC] SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation, NeurIPS [Paper] [arXiv] [Page]
[LMPM] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions, ICCV [Paper] [arXiv] [Code]
[HTML] HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation, ICCV [Paper] [Page]
[OnlineRefer] OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
[CMA] Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples, ICCV [Paper] [arXiv] [Code]
[R2VOS] Robust Referring Video Object Segmentation with Cyclic Structural Consensus, ICCV [Paper] [arXiv] [Code]
[SgMg] Spectrum-guided Multi-granularity Referring Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
[TempCD] Temporal Collection and Distribution for Referring Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
[MANet] Multi-Attention Network for Compressed Video Referring Object Segmentation, ACMMM [Paper] [arXiv] [Code]
[MTTR] End-to-End Referring Video Object Segmentation with Multimodal Transformers, CVPR [Paper] [arXiv] [Code]
[ReferFormer] Language as Queries for Referring Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[LBDT] Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[MLRL] Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation, CVPR [Paper]
[YOFO] You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation, AAAI [Paper]
[BA] Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[LLE-VOS] Event-assisted Low-Light Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[EVA-VOS] Learning the What and How of Annotation in Video Object Segmentation, WACV [Paper] [arXiv] [Code]
[Training-Free-VOS] From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models, NeurIPS [Paper] [Code]
[DVSOD] DVSOD: RGB-D Video Salient Object Detection, NeurIPS [Paper] [arXiv] [Page]
[VOSPGD] Exploring the Adversarial Robustness of Video Object Segmentation via One-shot Adversarial Attacks, ACMMM [Paper]
[DEVA] Tracking Anything with Decoupled Video Segmentation, ICCV [Paper] [arXiv] [Code]
[Timetuning] Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations, ICCV [Paper] [arXiv] [Code]
[VOS-VFI] Video Object Segmentation-aware Video Frame Interpolation, ICCV [Paper] [Code]
[LVOS] LVOS: A Benchmark for Long-term Video Object Segmentation, ICCV [Paper] [arXiv] [Page]
[MOSE] MOSE: A New Dataset for Video Object Segmentation in Complex Scenes, ICCV [Paper] [arXiv] [Page]
[RCF] Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping, CVPR [Paper] [arXiv] [Code]
[VOST] Breaking the “Object” in Video Object Segmentation, CVPR [Paper] [arXiv] [Page]
[InstMove] InstMove: Instance Motion for Object-centric Video Segmentation, CVPR [Paper] [arXiv] [Code]
[SSL-VOS] A Simple and Powerful Global Optimization for Unsupervised Video Object Segmentation, WACV [Paper] [arXiv] [Code]
[BURST] BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video, WACV [Paper] [arXiv] [Code]
[EPIC-KITCHENS] EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations, NeurIPS [Paper] [arXiv] [Page]
[SaVos] Self-supervised Amodal Video Object Segmentation, NeurIPS [Paper] [arXiv]
[YouMVOS] YouMVOS: An Actor-centric Multi-shot Video Object Segmentation Dataset, CVPR [Paper] [Page]
[Wnet] Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks, CVPR [Paper] [Code]
[DUL] Dense Unsupervised Learning for Video Segmentation, NeurIPS [Paper] [arXiv] [Code]
[AMD] The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos, NeurIPS [Paper] [arXiv] [Code]
[MotionGroup] Self-supervised Video Object Segmentation by Motion Grouping, ICCV [Paper] [arXiv] [Code]
[GMB] Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos, ICCV [Paper] [arXiv] [Code]
[DANet] Delving Deep Into Many-to-Many Attention for Few-Shot Video Object Segmentation, CVPR [Paper] [Code]
[IVOS-W] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild, CVPR [Paper]
[arXiv] [Code]
[GIS] Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps, CVPR [Paper] [arXiv] [Code]
[MiVOS] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion, CVPR [Paper] [arXiv] [Code]
[ContrastCorr] Contrastive Transformation for Self-supervised Correspondence Learning, AAAI [Paper] [arXiv] [Code]
[TAO-VOS] Reducing the Annotation Effort for Video Object Segmentation Datasets, WACV [Paper] [arXiv] [Page]
[CRW] Space-Time Correspondence as a Contrastive Random Walk, NeurIPS [Paper] [arXiv] [Code]
[ODMS] Learning Object Depth from Camera Motion and Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
[ScribbleBox] ScribbleBox: Interactive Annotation Framework for Video Object Segmentation, ECCV [Paper] [arXiv] [Page]
[ATNet] Interactive Video Object Segmentation Using Global and Local Transfer Modules, ECCV [Paper] [arXiv] [Code]
[MAST] MAST: A Memory-Augmented Self-Supervised Tracker, CVPR [Paper] [arXiv] [Code]
[MuG] Learning Video Object Segmentation From Unlabeled Videos, CVPR [Paper] [arXiv] [Code]
[MA-Net] Memory Aggregation Networks for Efficient Interactive Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
[TimeCycle] Learning Correspondence from the Cycle-Consistency of Time, CVPR [Paper] [arXiv] [Code]
[BubbleNets] BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames, CVPR [Paper] [arXiv] [Code]
[IPNet] Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks, CVPR [Paper] [arXiv] [Code]