Enhancing Video Retrieval with Robust CLIP-Based Multimodal System