Complete collection of CLIP text prompts used by SWIFT for two-stage weather classification, and text-to-image generation prompts used by diffusion models for weather-adverse augmentation.
SWIFT uses a two-stage classification pipeline with an additional fog counter-prompt system.
Filters out indoor images using 5 outdoor vs. 5 indoor prompts. Max cosine similarity per category determines the classification.
Classifies outdoor images into 6 weather categories using 5 prompts each. Softmax with temperature scaling produces calibrated probabilities.
Separates true fog from confounders (clouds, blur, clear) using margin scoring: fog_score − max(confounder_scores) ≥ 0.15.
Images scoring higher on outdoor prompts proceed to weather classification. Indoor images are filtered out.
Outdoor images are classified into 6 standard weather categories. Each category uses 5 descriptive CLIP prompts.
Fog is distinguished from visual confounders using a margin-based scoring system.
An image is classified as foggy when: fog_score − max(cloudy, blurred, clear) ≥ 0.15
Alternative prompt set for detecting extreme weather conditions. Used when fine-grained severity distinction is needed.
Text prompts and negative prompts used by diffusion-based augmentation strategies to generate weather-adverse training images. Three detail levels are used depending on the generation model.
"Transform the scene to dense fog with volumetric haze, reducing visibility to 200-400 meters. Keep all traffic signs, road markings, buildings, and vehicles clearly visible with cool blue-gray lighting and soft, diffused shadows."
"Clear sky, harsh shadows, oversaturation, blurred signs, missing markings, cartoon style, daytime lighting."
"Make the scene snowy with snow covering roads, vehicles, and buildings. Preserve traffic sign legibility with slight snow dusting on edges and maintain road lane geometry under overcast gray sky and soft lighting."
"Visible green foliage, dry surfaces, sharp sunlight, vibrant colors, buried signs, unrealistic snow placement, summer atmosphere."
"Apply heavy rain with wet asphalt reflections and visible rain streaks. Keep all traffic signs and road markings sharp and readable with subtle water droplets and low ambient light under overcast clouds."
"Dry ground, clear sky, no reflections, fog, snow, blurred signs, cartoon rain, unrealistic puddles."
"Change lighting to dawn or dusk with long soft shadows and colored sky gradients from orange to purple. Preserve illuminated traffic signs and building outlines with warm or cool low-angle sunlight."
"Midday sun, harsh shadows, overcast, night darkness, neon colors, unreadable signs."
"Convert to night with artificial warm street lighting and cool headlights. Ensure traffic signs glow true to color and road markings remain visible under high contrast with deep shadows."
"Daylight, uniform brightness, dark or missing signs, overexposed lights, cartoon style."
"Transform to overcast conditions with dense cloud cover and soft diffused light. Keep traffic sign colors vibrant and visible, preserve road markings and building textures with a subtle blue-gray color cast."
"Sunny sky, sharp shadows, vibrant colors, fog, rain, snow, faded signs, cartoon style."
Default configuration values for the SWIFT classification pipeline.
| Parameter | Value | Description |
|---|---|---|
| CLIP Model | ViT-B/32 | Vision Transformer backbone for text-image similarity |
| Temperature (τ) | 10 | Softmax temperature scaling for probability calibration |
| Confidence Threshold | 0.3 | Minimum classification confidence to accept a prediction |
| Margin Threshold | 0.1 | Minimum margin between top-1 and top-2 categories |
| Fog Margin Threshold | 0.15 | Minimum fog vs. confounder margin for fog classification |
| Indoor/Outdoor Scoring | max cosine sim | Maximum cosine similarity across all prompts per category |
| Weather Scoring | max cosine sim | Maximum cosine similarity across per-category prompts |
| Fog Counter-Prompt Scoring | mean embedding | Mean embedding per class, then cosine similarity |