Interactive Analysis

Explore the results interactively. Filter by method family and switch between domain and model views.

Supplementary Material

Additional figures, detailed breakdowns, and technical appendix content referenced in the paper.

📊 Supplementary Figures
Category Distribution
Fig. S0. Weather condition distribution across all evaluation datasets, showing the severe class imbalance that AWACS addresses.
Domain Heatmap
Fig. S1. Domain gap heatmap across model–dataset combinations.
Cross-Stage Comparison
Fig. S2. Three-stage comparison: Baseline vs. S1 vs. S2 performance.
S1 Per Dataset Violin
Fig. S3. Stage 1 mIoU distributions per evaluation dataset.
S1 Per Model Violin
Fig. S4. Stage 1 mIoU distributions per model architecture.
S2 Per Dataset Violin
Fig. S5. Stage 2 mIoU distributions per evaluation dataset.
S2 Per Model Violin
Fig. S6. Stage 2 mIoU distributions per model architecture.
Balance Diversity Heatmap
Fig. S7. SWIFT metric correlation heatmap.
Balance Diversity Profile
Fig. S8. SWIFT balance and diversity profiles per dataset.
Size vs Balance
Fig. S9. Dataset size vs. weather balance scatter.
Quality Performance S2
Fig. S10. Quality–performance relationship in Stage 2.
Model Capacity vs Gain
Fig. S11. Model capacity vs. augmentation gain.
Model Capacity Threshold
Fig. S12. Model capacity threshold analysis.
Radar Domain S1
Fig. S13. Radar chart: per-domain performance (S1).
Radar Domain S2
Fig. S14. Radar chart: per-domain performance (S2).
S1 S2 Bars
Fig. S15. S1 vs S2 bar comparison across strategies.
S1 vs S2 Scatter
Fig. S16. S1 vs. S2 gain scatter plot.
Training Curves
Fig. S17. Extended training convergence curves.
Diminishing Returns
Fig. S18. Diminishing returns analysis.
Quality Violin
Fig. S19. Generative quality metric distributions (violin).
Generative Examples
Fig. S20. Visual comparison of all generative augmentation methods.
Segmentation Examples
Fig. S21. Segmentation prediction examples across conditions.
📁 Downloadable Data (CSV)
🔧 Method Descriptions (6 Families, 21 Strategies)

Family 1: 2D Rendering

Parametric weather synthesis through classic image processing — no neural networks.

  • Automold — Road-specific augmentation (rain, fog, sun flare, shadow)
  • Albumentations — Efficient weather-specific augmentation library
  • Augmenters — Comprehensive augmentation pipeline framework (imgaug)
  • Weather Effect Generator — Physics-inspired fog, rain, snow particle effects

Family 2: CNN/GAN

CNN and GAN architectures for unpaired image-to-image translation.

  • CycleGAN — Unpaired image-to-image translation via adversarial learning with cycle consistency loss
  • StarGAN v2 — Multi-domain image translation with diverse style synthesis
  • CUT — Contrastive Unpaired Translation using patchwise contrastive learning
  • SUSTechGAN — Foggy scene synthesis specialized for driving scenarios

Family 3: Style Transfer

Neural style transfer models for domain-specific appearance manipulation.

  • LANIT — Language-guided multi-domain translation
  • TSIT — Texture and Structure Improved Transfer for style translation
  • Attribute Hallucination — Attribute-based hallucination for weather effects

Family 4: Diffusion

Diffusion‐based image-to-image models including ControlNet-conditioned approaches.

  • CycleDiffusion — Extends cycle consistency to diffusion models for flexible content preservation
  • Img2Img — Stable Diffusion image-to-image pipeline with weather prompts
  • InstructPix2Pix — Instruction-following image editing model
  • ControlNet-Seg — Segmentation-conditioned generation via ControlNet
  • UniControl — Unified multi-condition controllable generation

Family 5: Multimodal Diffusion

VLM and multimodal diffusion models for text-guided weather synthesis.

  • Step1X / Step1X v1.2 — Progressive diffusion for weather editing
  • Flux Kontext — Next-generation flow-matching with in-context transfer
  • VisualCloze — Visual in-context learning for image transformation
  • Qwen Image Edit — Large-scale multimodal editing model

Family 6: Standard Augmentation

Standard augmentation pipelines not specific to weather.

  • RandAugment — Random augmentation policy sampled from a set of transforms
  • AutoAugment — Learned augmentation policy via reinforcement learning
  • CutMix — Spatial cutout with mix of training samples
  • MixUp — Convex combination of training examples
📐 Technical Appendix

Evaluation Metrics

  • mIoU — Mean Intersection over Union across all semantic classes
  • FID — Fréchet Inception Distance (lower = more realistic)
  • LPIPS — Learned Perceptual Image Patch Similarity (lower = more similar)
  • SSIM — Structural Similarity Index (higher = more similar)
  • CQS — Composite Quality Score combining FID, LPIPS, SSIM, mIoU, Pixel Accuracy

Training Pipeline

  • Stage 1 (S1): Cityscapes (fine) training only → evaluated on 4 diverse test sets
  • Stage 2 (S2): Multi-dataset training (Cityscapes + ACDC + BDD10k + IDD-AW + MapillaryVistas + OUTSIDE15k) → same test sets
  • Architectures: PSPNet (R50), SegFormer (MiT-B3), SegNeXt (MSCAN-B), Mask2Former (Swin-B), HRNet (HR48)
  • Training: 40k iterations, AdamW optimizer, poly learning rate schedule

PRISM — Pipeline for Robust Image Similarity Metrics

Standardized quality assessment framework computing FID, LPIPS, SSIM, PSNR, pixel accuracy, mIoU, and frequency-weighted IoU for each generative method against original images.

SWIFT — Structured Weather Identification and Feature Taxonomy

Condition-aware dataset splitting strategy using CLIP-based weather classification. Two-stage process: (1) indoor/outdoor filtering, (2) 7-class weather classification with fog counter-prompts.

Shannon Entropy — Dataset Balance Metric

Measures how close a weather domain distribution is to uniform on a 0–1 scale. Used to quantify class imbalance across evaluation datasets.

Normalized Shannon Entropy:

Hnorm = H / Hmax = −Σi=1…K pi ln(pi) / ln(K)

  • K = 7 — Number of weather categories (clear_day, foggy, snowy, night, rainy, dawn_dusk, cloudy)
  • pi — Proportion of images in category i
  • Hnorm = 1 → Perfectly uniform distribution
  • Hnorm → 0 → Highly skewed/imbalanced distribution

Companion metric — Imbalance Ratio: IR = Nmax / Nmin (ratio of largest to smallest category count).

Pixel-level variant: The same formula applied to segmentation class distributions across pixels, where max entropy uses the count of non-zero classes rather than all possible classes.

Quality thresholds (pixel-level Hnorm):

  • > 0.8 — Very balanced class distribution (high diversity)
  • > 0.6 — Reasonably balanced
  • > 0.3 — Imbalanced
  • ≤ 0.3 — Highly imbalanced

Layout Diversity — Spatial Pyramid Matching

Measures structural diversity of segmentation layouts across a dataset using Spatial Pyramid Matching (SPM) with Histogram Intersection similarity.

Step 1 — Spatial Pyramid Histograms:

Each segmentation mask is divided into a grid of 2l × 2l cells at level l. Per-cell class histograms are L1-normalized and weighted by level.

  • Level 0 (1×1 grid) — weight = 0.0625
  • Level 1 (2×2 grid) — weight = 0.125
  • Level 2 (4×4 grid) — weight = 0.25
  • Level 3 (8×8 grid) — weight = 0.5

Weightl = 0.5(Lmax − l + 1)  where Lmax = 3

Step 2 — Descriptor & Similarity:

Descriptork = ⊕l∈levels wl · SpatialHistogram(Mk, l)

Similarity(i, j) = Σd min(Descriptori[d], Descriptorj[d])

Step 3 — Diversity Score:

Diversity = 1 − mean(Similarityoff-diagonal)

  • Similarity matrix is normalized by mean self-similarity to [0, 1] range
  • Diversity → 1 means highly diverse layouts
  • Diversity → 0 means very similar/repetitive layouts

Benchmark parameters:

  • num_samples = 100 — Images sampled per dataset
  • min_domain_samples = 10 — Minimum samples for per-domain analysis
  • Datasets: ACDC, BDD10k, Cityscapes, Mapillary, OUTSIDE15k, IDD

PROVE — Progressive Real-data Organization for Validation of Effects

Systematic downstream evaluation framework. Tests each augmentation strategy across all model × dataset combinations, computing per-domain and aggregate performance metrics.