Challenge Results
Overall Rankings
Objective Ranking
- xlancelab
- CUPAudioGroup
- AC_DC
- Hachimi
- cp-jku
Subjective Ranking
- xlancelab
- CUPAudioGroup
- Hachimi
- AC_DC
- cp-jku
Detailed Results
Objective Results
| Team | Overall MMSNR | Overall Zimt | Overall FAD | MMSNR Rank | Zimt Rank | FAD Rank | Macro Rank |
|---|---|---|---|---|---|---|---|
| xlancelab | 4.4623 | 0.0137 | 0.1988 | 1 | 1 | 1 | 1.00 |
| CUPAudioGroup | 2.3405 | 0.0164 | 0.2253 | 2 | 2 | 2 | 2.00 |
| AC_DC | 1.4520 | 0.0182 | 0.2907 | 4 | 3 | 3 | 3.33 |
| Hachimi | 2.0016 | 0.0183 | 0.2939 | 3 | 4 | 4 | 3.67 |
| cp-jku | 0.8329 | 0.0189 | 0.3814 | 5 | 5 | 5 | 5.00 |
Subjective Results (MOS)
| System | MOS Sep | MOS Rest | MOS Overall | Sep Rank | Rest Rank | Overall Rank | Macro Rank |
|---|---|---|---|---|---|---|---|
| xlancelab | 4.2358 | 3.3892 | 3.4665 | 1 | 1 | 1 | 1.00 |
| CUPAudioGroup | 3.8360 | 2.9173 | 2.9253 | 2 | 2 | 2 | 2.00 |
| Hachimi | 3.5814 | 2.6331 | 2.7235 | 3 | 3 | 3 | 3.00 |
| AC_DC | 3.5425 | 2.4768 | 2.5412 | 5 | 4 | 4 | 4.33 |
| cp-jku | 3.5510 | 2.0838 | 2.1414 | 4 | 5 | 5 | 4.67 |
System Descriptions
xlancelab
xlancelab employs sequential BS-Roformers using pretrained models from the ZFTurbo MSS-Training repository (Roformer-SW, dereverb, denoise). Training uses L1 loss and multi-resolution STFT loss. Data: MoisesDB and manually cleaned RawStems.
CUPAudioGroup
CUPAudioGroup uses an ensemble of BSRNN, BSRoformer, and MDX23. Pretrained parameters are sourced from the open-source "Music-Source-Separation-Training" project (ZFTurbo). Data: RawStems, MUSDB18-HQ, MoisesDB.
AC/DC
AC/DC submits DTT-BSR, a generator based on DTTNet (a dual-path TFC-TDF U-Net). The architecture incorporates Band-Sequence Modeling from BandSplitRNN to model subband and temporal correlations, and uses a RoPE Transformer Bottleneck for long-sequence handling and phase preservation. Training uses a Multi-Frequency Discriminator with multi-scale mel reconstruction loss, LSGAN adversarial loss, and feature-matching loss. Data: RawStems at 48kHz.
Hachimi
Hachimi uses the "Max version of the backbone" from their recent work. Training combines reconstruction loss and GAN loss. Data: MUSDB25, MUSDB18-HQ, MoisesDB, MedleyDB, RawStems, URMP, MAESTRO.
cp-jku
cp-jku proposes a two-stage pipeline: separation followed by restoration. Separation uses BandSplit-RoFormer to extract eight stems (including "other"). Restoration uses the HiFi++ GAN bundle (SpectralUNet, Upsampler, WaveUNet, SpectralMaskNet). The separator is trained in three stages with LoRA fine-tuning. The restorer is trained in five stages, producing eight source-specific expert models. Data: MUSDB18, DSD100, MoisesDB, Slakh2100, MedleyDB v2, RawStems, and others.