Examples in MIDI-to-audio generation on the Slakh dataset . For each midi file, we present results in reconstruction (using the original audio associated with the midi file) and transfer to a different recording timbre. For the baseline SpecDiff (Multi-instrument music synthesis with spectrogram diffusion [1]), we swap the MIDI instrument program to the one of the target timbre sample.
Scroll to see all the results if necessary.
MIDI | Target | SpecDiff | Ours with encoder | Ours | ||
---|---|---|---|---|---|---|
Piano | reconstruction | |||||
transfer | ||||||
Guitar | reconstruction | |||||
transfer | ||||||
Strings | reconstruction | |||||
transfer | ||||||
Voice | reconstruction | |||||
transfer | ||||||
Synth | reconstruction | |||||
transfer | ||||||
Bass | reconstruction | |||||
transfer | ||||||
Flute | reconstruction | |||||
transfer |
Examples in timbre transfer on the Slakh dataset. We compare our method with two baselines, Music Style Transfer [2] and SS-VAE [3].
Source | Target | SS-VAE | Music Style Transfer | Ours no adv. | Ours | |
---|---|---|---|---|---|---|
Piano to guitar | ||||||
guitar to voice | ||||||
synth to strings | ||||||
guitar to flute | ||||||
bass to keys | ||||||
guitar to guitar |
Examples in timbre transfer on three real instrumental recordings datasets.
Source | Target | SS-VAE | Music Style Transfer | Ours no adv. | Ours | |
---|---|---|---|---|---|---|
piano to guitar | ||||||
guitar to piano | ||||||
flute to piano | ||||||
guitar to flute | ||||||
piano to flute | ||||||
violin to guitar | ||||||
violin to piano | ||||||
piano to piano |
Examples in musical style transfer between recordings of rock, jazz, dub and lofi hip-hop. For music gen, we use the source audio as melody input and the following prompts :
Source | Target | MusicGen | Ours no adv. | Ours |
---|---|---|---|---|
[1] C. Hawthorne, I. Simon, A. Roberts, N. Zeghidour, J. Gardner, E. Manilow, and J. Engel, “Multi-instrument music synthesis with spectrogram diffusion,” arXiv preprint arXiv:2206.05408, 2022.615
[2] O. Cífka, A. Ozerov, U. ̧Sim ̧sekli, and G. Richard “Self-supervised vq-vae for one-shot music style transfer,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processin(ICASSP). IEEE, 2021, pp. 96–100
[3] Li, Y. Zhang, F. Tang, C. Ma, W. Dong, and C. Xu, “Music style transfer with time-varying inversion of diffusion models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 1, 2024, pp.547–555