Examples in MIDI-to-audio generation on the Slakh dataset . For each midi file, we present results in reconstruction (using the original audio associated with the midi file) and transfer to a different recording timbre. For the baseline SpecDiff (Multi-instrument music synthesis with spectrogram diffusion [1]), we swap the MIDI instrument program to the one of the target timbre sample.
Scroll to see all the results if necessary.
| MIDI | Target | SpecDiff | Ours with encoder | Ours | ||
|---|---|---|---|---|---|---|
| Piano | ![]() |
reconstruction | ||||
| transfer | ||||||
| Guitar | ![]() |
reconstruction | ||||
| transfer | ||||||
| Strings | ![]() |
reconstruction | ||||
| transfer | ||||||
| Voice | ![]() |
reconstruction | ||||
| transfer | ||||||
| Synth | ![]() |
reconstruction | ||||
| transfer | ||||||
| Bass | ![]() |
reconstruction | ||||
| transfer | ||||||
| Flute | ![]() |
reconstruction | ||||
| transfer |
Examples in timbre transfer on the Slakh dataset. We compare our method with two baselines, Music Style Transfer [2] and SS-VAE [3].
| Source | Target | SS-VAE | Music Style Transfer | Ours no adv. | Ours | |
|---|---|---|---|---|---|---|
| Piano to guitar | ||||||
| guitar to voice | ||||||
| synth to strings | ||||||
| guitar to flute | ||||||
| bass to keys | ||||||
| guitar to guitar |
Examples in timbre transfer on three real instrumental recordings datasets.
| Source | Target | SS-VAE | Music Style Transfer | Ours no adv. | Ours | |
|---|---|---|---|---|---|---|
| piano to guitar | ||||||
| guitar to piano | ||||||
| flute to piano | ||||||
| guitar to flute | ||||||
| piano to flute | ||||||
| violin to guitar | ||||||
| violin to piano | ||||||
| piano to piano |
Examples in musical style transfer between recordings of rock, jazz, dub and lofi hip-hop. For music gen, we use the source audio as melody input and the following prompts :
| Source | Target | MusicGen | Ours no adv. | Ours |
|---|---|---|---|---|
[1] C. Hawthorne, I. Simon, A. Roberts, N. Zeghidour, J. Gardner, E. Manilow, and J. Engel, “Multi-instrument music synthesis with spectrogram diffusion,” arXiv preprint arXiv:2206.05408, 2022.615
[2] O. Cífka, A. Ozerov, U. ̧Sim ̧sekli, and G. Richard “Self-supervised vq-vae for one-shot music style transfer,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processin(ICASSP). IEEE, 2021, pp. 96–100
[3] Li, Y. Zhang, F. Tang, C. Ma, W. Dong, and C. Xu, “Music style transfer with time-varying inversion of diffusion models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 1, 2024, pp.547–555