The Struggle is Real: Sampling Issues in Music VAEs | Ranjan Kumar

As I dive into the world of music generation using Variational Autoencoders (VAEs), I’m faced with a frustrating reality: my VAE is producing mediocre samples at best. The reconstruction is spot on, but when it comes to sampling, it’s a different story altogether.

I’m not alone in this struggle. Many of us in the machine learning community have been there, done that, and got the t-shirt. But what’s going on? Why can’t we get our VAEs to produce something remotely useful when sampling?

In my case, I’ve tried fiddling with the KL weight, but to no avail. I’ve even experimented with different schedules for ramping up beta, but the results are still underwhelming. It’s as if my VAE is stuck in a rut, refusing to budge.

The architecture of my VAE is pretty standard: 3.8M params, compression of 4x, and a loss function that’s a combination of reconstruction loss, KL divergence, and perceptual loss. You can check out the exact architecture and training scripts on my GitHub repo.

So, what’s the deal? Is it something to do with the way I’m defining my loss function? Am I missing something fundamental in my approach? Or is it simply a matter of tweaking the hyperparameters until I get it just right?

If you’ve faced similar struggles with sampling in VAEs, I’d love to hear your thoughts. Let’s commiserate and maybe, just maybe, we can figure out what’s going on together.

—

*Further reading: [Variational Autoencoder](https://arxiv.org/abs/1312.6114)*

Leave a Comment Cancel Reply