Magenta MusicVAE/GrooVAE conditioning

Question

I want to try different methods of conditioning the decoding process of the Variational Autoencoder Models of the Google Magenta project for my own research project. As far as I can tell, MusicVAE has already been conditioned by the authors on chords (e.g., for the 'hier-multiperf_vel_1bar_med_chords' model). I want to also try other methods, like style tags or diatonicity etc.
However, I am having a hard time of figuring out where the respective tensors (one-hot-encoded chords I think) are used during training in the hierarchical decoder. Are the same conditioning tensors concatenated to every decoding step? Or only the first? Or something else? Since it is difficult to figure this out by looking at the code and the paper (https://arxiv.org/pdf/1803.05428.pdf) does not mention this architectural concern, I thought maybe a person involved could clear this up for me. Here is the picture of the MusicVAE architecture as depicted in the just mentioned paper.

P.S.: I also posted this very same question here https://stackoverflow.com/questions/63029684/magenta-musicvae-groovae-conditioning , I can delete it if that is against the rules.

Magenta MusicVAE/GrooVAE conditioning

Add your own answers!

Ask a Question