Have you ever wondered how to disentangle attributes from an embedding? I’ve been exploring the idea of using flow matching models to learn smooth and invertible mappings. The concept is fascinating, but I have some concerns.
Let’s say we have a pre-trained embedding E and disentangled features T1 and T2. Can we learn a flow matching model to map E to T1 and T2 (and vice versa)? The challenge lies in the fact that the distribution of E is known, but T1 and T2 are unknown. How can the model learn when the target is moving or unknown?
Another question is whether clustering losses can enable this learning. And what about using priors? I’m unsure what would be a good prior in this case.
Interestingly, a paper from ICCV 25 (SCFlow) uses flow matching for disentanglement, but they know the disentangled representations (ground truth is available). They provide T1 or T2 distributions to the model alternatively and ask it to learn the other. But what if we don’t have this luxury?
I’d love to hear your thoughts on this. Can we make flow matching models work for disentanglement without knowing the target distributions? Are there any advancements or alternative approaches that could help?