Have you heard about the new OSS models by OpenAI that use low-precision weights, specifically MXFP4? It’s an intriguing development, and it raises some interesting questions about how these models were trained. Were they trained using MXFP4, and if so, what does that mean for the training process?
I’m curious about the possibilities of training models in such a low precision. Can we use traditional methods like SGD in this range, where FP4 has only 16 values? Are there papers or research that explore this topic? I’d love to dive deeper into the world of low-precision training.
But that’s not all – I’m also wondering if it’s possible to go even lower. Can we train models using FP3 or even FP2? What are the limitations and potential benefits of using such low-precision weights?
The possibilities are endless, and I’m excited to explore this topic further. If you have any insights or recommendations for papers on low-precision training, I’d love to hear them.