A fundamental part of event generation, hadronization is currently simulated with the help of fine-tuned empirical models. Motivated by the difficulties of these models, in this talk I'll present MLHAD: a proposed alternative where the empirical model is replaced by a surrogate Machine Learning-based model to be ultimately data-trainable. I'll detail the current stage of development and discuss challenges and possible ways forward.