Abstract:Aiming at the problem that existing image generation models cannot meet the professional requirements for generating diverse transparent printing materials in the textile field, a transparent printing material generation method based on large multi-modal models is proposed. First, an aesthetic score predictor is used to construct high-quality printing material data set. Second, the large multi-modal language model BLIP3 is employed to generate label semantics of the data set. Third, the SD model is fine-tuned through multi-scale bucket training, and VAE is improved to introduce image transparency information into the image generation space, and enable the direct generation of high quality transparent printing material. Experimental results show that the designed method can generate transparent printed materials with diverse contents and styles in both text-to-image and image-to-image modes, and the edge details of the generated materials are significantly better than those of the deep learning image segmentation model.