shreenithi20/fmnist-8x8-latents
Updated • 16
A transformer-based diffusion model trained on Fashion MNIST latent representations for text-to-image generation.
model-1000.safetensors: Early training (1k steps)model-3000.safetensors: Mid training (3k steps) model-5000.safetensors: Advanced training (5k steps)model-8500.safetensors: Final model (8.5k steps)from transformers import AutoConfig, AutoModel
import torch
# Load model
model = AutoModel.from_pretrained("shreenithi20/fmnist-t2i-diffusion")
model.eval()
# Generate images
with torch.no_grad():
generated_latents = model.generate(
text_embeddings=class_labels,
num_inference_steps=25,
guidance_scale=7.5
)
The model generates high-quality Fashion MNIST images conditioned on class labels, with 8×8 latent resolution that can be decoded to 64×64 pixel images.