ffurfaro/PixelBytes-PokemonAll
Viewer • Updated • 533 • 17 • 2
Welcome to the PixelBytes repository! This project features models designed to generate text, audio and images simultaneously, pixel by pixel, using a unified embedding. (only testing weight)
The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.
We use the PixelBytes-PokemonAll dataset, available on Hugging Face: PixelBytes-PokemonAll. It contains text and image sequences of Pokémon for training our model.
Furfaro, F. (2024). PixelBytes: A Unified Multimodal Representation Learning Project. (https://github.com/fabienfrfr/PixelBytes)
Thank you for exploring PixelBytes! We hope this model aids your multimodal generation projects.