Spaces:
Running
Running
| title: ACE-Step 1.5 XL Music Generation (CPU) | |
| emoji: 🎵 | |
| colorFrom: indigo | |
| colorTo: yellow | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| tags: | |
| - music-generation | |
| - ace-step | |
| - gguf | |
| - lora | |
| - training | |
| - cpu | |
| - mcp-server | |
| short_description: ACE-Step 1.5 XL - CPU music generation + LoRA training | |
| models: | |
| - ACE-Step/Ace-Step1.5 | |
| startup_duration_timeout: 2h | |
| # ACE-Step 1.5 XL Music Generation (CPU) | |
| **GGUF inference + LoRA training** on free CPU Spaces. Powered by [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp). | |
| ## Features | |
| - **Music Generation** -- text/lyrics to stereo 48kHz MP3 via GGUF quantized models | |
| - **LoRA Training** -- fine-tune on your own audio (~11s/epoch CPU, ~1.4s/epoch GPU) | |
| - **Auto-Captioning** -- librosa BPM/key/signature + LM understand mode (caption + lyrics extraction) | |
| - **Multiple LM Sizes** -- 0.6B / 1.7B / 4B language models (on-demand download) | |
| - **Cancel + Download** -- cancel training mid-epoch, download trained LoRA adapter | |
| ## Music Generation | |
| 1. Enter a music description | |
| 2. Enter lyrics or check **Instrumental** | |
| 3. Adjust BPM, duration, steps, seed | |
| 4. Select LoRA adapter if trained | |
| 5. Click **Generate Music** | |
| **Timing:** ~270s for 10s audio with 1.7B LM, 8 steps on CPU. | |
| ## LoRA Training | |
| 1. Upload audio files (any length, auto-tiled at 30s chunks by VAE) | |
| 2. Set LoRA name, epochs, learning rate, rank | |
| 3. Click **Train** -- ace-server stops during training, restarts after | |
| 4. Use **Cancel** to stop early (saves checkpoint) | |
| 5. **Download** the trained adapter file | |
| 6. Trained adapter appears in the LoRA dropdown | |
| **Timing:** ~170s preprocessing + ~11s/epoch on CPU. GPU: ~1.4s/epoch. | |
| **Limits:** 30 min total audio across all files. Files exceeding the cap are truncated with a warning. 50 files max. 8h training timeout. | |
| **Settings (per Side-Step author recommendations):** | |
| - LR: 3e-4 | |
| - Rank: 32, Alpha: 64 | |
| - Epochs: 200-500 for 3-10 files | |
| - Optimizer: Adafactor (minimal memory) | |
| - Variant: standard turbo (not XL -- XL swaps on 18GB) | |
| ## Captioning Pipeline | |
| Training audio is auto-captioned before preprocessing: | |
| | Method | What it extracts | Speed | | |
| |--------|-----------------|-------| | |
| | **librosa** | BPM, key, time signature | ~3s/file | | |
| | **LM understand** (GPU) | Rich caption + lyrics + metadata | ~52s/file | | |
| | **ace-server /understand** (Space) | Same as LM, via GGUF | ~30s/file | | |
| | **.txt/.json sidecar** | User-provided caption (if present) | instant | | |
| On Space: uses ace-server /understand before training. Locally: uses PyTorch LM understand. | |
| ## Models | |
| | Component | GGUF | Size | Purpose | | |
| |-----------|------|------|---------| | |
| | DiT XL turbo | acestep-v15-xl-turbo-Q4_K_M | 2.8 GB | Music generation (no LoRA) | | |
| | DiT standard turbo | acestep-v15-turbo-Q4_K_M | 1.1 GB | Music generation (with LoRA) | | |
| | LM 1.7B | acestep-5Hz-lm-1.7B-Q8_0 | 1.7 GB | Caption understanding | | |
| | Text Encoder | Qwen3-Embedding-0.6B-Q8_0 | 0.75 GB | Text encoding | | |
| | VAE | vae-BF16 | 0.32 GB | Audio encode/decode | | |
| ## API | |
| ### Generate Music | |
| ```python | |
| from gradio_client import Client | |
| client = Client("WeReCooking/ACE-Step-CPU") | |
| result = client.predict( | |
| caption="upbeat electronic dance music", | |
| lyrics="[Instrumental]", | |
| instrumental=True, bpm=120, duration=10, seed=-1, steps=8, | |
| lora_select="None (no LoRA)", | |
| lm_model_select="acestep-5Hz-lm-1.7B-Q8_0.gguf", | |
| api_name="/generate" | |
| ) | |
| ``` | |
| ### Train LoRA | |
| ```python | |
| from gradio_client import Client, handle_file | |
| client = Client("WeReCooking/ACE-Step-CPU") | |
| result = client.predict( | |
| audio_files=[handle_file("song.mp3")], | |
| lora_name="my-style", epochs=200, lr=0.0003, rank=32, | |
| api_name="/train_lora" | |
| ) | |
| ``` | |
| ### MCP (Model Context Protocol) | |
| ```json | |
| { | |
| "mcpServers": { | |
| "ace-step": {"url": "https://werecooking-ace-step-cpu.hf.space/gradio_api/mcp/"} | |
| } | |
| } | |
| ``` | |
| ## CLI | |
| ```bash | |
| python app.py "upbeat electronic dance music" --duration 10 --steps 8 | |
| python app.py "jazz piano" --adapter my-style --seed 42 | |
| ``` | |
| ## Architecture | |
| - **Inference:** GGUF via [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp) | |
| - **Training:** PyTorch, ported from [Side-Step](https://github.com/koda-dernet/Side-Step) (commit ecd13bd) | |
| - **Captioning:** librosa + LM understand (PyTorch or ace-server /understand) | |
| - Training stops ace-server to free RAM, restarts after with new adapters | |
| - Inference blocked during training with clear message | |
| ## Credits | |
| - [ACE-Step 1.5](https://github.com/ace-step/ACE-Step-1.5) | |
| - [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp) | |
| - [Side-Step](https://github.com/koda-dernet/Side-Step) | |
| - [Serveurperso/ACE-Step-1.5-GGUF](https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF) | |