Text Generation
Transformers
PyTorch
English
custom-architecture
rope
rmsnorm
swiglu
flash-attention
16k-context
Eval Results (legacy)
Instructions to use Austin207/Map-NEO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Austin207/Map-NEO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Austin207/Map-NEO")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Austin207/Map-NEO", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Austin207/Map-NEO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Austin207/Map-NEO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Austin207/Map-NEO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Austin207/Map-NEO
- SGLang
How to use Austin207/Map-NEO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Austin207/Map-NEO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Austin207/Map-NEO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Austin207/Map-NEO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Austin207/Map-NEO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Austin207/Map-NEO with Docker Model Runner:
docker model run hf.co/Austin207/Map-NEO
| # extend_context.py - Extend MAP-NEO Mini context window to 4096 tokens | |
| from model_neo import NeoMiniConfig, NeoMini | |
| import torch | |
| def extend_model_context(checkpoint_path="checkpoints/checkpoint_step_149999.pt", | |
| new_max_len=16384): | |
| """Extend model's context window from 2048 to 4096 tokens""" | |
| print(f"Extending context window to {new_max_len} tokens...") | |
| # Load original config and model | |
| config = NeoMiniConfig() | |
| config.max_seq_len = new_max_len # Extend context window | |
| # Create new model with extended context | |
| extended_model = NeoMini(config) | |
| # Load original weights | |
| checkpoint = torch.load(checkpoint_path, map_location='cpu') | |
| original_state = checkpoint['model_state_dict'] | |
| # Transfer weights (position embeddings will be interpolated) | |
| extended_state = extended_model.state_dict() | |
| for key in original_state: | |
| if key in extended_state: | |
| if 'pos' in key and extended_state[key].shape != original_state[key].shape: | |
| # Interpolate position embeddings for longer context | |
| print(f"Interpolating position embeddings: {key}") | |
| old_pos_emb = original_state[key] | |
| new_pos_emb = torch.nn.functional.interpolate( | |
| old_pos_emb.unsqueeze(0).unsqueeze(0), | |
| size=(new_max_len, old_pos_emb.shape[-1]), | |
| mode='linear' | |
| ).squeeze(0).squeeze(0) | |
| extended_state[key] = new_pos_emb | |
| else: | |
| extended_state[key] = original_state[key] | |
| extended_model.load_state_dict(extended_state) | |
| # Save extended model | |
| extended_checkpoint = { | |
| 'model_state_dict': extended_model.state_dict(), | |
| 'config': config.to_dict() | |
| } | |
| output_path = "checkpoints/extended_context_model.pt" | |
| torch.save(extended_checkpoint, output_path) | |
| print(f"Extended model saved to {output_path}") | |
| return extended_model, config | |
| if __name__ == "__main__": | |
| extend_model_context() | |