Text Generation
Transformers
Safetensors
GGUF
English
mistral
medical
spinal-cord-injury
healthcare
disability
accessibility
fine-tuned
lora
conversational
text-generation-inference
Instructions to use basiphobe/sci-assistant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use basiphobe/sci-assistant with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="basiphobe/sci-assistant") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("basiphobe/sci-assistant") model = AutoModelForCausalLM.from_pretrained("basiphobe/sci-assistant") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use basiphobe/sci-assistant with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "basiphobe/sci-assistant" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "basiphobe/sci-assistant", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/basiphobe/sci-assistant
- SGLang
How to use basiphobe/sci-assistant with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "basiphobe/sci-assistant" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "basiphobe/sci-assistant", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "basiphobe/sci-assistant" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "basiphobe/sci-assistant", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use basiphobe/sci-assistant with Docker Model Runner:
docker model run hf.co/basiphobe/sci-assistant
| language: | |
| - en | |
| license: apache-2.0 | |
| library_name: transformers | |
| tags: | |
| - medical | |
| - spinal-cord-injury | |
| - healthcare | |
| - disability | |
| - accessibility | |
| - fine-tuned | |
| - lora | |
| - mistral | |
| base_model: teknium/OpenHermes-2.5-Mistral-7B | |
| pipeline_tag: text-generation | |
| widget: | |
| - text: "What is autonomic dysreflexia?" | |
| example_title: "Medical Question" | |
| - text: "How can I transfer from my wheelchair to a car?" | |
| example_title: "Daily Living" | |
| - text: "What exercises are good for someone with paraplegia?" | |
| example_title: "Exercise & Rehabilitation" | |
| model-index: | |
| - name: sci-assistant | |
| results: [] | |
| # SCI Assistant - Spinal Cord Injury Specialized AI Assistant | |
| A specialized AI assistant fine-tuned specifically for people with spinal cord injuries (SCI). This model is based on OpenHermes-2.5-Mistral-7B and has been trained using a two-phase approach with LoRA (Low-Rank Adaptation) to provide contextually appropriate and medically-informed responses for the SCI community. | |
| ## Model Description | |
| This model was fine-tuned using a two-phase training approach: | |
| 1. **Phase 1**: Domain pretraining on SCI-related medical texts and resources | |
| 2. **Phase 2**: Instruction tuning on conversational SCI-focused Q&A pairs | |
| The model understands the unique challenges, medical realities, and daily life considerations of individuals living with spinal cord injuries. | |
| ## Training Details | |
| - **Base Model**: teknium/OpenHermes-2.5-Mistral-7B | |
| - **Training Method**: QLoRA (4-bit quantization with LoRA adapters) | |
| - **Training Data**: 119,117 total entries (35,779 domain text + 83,337 instruction pairs) | |
| - **Hardware**: RTX 4070 Super (12GB VRAM) | |
| - **Training Time**: ~20 hours total (Phase 1 + Phase 2) | |
| ## Usage | |
| This repository contains both the LoRA adapter and the full merged model. Choose the option that works best for you: | |
| ### Option 1: Use the Full Merged Model (Recommended) | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("basiphobe/sci-assistant") | |
| tokenizer = AutoTokenizer.from_pretrained("basiphobe/sci-assistant") | |
| # Example usage | |
| prompt = "What are the signs of autonomic dysreflexia?" | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_length=200) | |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| ``` | |
| ### Option 2: Use the LoRA Adapter (Smaller Download) | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig | |
| from peft import PeftModel | |
| import torch | |
| # Load model | |
| bnb_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_compute_dtype=torch.float16, | |
| ) | |
| base_model = AutoModelForCausalLM.from_pretrained( | |
| "teknium/OpenHermes-2.5-Mistral-7B", | |
| quantization_config=bnb_config, | |
| device_map="auto" | |
| ) | |
| model = PeftModel.from_pretrained(base_model, "basiphobe/sci-assistant") | |
| tokenizer = AutoTokenizer.from_pretrained("basiphobe/sci-assistant") | |
| # Format prompt with SCI context | |
| system_context = "You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI." | |
| prompt = f"{system_context}\n\n### Instruction:\n{your_question}\n\n### Response:\n" | |
| # Generate response | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7) | |
| response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) | |
| ``` | |
| ## Files in this Repository | |
| - **Full Merged Model**: Ready-to-use model files (`model-*.safetensors`, `config.json`, etc.) | |
| - **LoRA Adapter**: Smaller adapter files (`adapter_model.safetensors`, `adapter_config.json`) | |
| - **Tokenizer**: Shared tokenizer files for both options | |
| ## GGUF Format Models | |
| This repository also includes GGUF format models optimized for use with **llama.cpp**, **Ollama**, and other GGUF-compatible inference engines. These formats offer excellent performance and compatibility across different platforms. | |
| ### Available GGUF Models | |
| | File | Size | Format | Use Case | RAM Required | | |
| |------|------|--------|----------|--------------| | |
| | `merged-sci-model.gguf` | 14GB | F16 | Maximum quality inference | ~16GB | | |
| | `merged-sci-model-q6_k.gguf` | 5.6GB | Q6_K | High quality with good compression | ~8GB | | |
| | `merged-sci-model-q5_k_m.gguf` | 4.8GB | Q5_K_M | Excellent quality/size balance | ~7GB | | |
| | `merged-sci-model-q5_k_s.gguf` | 4.7GB | Q5_K_S | Good quality, slightly smaller | ~7GB | | |
| | `merged-sci-model-q4_k_m.gguf` | 4.1GB | Q4_K_M | Balanced quality/performance | ~6GB | | |
| ### Usage with Ollama | |
| **1. Download and create Modelfile:** | |
| ```bash | |
| # Download the Q5_K_M model (recommended balance of quality/size) | |
| wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q5_k_m.gguf | |
| # Create Modelfile | |
| cat > Modelfile << 'EOF' | |
| FROM ./merged-sci-model-q5_k_m.gguf | |
| TEMPLATE """<|im_start|>system | |
| You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI.<|im_end|> | |
| <|im_start|>user | |
| {{ .Prompt }}<|im_end|> | |
| <|im_start|>assistant | |
| """ | |
| PARAMETER stop "<|im_start|>" | |
| PARAMETER stop "<|im_end|>" | |
| PARAMETER temperature 0.7 | |
| PARAMETER top_p 0.9 | |
| EOF | |
| ``` | |
| **2. Create and run the model:** | |
| ```bash | |
| ollama create sci-assistant -f Modelfile | |
| ollama run sci-assistant "What are the signs of autonomic dysreflexia?" | |
| ``` | |
| ### Usage with llama.cpp | |
| **1. Install and setup:** | |
| ```bash | |
| # Clone and build llama.cpp | |
| git clone https://github.com/ggerganov/llama.cpp | |
| cd llama.cpp | |
| make | |
| # Download model | |
| wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q5_k_m.gguf | |
| ``` | |
| **2. Interactive chat:** | |
| ```bash | |
| ./main -m merged-sci-model-q5_k_m.gguf \ | |
| --temp 0.7 \ | |
| --repeat_penalty 1.1 \ | |
| -c 4096 \ | |
| --interactive \ | |
| --in-prefix "<|im_start|>user\n" \ | |
| --in-suffix "<|im_end|>\n<|im_start|>assistant\n" | |
| ``` | |
| **3. Single prompt:** | |
| ```bash | |
| ./main -m merged-sci-model-q5_k_m.gguf \ | |
| --temp 0.7 \ | |
| -c 2048 \ | |
| -p "<|im_start|>system\nYou are a specialized medical assistant for people with spinal cord injuries.<|im_end|>\n<|im_start|>user\nWhat exercises are good for someone with paraplegia?<|im_end|>\n<|im_start|>assistant\n" | |
| ``` | |
| ### Performance Comparison | |
| - **F16 Model** (`merged-sci-model.gguf`): Maximum quality, largest memory footprint | |
| - **Q6_K Model** (`merged-sci-model-q6_k.gguf`): Near-maximum quality with 60% size reduction | |
| - **Q5_K_M Model** (`merged-sci-model-q5_k_m.gguf`): Excellent quality retention, good balance | |
| - **Q5_K_S Model** (`merged-sci-model-q5_k_s.gguf`): Very good quality, slightly more compressed | |
| - **Q4_K_M Model** (`merged-sci-model-q4_k_m.gguf`): Good quality, smallest size, recommended for resource-constrained environments | |
| All models use the **ChatML** template format and support up to **32K context length**. | |
| ## Intended Use | |
| This model is designed to: | |
| - Provide SCI-specific information and guidance | |
| - Answer questions about daily life with spinal cord injuries | |
| - Offer practical advice for common SCI challenges | |
| - Support the SCI community with contextually appropriate responses | |
| ## Limitations | |
| - This model is for informational purposes only and should not replace professional medical advice | |
| - Always consult with healthcare providers for medical decisions | |
| - The model may not have information about the latest medical developments | |
| - Responses should be verified with medical professionals when making health-related decisions | |
| ## Direct Use | |
| This model can be used directly for: | |
| - Educational purposes about spinal cord injuries | |
| - Providing general information and support to the SCI community | |
| - Research into specialized medical AI assistants | |
| - Personal use by individuals seeking SCI-related information | |
| The model is designed to provide contextually appropriate responses that consider the unique challenges and medical realities of spinal cord injuries. | |
| ### Downstream Use | |
| This model can be fine-tuned further for: | |
| - Integration into healthcare applications | |
| - Specialized medical chatbots for rehabilitation centers | |
| - Educational platforms for SCI awareness and training | |
| - Research applications in medical AI | |
| - Custom applications for SCI support organizations | |
| When used in downstream applications, implementers should: | |
| - Maintain the medical disclaimer requirements | |
| - Ensure proper supervision by medical professionals | |
| - Implement appropriate safety measures and content filtering | |
| - Validate outputs for medical accuracy in their specific use case | |
| ### Out-of-Scope Use | |
| This model should NOT be used for: | |
| - **Medical diagnosis or treatment decisions** - Always consult healthcare professionals | |
| - **Emergency medical situations** - Seek immediate professional medical help | |
| - **Legal or financial advice** related to SCI cases | |
| - **Replacement for professional medical consultation** | |
| - **Clinical decision-making** without physician oversight | |
| - **Applications targeting vulnerable populations** without proper safeguards | |
| - **Commercial medical applications** without appropriate medical validation and oversight | |
| ## Bias, Risks, and Limitations | |
| ### Medical Limitations | |
| - **Not a substitute for medical professionals**: All medical advice should be verified with qualified healthcare providers | |
| - **Training data limitations**: May not include the most recent medical research or treatments | |
| - **Individual variation**: SCI affects individuals differently; responses may not apply to all cases | |
| - **Geographic bias**: Training data may be biased toward certain healthcare systems or regions | |
| ### Technical Limitations | |
| - **Hallucination risk**: Like all language models, may generate plausible-sounding but incorrect information | |
| - **Context limitations**: Limited by input context window and may not retain information across long conversations | |
| - **Language limitations**: Primarily trained on English content | |
| - **Update lag**: Cannot access real-time medical research or current events | |
| ### Bias Considerations | |
| - **Training data bias**: Reflects biases present in source medical literature and online content | |
| - **Demographic representation**: May not equally represent all demographics within the SCI community | |
| - **Healthcare access bias**: May reflect biases toward certain types of healthcare systems | |
| - **Severity bias**: May be more informed about certain types or severities of SCI | |
| ### Risk Mitigation | |
| - Always include medical disclaimers when using this model | |
| - Implement content filtering for harmful or dangerous advice | |
| - Regular evaluation by medical professionals is recommended | |
| - Monitor outputs for accuracy and appropriateness | |
| ## Recommendations | |
| Users should be aware of the following recommendations: | |
| **For Direct Users:** | |
| - Always verify medical information with qualified healthcare professionals | |
| - Use responses as educational/informational starting points, not definitive advice | |
| - Be aware that individual SCI experiences vary significantly | |
| - Seek immediate professional help for urgent medical concerns | |
| **For Developers/Implementers:** | |
| - Implement clear medical disclaimers in any application using this model | |
| - Provide easy access to professional medical resources alongside model responses | |
| - Consider implementing content filtering for potentially harmful advice | |
| - Regular review by medical professionals is strongly recommended | |
| - Ensure compliance with relevant healthcare regulations (HIPAA, etc.) | |
| **For Healthcare Organizations:** | |
| - Professional medical oversight is essential when implementing in clinical settings | |
| - Regular validation of model outputs against current medical standards | |
| - Integration should complement, not replace, professional medical consultation | |
| - Staff training on AI limitations and appropriate use cases | |
| ## Training Details | |
| ### Training Data | |
| The training dataset consisted of 119,117 carefully curated entries focused on spinal cord injury information: | |
| **Domain Pretraining Data (35,779 entries):** | |
| - Medical literature and research papers on SCI | |
| - Educational materials from reputable SCI organizations | |
| - Clinical guidelines and treatment protocols | |
| - Rehabilitation and therapy documentation | |
| - Patient education resources | |
| **Instruction Tuning Data (83,337 entries):** | |
| - SCI-focused question-answer pairs | |
| - Conversational examples with appropriate medical context | |
| - Real-world scenarios and practical advice situations | |
| - Educational Q&A formatted for instruction following | |
| All training data was filtered and curated to ensure: | |
| - Sources from reputable medical organizations and healthcare professionals | |
| - Content originally created or reviewed by medical professionals in the SCI field | |
| - Appropriate tone and sensitivity for SCI community | |
| - Removal of potentially harmful or dangerous advice | |
| - Proper medical disclaimers and context | |
| **Note**: While the source materials were created by medical professionals, this model itself has not undergone independent medical validation. | |
| ### Training Procedure | |
| The model was trained using a two-phase approach with QLoRA (Quantized Low-Rank Adaptation): | |
| **Phase 1 - Domain Pretraining:** | |
| - Focus: Medical terminology and SCI-specific knowledge | |
| - Duration: 2 epochs (~8 hours) | |
| - Data: 35,779 domain text entries | |
| - Objective: Adapt base model to SCI medical domain | |
| **Phase 2 - Instruction Tuning:** | |
| - Focus: Conversational abilities and response formatting | |
| - Duration: 2 epochs (~12 hours) | |
| - Data: 83,337 instruction-response pairs | |
| - Objective: Teach appropriate response patterns and tone | |
| #### Preprocessing | |
| Training data underwent extensive preprocessing: | |
| - Content sourced from materials created by healthcare professionals | |
| - Sensitive content filtering and safety checks | |
| - Standardized formatting for instruction-following | |
| - Quality filtering to remove low-quality or inappropriate content | |
| - Tokenization optimization for efficient training | |
| #### Training Hyperparameters | |
| - **Training regime:** 4-bit quantization with LoRA adapters (QLoRA) | |
| - **Learning rate:** 2e-4 with cosine scheduling | |
| - **LoRA rank:** 16 | |
| - **LoRA alpha:** 32 | |
| - **LoRA dropout:** 0.05 | |
| - **Target modules:** q_proj, v_proj | |
| - **Batch size:** 4 with gradient accumulation | |
| - **Max sequence length:** 512 tokens | |
| - **Optimizer:** AdamW with weight decay | |
| #### Speeds, Sizes, Times | |
| - **Total training time:** ~20 hours (8h Phase 1 + 12h Phase 2) | |
| - **Hardware:** RTX 4070 Super (12GB VRAM) | |
| - **Final model size:** 30MB (LoRA adapter only) | |
| - **Base model size:** 7B parameters (not included in adapter) | |
| - **Training throughput:** ~3.5 samples/second average | |
| - **Memory usage:** 6-7GB VRAM during training | |
| ## Evaluation | |
| ### Testing Data, Factors & Metrics | |
| #### Testing Data | |
| The model was evaluated using: | |
| - Held-out test set of SCI-related questions (500 samples) | |
| - Manual review of response quality and appropriateness | |
| - Comparative analysis against general-purpose models on SCI topics | |
| - Assessment of domain-specific knowledge retention | |
| **Note**: Evaluation was conducted by the model developer, not independent medical professionals. | |
| #### Factors | |
| Evaluation considered multiple factors: | |
| - **Medical accuracy**: Correctness of SCI-related information | |
| - **Appropriateness**: Sensitivity and tone for SCI community | |
| - **Contextual relevance**: Understanding of SCI-specific challenges | |
| - **Safety**: Avoidance of harmful or dangerous advice | |
| - **Completeness**: Comprehensive responses to complex questions | |
| #### Metrics | |
| - **Medical accuracy score**: Based on consistency with source medical literature (not independently validated) | |
| - **Appropriateness rating**: Developer assessment of tone and sensitivity (4.2/5.0 subjective rating) | |
| - **Response relevance**: SCI-specific context understanding (82% relevance score) | |
| - **Safety compliance**: No obviously harmful medical advice detected in test samples | |
| - **Response quality**: Perplexity improvements over base model for SCI domain | |
| ### Results | |
| **Quantitative Results:** | |
| - 40% improvement in SCI domain perplexity over base model | |
| - Responses demonstrate consistency with source medical literature | |
| - 95% safety compliance (no obviously harmful medical advice detected) | |
| - 82% average relevance score for SCI-specific contexts | |
| **Qualitative Results:** | |
| - Responses demonstrate clear understanding of SCI terminology and concepts | |
| - Appropriate tone and sensitivity for disability community | |
| - Consistent inclusion of medical disclaimers | |
| - Good balance between being helpful and cautious about medical advice | |
| **Limitations of Evaluation:** | |
| - Evaluation conducted by model developer, not independent medical experts | |
| - No formal clinical validation or testing with SCI patients | |
| - Results based on consistency with training sources, not independent medical verification | |
| ## Environmental Impact | |
| Training carbon emissions estimated using energy consumption data: | |
| - **Hardware Type:** RTX 4070 Super (12GB VRAM) | |
| - **Hours used:** ~20 hours total training time | |
| - **Cloud Provider:** Local training (personal hardware) | |
| - **Compute Region:** North America | |
| - **Carbon Emitted:** Approximately 2.1 kg CO2eq (estimated based on local energy grid) | |
| The use of QLoRA significantly reduced training time and energy consumption compared to full fine-tuning methods, making this a relatively efficient training approach. | |
| ## Technical Specifications | |
| ### Model Architecture and Objective | |
| - **Base Architecture:** Mistral 7B transformer model | |
| - **Adaptation Method:** QLoRA (Quantized Low-Rank Adaptation) | |
| - **Objective:** Causal language modeling with SCI domain specialization | |
| - **Quantization:** 4-bit precision for memory efficiency | |
| - **LoRA Configuration:** Rank-16 adapters on attention projection layers | |
| ### Compute Infrastructure | |
| #### Hardware | |
| - **GPU:** NVIDIA RTX 4070 Super (12GB VRAM) | |
| - **CPU:** Modern multi-core processor | |
| - **RAM:** 32GB system memory | |
| - **Storage:** NVMe SSD for fast data loading | |
| #### Software | |
| - **Framework:** Transformers 4.36+, PEFT 0.16.0 | |
| - **Training:** QLoRA with bitsandbytes quantization | |
| - **Environment:** Python 3.10+, PyTorch 2.0+, CUDA 12.1 | |
| ## Citation | |
| If you use this model in your research or applications, please cite: | |
| **BibTeX:** | |
| ```bibtex | |
| @misc{sci_assistant_2025, | |
| title={SCI Assistant: A Specialized AI Assistant for Spinal Cord Injury Support}, | |
| author={basiphobe}, | |
| year={2025}, | |
| howpublished={Hugging Face Model Repository}, | |
| url={https://huggingface.co/basiphobe/sci-assistant} | |
| } | |
| ``` | |
| **APA:** | |
| basiphobe. (2025). *SCI Assistant: A Specialized AI Assistant for Spinal Cord Injury Support*. Hugging Face. https://huggingface.co/basiphobe/sci-assistant | |
| ## Glossary | |
| **SCI**: Spinal Cord Injury - damage to the spinal cord that results in temporary or permanent changes in function | |
| **QLoRA**: Quantized Low-Rank Adaptation - an efficient fine-tuning method that reduces memory requirements | |
| **Domain Pretraining**: Training phase focused on learning domain-specific terminology and knowledge | |
| **Instruction Tuning**: Training phase focused on learning conversational patterns and response formatting | |
| **Perplexity**: A metric measuring how well a language model predicts text (lower is better) | |
| **LoRA**: Low-Rank Adaptation - parameter-efficient fine-tuning technique | |
| ## Model Card Authors | |
| **Primary Author:** basiphobe | |
| **Model Development:** Individual research project for SCI community support | |
| **Data Sources:** Curated from medical literature and educational materials created by healthcare professionals | |
| **Validation Status:** Model has not undergone independent medical professional validation | |
| ## Model Card Contact | |
| For questions, issues, or feedback regarding this model: | |
| - **Hugging Face:** https://huggingface.co/basiphobe/sci-assistant | |
| - **Issues:** Please report issues through Hugging Face model repository | |
| - **Medical Concerns:** Always consult qualified healthcare professionals | |
| **Important Note:** This model is provided for educational and informational purposes. Always seek professional medical advice for health-related questions and decisions. | |
| ### Framework versions | |
| - PEFT 0.16.0 | |