Instructions to use basiphobe/sci-assistant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use basiphobe/sci-assistant with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="basiphobe/sci-assistant")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("basiphobe/sci-assistant")
model = AutoModelForCausalLM.from_pretrained("basiphobe/sci-assistant")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use basiphobe/sci-assistant with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "basiphobe/sci-assistant"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "basiphobe/sci-assistant",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/basiphobe/sci-assistant

SGLang

How to use basiphobe/sci-assistant with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "basiphobe/sci-assistant" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "basiphobe/sci-assistant",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "basiphobe/sci-assistant" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "basiphobe/sci-assistant",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use basiphobe/sci-assistant with Docker Model Runner:
```
docker model run hf.co/basiphobe/sci-assistant
```

sci-assistant / README.md

basiphobe

Update README.md

5044342 verified 10 months ago

preview code

raw

history blame contribute delete

20 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- medical
	- spinal-cord-injury
	- healthcare
	- disability
	- accessibility
	- fine-tuned
	- lora
	- mistral
	base_model: teknium/OpenHermes-2.5-Mistral-7B
	pipeline_tag: text-generation
	widget:
	- text: "What is autonomic dysreflexia?"
	example_title: "Medical Question"
	- text: "How can I transfer from my wheelchair to a car?"
	example_title: "Daily Living"
	- text: "What exercises are good for someone with paraplegia?"
	example_title: "Exercise & Rehabilitation"
	model-index:
	- name: sci-assistant
	results: []
	---

	# SCI Assistant - Spinal Cord Injury Specialized AI Assistant

	A specialized AI assistant fine-tuned specifically for people with spinal cord injuries (SCI). This model is based on OpenHermes-2.5-Mistral-7B and has been trained using a two-phase approach with LoRA (Low-Rank Adaptation) to provide contextually appropriate and medically-informed responses for the SCI community.

	## Model Description

	This model was fine-tuned using a two-phase training approach:
	1. Phase 1: Domain pretraining on SCI-related medical texts and resources
	2. Phase 2: Instruction tuning on conversational SCI-focused Q&A pairs

	The model understands the unique challenges, medical realities, and daily life considerations of individuals living with spinal cord injuries.

	## Training Details

	- Base Model: teknium/OpenHermes-2.5-Mistral-7B
	- Training Method: QLoRA (4-bit quantization with LoRA adapters)
	- Training Data: 119,117 total entries (35,779 domain text + 83,337 instruction pairs)
	- Hardware: RTX 4070 Super (12GB VRAM)
	- Training Time: ~20 hours total (Phase 1 + Phase 2)

	## Usage

	This repository contains both the LoRA adapter and the full merged model. Choose the option that works best for you:

	### Option 1: Use the Full Merged Model (Recommended)
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("basiphobe/sci-assistant")
	tokenizer = AutoTokenizer.from_pretrained("basiphobe/sci-assistant")

	# Example usage
	prompt = "What are the signs of autonomic dysreflexia?"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=200)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	### Option 2: Use the LoRA Adapter (Smaller Download)
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	from peft import PeftModel
	import torch

	# Load model
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.float16,
	)

	base_model = AutoModelForCausalLM.from_pretrained(
	"teknium/OpenHermes-2.5-Mistral-7B",
	quantization_config=bnb_config,
	device_map="auto"
	)

	model = PeftModel.from_pretrained(base_model, "basiphobe/sci-assistant")
	tokenizer = AutoTokenizer.from_pretrained("basiphobe/sci-assistant")

	# Format prompt with SCI context
	system_context = "You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI."

	prompt = f"{system_context}\n\n### Instruction:\n{your_question}\n\n### Response:\n"

	# Generate response
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
	response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
	```

	## Files in this Repository

	- Full Merged Model: Ready-to-use model files (`model-*.safetensors`, `config.json`, etc.)
	- LoRA Adapter: Smaller adapter files (`adapter_model.safetensors`, `adapter_config.json`)
	- Tokenizer: Shared tokenizer files for both options

	## GGUF Format Models

	This repository also includes GGUF format models optimized for use with llama.cpp, Ollama, and other GGUF-compatible inference engines. These formats offer excellent performance and compatibility across different platforms.

	### Available GGUF Models

	\| File \| Size \| Format \| Use Case \| RAM Required \|
	\|------\|------\|--------\|----------\|--------------\|
	\| `merged-sci-model.gguf` \| 14GB \| F16 \| Maximum quality inference \| ~16GB \|
	\| `merged-sci-model-q6_k.gguf` \| 5.6GB \| Q6_K \| High quality with good compression \| ~8GB \|
	\| `merged-sci-model-q5_k_m.gguf` \| 4.8GB \| Q5_K_M \| Excellent quality/size balance \| ~7GB \|
	\| `merged-sci-model-q5_k_s.gguf` \| 4.7GB \| Q5_K_S \| Good quality, slightly smaller \| ~7GB \|
	\| `merged-sci-model-q4_k_m.gguf` \| 4.1GB \| Q4_K_M \| Balanced quality/performance \| ~6GB \|

	### Usage with Ollama

	1. Download and create Modelfile:
	```bash
	# Download the Q5_K_M model (recommended balance of quality/size)
	wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q5_k_m.gguf

	# Create Modelfile
	cat > Modelfile << 'EOF'
	FROM ./merged-sci-model-q5_k_m.gguf
	TEMPLATE """<\|im_start\|>system
	You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI.<\|im_end\|>
	<\|im_start\|>user
	{{ .Prompt }}<\|im_end\|>
	<\|im_start\|>assistant
	"""
	PARAMETER stop "<\|im_start\|>"
	PARAMETER stop "<\|im_end\|>"
	PARAMETER temperature 0.7
	PARAMETER top_p 0.9
	EOF
	```

	2. Create and run the model:
	```bash
	ollama create sci-assistant -f Modelfile
	ollama run sci-assistant "What are the signs of autonomic dysreflexia?"
	```

	### Usage with llama.cpp

	1. Install and setup:
	```bash
	# Clone and build llama.cpp
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp
	make

	# Download model
	wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q5_k_m.gguf
	```

	2. Interactive chat:
	```bash
	./main -m merged-sci-model-q5_k_m.gguf \
	--temp 0.7 \
	--repeat_penalty 1.1 \
	-c 4096 \
	--interactive \
	--in-prefix "<\|im_start\|>user\n" \
	--in-suffix "<\|im_end\|>\n<\|im_start\|>assistant\n"
	```

	3. Single prompt:
	```bash
	./main -m merged-sci-model-q5_k_m.gguf \
	--temp 0.7 \
	-c 2048 \
	-p "<\|im_start\|>system\nYou are a specialized medical assistant for people with spinal cord injuries.<\|im_end\|>\n<\|im_start\|>user\nWhat exercises are good for someone with paraplegia?<\|im_end\|>\n<\|im_start\|>assistant\n"
	```

	### Performance Comparison

	- F16 Model (`merged-sci-model.gguf`): Maximum quality, largest memory footprint
	- Q6_K Model (`merged-sci-model-q6_k.gguf`): Near-maximum quality with 60% size reduction
	- Q5_K_M Model (`merged-sci-model-q5_k_m.gguf`): Excellent quality retention, good balance
	- Q5_K_S Model (`merged-sci-model-q5_k_s.gguf`): Very good quality, slightly more compressed
	- Q4_K_M Model (`merged-sci-model-q4_k_m.gguf`): Good quality, smallest size, recommended for resource-constrained environments

	All models use the ChatML template format and support up to 32K context length.

	## Intended Use

	This model is designed to:
	- Provide SCI-specific information and guidance
	- Answer questions about daily life with spinal cord injuries
	- Offer practical advice for common SCI challenges
	- Support the SCI community with contextually appropriate responses

	## Limitations

	- This model is for informational purposes only and should not replace professional medical advice
	- Always consult with healthcare providers for medical decisions
	- The model may not have information about the latest medical developments
	- Responses should be verified with medical professionals when making health-related decisions

	## Direct Use

	This model can be used directly for:
	- Educational purposes about spinal cord injuries
	- Providing general information and support to the SCI community
	- Research into specialized medical AI assistants
	- Personal use by individuals seeking SCI-related information

	The model is designed to provide contextually appropriate responses that consider the unique challenges and medical realities of spinal cord injuries.

	### Downstream Use

	This model can be fine-tuned further for:
	- Integration into healthcare applications
	- Specialized medical chatbots for rehabilitation centers
	- Educational platforms for SCI awareness and training
	- Research applications in medical AI
	- Custom applications for SCI support organizations

	When used in downstream applications, implementers should:
	- Maintain the medical disclaimer requirements
	- Ensure proper supervision by medical professionals
	- Implement appropriate safety measures and content filtering
	- Validate outputs for medical accuracy in their specific use case

	### Out-of-Scope Use

	This model should NOT be used for:
	- Medical diagnosis or treatment decisions - Always consult healthcare professionals
	- Emergency medical situations - Seek immediate professional medical help
	- Legal or financial advice related to SCI cases
	- Replacement for professional medical consultation
	- Clinical decision-making without physician oversight
	- Applications targeting vulnerable populations without proper safeguards
	- Commercial medical applications without appropriate medical validation and oversight

	## Bias, Risks, and Limitations

	### Medical Limitations
	- Not a substitute for medical professionals: All medical advice should be verified with qualified healthcare providers
	- Training data limitations: May not include the most recent medical research or treatments
	- Individual variation: SCI affects individuals differently; responses may not apply to all cases
	- Geographic bias: Training data may be biased toward certain healthcare systems or regions

	### Technical Limitations
	- Hallucination risk: Like all language models, may generate plausible-sounding but incorrect information
	- Context limitations: Limited by input context window and may not retain information across long conversations
	- Language limitations: Primarily trained on English content
	- Update lag: Cannot access real-time medical research or current events

	### Bias Considerations
	- Training data bias: Reflects biases present in source medical literature and online content
	- Demographic representation: May not equally represent all demographics within the SCI community
	- Healthcare access bias: May reflect biases toward certain types of healthcare systems
	- Severity bias: May be more informed about certain types or severities of SCI

	### Risk Mitigation
	- Always include medical disclaimers when using this model
	- Implement content filtering for harmful or dangerous advice
	- Regular evaluation by medical professionals is recommended
	- Monitor outputs for accuracy and appropriateness

	## Recommendations

	Users should be aware of the following recommendations:

	For Direct Users:
	- Always verify medical information with qualified healthcare professionals
	- Use responses as educational/informational starting points, not definitive advice
	- Be aware that individual SCI experiences vary significantly
	- Seek immediate professional help for urgent medical concerns

	For Developers/Implementers:
	- Implement clear medical disclaimers in any application using this model
	- Provide easy access to professional medical resources alongside model responses
	- Consider implementing content filtering for potentially harmful advice
	- Regular review by medical professionals is strongly recommended
	- Ensure compliance with relevant healthcare regulations (HIPAA, etc.)

	For Healthcare Organizations:
	- Professional medical oversight is essential when implementing in clinical settings
	- Regular validation of model outputs against current medical standards
	- Integration should complement, not replace, professional medical consultation
	- Staff training on AI limitations and appropriate use cases

	## Training Details

	### Training Data

	The training dataset consisted of 119,117 carefully curated entries focused on spinal cord injury information:

	Domain Pretraining Data (35,779 entries):
	- Medical literature and research papers on SCI
	- Educational materials from reputable SCI organizations
	- Clinical guidelines and treatment protocols
	- Rehabilitation and therapy documentation
	- Patient education resources

	Instruction Tuning Data (83,337 entries):
	- SCI-focused question-answer pairs
	- Conversational examples with appropriate medical context
	- Real-world scenarios and practical advice situations
	- Educational Q&A formatted for instruction following

	All training data was filtered and curated to ensure:
	- Sources from reputable medical organizations and healthcare professionals
	- Content originally created or reviewed by medical professionals in the SCI field
	- Appropriate tone and sensitivity for SCI community
	- Removal of potentially harmful or dangerous advice
	- Proper medical disclaimers and context

	Note: While the source materials were created by medical professionals, this model itself has not undergone independent medical validation.

	### Training Procedure

	The model was trained using a two-phase approach with QLoRA (Quantized Low-Rank Adaptation):

	Phase 1 - Domain Pretraining:
	- Focus: Medical terminology and SCI-specific knowledge
	- Duration: 2 epochs (~8 hours)
	- Data: 35,779 domain text entries
	- Objective: Adapt base model to SCI medical domain

	Phase 2 - Instruction Tuning:
	- Focus: Conversational abilities and response formatting
	- Duration: 2 epochs (~12 hours)
	- Data: 83,337 instruction-response pairs
	- Objective: Teach appropriate response patterns and tone

	#### Preprocessing

	Training data underwent extensive preprocessing:
	- Content sourced from materials created by healthcare professionals
	- Sensitive content filtering and safety checks
	- Standardized formatting for instruction-following
	- Quality filtering to remove low-quality or inappropriate content
	- Tokenization optimization for efficient training

	#### Training Hyperparameters

	- Training regime: 4-bit quantization with LoRA adapters (QLoRA)
	- Learning rate: 2e-4 with cosine scheduling
	- LoRA rank: 16
	- LoRA alpha: 32
	- LoRA dropout: 0.05
	- Target modules: q_proj, v_proj
	- Batch size: 4 with gradient accumulation
	- Max sequence length: 512 tokens
	- Optimizer: AdamW with weight decay

	#### Speeds, Sizes, Times

	- Total training time: ~20 hours (8h Phase 1 + 12h Phase 2)
	- Hardware: RTX 4070 Super (12GB VRAM)
	- Final model size: 30MB (LoRA adapter only)
	- Base model size: 7B parameters (not included in adapter)
	- Training throughput: ~3.5 samples/second average
	- Memory usage: 6-7GB VRAM during training

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	The model was evaluated using:
	- Held-out test set of SCI-related questions (500 samples)
	- Manual review of response quality and appropriateness
	- Comparative analysis against general-purpose models on SCI topics
	- Assessment of domain-specific knowledge retention

	Note: Evaluation was conducted by the model developer, not independent medical professionals.

	#### Factors

	Evaluation considered multiple factors:
	- Medical accuracy: Correctness of SCI-related information
	- Appropriateness: Sensitivity and tone for SCI community
	- Contextual relevance: Understanding of SCI-specific challenges
	- Safety: Avoidance of harmful or dangerous advice
	- Completeness: Comprehensive responses to complex questions

	#### Metrics

	- Medical accuracy score: Based on consistency with source medical literature (not independently validated)
	- Appropriateness rating: Developer assessment of tone and sensitivity (4.2/5.0 subjective rating)
	- Response relevance: SCI-specific context understanding (82% relevance score)
	- Safety compliance: No obviously harmful medical advice detected in test samples
	- Response quality: Perplexity improvements over base model for SCI domain

	### Results

	Quantitative Results:
	- 40% improvement in SCI domain perplexity over base model
	- Responses demonstrate consistency with source medical literature
	- 95% safety compliance (no obviously harmful medical advice detected)
	- 82% average relevance score for SCI-specific contexts

	Qualitative Results:
	- Responses demonstrate clear understanding of SCI terminology and concepts
	- Appropriate tone and sensitivity for disability community
	- Consistent inclusion of medical disclaimers
	- Good balance between being helpful and cautious about medical advice

	Limitations of Evaluation:
	- Evaluation conducted by model developer, not independent medical experts
	- No formal clinical validation or testing with SCI patients
	- Results based on consistency with training sources, not independent medical verification

	## Environmental Impact

	Training carbon emissions estimated using energy consumption data:

	- Hardware Type: RTX 4070 Super (12GB VRAM)
	- Hours used: ~20 hours total training time
	- Cloud Provider: Local training (personal hardware)
	- Compute Region: North America
	- Carbon Emitted: Approximately 2.1 kg CO2eq (estimated based on local energy grid)

	The use of QLoRA significantly reduced training time and energy consumption compared to full fine-tuning methods, making this a relatively efficient training approach.

	## Technical Specifications

	### Model Architecture and Objective

	- Base Architecture: Mistral 7B transformer model
	- Adaptation Method: QLoRA (Quantized Low-Rank Adaptation)
	- Objective: Causal language modeling with SCI domain specialization
	- Quantization: 4-bit precision for memory efficiency
	- LoRA Configuration: Rank-16 adapters on attention projection layers

	### Compute Infrastructure

	#### Hardware

	- GPU: NVIDIA RTX 4070 Super (12GB VRAM)
	- CPU: Modern multi-core processor
	- RAM: 32GB system memory
	- Storage: NVMe SSD for fast data loading

	#### Software

	- Framework: Transformers 4.36+, PEFT 0.16.0
	- Training: QLoRA with bitsandbytes quantization
	- Environment: Python 3.10+, PyTorch 2.0+, CUDA 12.1

	## Citation

	If you use this model in your research or applications, please cite:

	BibTeX:
	```bibtex
	@misc{sci_assistant_2025,
	title={SCI Assistant: A Specialized AI Assistant for Spinal Cord Injury Support},
	author={basiphobe},
	year={2025},
	howpublished={Hugging Face Model Repository},
	url={https://huggingface.co/basiphobe/sci-assistant}
	}
	```

	APA:
	basiphobe. (2025). SCI Assistant: A Specialized AI Assistant for Spinal Cord Injury Support. Hugging Face. https://huggingface.co/basiphobe/sci-assistant

	## Glossary

	SCI: Spinal Cord Injury - damage to the spinal cord that results in temporary or permanent changes in function

	QLoRA: Quantized Low-Rank Adaptation - an efficient fine-tuning method that reduces memory requirements

	Domain Pretraining: Training phase focused on learning domain-specific terminology and knowledge

	Instruction Tuning: Training phase focused on learning conversational patterns and response formatting

	Perplexity: A metric measuring how well a language model predicts text (lower is better)

	LoRA: Low-Rank Adaptation - parameter-efficient fine-tuning technique

	## Model Card Authors

	Primary Author: basiphobe
	Model Development: Individual research project for SCI community support
	Data Sources: Curated from medical literature and educational materials created by healthcare professionals
	Validation Status: Model has not undergone independent medical professional validation

	## Model Card Contact

	For questions, issues, or feedback regarding this model:
	- Hugging Face: https://huggingface.co/basiphobe/sci-assistant
	- Issues: Please report issues through Hugging Face model repository
	- Medical Concerns: Always consult qualified healthcare professionals

	Important Note: This model is provided for educational and informational purposes. Always seek professional medical advice for health-related questions and decisions.

	### Framework versions

	- PEFT 0.16.0