How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="aab20abdullah/qwen_OSINT",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Qwen-OSINT

Model License Task Dataset

๐Ÿ“‹ Table of Contents


๐ŸŽฏ Overview

Qwen-OSINT is a specialized large language model fine-tuned from Qwen2.5-7B specifically designed for Open Source Intelligence (OSINT) operations. This model leverages advanced natural language processing capabilities to assist security researchers, analysts, and investigators in gathering, analyzing, and synthesizing information from publicly available sources.

What is OSINT?

Open Source Intelligence (OSINT) refers to the practice of collecting and analyzing information from publicly available sources to support decision-making processes. This includes data from:

  • ๐ŸŒ Social media platforms
  • ๐Ÿ“ฐ News articles and publications
  • ๐Ÿ” Search engines and databases
  • ๐Ÿ’ผ Professional networks
  • ๐ŸŒ Public records and government databases

โœจ Features

Feature Description
๐Ÿ”Ž Advanced Search Analysis Efficiently analyzes search queries and identifies relevant intelligence sources
๐Ÿ“Š Data Synthesis Consolidates information from multiple sources into coherent summaries
๐Ÿ” Security Analysis Supports threat analysis and vulnerability assessment tasks
๐Ÿ“ Report Generation Generates structured intelligence reports in various formats
๐ŸŒ Multi-language Support Processes and analyzes content in multiple languages
๐Ÿ›ก๏ธ Ethical Compliance Built with safety guidelines to ensure responsible use

๐Ÿ“Š Model Details

Attribute Value
Base Model Qwen2.5-7B-Instruct
Framework Transformers (Hugging Face)
Training Method Supervised Fine-tuning (SFT)
Vocabulary Size 151,669 tokens
Architecture Transformer-based Decoder
Precision FP16 / INT8 compatible

Training Configuration

- Learning Rate: 2e-5
- Batch Size: 8
- Epochs: 3
- Warmup Steps: 100
- Max Sequence Length: 8192

๐Ÿ”ง Installation

Prerequisites

Python >= 3.8
PyTorch >= 2.0
transformers >= 4.35.0
accelerate >= 0.20.0
bitsandbytes >= 0.40.0 (for quantization)

Install Dependencies

pip install transformers torch accelerate bitsandbytes

Download the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "aab20abdullah/qwen_OSINT"

# Download tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# Download model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    trust_remote_code=True
)

๐Ÿš€ Quick Start

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "aab20abdullah/qwen_OSINT"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

def generate_intelligence(prompt, max_length=512):
    messages = [{"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer([text], return_tensors="pt").to("cuda")
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_length,
        temperature=0.7,
        top_p=0.9
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("assistant")[-1].strip()

# Example
result = generate_intelligence("Analyze the key elements of a threat intelligence report.")
print(result)

Quantized Version (Lower Memory Usage)

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_8bit=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)

๐Ÿ’ก Usage Examples

Example 1: Search Query Analysis

prompt = """Analyze the following search query and suggest improvements for OSINT research:
Query: "site:linkedin.com cybersecurity analyst" """

result = generate_intelligence(prompt)
print(result)

Example 2: Data Source Evaluation

prompt = """Evaluate the reliability and credibility of the following OSINT sources:
1. Government statistical databases
2. Academic research papers
3. Social media platforms
4. Open-source code repositories"""

result = generate_intelligence(prompt)
print(result)

Example 3: Threat Analysis Framework

prompt = """Using the MITRE ATT&CK framework, analyze potential threat vectors for:
- Phishing attacks
- Network intrusion
- Data exfiltration

Provide recommendations for detection and prevention.""" 

result = generate_intelligence(prompt)
print(result)

๐Ÿ›ก๏ธ Ethical Guidelines

โš ๏ธ IMPORTANT: This model is designed for legitimate OSINT research only.

Acceptable Use Cases โœ…

  • ๐Ÿ” Security research and vulnerability assessment
  • ๐Ÿ“Š Threat intelligence analysis
  • ๐Ÿ›ก๏ธ Organizational security posture evaluation
  • ๐Ÿ“š Academic research in cybersecurity
  • ๐Ÿข Corporate due diligence

Prohibited Use Cases โŒ

  • ๐Ÿšซ Unauthorized surveillance
  • ๐Ÿšซ Invasion of privacy
  • ๐Ÿšซ Harassment or stalking
  • ๐Ÿšซ Illegal activities
  • ๐Ÿšซ Content generation for malicious purposes

Responsible Use Principles

  1. Transparency: Clearly identify yourself when conducting OSINT operations
  2. Legality: Ensure compliance with applicable laws and regulations
  3. Proportionality: Collect only information necessary for your objectives
  4. Security: Protect collected data appropriately
  5. Accountability: Maintain records of your OSINT activities

โš ๏ธ Limitations

Limitation Description
โšก Computational Resources Requires GPU with sufficient VRAM for optimal performance
๐ŸŽฏ Accuracy May generate plausible but incorrect information - always verify
๐ŸŒ Language Coverage Best performance in English; other languages may vary
๐Ÿ“… Knowledge Cutoff Training data has a knowledge cutoff date
๐Ÿ”’ Sensitive Data Not designed to handle highly classified or sensitive information

๐Ÿ“„ License

This model is released under the Apache 2.0 License.

Copyright 2024 aab20abdullah

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Base Model License

The base model Qwen2.5 is licensed under the Qwen Research License.


๐Ÿ™ Acknowledgments

  • Alibaba Cloud - For developing the Qwen2.5 model architecture
  • Hugging Face - For providing the model hosting infrastructure
  • Open Source Community - For continuous contributions to AI safety and ethics

๐Ÿ“ฌ Contact


๐Ÿ“ Citation

If you use this model in your research or project, please cite:

@model{qwen_osint,
  author = {aab20abdullah},
  title = {Qwen-OSINT: A Specialized Model for Open Source Intelligence},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/aab20abdullah/qwen_OSINT}
}

โญ If you find this model useful, please consider giving it a star!

Made with โค๏ธ for the OSINT community

Downloads last month
606
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aab20abdullah/qwen_OSINT

Finetuned
Qwen/Qwen3-4B
Quantized
(219)
this model