FrameByFrame
/

programming-language-identification-100plus

Text Classification

programming-language-identification

language-detection

text-embeddings-inference

Model card Files Files and versions

vijaym commited on 4 days ago

Commit

dd5f010

·

verified ·

1 Parent(s): 23bcac5

Add inference notebook

Files changed (1) hide show

inference_examples.ipynb +2 -10

inference_examples.ipynb CHANGED Viewed

@@ -4,15 +4,7 @@
    "cell_type": "markdown",
    "id": "intro",
    "metadata": {},
-   "source": [
-    "# programming-language-identification-100plus\n",
-    "\n",
-    "Runnable examples for the ModernBERT programming-language identifier.\n",
-    "Covers 107 languages. Input is truncated to the first 512 characters\n",
-    "(matches the training-time `head` strategy).\n",
-    "\n",
-    "Point `MODEL_ID` at the local checkpoint directory or the HF repo id."
-   ]
   },
   {
    "cell_type": "code",
@@ -20,7 +12,7 @@
    "id": "setup",
    "metadata": {},
    "outputs": [],
-   "source": "import torch\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\n\nMODEL_ID = \"/home/vijay/llm_models/guardrail_code_models/programming-language-identification-100plus\"\nDEVICE = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL_ID)\nmodel = AutoModelForSequenceClassification.from_pretrained(\n    MODEL_ID,\n    attn_implementation=\"eager\",\n    torch_dtype=torch.bfloat16,  # weights are published in bf16\n).to(DEVICE).eval()\n\nprint(f\"device={DEVICE}  num_labels={model.config.num_labels}  dtype={model.dtype}\")\n"
   },
   {
    "cell_type": "markdown",

    "cell_type": "markdown",
    "id": "intro",
    "metadata": {},
+   "source": "# programming-language-identification-100plus\n\nRunnable examples for the ModernBERT programming-language identifier.\nCovers 107 languages. Input is truncated to the first 512 characters\n(matches the training-time `head` strategy).\n"
   },
   {
    "cell_type": "code",
    "id": "setup",
    "metadata": {},
    "outputs": [],
+   "source": "import torch\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\n\nMODEL_ID = \"FrameByFrame/programming-language-identification-100plus\"\nDEVICE = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL_ID)\nmodel = AutoModelForSequenceClassification.from_pretrained(\n    MODEL_ID,\n    attn_implementation=\"eager\",\n    torch_dtype=torch.bfloat16,\n).to(DEVICE).eval()\n\nprint(f\"device={DEVICE}  num_labels={model.config.num_labels}  dtype={model.dtype}\")\n"
   },
   {
    "cell_type": "markdown",