Loitering Detection

Property Value
Category Object Detection + Tracking + Zone Analytics
Source Framework PyTorch (Ultralytics)
Supported Precisions FP32, FP16, INT8 (mixed-precision)
Inference Engine OpenVINO
Hardware CPU, GPU, NPU
Detected Class person (COCO class 0)

Overview

Loitering Detection is a Metro Analytics use case that flags people who remain inside a configurable region of interest for longer than a dwell-time threshold. It is built on YOLO26 for person detection, paired with a multi-object tracker that assigns persistent IDs across frames. A polygon zone defines the area to monitor; for each tracked person whose bounding-box anchor falls inside the zone, the application accumulates dwell time and raises a loitering event when the threshold is exceeded.

Typical Metro deployments include:

  • Restricted-Area Monitoring -- raise alerts when a person lingers near tracks, equipment rooms, or after-hours zones.
  • Platform Edge Safety -- detect prolonged presence inside a yellow-line buffer.
  • ATM and Ticketing Security -- identify suspicious dwell at unattended kiosks.
  • Crowd-Free Zone Enforcement -- monitor emergency exits and corridors that must remain clear.

Available variants: yolo26n, yolo26s, yolo26m, yolo26l, yolo26x. Smaller variants (yolo26n, yolo26s) are recommended for high-FPS edge deployment.


Prerequisites

Create and activate a Python virtual environment before running the scripts:

python3 -m venv .venv --system-site-packages
source .venv/bin/activate

Note: The --system-site-packages flag is required so the virtual environment can access the system-installed OpenVINO and DLStreamer Python packages.


Getting Started

Download and Quantize Model

Run the provided script to download, export to OpenVINO IR, and optionally quantize:

chmod +x export_and_quantize.sh
./export_and_quantize.sh

This exports the default yolo26n model in FP16 precision.

Optional: Select a Different Variant or Precision

./export_and_quantize.sh yolo26n FP32   # full-precision
./export_and_quantize.sh yolo26n INT8   # quantized
./export_and_quantize.sh yolo26s        # larger variant, default FP16

Replace yolo26n with any variant (yolo26s, yolo26m, yolo26l, yolo26x). The second argument selects the precision (FP32, FP16, INT8); the default is FP16.

The script performs the following steps:

  1. Installs dependencies (openvino, ultralytics; adds nncf for INT8).
  2. Downloads the sample surveillance video (VIRAT_S_000101.mp4) from the Intel Metro AI Suite project into the current directory.
  3. Downloads the PyTorch weights and exports to OpenVINO IR.
  4. (INT8 only) Quantizes the model using NNCF post-training quantization.

Output files:

  • yolo26n_openvino_model/ -- FP32 or FP16 OpenVINO IR model directory.
  • yolo26n_loitering_int8.xml / yolo26n_loitering_int8.bin -- INT8 quantized model (only when INT8 is selected).

Precision / Device Compatibility

Precision CPU GPU NPU
FP32 Yes Yes No
FP16 Yes Yes Yes
INT8 Yes Yes Yes

Note: The INT8 calibration uses frames from the bundled sample video. For production accuracy, replace it with a representative set of frames from the target deployment site.

Defining the Region of Interest

The zone is a rectangular ROI expressed as x_min,y_min,x_max,y_max in the original input frame coordinates (not the 640x640 model input). DLStreamer's gvaattachroi element attaches the ROI to every buffer, and gvadetect inference-region=1 (roi-list) restricts inference to that ROI only -- no Python polygon math required. A typical surveillance-zone configuration on a 1280x720 source might be:

roi=0,200,300,400          # ROI for gvaattachroi (x_min,y_min,x_max,y_max)
LOITERING_SECONDS = 5.0       # dwell threshold, in seconds (demo value)

Note: The sample uses a 5-second threshold so that loitering events are triggered quickly on the short demo video. For production deployments, increase this to 10--30 seconds depending on the site's operational requirements.

Per-person dwell time is measured at the bottom-center of the bounding box (the foot anchor), which most closely approximates the person's ground position.

DLStreamer Sample

Set up the environment:

source /opt/intel/openvino_2026/setupvars.sh
source /opt/intel/dlstreamer/scripts/setup_dls_env.sh
export PYTHONPATH=/opt/intel/dlstreamer/python:/opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-}

Run loitering detection:

from collections import defaultdict
import ctypes
import gi
gi.require_version("Gst", "1.0")
from gi.repository import Gst
from gstgva import Tensor, VideoFrame

Gst.init(None)

libgst = ctypes.CDLL("libgstreamer-1.0.so.0")
libgst.gst_structure_new_empty.argtypes = [ctypes.c_char_p]
libgst.gst_structure_new_empty.restype = ctypes.c_void_p

MODEL = "yolo26n_openvino_model/yolo26n.xml"
VIDEO = "VIRAT_S_000101.mp4"
ROI = "0,200,300,400"
LOITERING_SECONDS = 5.0

pipeline = Gst.parse_launch(
    f"filesrc location={VIDEO} ! decodebin3 ! videoconvert ! "
    f"gvaattachroi roi={ROI} ! "
    f"gvadetect inference-region=1 model={MODEL} device=GPU threshold=0.5 ! queue ! "
    f"gvatrack tracking-type=short-term-imageless ! queue ! "
    f"gvafpscounter ! identity name=probe ! gvawatermark ! videoconvert ! video/x-raw,format=I420 ! "
    f"openh264enc ! h264parse ! mp4mux ! filesink location=output_dlstreamer.mp4"
)

dwell = defaultdict(float)
last_seen = {}
flagged = set()

def on_buffer(pad, info):
    buf = info.get_buffer()
    frame = VideoFrame(buf, caps=pad.get_current_caps())
    now = buf.pts / Gst.SECOND if buf.pts != Gst.CLOCK_TIME_NONE else 0.0

    for region in frame.regions():
        if region.label() != "person":
            continue
        oid = region.object_id()
        if oid <= 0:
            continue

        dwell[oid] += now - last_seen.get(oid, now)
        last_seen[oid] = now

        # Show dwell time on the bounding box (rendered by gvawatermark)
        t = Tensor(libgst.gst_structure_new_empty(b"dwell"))
        t.set_label(f" {dwell[oid]:.1f}s")
        region.add_tensor(t)

        if dwell[oid] >= LOITERING_SECONDS and oid not in flagged:
            flagged.add(oid)
            rect = region.rect()
            print(f"LOITERING id={oid} dwell={dwell[oid]:.1f}s pos=({int(rect.x + rect.w/2)},{int(rect.y + rect.h)})")

    return Gst.PadProbeReturn.OK

pipeline.get_by_name("probe").get_static_pad("src").add_probe(Gst.PadProbeType.BUFFER, on_buffer)
pipeline.set_state(Gst.State.PLAYING)
pipeline.get_bus().timed_pop_filtered(Gst.CLOCK_TIME_NONE, Gst.MessageType.EOS | Gst.MessageType.ERROR)
pipeline.set_state(Gst.State.NULL)

Expected output:

LOITERING id=26 dwell=5.0s pos=(147,341)
LOITERING id=27 dwell=5.0s pos=(122,337)
...

The annotated video is saved to output.mp4.

Expected Output

DLStreamer expected output

Device targets:

  • device=GPU -- default in the sample code.
  • device=CPU -- change device=GPU to device=CPU.
  • device=NPU -- change device=GPU to device=NPU; use batch-size=1 and nireq=4 for best NPU utilization.

License

Copyright (C) Intel Corporation. All rights reserved. Licensed under the MIT License. See LICENSE for details.

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Intel/loitering-detection

Collection including Intel/loitering-detection