Selective Frame Analysis for Efficient Object Tracking

Balancing Speed and Accuracy in Real-Time Vision Systems

12/1/20242 min read

black car interior \
black car interior \

Introduction

In real-time applications like autonomous driving and video surveillance, tracking multiple objects (MOT) with precision is essential, but it's also computationally demanding. Traditional MOT pipelines process every video frame, which often leads to high computational cost, latency, and inefficiency, especially in crowded or resource-constrained environments.

My research explores a simple yet powerful idea:
What if we didn’t need to process every frame?

The Concept: Selective Frame Skipping

Rather than analyzing every video frame, our proposed method introduces frame skipping—selectively dropping frames during detection and relying on prediction to bridge the gap. The goal is to:

  • Reduce computational overhead, and

  • Preserve tracking accuracy.

We combine:

  • YOLOv8 for high-performance object detection

  • ByteTrack for robust multi-object tracking

  • Kalman Filter for motion prediction across skipped frames

This setup enables the system to track objects reliably, even during skipped intervals, by predicting their movement and then correcting their position when the next detection occurs.

Experimental Design

We conducted extensive experiments on four benchmark MOT datasets:

  • MOT16

  • MOT17

  • MOT20

  • KITTI

The experiments tested varying levels of frame skipping (e.g., skipping every 1, 2, or 3 frames) to assess the trade-offs between tracking accuracy and speed.

Results

Our findings revealed a significant boost in tracking speed with only a modest drop in accuracy:

On MOT16, the tracker achieved a significant speed boost—from 11 FPS to 22 FPS—resulting in a 2× increase in speed, with only a ~5% drop in HOTA accuracy. For MOT17, frame skipping improved the speed from 15 FPS to 27 FPS, yielding a 1.8× performance gain, though it introduced a slight increase in identity switches. In the KITTI dataset, known for its relatively sparse scenes, the tracker increased its frame rate from 25 FPS to 45 FPS—an 80% speedup—with minimal impact on accuracy, making it highly effective for automotive applications.

Lastly, on MOT20, which includes extremely crowded urban scenes, speed improved from 22 FPS to 38 FPS (a 70% increase), but at the cost of a moderate trade-off in tracking accuracy, particularly due to higher occlusion rates.

Key Takeaway: Frame skipping is highly effective in sparse scenes (e.g., highways), while more caution is needed in dense urban environments.

Why It Matters

This research highlights the real-world potential of efficient MOT systems that:

  • Run faster on resource-limited hardware

  • Maintain acceptable accuracy in safety-critical tasks

  • Adapt to different scene complexities

It’s especially promising for applications like:

  • Autonomous vehicles

  • Mobile robotics

  • Smart surveillance systems

Conclusion

Our selective frame analysis framework demonstrates that real-time multi-object tracking doesn’t require analysing every frame. Through smart prediction and selective detection, we achieve substantial speedups with minimal loss of accuracy.

This work represents a step toward developing lightweight, deployable tracking systems that operate effectively in the real world, where speed, efficiency, and robustness are all crucial.

📄 Want to learn more?
👉 Read the full paper (SpringerLink)
📧 Contact me if you’re interested in collaborating or deploying this system.