This project tracks a drone in real time using any learning-free approach. The tracker receives a bounding box in the first frame and must predict the box for each next frame using only visual information from previous frames.
General-purpose pre-trained models are allowed, but no drone-specific training or smoothing methods should be used. Any programming language.
The script takes as input:
- a video file
- a bounding box for the first frame
and outputs a new video with predicted bounding boxes for all frames.
Performance goal: about 100ms per frame on a laptop, >50 FPS on Jetson Thor.
Example usage:
uv run track.py –video test_track_dron1.mp4 –bbox 120,80,60,40The sample video: https://drive.google.com/file/d/1HJCNYOLodnfECc_sIJw9kIgjwGkzGrRi/view?usp=sharing
- It's totally fine to double-check with Roman after doing some research and choosing an approach (e.g., "I'm leaning toward X because Y - does that sound reasonable?").
- If anything is unclear (constraints, allowed tools/models, smoothing assumptions, metrics, etc.), please ask.