CVAT Setup and Usage Guide¶

This guide provides comprehensive instructions for setting up and using CVAT (Computer Vision Annotation Tool) for video annotation tasks in the KABR tools pipeline.

Overview¶

We chose to set up a self-hosted instance of CVAT on a remote server. We chose this option so we could manage our data more easily, and every person on our team could access the server remotely to complete their annotations.

We set up our instance of CVAT on a server with a AMD EPYC 7513 32-Core Processor, running Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-190-generic x86_64).

Alternative Installation Options

CVAT may also be accessed online using CVAT Cloud or installed locally on Microsoft, Apple, and Linux computers. See the CVAT Getting Started documentation for instructions.

Setting up CVAT¶

Please have a look at the CVAT Installation Guide for instructions to set up the tool using Docker on a computer running Ubuntu OS.

See CVAT's Manuals for further information.

Detailed Installation Instructions¶

Step 1: Retrieve the CVAT source code

As of writing, the latest release of CVAT is v2.17.0. CVAT is updated frequently with new features and bug fixes. You may retrieve this (or any) specific version using wget:

CVAT_VERSION="v2.17.0" && CVAT_V="${CVAT_VERSION#v}"
wget https://github.com/cvat-ai/cvat/archive/refs/tags/${CVAT_VERSION}.zip && unzip ${CVAT_VERSION}.zip && mv cvat-${CVAT_V} cvat && rm ${CVAT_VERSION}.zip && cd cvat

Step 2: Set the CVAT_HOST environment variable

One-time Setup

This should only be done once. Skip this if upgrading.

echo "export CVAT_HOST=localhost" >> ~/.bashrc
source ~/.bashrc

Step 3: Optionally mount a host volume

To access data within CVAT from the host machine (rather than needing to upload through the browser), create the file docker-compose.override.yml in the same directory as docker-compose.yml. Add the following to the docker-compose.override.yml file:

services:
  cvat_server:
    volumes:
      - cvat_share:/home/django/share:ro
  cvat_worker_import:
    volumes:
      - cvat_share:/home/django/share:ro
  cvat_worker_export:
    volumes:
      - cvat_share:/home/django/share:ro
  cvat_worker_annotation:
    volumes:
      - cvat_share:/home/django/share:ro

volumes:
  cvat_share:
    driver_opts:
      type: none
      device: /abs/path/to/host/data/directory # Edit this line
      o: bind

Step 4: Build CVAT

Host Volume

Exclude the -f docker-compose.override.yml below if not mounting a host volume.

docker compose -f docker-compose.yml -f docker-compose.override.yml up --build -d

Accessing CVAT¶

After CVAT was set up on our remote server, users ran the following commands in their terminal to use the CVAT web interface:

ssh username@servername -N -L 8080:localhost:8080

Next, users navigated to Chrome, and entered http://localhost:8080/ into their browser to open the CVAT GUI.

Creating Tasks in CVAT¶

Open CVAT web GUI by navigating to http://localhost:8080/ in Chrome.
Login to CVAT.
Navigate to Projects and click on your project.
Click on 'Create multi tasks'
Select files - Navigate to 'Connected file share' and select the files you want to upload to create tasks.

Processing Time

This may take some time if the files are large.
Clean up - Once a task is created for the video, you may delete the original video file from the server since CVAT will save the data in a different location.

Detections in CVAT¶

Manual Bounding Box Detection¶

Access your task - Once you login to CVAT, navigate to the Projects tab where you should see your assigned tasks. Click on the task to open it.
Assign and open task - Choose a task to work and assign it to yourself under 'Assignee'. Click on 'Job #' to open task.
Optional: Select region of interest - Select a region of interest to zoom in on the scene.
Set up detection tool - Click on the rectangle and select the correct species. Check the filename if you aren't sure. Make sure to select "Track".
Draw bounding boxes - Draw a box around each animal in view.
Track through frames - Use the >> button, or press 'V' on your keyboard to advance 10 frames. Update the bounding boxes by dragging them to the correct positions as required.

Save Frequently

Make sure to save frequently! You can stop, save, and start working on the video later at any time.
Complete annotation - Continue until all the frames have been annotated. Save your results.
Mark as complete - Once you are done, select "Validation" so the project lead knows the task is complete.

Behavior Labeling in CVAT¶

Access your task - Once you login to CVAT, navigate to the Projects tab where you should see your assigned tasks. Click on the task to open it.
Set up point annotation - Click on "Draw new points" (1) then select the appropriate animal (2). Next, select "Track".
Annotate behavior - Place your dot anywhere on the screen and select the appropriate behavior. Here, the zebra is "Head Up".
Continue annotation - Continue annotating the video, updating the object label as the animal changes behavior. See the Updated Ethogram for explanations of the different behavior categories. Pay particular attention to the caveats on any "out of sight" sub-categories.

Save Frequently

Make sure to save current changes frequently. You can annotate part of a video and come back to complete it later.
Complete the job - Once you are done annotating, save the annotations for a final time, change the job status to "completed" and then select "Finish the job."

Downloading Annotations¶

You may download annotations for individual tasks, or the entire project.

Access export options - Click on the 3 vertical dots in the lower right corner of the project or task.
Select export type - Select "Export dataset" to export a project, or "Export task dataset" to export a task.
Configure export - Export in "CVAT for video 1.1" format, deselect the option to "Save images", and click "Ok".
Download dataset - Once the export request is complete, navigate to the "Requests" tab and download the dataset.

Using Your Annotations¶

For detections: You may use your detections to create mini-scenes using tracks_extractor. These mini-scenes can be automatically annotated with behavior labels using the KABR model.

For behaviors: You may use your annotated behaviors to fine-tune a behavior recognition model, such as X3D, or create time-budgets with the time budget notebook.