Trackformer Annotations

Trackformer uses a modified JSON file format for annotating ground truth for multi-object tracking. In this article we will explain this annotation format.

Annotation format

Sample annotation file

{
    "type": "instances",
    "categories": [
        {
            "supercategory": "spine",
            "name": "spine",
            "id": 1
        },
        ...
    ],
    "images": [
        {
            "file_name": "aid052N1D1_tp1_stack2_layer001.png",
            "height": 512,
            "width": 512,
            "id": 0,
            "first_frame_image_id": 0,
            "seq_length": 20,
            "frame_id": 0
        },
        ...
    ],
    "annotations": [
        {
            "id": 0,
            "category_id": 1,
            "image_id": 3,
            "seq": "aid052N1D1_tp1_stack2",
            "track_id": 0
            "bbox": [
                337,
                473,
                23,
                21
            ],
            "area": 483,
            "segmentation": [],
            "ignore": 0,
            "visibility": 1.0,
            "iscrowd": 0,
        },
        ...
    ],
    "sequences": [
        "aid052N1D1_tp1_stack2",
        ...
    ],
    "frame_range": {
        "start": 0.0,
        "end": 1.0
    }
}

JSON annotation file structure

erDiagram
    DATASET {
        string type "Dataset type ('instances' for tracking)"
        sequences list[string] "List of sequence names"
        frame_range object "Object describing frame range"
    }
    CATEGORY }o--|| DATASET : has
    IMAGE }o--|| DATASET : has
    ANNOTATION }o--|| DATASET : has
    ANNOTATION }|--|| IMAGE : has
    ANNOTATION }|--|| CATEGORY : has

Category entities

erDiagram
    CATEGORY {
        int id PK "The category ID starting from 1"
        string supercategory "Name of the supercategory (use the same name as for 'name')"
        string name "The category name"
    }

Image entities

erDiagram
    IMAGE {
        id int PK "Image ID starting from 1"
        file_name string "Image filename (relative to train directory)"
        height int "Image height"
        width int "Image width"
        frame_id int "Frame ID (starting from 0)"
        first_frame_image_id int "ID of the first frame in the sequence"
        seq_length int "Number of frames in the corresponding sequence"
    }

Annotation entities

erDiagram
    ANNOTATION {
        id int PK "Annotation ID (starting from 1)"
        category_id int FK "ID of the category of this bounding box"
        image_id int FK "ID of the image"
        seq string "Sequence name"
        track_id int "ID of track in the sequence (starting from 0)"
        bbox list[int] "Bounding box in XYWH format"
        segmentation list[int] "Segmentation mask polygon"
        area int "Bounding box area (width * height)"
        iscrowd int "Always 0"
        ignore int "Always 0"
        visibility float "Object visibility"
    }

Convert CSV annotations to Trackformer annotations

We have created a script that can transform tracking annotations in a CSV format described below into the Trackformer JSON annotation format. First you need to create the following directory structure in your trackformer directory:

data
|- spine_detection
  |- annotations
    |- test.csv
    |- train.csv
    |- val.csv
  |- test
  |- train
  |- val

Place your images in the train, val and test directories. Then add your annotations in the CSV files in the following format:

id,filename,width,height,class,score,xmin,ymin,xmax,ymax
1,aid052N1D2_tp1_stack1_layer006.png,512,512,spine,1.0,445.0,243.0,458.0,263.0
0,aid052N1D2_tp1_stack1_layer006.png,512,512,spine,1.0,339.0,232.0,353.0,245.0
2,aid052N1D2_tp1_stack1_layer006.png,512,512,spine,1.0,326.0,254.0,344.0,270.0
2,aid052N1D2_tp1_stack1_layer007.png,512,512,spine,1.0,327.0,254.0,345.0,270.0
3,aid052N1D2_tp1_stack1_layer007.png,512,512,spine,1.0,464.0,278.0,488.0,306.0
4,aid052N1D2_tp1_stack1_layer007.png,512,512,spine,1.0,496.0,54.0,512.0,76.0
0,aid052N1D2_tp1_stack1_layer007.png,512,512,spine,1.0,338.0,230.0,356.0,245.0
1,aid052N1D2_tp1_stack1_layer007.png,512,512,spine,1.0,443.0,241.0,460.0,261.0
5,aid052N1D2_tp1_stack1_layer008.png,512,512,spine,1.0,217.0,208.0,236.0,248.0
3,aid052N1D2_tp1_stack1_layer008.png,512,512,spine,1.0,463.0,278.0,485.0,304.0
6,aid052N1D2_tp1_stack1_layer008.png,512,512,spine,1.0,259.0,233.0,270.0,248.0

id is the object identity. It can occur multiple times in a sequence, but only once in a frame.

Now call the following Python script to create JSON annotation files in the annotationdirectory:

$ python src/generate_coco_from_spine.py

After running this command you will find the annotation files train.json, val.json and test.json in the annotationsdirectory.