The available datasets contain two different collections of ground truth videos.
The first one includes 17 videos (10 minutes long each) with a resolution of 320x240 and a 24-bit color depth at a frame rate of 5 fps. These videos was selected by tacking into account the presence of specific features which depict both standard and non-standard conditions (e.g. dynamic background, illumination variations, high water turbidity, very low contrast, crowded scenes and camouflage) for the observed environment. For each video the ground truth is available only for a set of specific keyFrames
identified as those containing the highest number of objects of interest for the current video. The recorded ground truth is available in XML format where the set of keyFrame
are listed at the beginning of the file together with the video class (e.g. "Blurred", "DynamicBkg", "Crowded", etc). For each frame a list of objects is available with trackingId
. Finally, for each object in the current frame, the contour information is recorded as a pair of x
coordinates. An example of this XML file is shown in the images below.
The second dataset contain 10 videos (10 minute long each) with a resolution of both 320x240 and 640x480 at a frame rate of 5 fps. Unlike previously, these videos mainly take into account scene under standard conditions. In this case the ground truth is available in XML format for all frames (3000 on avarage) in the video. More precisely, for each frame, a list of both bounding boxex and countors are recorded together with BBox ID
, as shown in the images below.