Monday, May 18, 2015

ARGOS visual analytics extraction

The processed outputs from many sensors need to be combined in an early warning system for critical infrastructure. Video-based sensors are usually expensive to process but can bring unique information in the system. For example, a car passing by can be identified by both audio and video sensors, but a car arriving at the scene, switching off and remaining there can only be observed by video.


Video sensors come in different bands of the spectrum: Thermal and IR video sensors provide excellent background suppression performance but usually at low resolution and always suffering at high-temperature environments, where humans can actually be cooler or comparable to the background. For this reason visual video sensors are still very important.

The key challenge in visual video analytics is the extraction of the foreground, i.e. the objects of interest from the cluttered background. This extraction is based on two assumptions:

• Any given image patch is most of the time background, i.e. the foreground existence is a rather rare event. This assumption is easily fulfilled in the use cases that have to do with monitoring of restricted areas, like those found in ARGOS.

• The background changes slowly, i.e. the camera is stationary and the visual appearance of the background changes following the light cycle. This assumption is weaker, since adverse weather conditions can create fast changes in the background.

At the heart of the AIT-ARGOS visual analytics system lies an adaptive foreground segmentation algorithm [1] that can handle slow, but also fast periodic background appearance changes, like those caused by swaying branches. The algorithm is a modification of the Stauffer-Grimson one. Identified foreground objects have colour models with weak spatial information [2] built for them and are consecutively tracked [2] in the sequence of frames. The tracker utilises two visual measurement modalities: foreground and colour information. The tracked objects are analysed for shape and motion, to be classified either as humans or vehicles. As a result, metadata about the objects of interest are collected and transmitted to the rest of the system.

By Dr. Aristodemos Pnevmatikakis, Associate Professor
Signal processing team, IRIS Laboratory, Athens Information Technology


References

1. N. Katsarakis, A. Pnevmatikakis, Z.-H. Tan and R. Prasad, “Improved Gaussian Mixture Models for Adaptive Foreground Segmentation,” Wireless Personal Communications, special issue on “Current trends in information and communication technology,” Springer, May 2015.

2. N. Katsarakis, A. Pnevmatikakis, Z.-H. Tan and R. Prasad, “Combination of Multiple Measurement Cues for Visual Face Tracking,” Wireless Personal Communications, special issue on “Intelligent Infrastructures & Beyond,” vol. 78, issue 3, pp. 1789-1810, Springer, October 2014.

No comments:

Post a Comment