6th International Conference on Computer Vision Systems, Vision for Cognitive Systems

Welcome to ICVS 2008

Vision for Cognitive Systems


  12nd May
13rd May
14th May
15th May
Session 7
Registration - Welcome
Session 5
Cross Modal Systems
Welcome Event


(NEW!) Download the ICVS08 final program in pdf format.flyer


ICVS Poster Sessions

  • Poster Session 1 / Chair: Rita Cucchiara

    A Segmentation Approach In Novel Real Time 3D Plant Recognition System
    Dejan Seatovic
    A Tale of Two Object Recognition Methods for Mobile Robots
    Arnau Ramisa,Shrihari Vasudevan,Ramon López de Mántaras,Roland Siegwart
    Automatic Object Detection On Aerial Images Using Local Descriptors and Image Synthesis
    Xavier Perrotton, Marc Sturzel, Michel Roux
    Bottom-Up and Top-Down Object Matching Using Asynchronous Agents and a contrario Principles
    Nicolas Burrus, Thierry Bernard, Jean-Michel Jolion
    CEDD: Color and Edge Directivity Descriptor. A Compact Descriptor for Image Indexing and Retrieval
    Savvas Chatzichristofis, Yiannis Boutalis
    Face Recognition Using a Color PCA Framework
    Mani Thomas, Senthil Kumar, Chandra Kambhamettu
    Multiscale Laplacian Operators for Feature Extraction on Irregularly Distributed 3-D Range Data
    Shanmugalingam Suganthan, Sonya Coleman, Bryan Scotney
    Online Learning for Bootstrapping of Object Recognition and Localization in a Biologically Motivated Architecture
    Heiko Wersing, Stephan Kirstein, Bernd Schneiders, Ute Bauer-Wersing, Edgar Koerner
    Ranking Corner Points by the Angular Difference between Dominant Edges
    Rafael Lemuz, Miguel Arias
    Scene Classification Based on Multi-resolution Orientation Histogram of Gabor Features
    Kazuhiro Hotta
    Skeletonization Based on Metrical Neighborhood Sequences
    Attila Fazekas, Kálmán Palágyi, Gábor Németh, György Kovács
    Vein Segmentation in Infrared Images Using Compound Enhancing and Crisp Clustering
    Marios Vlachos, Evangelos Dermatas
    Weighted Dissociated Dipoles: An Extended Visual Feature Set
    Xavier Baró, Jordi Vitrià
    Rk-means: A k-means Based Clustering Algorithm
    Domenico Daniele Bloisi, Luca Iocchi
    Smoke Detection in Video Surveillance: a MoG Model in the Wavelet Domain
    Simone Calderara, Paolo Piccinini, Rita Cucchiara

  • Poster Session 2 / Chair: Petia Radeva

    Covert Attention with a Spiking Neural Network
    Sylvain Chevallier, Philippe Tarroux
    Enhancing Robustness of a Saliency-based Attention System for Driver Assistance
    Thomas Michalke, Jannik Fritsch, Christian Goerick
    Salient Region Detection and Segmentation
    Radhakrishna Achanta, Francisco Estrada, Patricia Wils, Sabine Süsstrunk
    A Novel Feature Selection based Semi-Supervised method for Image Classification
    Muhammad Tahir, Edward Smith, Praminda Caleb-Solly
    Increasing Classification Robustness with Adaptive Features
    Christian Eitzinger, Manfred Gmainer, Wolfgang Heidl, Edwin Lughofer
    Learning Contextual Variations for Video Segmentation
    Vincent Martin, Monique Thonnat
    Learning to Detect Aircrafts at Low Resolutions
    Stavros Petridis, Christopher Geyer, Sanjiv Singh
    Learning Visual Quality Inspection from Multiple Humans using Ensembles of Classifiers
    Davy Sannen, Hendrik Van Brussel, Marnix Nuttin
    Sub-class Error-Correcting Output Codes
    Sergio Escalera, Oriol Pujol, Petia Radeva
    Automatic Initialization for Facial Analysis in Interactive Robotics
    Ahmad Rabie, Christian Lang, Marc Hahnheide, Modesto Castrillon Santana, Gerhard Sagerer
    Face Recognition across Pose Using View Based Active Appearance Models (VBAAMs) on CMU Multi-PIE Dataset
    Jingu Heo, Marios Savvides
    Spatio-temporal 3D Pose Estimation of Objects in Stereo Images
    Bjoern Barrois, Christian Woehler
    An On-Line Interactive Self-Adaptive Image Classification Framework
    Davy Sannen, Marnix Nuttin, Praminda Caleb-Solly, Jim Smith, Muhammad Tahir
    Communication-Aware Face Detection Using Noc Architecture
    Hung-Chih Lai, Marios Savvides, Tsuhan Chen
    Open Source Multiview Reconstruction and Evaluation
    Keir Mierle, James MacLean

Session 1 - Cognitive Vision / Chair: Bärbel Mertsching
Tuesday, 13th May , 17:30-19:10

  • Visual search in static and dynamic scenes using fine-grain top-down visual attention

    Artificial visual attention is one of the key methodologies inspired from nature that can lead to robust and efficient visual search by machine vision systems. A novel approach is proposed for modeling of top-down visual attention in which separate saliency maps for the two attention pathways are suggested. The maps for the bottom-up pathway are built using unbiased rarity criteria while the top-down maps are created using fine-grain feature similarity with the search target as suggested by the literature on natural vision. The model has shown robustness and efficiency during experiments on visual search using natural and artificial visual input under static as well as dynamic scenarios.

    Muhammad Zaheer Aziz, Bärbel Mertsching
  • Integration of Visual and Shape Attributes for Object Action Complexes

    Our work is oriented towards the idea of developing cognitive capabilities in artificial systems through Object Action Complexes (OACs) [10]. The theory comes up with the claim that objects and actions are inseparably intertwined. Categories of objects are not built by visual appearance only, as very common in computer vision, but by the actions an agent can perform and by attributes perceivable. The core of the OAC concept is constituting objects from a set of attributes, which can be manifold in type (e.g. color, shape, mass, material), to actions. This twofold of attributes and actions provides the base for categories. The work presented here is embedded in the development of an extensible system for providing and evolving attributes, beginning with attributes extractable from visual data.

    Kai Huebner, Mårten Björkman, Babak Rasolzadeh, Martina Schmidt, Danica Kragic
  • 3D Action Recognition and Long-term Prediction of Human Motion

    In this contribution we introduce a novel method for 3D trajectory based recognition of working actions and long-term motion prediction. The 3D pose of the human hand-forearm limb is tracked over time with a multi-hypothesis Kalman Filter framework using the Multiocular Contracting Curve Density algorithm (MOCCD) as a 3D pose estimation method. A novel trajectory classification approach is introduced which relies on the Levenshtein Distance on Trajectories (LDT) as a measure for the similarity between trajectories. Experimental investigations are performed on 10 real-world test sequences acquired from different viewpoints in a working environment. The system performs the simultaneous recognition of a working action and a cognitive long-term motion prediction. Trajectory recognition rates around 90% are achieved, requiring only a small number of training sequences. The proposed prediction approach yields significantly more reliable results than a Kalman Filter based reference approach.

    Markus Hahn, Christian Woehler, Lars Krueger,
  • Tracking of Human Hands and Faces through Probabilistic Fusion of Multiple Visual Cues

    This paper describes a novel approach for real time segmentation and tracking of human hands and faces on image sequences. The proposed method builds on our previous research on color-based, skin-color tracking, eliminating inherent limitations of the existing approach such as its inability to distinguish between human hands, faces and other skin-colored regions in the background. To overcome these limitations, the proposed approach allows the utilization of additional information cues including motion information given by means of a background substraction algorithm, and top-down information regarding the formed image segments such as their spatial location, velocity and shape. All information cues are combined under a probabilistic framework which furnishes the proposed approach with the ability to cope with uncertainty due to noise. The proposed approach runs in real time on a standard, personal computer. Experimental results presented in this paper, confirm the effectiveness of the proposed methodology and its advantages over previous approaches.

    Haris Baltzakis, Antonis Argyros, Manolis Lourakis, Panos Trahanias

Session 2 - Monitor and Surveilance / Chair: James Ferryman
Wednesday, 14th May , 09:00-10:15

  • The SAFEE On-Board Threat Detection System

    Under the framework of the European Union Funded SAFEE project, this paper gives an overview of a novel monitoring and scene analysis system developed for use onboard aircraft in spatially constrained environments. The techniques discussed herein aim to warn on-board crew about pre-determined indicators of threat intent (such as running or shouting in the cabin), as elicited from industry and security experts. The subject matter experts belive that activities such as these are strong indicators of the beginnings of undesirable chains of events, which should not be allowed to develop aboard aircraft. These events may lead to situations involving unruly passengers or be indicative of the precursors to terrorist threats. With a state of the art tracking system using homography intersections of motion images, and probability based Petri nets for scene understanding, the SAFEE behavioural analysis system automatically assesses the output from multiple intelligent sensors, and creates recommendations that are presented to the crew using an integrated airborn user interface. Evaluation of the system is conducted within a full size aircraft mockup, and experimental results are presented, showing that the SAFEE system is well suited to monitoring people in confined environments, and that meaningful and instructive output regarding human intentions can be derived from the sensor network within the cabin.

    Nicholas Carter, James Ferryman
  • Region of Interest Generation in Dynamic Environments Using Local Entropy Fields

    This paper presents a novel technique to generate regions of interest in image sequences containing independent motions. The technique uses a novel motion segmentation method to detect independent relative motions using a local entropies field. Local entropy values are computed for each vector of the optical flow with respect to their neighborhood. These values are used as input vectors for a two state Markov Random Field that is used to discriminate the boundaries of the clusters. The idea is to exploit the local entropy values as highly informative cues about the amount of information contained in the vector's neighborhood. High values represent significative motion differences, low values express uniform movements. After a graph cutting labeling, the cluster motion information is used to create a multiple hypothesis prediction for the following frame based on nearest neighbor greedy data association technique. The algorithm can be used as an hypothesis generator on area of interest because it already gives results within two frames. In order to show the validity of the proposed algorithm experiments have been performed in standard datasets and real-world outdoor environment showing promising results.

    Luciano Spinello, Roland Siegwart
  • Real-time Face Tracking for Attention Aware Adaptive Games

    This paper presents a real time face tracking and head pose estimation system which is included in an attention aware game framework. This fast tracking system enables the detection of the player’s attentional state using a simple attention model. This state is then used to adapt the game unfolding in order to enhance user’s experience (in the case of adventure game) and improve the game attentional attractiveness (in the case of pedagogical game).

    Matthieu Perreira Da Silva, Vincent Courboulay, Armelle Prigent, Pascal Estraillier

Session 3 - Computer Vision Architectures / Chairs: Marios Savvides, Markus Vincze
Wednesday, 14th May , 11:00-12:40

  • Feature Extraction and Classification by Genetic Programming

    This paper explores the use of genetic programming for constructing vision systems. A two-stage approach is used, with separate evolution of the feature extraction and classification stages. The strategy taken for the classifier is to evolve a set of partial solutions, each of which works for a single class. It is found that this approach is significantly faster than conventional genetic programming, and frequently results in a better classifier. The effectiveness of the approach is explored on three image classification problems.

    Olly Oechsle, Adrian Clark
  • GPU-based Multigrid: Real-Time Performance in High Resolution Nonlinear Image Processing

    Multigrid methods provide fast solvers for a wide variety of problems encountered in computer vision. Recent graphics hardware is ideally suited for the implementation of such methods, but this potential has not yet been fully realized. Typically, work in that area focuses on linear systems only, or on implementation of numerical solvers that are not as efficient as multigrid methods. We demonstrate that nonlinear multigrid methods can be used to great effect on modern graphics hardware. Specifically, we implement two applications: a nonlinear denoising filter and a solver for variational optical flow. We show that performing these computations on graphics hardware is between one and two orders of magnitude faster than comparable CPU-based implementations.

    Harald Grossauer, Peter Thoman
  • Attention Modulation using Short- and Long-term Knowledge

    A fast and reliable visual search is crucial for representing visual scenes. Here the modulation of bottom-up attention plays an important role. Often the knowledge about target features is used to bias the bottom-up pathway. In this paper we propose a system which does not only make use of knowledge about the target features, but also uses already acquired knowledge about objects in the current scene to speed up the visual search. Main ingredients are a relational short term memory in connection with a semantic relational long term memory and an adjustable bottom-up saliency. The focus of this work is to investigate mechanisms to use the memory of the system efficiently. We show a proof-of-concept implementation which is working in a real-world environment and performs visual search tasks. It becomes clear that using the relational semantic memory in combination with spatial and feature modulation of the bottom-up path is beneficial for speeding up such search tasks.

    Sven Rebhan, Florian Roehrbein, Julian Eggert, Edgar Koerner
  • PCA Based 3D shape Reconstruction of Human Foot Using Multiple Viewpoint Cameras

    This article describes a multiple camera based method to reconstruct a 3D shape of a human foot. From a feet database, an initial 3D model of the foot represented by a cloud of points is built. In addition, some shape parameters, which characterize any foot at more than 92%, are defined by using Principal Component Analysis. Then, the 3D model is adapted to the foot of interest captured in multiple images based on "active shape models" methods by applying some constraints (edge points' distance, color variance for example). We insist here on the experiment part where we demonstrate the efficiency of the proposed method on a plastic foot model, and on real human feet with various shapes. We compare different ways to texture the foot, and conclude that using projectors can improve drastically the reconstruction's accuracy. Based on experimental results, we finally propose some improvements regarding to the system integration.

    Edmee Amsutz, Tomoaki Teshima, Makoto Kimura, Masaaki Mochimaru, Hideo Saito

Session 4 - Calibration and Registration / Chair: Fiora Pirri
Wednesday, 14th May , 14:30-15:45

  • A System for Geometrically Constrained Single View Reconstruction

    This paper presents an overview of a system for recovering 3D models corresponding to scenes for which only a single perspective image is available. The system encompasses a versatile set of semi-automatic single view reconstruction techniques and couples them with limited interactive user input in order to reconstruct textured 3D graphical models corresponding to the imaged input scenes. Such 3D models can serve as the digital content for supporting interactive multimedia and virtual reality applications. Furthermore, they can support novel applications in areas such as video games, 3D photography, visual metrology, computer-assisted study of art and forensics, etc.

    Manolis Lourakis
  • Monocular Omnidirectional Visual Odometry for Outdoor Ground Vehicles

    In this paper, we describe a real-time algorithm for computing the ego-motion of a vehicle relative to the road. The algorithm uses as only input images provided by a single omnidirectional camera mounted on the roof of the vehicle. The front ends of the system are two different trackers. The first one is a homography-based tracker that detects and matches robust scale invariant features that most likely belong to the ground plane. The second one uses an appearance based approach and gives high resolution estimates of the rotation of the vehicle. This 2D pose estimation method has been successfully applied to videos from an automotive platform. We give an example of camera trajectory estimated purely from omnidirectional images over a distance of 450 meters. For performance evaluation, the estimated path is superimposed onto a Google Earth image of the same test environment. In the end, we use image mosaicing to obtain a textured 2D reconstruction of the estimated path.

    Davide Scaramuzza, Roland Siegwart
  • Eyes and Cameras Calibration for 3d World Gaze Detection

    Gaze tracking is a promising research area with application that goes from advanced human machine interaction systems, to human attention processes studying, modeling and use in cognitive vision fields. In this paper we propose a novel approach for the calibration and use of a head mounted dual eye gaze tracker. Key aspects are a robust pupil tracking algorithm based on prediction from infrared LED purkinje image position, and a new gaze localization method based on 3D lines of sight computation, and trifocal geometry considerations.

    Stefano Marra, Fiora Pirri

Session 5 - Cross Modal Systems / Chair: Monique Thonnat
Wednesday, 14th May , 18:30-19:45

  • Object Category Detection using Audio-visual Cues

    Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection.

    Jie Luo, Barbara Caputo, Alon Zweig, Joerg-Hendrik Bach, Joern Anemueller
  • Multimodal Interaction Abilities for a Robot Companion

    Among the cognitive abilities a robot companion must be endowed with, human perception and speech understanding are both fundamental in the context of multimodal human-robot interaction. The two components we have developed and integrated on a mobile robot are presented in this paper. First, we detail an interactively distributed multiple object tracker dedicated to two-handed gestures and head location in 3D. The on-board speech understanding system is then depicted. For both components, associated in- and off-line evaluations from data acquired by the robot highlight their relevance. Implementation and preliminary experiments on a household robot companion are then demonstrated. The latter illustrate how vision can assist speech by specifying location references, object/person IDs in verbal statements in order to interpret natural deictic commands given by human beings. Extensions of our work are finally discussed.

    Brice Burger, Isabelle Feranne, Frederic Lerasle

Session 6 - Object Recognition and Tracking / Chair: Bernd Neumann
Thursday, 15th May , 09:00-10:40

  • Diagnostic System for Intestinal Motility Disfunctions Using Video Capsule Endoscopy

    Wireless Video Capsule Endoscopy is a clinical technique consisting of the analysis of images from the intestine which are provided by an ingestible device with a camera attached to it. In this paper we propose an automatic system to diagnose severe intestinal motility disfunctions using the video endoscopy data. The system is based on the application of computer vision techniques within a machine learning framework in order to obtain the characterization of diverse motility events from video sequences. We present experimental results that demonstrate the effectiveness of the proposed system and compare them with the ground-truth provided by the gastroenterologists.

    Santi Seguí, Laura Igual, Fernando Vilariño, Petia Radeva, Carolina Malagelada
  • Detecting and Recognizing Abandoned Objects in Crowded Environments

    In this paper we present a framework for detecting and recognizing abandoned objects in crowded environments. The two main components of the framework include background change detection and object recognition. Moving blocks are detected using dynamic thresholding of spatiotemporal texture changes. The background change detection is based on analyzing wavelet transform coefficients of non-overlapping and non-moving 3D texture blocks. Detected changed background becomes the region of interest which is scanned to recognize various objects under surveillance such as abandoned luggage. The object recognition is based on model histogram ratios of image gradient magnitude patches. Supervised learning of the objects is performed by support vector machine. Experimental results are demonstrated using various benchmark video sequences (PETS, CAVIAR, i-Lids) and an object category dataset (CalTech256).

    Roland Miezianko, Dragoljub Pokrajac
  • An Approach for Tracking the 3D Object Pose Using Two Object Points

    In this paper, a novel and simple approach for tracking the object pose, position and orientation, using two object points when the object is rotated about one of the axes of the reference coordinate system is presented. The object rotation angle can be tracked up to a range of 180° for object rotations around each axis of the reference coordinate system from an initial object situation. The considered two object points are arbitrary points of the object which can be uniquely identified in stereo images. Since the approach requires only two object points, it is advantageous for the robotic applications where very few feature points can be obtained because of lack pattern information on the objects. The paper also presents the results for the pose estimation of a meal tray in a rehabilitation robotics environment

    Sai Krishna Vuppala, Axel Gräser
  • Adaptive Motion-based Gesture Recognition Interface for Mobile Phones

    In this paper, we introduce a new vision based interaction technique for mobile phones. The user operates the interface by simply moving a finger in front of a camera. During these movements the finger is tracked using a method that embeds the Kalman filter and Expectation Maximization (EM) algorithms. Finger movements are interpreted as gestures using Hidden Markov Models (HMMs). This involves first creating a generic model of the gesture and then utilizing unsupervised Maximum a Posteriori (MAP) adaptation to improve the recognition rate for a specific user. Experiments conducted on a recognition task involving simple control commands clearly demonstrate the performance of our approach.

    Jari Hannuksela, Mark Barnard, Pekka Sangi, Janne Heikkilä

Session 7 - Learning / Chair: Antonios Argyros
Thursday, 15th May , 11:20-12:35

  • A System that Learns to Tag Videos by Watching Youtube

    We present a system that automatically tags videos, i.e. detects high-level semantic concepts like objects or actions in them. To do so, our system does not – in contrast to previous work – rely on datasets manually annotated for research purposes. Instead, we propose to use videos from online portals like youtube.com as a novel source of training data, where tags provided by users during upload serve as ground truth annotations. This allows our system to learn autonomously by automatically downloading its training set. The key contribution of this work is a number of large-scale quantitative experiments on real-world online videos, in which we investigate the influence of the individual system components, and how well our tagger generalizes to novel content. Our key results are: (1) Fair tagging results can be obtained by a late fusion of several kinds of visual features. (2) Using more than one keyframe per shot is helpful. (3) To generalize to different video content (e.g., another video portal), the system can be adapted by expanding its training set.

    Adrian Ulges, Christian Schulze, Daniel Keysers, Thomas M. Breuel
  • Geo-located Image Grouping Using Latent Descriptions

    Image categorization is undoubtedly one of the most challenging problems faced in Computer Vision. The scientific literature is plenty of methods dedicated to specific classes of images; further, commercial systems are also going to be advertised in the market. Nowadays, additional data can also be associated to the images, enriching its semantic interpretation beyond the pure appearance. This is the case of geo-location data, that contain information about the geographical place where an image has been captured. This data allow, if not require, a different management of the images, for instance, to the purpose of easy retrieval and visualization from a geo-referenced image repository. This paper constitutes a first step in this sense, presenting a method for geo-referenced image categorization. The solution presented here places in the wide literature on the statistical latent descriptions, where the probabilistic Latent Semantic Analysis (pLSA) is one of the most known representative. In particular, we extend the pLSA paradigm, introducing a latent variable modelling the geographical area in which an image has been captured. In this way, we are able to describe the entire image dataset grouping effectively proximal images with similar appearance. Experiments on categorization have been carried out, employing a wellknown geographical image repository: results are actually very promising, opening new interesting challenges and applications in this research field.

    Marco Cristani, Vittorio Murino
  • Functional Object Class Detection Based on Learned Affordance Cues

    Current approaches to visual object class detection mainly focus on the recognition of abstract object categories, such as cars, motorbikes, mugs and bottles. Although these approaches have demonstrated impressive performance in terms of recognition, their restriction to abstract categories seems artificial and inadequate in the context of embodied, cognitive agents. Here, distinguishing objects according to functional aspects based on object affordances is vital for a meaningful human-machine interaction. In this paper, we propose a complete system for the detection of functional object classes, based on a representation of visually distinct hints on object affordances (affordance cues). It spans the complete cycle from tutor-driven acquisition of affordance cues, one-shot learning of corresponding object models, and detecting novel instances of functional object classes in real images.

    Michael Stark, Philipp Lies, Michael Zillich, Bernt Schiele