Postdoctoral research in the field of Autonomous Vision from UAVs
Democritus University of Thrace (DUTH), School of Engineering
Hello and wellcome to my site. My name is Loukas Bampis and I am currently a Postdoctoral fellow in the field of Robotics Vision at the Democritus University of Thrace. Guided by my advisor, Professor Antonios Gasteratos, I am focusing on addressing real-time Place Recognition and Localization tasks on mobile robotic systems.
Democritus University of Thrace (DUTH), School of Engineering
Democritus University of Thrace (DUTH), School of Engineering
Democritus University of Thrace (DUTH), School of Engineering
Nowadays, one of the main routes the Computer Vision research community has been focused on is the development of existing and computationally expensive algorithms on mobile devices, such as mobile phones, tablets and low power consuming processing boards. These devices are well settled in our lives and provide cheap computational power and sensors. Lately, in order to extend the computational capabilities, the production of devices with parallel processing units is preferred, rather than the increment of the processing speeds. Two of the most popular hardware solutions capable for parallelization are the SIMD-CPUs and the GP-GPUs.
Furthermore, together with the growth of the parallelization capabilities of the available processing units, the Estimation Theory can be well applied in a plethora of Computer Vision algorithms, such as Simultaneous Localization and Mapping (SLAM), image description, etc. to enhance their accuracy while reducing their computational cost.
Those are the two main axes of my research activities, which have been applied in both my scientific publications and during my research scholarship.
The Laboratory of Robotics and Automation performs and promotes research in application problems that arise in the area of robotics, computer vision, navigation, multimodal integration, image analysis and understanding, visual surveillance, intelligent sensory networks, sensory fusion, and other relative topics.
The lab utilizes state of the art tools to expand the scientific and technological front in the respective research areas, namely: artificial vision (including cognitive and robot Vision); intelligent systems (such as artificial neural networks); pattern recognition.
During my scholarship with Professor Stergios Roumeliotis (member of my Ph.D. advisor committee) at his lab in University of Minneapolis (MarsLab), I contributed to his research activities by implementing a variety of accelerated versions of Computer Vision and Estimation Theory algorithms on GP-GPU and SIMD-CPU systems. Additionally, I had the chance to study some of the most interesting subjects of my research area like Robotics, Probability, Linear Algebra and Estimation theory.
Moreover, I participated in Google’s Project Tango (currently incorporated into ARCore), which achieved to create a mobile device capable of localizing itself and estimating the structure of its surrounding world in real time for indoor environments where GPS measurements are not available. Instead, this was achieved by using sensors like RGB/RGBD cameras, accelerometers, gyroscopes and many others low-cost sensors, all of them included in one device.
I have worked on the HCUAV project, run by the research committee of the Democritus University of Thrace, which is funded by the European Commission. This project developed a surveillance Medium-Altitude Long-Endurance UAV (MALE UAV) capable of flying autonomously for long distances while creating an orthorectified and georeferenced map of the world underneath.
My task in this project included the creation of the aforementioned 3D map and the detection of ambiguous events. In order to carry out those requirements SLAM and Deep Learning techniques need to be used, while having in mind a real-time implementation.
11SYNERGASIA_9_629, National Strategic Reference Framework (NSRF) – Competitiveness & Entrepreneurship – SYNERGASIA 2011
I have also worked on the AVERT project. This project developed a security system capable of removing any blocking or suspect vehicle from vulnerable positions such as enclosed infrastructure spaces, tunnels, low bridges as well as under-building and underground car parks. Vehicles can be removed from confined spaces with delicate handling, swiftly, and in any direction to a safer disposal point to reduce or eliminate collateral damage to infrastructure and personnel. Remote operation, self-powered and onboard sensors provide a new capability that can operate alongside existing technologies, thereby enhancing bomb disposal response, speed and safety.
My task in this project included the development of place recognition algorithms in order for the system to be able to recover its position in localization failure scenarios.
FP7-SEC-2011-1-285092, European Commission – Information & Communication Technologies (ICT).
I was working on the NOAH project for the European Space Agency (ESA). The NOAH activity improved the state-of-the-art of novelty detection techniques specifically for Mars-like environments.
My effort to this project refered to the investigation of novel CNN-based techniques for segmenting and classifying regions of interest.
T713-503MM, European Space Agency
I am also working on the development of the YPOPSEI project, run by the research committee of the Athena Research and Innovation Centre, which is funded by the European Union and Greece. YPOPSEI refers to a user application with which a museum visitor can obtain information regarding an exhibition that he is interested in by simply capturing it with his/her mobile device.
My task in this project includes the development of a recognition architecture that combines the cognitive capabilities of Convolutional Neural Networks with the Bags of Visual Words model.
MIS code:5006383, Human Resources Development, Education and Lifelong Learning
I am currently working on the MPU project, run by the research committee of the Democritus University of Thrace, which is funded by the European Union and Greece. The goal of MPU is to develop a commercial multipurpose UAV with a small form-factor, which will be able to long-range autonomous missions.
My work on this project is to research methods for autonomous obstacle avoidance, as well as coordinate their development and implementation in the UAV's low power processing unit.
You can find out more about my research by referring to my scientific papers.
In this paper, a novel pipeline for loop-closure detection is proposed. We base our work on a bag of binary feature words and we produce a description vector capable of characterizing a physical scene as a whole. Instead of relying on single camera measurements, the robot’s trajectory is dynamically segmented into image sequences according to its content. The visual word occurrences from each sequence are then combined to create sequence-visual-word-vectors and provide additional information to the matching functionality. In this way, scenes with considerable visual differences are firstly discarded, while the respective image-to-image associations are provided subsequently. With the purpose of further enhancing the system’s performance, a novel temporal consistency filter (trained offline) is also introduced to advance matches that persist over time. Evaluation results prove that the presented method compares favorably with other state-of-the-art techniques, while our algorithm is tested on a tablet device, verifying the computational efficiency of the approach.
In the field of loop closure detection, the most conventional approach is based on the Bag-of-Visual-Words (BoVW) image representation. Although well-established, this model rejects the spatial information regarding the local feature points' layout and performs the associations based only on their similarities. In this paper we propose a novel BoVW-based technique which additionally incorporates the operational environment's structure into the description, treating bunches of visual words with similar optical flow measurements as single similarity votes. The presented experimental results prove that our method offers superior loop closure detection accuracy while still ensuring real-time performance, even in the case of a low power consuming mobile device.
In this paper we propose a novel technique for detecting loop closures on a trajectory by matching sequences of images instead of single instances. We build upon well established techniques for creating a bag of visual words with a tree structure and we introduce a significant novelty by extending these notions to describe the visual information of entire regions using Visual-Word-Vectors. The fact that the proposed approach does not rely on a single image to recognize a site allows for a more robust place recognition, and consequently loop closure detection, while reduces the computational complexity for long trajectory cases. We present evaluation results for multiple publicly available indoor and outdoor datasets using Precision-Recall curves, which reveal that our method outperforms other state of the art algorithms.
In this paper, a novel visual Place Recognition (vPR) approach is evaluated based on a visual vocabulary of the Color and Edge Directivity Descriptor (CEDD) in order to address the loop closure detection task. Even though CEDD was initially designed so as to globally describe the color and texture information of an input image addressing Image Indexing and Retrieval tasks, its scalability on characterizing single feature points has already been proven. Thus, instead of using CEDD as a global descriptor, we adopt a bottom-up approach and utilize its localized version, Local Color And Texture dEscriptor (LoCATe), as an input to a state-of-the-art visual Place Recognition technique based on Visual Word Vectors. Also, we employ a parallel execution pipeline based on a previous work of ours using the well-established GPGPU computing. Our experiments show that the usage of CEDD as a local descriptor produces high accuracy vPR results, while the parallelization employed allows for a real-time implementation even in the case of a low-cost mobile device.
This paper introduces a new super-resolution algorithm based on machine learning along with a novel hybrid implementation for next generation mobile devices. The proposed super-resolution algorithm entails a two dimensional polynomial regression method using only the input image properties for the learning task. Model selection is applied for defining the optimal degree of polynomial by adopting regularization capability in order to avoid overfitting. Although it is widely believed that machine learning algorithms are not appropriate for real-time implementation, the paper in hand proves that there are indeed specific hypothesis representations that are able to be integrated into real-time mobile applications. With aim to achieve this goal, the increasing GPU employment in modern mobile devices is exploited. More precisely, by utilizing the mobile GPU as a co-processor in a hybrid pipelined implementation, significant performance speedup along with superior quantitative results can be achieved.
The detection of ambiguous objects, although challenging, is of great importance for any surveillance system and especially for an Unmanned Aerial Vehicle (UAV), where the measurements are affected by the great observing distance. Wildfire outbursts and illegal migration are only some of the examples that such a system should distinguish and report to the appropriate authorities. More specifically, Southern European countries commonly suffer from those problems due to the mountainous terrain and thick forests that contain. UAVs like the "Hellenic Civil Unmanned Air Vehicle - HCUAV" project have been designed in order to address high altitude detection tasks and patrol the borders and woodlands for any ambiguous activity. In this paper, a moment-based blob detection approach is proposed that utilizes the thermal footprint obtained from single infrared (IR) images and distinguishes human or fire sized and shaped figures. Our method is specifically designed so as to be appropriately integrated into hardware acceleration devices, such as GPGPUs and FPGAs, and takes full advantage of their respective parallelization capabilities succeeding real-time performances and energy efficiency. The timing evaluation of the proposed hardware accelerated algorithm's adaptations shows an achieved speedup of up to 7 times, as compared to a highly optimized CPU-only based version.
A visual anomaly detection algorithm typically incorporates two modules, namely the Region Of Interest (ROI) detection and the ROI characterization. During the first module, salient image regions are identified containing semantically distinctive entities, while the second one refers to the characterization of the detected ROIs as "novel" (unable of being classified) or "known" (capable of being classified into one of the pre-trained classes). In this paper, we are interested in investigating a unified solution that treats the ROI detection and characterization functionalities as a single task using the classification properties of the Convolutional Neural Networks (CNNs).
This paper presents a georeferenced map extraction method, for Medium-Altitude Long-Endurance UAVs. The adopted technique of projecting world points to an image plane is a perfect candidate for a GPU implementation. The achieved high frame rate leads to a plethora of measurements even in the case of a low-power mobile processing unit. These measurements can later be combined in order to refine the output and create a more accurate result.
The continuous increase of illegal migration flows to southern European countries has been recently in the spotlight of European Union due to numerous deadly incidents. Another common issue that the aforementioned countries share is the Mediterranean wildfires which are becoming more frequent due to the warming climate and increasing magnitudes of droughts. Different ground early warning systems have been funded and developed across these countries separately for these incidents, however they have been proved insufficient mainly because of the limited surveyed areas and challenging Mediterranean shoreline and landscape. In 2011, the Greek Government along with European Commission, decided to support the development of the first Hellenic Civil Unmanned Aerial Vehicle (HCUAV), which will provide solutions to both illegal migration and wildfires. This paper presents the challenges in the electronics and software design, and especially the under development solutions for detection of human and fire activity, image mosaicking and orthorectification using commercial off-the-shelf sensors. Preliminary experimental results of the HCUAV medium altitude remote sensing algorithms, show accurate and adequate results using low cost sensors and electronic devices.
In this paper, we focus on implementing the extraction of a well-known low-level image descriptor using the multicore power provided by general-purpose graphic processing units (GPGPUs). The color and edge directivity descriptor, which incorporates both color and texture information achieving a successful trade-off between effectiveness and efficiency, is employed and reassessed for parallel execution. We are motivated by the fact that image/frame indexing should be achieved real time, which in our case means that a system should be capable of indexing a frame or an image as it becomes part of a database (ideally, calculating the descriptor as the images are captured). Two strategies are explored to accelerate the method and bypass resource limitations and architectural constrains. An approach that exclusively uses the GPU together with a hybrid implementation that distributes the computations to both available GPU and CPU resources are proposed. The first approach is strongly based on the compute unified device architecture and excels compared to all other solutions when the GPU resources are abundant. The second implementation suggests a hybrid scheme where the extraction process is split in two sequential stages, allowing the input data (images or video frames) to be pipelined through the central and the graphic processing units. Experimental results were conducted on four different combinations of GPU–CPU technologies in order to highlight the strengths and the weaknesses of all implementations. Real-time indexing is obtained over all computational setups for both GPU-only and Hybrid techniques. An impressive 22 times acceleration is recorded for the GPU-only method. The proposed Hybrid implementation outperforms the GPU-only implementation and becomes the preferred solution when a low-cost setup (i.e., more advanced CPU combined with a relatively weak GPU) is employed.
This paper describes a visual multimedia content encryption approach based on cellular automata (CA), expanding the work proposed in . The presented algorithm relies on an attribute of the eXclusive-OR (XOR) filter, according to which, the original content of a cellular neighborhood can be reconstructed following a predefined number of applications of the filter. During the interim time marks, the cellular neighborhood is greatly distorted, making it impossible to be recognized. The application of this attribute to the field of image processing, results to a strong visual multimedia content encryption approach. Additionally, this paper proposes a new approach for the acceleration of the application of the XOR-filter, taking advantage of the Summed Area Tables (SAT) approach.
This paper presents the application of different spectral methods, like Fourier series and polynomial-based expansions, to Digital Elevation Models (DEMs) in order to fuse their content. Two different fusion techniques: 1) a filter-based one and 2) a weighted average of expansion coefficients, are examined. Their performance is evaluated by using both ground-truth lidar data as well as fusion quality measures. The results point out that polynomial-based spectral expansions perform better than the traditional Fourier approach.
This paper introduces a new super-resolution algorithm based on machine learning along with a novel hybrid implementation for next generation mobile devices. The proposed super-resolution algorithm entails a multivariate polynomial regression method using only the input image properties for the learning task. Although it is widely believed that machine learning algorithms are not appropriate for real-time implementation, the paper in hand proves that there are indeed specific hypothesis representations that are able to be integrated into real-time mobile applications. With aim to achieve this goal, we take advantage of the increasing GPU employment in modern mobile devices. More precisely, we utilize the mobile GPU as a co-processor in a hybrid pipelined implementation achieving significant performance speedup along with superior quantitative interpolation results.
Image indexing refers to describing the visual multimedia content of a medium, using high level textual information or/and low level descriptors. In most cases, images and videos are associated with noisy and incomplete user-supplied textual annotations, possibly due to omission or the excessive cost associated with the metadata creation. In such cases, Content Based Image Retrieval (CBIR) approaches are adopted and low level image features are employed for indexing and retrieval. We employ the Colour and Edge Directivity Descriptor (CEDD), which incorporates both colour and texture information in a compact representation and reassess it for parallel execution, utilizing the multicore power provided by General Purpose Graphic Processing Units (GPGPUs). Experiments conducted on four different combinations of GPU-CPU technologies revealed an impressive gained acceleration when using a GPU, which was up to 22 times faster compared to the respective CPU implementation, while real-time indexing was achieved for all tested GPU models.
Together with my ongoing research, I have also contributed to the scientific community as a reviewer on a variety of conference and journal papers for IROS, ICRA, Electronics Letters and more.
Along with my Ph.D. studies, I have enjoyed several aspects of teaching.
Vehicle passengers detection using CNNs.
Landmark recognition using visual vocabularies.
Visual Inertial SLAM for multiple quadcopters.
A comparative study between local descriptors for place recognition tasks.
An 8-point algorithm SLAM for freely moving cameras.
Loop closure detection for indoor/ourdoor applications.
I have taught a class of primary school students about the basic principles of Robotics using the educational toolkit provided by Lego Mindstorms. That was probably the most challenging and interesting course that I have experienced so far. It is really exciting to witness such young minds to understand and use concepts from robotics theory, like a frame of reference, while I also learned a lot about the overall teaching procedure.
I would be happy to talk to you if you wish to learn more about me and my work.
You can also visit my GitHub repository here.