Модель и метод поиска ключевых кадров для реферирования видеоданных
Електронного архіву Харківського національного університету радіоелектроніки (Open Access Repository of KHNURE)
Переглянути архів ІнформаціяПоле | Співвідношення | |
Creator |
Михнова, Е. Д.
|
|
Date |
2015-03-13T09:43:52Z
2015-03-13T09:43:52Z 2014 |
|
Identifier |
Михнова, Е. Д. Модель и метод поиска ключевых кадров для реферирования видеоданных : автореф. дис. ... канд. техн. наук : 05.13.23 "Системы и средства искусственного интеллекта" / Е. Д. Михнова ; М-во образования и науки Украины, Харьк. нац. ун-т радиоэлектроники. – Харьков, 2014. – 147 с.
http://hdl.handle.net/123456789/1909 |
|
Description |
Дисертаційна робота присвячена розвитку актуального напрямку обробки відеоданих – пошуку ключових кадрів у різних за жанром відеопослідовностях із 17 залученням методів просторово-часової сегментації. З точки зору сегментації відео у часі досліджено переваги та недоліки існуючих методів визначення границь сцен. З точки зору просторової сегментації розбиття поля зору кожного кадру реалізовано за допомогою діаграм Вороного з урахуванням кольору, текстури та форми областей. Вдосконалено метод визначення місцезнаходження опорних точок, за якими будуються діаграми Вороного. Проаналізовано можливості та особливості побудови діаграм Вороного різних порядків для представлення змісту відео. Розроблено спеціалізовані метрики, які дозволили порівняти діаграми Вороного та визначити значні зміни змісту в усьому відео. На основні запропонованої математичної моделі представлення та аналізу змісту відео побудовано метод пошуку ключових кадрів, який дозволяє не тільки видобувати кадри зі значимим змістом, але й усувати близькі за змістом кадри. Проведені експериментальні дослідження підтвердили конкурентоспроможність нової моделі та методу, які було успішно впроваджено на промисловому підприємстві.Integration of video endoscopes and cameras into industrial, medical, security and other tracking systems has caused great interest to intellectual video processing. As a rule, monitoring of any object assumes huge amount of time and human resources needed not only for decision making, that influences on the state of objects under control, but also for long lasting tracking and identification of interesting or non-ordinary conditions. Video summarization is an excellent tool for redundancy elimination in video, which is achieved by key frame extraction. These static images with significant content provide a brief overview on what was going on in video for hours. Key frames can also facilitate indexing, archiving, searching and cataloging video information. Despite of variety of existing key frame extraction methods and content presentation models, the main problem they face is the gap between information retrieved from video at a low level and semantic interpretation at a high level required for efficient summarization. Another challenge consists in different lighting conditions and camera characteristics, with which a video is shot. That is why video summarization attracts more and more research and development efforts. To overcome the aforementioned problems a novel model and method have been developed. The main mathematics which lies on its basis is Voronoi diagrams applied for spatial segmentation of frames. The diagram corresponds to decomposition of a plane (frames, in our case) into Voronoi tessellations or regions. Each tessellation of the first order Voronoi diagram is connected with its own salient point and built under the following rule: the distance from any point located at the same tessellation to its salient point is less or equal to its distance to any other salient point. Voronoi diagrams of higher order (or generalized Voronoi diagrams) are built according to the following rule: the distance between the farthest point of one Voronoi tessellation to its corresponding generator points is closer or equals to the distance to any nearest generator point of another tessellation. Arbitrary Voronoi tessellation of order k may contain from 0 to k generator points. A simple Voronoi diagram of the first order is a particular case of higher order Voronoi diagrams. Application of Voronoi diagrams has several advantages over traditional segmentation into objects. First of all, Voronoi diagram takes much less computational resources to build and, unlike object tracing, it is strictly defined. Higher order Voronoi diagrams are more stable to dynamic changes in video content compared with salient points as well. The latter gives less information and their changes from frame to frame are more significant and hard to identify. In addition, Voronoi diagrams have never been used 19 for content presentation and key frame extraction, which offers great interest from the point of competitiveness with other methods and models of this kind. Throughout the thesis, the constructed model is defined in terms of several metrics for comparing Voronoi diagrams which present content in each frame from different points of view, namely color, texture and shape. Other frame characteristics, such as motion and area of segments, have not been taken into consideration because of some reasons which have been provided. The resulting metric has also been proposed, uniting frame characteristics for machine level understanding. Each of the metrics has been presented using Voronoi tessellations and salient points only. The model aims at providing full presentation and description of frame content. A new method based on the model has been described in details. Various improvement measures have been designed for the method. The first one consists in enhancement of Harris method (for salient point selection) via k-means clustering that brings to invariance of initial placement of salient points as a result (only from the point of quality, but not time needed for a number of clustering iterations). The second one lies in stabilization of key frame descriptions with higher order Voronoi diagrams. The third one implies integration of shot boundary detection method into the proposed key frame extraction procedure. Voronoi diagrams of highest order turned out to be the most stable for changes in content. Highest order Voronoi diagrams constructed for frames from the same shot may look almost the same and can also be used for the purpose of shot boundary detection along with some other methods being analyzed. Frame size and resolution optimization and centroid-based clustering techniques have been studied to select the most reasonable solution for key frame extraction. The obtained results have been verified on test samples taken from open source TRECVid collection, Internet Archive, Movie Content Analysis Project, Open Video Project. Among the test samples are several commercials and self-made high definition videos (shot at the city centre). Different estimators have been applied to check validity, performance and quality of the proposed model and method. The latter has also been compared with some of the existing methods used for key frame extraction. Scientific results have been adopted in an industrial enterprise ELTA, which decreased experts’ time needed for endoscopic material processing and enabled to add key frames as images for the reports. Scientific results have been also testified by UkrSEPRO and adopted in academic activity at Kharkiv National University of Radio Electronics. |
|
Language |
uk
|
|
Subject |
реферування відео
ключовий кадр просторово-часова сегментація діаграма Вороного опорна точка video summarization key frame spatio-temporal segmentation Voronoi diagram salient point |
|
Title |
Модель и метод поиска ключевых кадров для реферирования видеоданных
|
|
Type |
Abstract
|
|