This paper center around the problem of automated visual content classification. To enable classification based image or visual object retrieval, we propose a new image representation scheme called visual context descriptor (VCD) that is a multidimensional vector in which each element represents the frequency of a unique visual property of an image or a region. VCD utilizes the predetermined quality dimensions (i.e., types of features and quantization level) and semantic model templates mined in priori. Not only observed visual cues, but also contextually relevant visual features are proportionally incorporated in VCD. Contextual relevance of a visual cue to a semantic class is determined by using correlation analysis of ground truth samples. Such co-occurrence analysis of visual cues requires transformation of a real-valued visual feature vector (e.g., color histogram, Gabor texture, etc.,) into a discrete event (e.g., terms in text). Good-feature to track, rule of thirds, iterative k-means clustering and TSVQ are involved in transformation of feature vectors into unified symbolic representations called visual terms. Similarity-based visual cue frequency estimation is also proposed and used for ensuring the correctness of model learning and matching since sparseness of sample data causes the unstable results of frequency estimation of visual cues. The proposed method naturally allows integration of heterogeneous visual or temporal or spatial cues in a single classification or matching framework, and can be easily integrated into a semantic knowledge base such as thesaurus, and ontology. Robust semantic visual model template creation and object based image retrieval are demonstrated based on the proposed content description scheme.
Indexing, retrieval and delivery of visual and spatio-temporal properties of video objects requires efficient data models and sound operations on the model are mandatory. However, most object-based video data models address only a single aspect of those properties. In this paper, we present an efficient video object representation method that captures the visual, spatial and temporal properties of objects in a video in the form of a unified abstracted data type. The proposed data type is a polygon mesh, named video object mesh, which is defined in a spatio-temporal domain. Based on the application needs, a contour of an object is modeled with a polygonal contour. With the contour and color information of the object, content-based triangularization is performed. A video object in a frame is modeled with two dimensional-polygon mesh. Each vertex in the mesh, color information is embedded for further use. By using motion analysis, a corresponding vertex in the adjacent frame is identified connected to the vertex that is being analyzed. These processes are continued until a video object disappears. The result of these processes is a three dimensional polygon mesh hat models location variant motion and location invariant motion that can not be captured by traditional trajectory based motion model. The proposed model is also useful camera motion analysis. Since a surface shape of a video object mesh has partial information of camera motion.
Techniques for content=based image or video retrieval are not mature enough to recognize visual semantic completely. Retrieval based on color, size, texture and shape are within the state of the art. Our experiments on human factors in visual information query and retrieval show that visual information retrieval based on the semantic understanding of visual objects and content are more demanding rather than visual appearance based retrieval. Therefore, it is necessary to use captions or text annotations to photos or videos in content access of visual data. In this paper, human factors in text and image searching are carefully investigated. Based on the resulting human factors, a framework for integrated querying of visual information and textual concept is presented. The framework includes ontology- based semantic query expansion through query term rewriting and database navigation within a conceptual hierarchy within multi modal querying environments. To allow similarity based concept retrieval, a new conceptual similarity distance measure between two conceptual entities in a given conceptual space is proposed. The dissimilarity metric is a minimum weighted path length in the corresponding conceptual tree.
KEYWORDS: Video, Visualization, Multimedia, Databases, Information visualization, Image retrieval, Data modeling, Semantic video, Video processing, Visual process modeling
There has been significant progress in the area of content- based still image retrieval systems. However, most of the existing visual information management system use static feature analysis models decided by database implementers based on their heuristics, and use indexing oriented data modeling techniques. In other words, such systems have limitations of areas including scalability, extensibility and adaptability. In this paper, we will attempt to resolve the problems that surface in content modeling, description and sharing of distributed heterogeneous multimedia information. A language; named UCDL, for heterogenous multimedia content description is presented to resolve the related problem. The resulting UCDL facilitates a formal content modeling and description method for complex multimedia content and the exchange of heterogeneous content information. The proposed language has several advantages. For instance, an individual user can easily create audio-visual descriptions by using a library of automated tools. Users can perform automated testing of content description becomes implementation independent, thus offering portability across a number of applications for authoring tools to database management systems. Users can have personalized retrieval view through content filtering, and can easily share the heterogeneous content descriptions of various information sources. In addition, the proposed language can be a part of MPEG-7 DDL.
KEYWORDS: Video, Visualization, Databases, Information visualization, Multimedia, Data modeling, Image retrieval, Video processing, Feature extraction, Systems modeling
There has been significant progress in the area of content- based still image retrieval systems. However, most of the existing visual information system use static feature analysis models decided by database implementers based on heuristics, and adapting indexing oriented data modeling. In other words, such system fall short in a number of areas including scalability, extensibility and adaptability. In this paper, we will attempt to resolve the problems that surface in content modeling, description and sharing of distributed heterogeneous multimedia information. A language; named UCDL, for heterogeneous multimedia content description is presented to resolve the related problem. The resulting UCDL facilitates a formal modeling method of complex multimedia content, a unified content description scheme, and the exchange of heterogeneous content information. The proposed language has several advantages. For instance, an individual user can easily create audio- vidual descriptions by using a library of automated tools. Users can do automated testing of content description for correctness and completeness before populating the data base and its use. Note that with UCDL, content description becomes implementation independent, thus offering portability across a number of applications from authoring tools to database management systems. Users can have personalized retrieval view through content filtering, and can easily share the heterogenous content descriptions of various information sources.
KEYWORDS: Video, Visualization, Feature extraction, Data modeling, Statistical analysis, Principal component analysis, Databases, Systems modeling, Multimedia, Video processing
The large scale proliferation of multimedia data necessitates the use of sophisticated techniques for accessing the information based on the content. VideoRoadMap is a new content-based video indexing system for retrieving video clips and images from multimedia databases. The system indexes the audio-visual information using spatio-temporal features and information modeling methods. The proposed system employs adaptive similarity measurements based on the contents of media objects, resulting in more accurate retrievals. Principal component analysis and second order statistical analysis are employed to determine the appropriate combination of weight values in similarity search. In addition, VideoRoadMap includes a powerful multi- faceted querying mechanism which allows queries to be formulated and presented in a variety of modes, including query by example (image and/or video), query by sketch, and query by object motion trajectory.
Recently, available information resources in the form of various media have been increased with rapid speed. Many retrieval systems for multimedia information resources have been developed only focused on their efficiency and performance. Therefore, they cannot deal with user's preferences and interests well. In this paper, we present the framework design of a personalized image retrieval system (PIRS) which can reflect user's preferences and interests incrementally. The prototype of PIRS consists of two major parts: user's preference model (UPM) and retrieval module (RM). The UPM plays a role of refining user's query to meet with user's needs. The RM retrieves the proper images for refined query by computing the similarities between each image and refined query, and the retrieved images are ordered by these similarities. In this paper, we mainly discuss about UPM. The incremental machine learning technologies have been employed to provide the user adaptable and intelligent capability to the system. The UPM is implemented by decision tree based on incremental tree induction, and adaptive resonance theory network. User's feedbacks are returned to the UPM, and they modify internal structure of the UPM. User's iterative retrieval activities with PIRS cause the UPM to be revised for user's preferences and interests. Therefore, the PIRS can be adapted to user's preferences and interests. We have achieved encouraging results through experiments.
This paper addresses a technique of recognizing a head gesture. The proposed system is composed of eye tracking and head motion decision. The eye tracking step is divided into face detection and eye location. Face detection obtains the face region using neural network and mosaic image representation. Eye location extracts the location of eyes from the detected face region. Eye location is performed in the region close to a pair of eyes for real-time eye tracking. If a pair of eyes is not located, face detection is performed again. After eye tracking is performed, the coordinates of the detected eye are transformed into the normalized vector of the x-coordinate and the y-coordinate. Three methods are tested for head motion decision: head gesture recognition with direct observation, head gesture recognition using two HMMs, and head gesture recognition using three HMMs. Head gesture can be recognized by direct observation of the variation of the vector, but the variation of the vector can be observed by two-HMMs for more accurate recognition. However, because this method doesn't recognize neutral head gesture, three-HMMs learned by a directional vector is adopted. The directional vector represents the direction of head movement. The vector is inputted into HMMs to determine neutral gesture as well as positive and negative gesture. Combined head gesture recognition using above three methods is also discussed. The experimental results are reported.
ImageRoadMap is a new content-based retrieval system for retrieval of images by visual information. The system provides full capabilities for indexing and retrieval of images, their visual features and many other diverse data types. We introduce combination of effective indexing methods based on a novel spatial color distribution mode. By utilizing Self- Organizing Feature Map and other indexing methods, spatial color distribution, dominant color set, number of objects and other visual features may be computed. It also provides capabilities for similarity measurement and similarity based indexing. ImageRoadMap includes a powerful multi-faceted querying mechanism, which allows queries to be formulated and presented in several different ways. Depending on the characteristics and the nature of the query, the user may choose, Query by Example, Query by Spatial Color Distribution, Query by Color Contents, Query by Sketch, Query by Concept, or a combination of any of the above. The current interface support iterative multi-modal query formulation in which the user presents whatever relevant information that is available through appropriate windows.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.