Books - Non-fiction

Region-Based Image Retrieval with High Level Semantics

Region-Based Image Retrieval with High Level Semantics Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech, Monash University, Churchill, Victoria, 3842 {dengsheng.zhang,
of 57
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Region-Based Image Retrieval with High Level Semantics Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech, Monash University, Churchill, Victoria, 3842 {dengsheng.zhang, Outline The Problem Content-based Image Retrieval CBIR Semantic Gap Low Level Image Features Learning Image Semantics Using Decision Tree Performance Test Integrate with Google SIEVE Experiments and Results Conclusions 2 The Problem We are in a digital world, and we are inundated with digital images. How to organize large image database to facility convenient search. How to find required images becomes a headache for Internet users. It s s a gold mining issue. 3 The Problem Find similar images from database Tiger 4 Challenges Images are not as structured as text documents. Metadata description of an image is not enough. Human description of image content is subjective.? 5 Content-based Image Retrieval (CBIR) Represent images with content features. Color: histogram, dominant color. Shape: moments, Fourier descriptors, scale space method. Texture: statistical method, fractal method, spectral method. Region: blob, arbitrary, block. 6 CBIR State State-of-the-Art Limited success in a number of specific domain. Industrial object recognition. CAD and other design database management. Museum visual document management Trademark retrieval. No commercial CBIR system for WWW. 7 Challenges Semantic Gap Conventional content-based image retrieval (CBIR) systems put visual features ahead of textual information. However, there is a gap between visual features and semantic features (textual information) which cannot be closed easily. Courtesy of Md. Monirul Islam 8 Cause of the Semantic Gap Low level features are usually used individually. Single type of features cannot describe image completely. Spatial information is usually ignored. Images are usually treated globally while users are usually interested in objects instead whole image. 9 Narrow Down the Semantic Gap Divide an image into objects/regions. Describe objects/regions using multiple type of features. Learn semantic concepts from large number of region samples. Use the learned semantic concepts to describe images. 10 Image Segmentation Segment images into regions using JSEG technique. (Y. Y. Deng and B. S. Manjunath,, IEEE PAMI, 2001) JSEG segments images using a combination of color and texture features. 11 Region Representation Color Features Represent regions using their dominant color in HSV space. dominant color Segmentation HSV Histogram Regions and their dominant colors 12 13 Gabor Gabor Filters Filters = s t mn t s t y s x I y x G ), ( ), ( ), ( * ψ ~ ) ( ~, ), ( y x a y x m ψ mn ψ = ) 2 exp( )] ( 2 1 exp[ 2 1 ), ( Wx j y x y x y x y x π σ σ σ πσ ψ + = ) cos sin ( ~ ) sin cos ( ~ θ θ θ θ y x a y y x a x m m + = + = Gabor Filters 14 Region Representation Gabor Texture Features Gabor texture features are obtained by computing the mean and standard deviation of each filtered image. E( m, n) Gmn ( x, y), m = 1,..., M ; n = 1,..., N = x y µ mn = E( m, n) P Q σ mn = 2 ( Gmn ( x, y) µ mn ) x y P Q (B. S. Manjunath and W. Y. Ma, IEEE PAMI, 1996) 15 Region Similarity Measurement Earth Mover Distance (EMD) EMD is a distance modelled with the traditional transportation problem which is solved using the linear programming optimisation. (Y. Rubner, ICCV98) Given a query image R i Q and R j T I i i = {( R, w ) i 1,..., m} and Q Q Q = are region of the query and the t arg et image; w a t arget i Q and w j j image IT = {( RT, wt ) j = 1,..., n} j T are the weights of the regions Minimize Subject to EMD( I, ) v ij 0; m i= 1 j= 1 Q IT = m n n i= 1 j= 1 vd n i v,1 ; 1 ij w j Q i m = m j v,1 ; i 1 ij wt j n = m n m i n j v 1 1 ij = w i j i 1 Q w = = = j= 1 T ij v min(, ) ij ij dij is the Euclidean distance between region i R Q and j R T 16 Learning Image Semantics Using Decision Tree Given a set of training samples described by a set of input attributes, decision tree classifies the samples based on the values of the given attributes. A decision tree (DT) is obtained by recursively splitting the training data into different subsets according to the possible values of the selected attribute, until data samples in each subset belong to same class. DT is very close to human reasoning, and is the only machine learning tool which can produce human comprehensible rules. 17 Decision Tree A decision tree consists of leaf nodes and non-leaf nodes. Each leaf node of the decision tree represents a decision (outcome) whereas each non-leaf node corresponds to an input attribute with each branch being a possible value of the attribute. Given a decision tree generated from the training samples, a new data instance can be classified by starting at the root node of the decision tree, testing the attribute specified by this node and moving down the tree branch corresponding to the value of the attribute. This process is then repeated until a leaf node (a decision) is reached. Once trained, a set of decision rules in if-then format can be derived for decision making. 18 Decision Tree DT is an intuitive top down approach. DT is used by human being for day-to to-day decision making. 19 Learning Mechanism 20 Semantic Templates 19 concepts are selected for training, 30 training sample regions are collected for every concept. Semantic templates (ST) are generated as the centroid of the low-level level features of the 30 sample regions. Using the STs,, the continuous-valued color and texture features are converted to color and texture labels which are discrete in value. The labels are used as discrete attribute values for input to the decision tree. Decision tree so derived is called DT-ST. 21 Decision Criteria Start with 19x30=570 sample regions. Every region is described by a set of three discrete attributes: color label, texture label and color-texture (CT) label. Calculate the gain of each attribute A i : Gain(A i ) = H(C) E(A i ) Where H(C) is the entropy of the training dataset, E(A i ) is the expect information of A i 22 Learning Image Semantics Using DT-ST Gain(Color) ) = 2.46 Gain(Texture) ) = 2.01 Gain(CT) ) = 2.90 CT has highest information gain, it requires least amount of information to split the dataset for decision making. Therefore CT is selected as the root attribute of the decision tree. 23 Learning Image Semantics Using DT-ST The dataset is then divided into 19 subsets corresponding to each of the 19 possible values of CT. The 19 subsets so formed are then pre-pruned pruned to remove the data samples whose class probability is less than a threshold k (10%). 24 Learning Image Semantics DT-ST Subsets corresponding to CT values 2, 4, 5, 7, 9, 10 and 11 are homogeneous. Therefore, branches with CT labels 2, 4, 5, 7, 9, 10, and 11 end up as leaf nodes with the corresponding outcome: Blue sky, Flower, Sunset, Firework, Ape fur, Eagle and Building, respectively. 25 Learning Image Semantics Using DT-ST First half of the tree 26 Learning Image Semantics Using DT-ST The subsets corresponding to the rest of the CT labels are declared as non-leaf nodes. These subsets require further splitting by using other attributes. Attribute CT is removed from the attribute set A since it has been used. The tree induction process is recursively applied to each of the non-leaf node subsets. 27 Pre-pruning During each repeat, the tree is pre-pruned pruned to remove the data samples whose class probability is less than a threshold k 28 Post-pruning The tree so formed has leaf nodes labelled with unknown. However, a tree with too many unknown outcomes fails to classify many data instances. It is necessary to post-prune prune the unknown leaves. Replacing all unknown leaf nodes with the class label having the highest probability in the immediate parent node. 29 Post-pruning Color=Not 8 is replaced with concept Tiger as it has highest probability ity Similarly, for Color = 8, Texture=Not 7&8, the subset comprises concepts for which the probability of Tiger is the highest. Therefore T the unknown for the branch Texture = Not (7,8) is changed to Tiger. 30 Final Decision Tree 31 Derived Decision Rules This set of decision rules is the actual classifier or machine. 32 Experiments and Results Three different datasets are selected to test the performance of the decision tree. TestSet 1: 19x20=380 regions. TestSet 2: 19x25=475 regions. TestSet 3: 19x40=760 regions. 33 Test on Pre-pruning Threshold In all the test sets, the decision tree obtained gives best performance when k = 0.1. Pre-pruning effectively prevents tree from fragmentation and noise. 34 Performance Test with Post-pruning The unknown outcomes are replaced with the highest probability class label. The average classification accuracy is improved by about 24.7% by using post-pruning. pruning. 35 Test of False Positive Error 50 regions irrelevant to any of the 19 concepts are selected to test the false positive classification. Overall, 82% of the 50 irrelevant regions are categorized as irrelevant to the training concepts. False positive error is 18%. 36 Comparison of DT-ST with ID3 and C4.5 We compare the classification accuracy of DT-ST with ID3 and C4.5. ID3 and C4.5 are implemented using WEKA machine learning package. ( /ml/weka/) Tree induction method Classification accuracies for the three datasets TestSet1 TestSet2 TestSet3 DT-ST C4.5(continue continue attribute value) C4.5(discret discret attribute value) ID3(discret discret attribute value) Retrieval Performance of DT-ST Commonly used Corel image database is selected for the image retrieval test. The Corel database consists of 5,000 images of 50 categories. JSEG segmentation produces regions from these images, an average of 5.84 regions per image. Average Precision (P) of 40 queries is obtained at each level of Recall (R=10,20, 100%). 100%). The 40 query images are selected from the 50 categories excluding those with very abstract category labels such as Australia. 38 Performance of DT-ST on Image Retrieval The proposed RBIR system supports query by regions. The user specify the dominant regions in the query image. Using DT-ST, the system first finds a list of images that contain regions with the same concept as that of the query. Then, based on their low-level level color and texture features, these images are ranked according to their EMD distances to the query image. This is named as retrieval with concepts. 39 Performance of DT-ST on Image Retrieval 40 Retrieval Examples 41 Integration of DT-ST with Google SIEVE Currently, text based image search engines are not based on CBIR. The textual description in existing search engines may not capture image content. We propose to integrate the existing text-based image search engine with visual features. A post-filtering algorithm is proposed, it is called SIEVE Search Search Images Effectively through Visual Elimination. Practical fusion methods are also proposed to integrate SIEVE with contemporary text-based search engines. 42 SIEVE The Idea The idea of using SIEVE is very similar to object classification done by a human being. First, objects of interest are roughly distinguished from other very different objects either manually or through certain hand tools (Google( in this case). Then, the collected objects are subject to visual inspection (SIEVE( in this case) ) to confirm each object of interest from unwanted objects. 43 SIEVE The Approach In our approach, text-based image search results for a given query are obtained at the first step. SIEVE is then used to filter out those images which are semantically irrelevant to the query. 44 SIEVE The System 45 Experiment Image Collections To test the retrieval performance of SIEVE, 10 queries are selected, including mountain, beach, building, firework, flower, forest, snow, sunset, tiger and sea. Google image search can return up to thousands of images for a query, however, users are usually only interested in the first few pages. Therefore, for each query, the top 100 images are downloaded from the first 5 pages. 46 Experiment Measurement In Web image search scenario, it is not known how many relevant images there are in the database for a given query. Bull s s eye measurement is used. The bull s s eye measures the retrieval precision among the top K retrieved images. 47 Results Retrieval Retrieval Accuracy P recision SIEVE Google Average retrieval precision for 10 image concepts K As more images are retrieved, SIEVE shows significant gain over Google. 48 Results Retrieval Examples Above: Search result by Google using query Tiger Left: Result by SIEVE using the same query Tiger 49 Results Retrieval Examples Above: Search result by Google using query Snow Left: Result by SIEVE using the same query Snow 50 Results Retrieval Examples Above: Search result by Google using query Firework Right: Result by SIEVE using the same query Firework 51 Method of Integration Scenario 1 1 SIEVE is installed on the server. User sends an image search query a Web browser. Search engine returns the SIEVED images to the user. Scenario 2 2 SIEVE is integrated with the Web browser as a plug-in. A user query is directed by the SIEVE to search engine. The returned list is subject to SIEVE. Scenario 3 3 SIEVE is used as an application software. SIEVE directs user query to various Web image search engines. The returned lists from search engines are further SIEVED. 52 Issues Significant time on image segmentation and computing image semantics. This can be solved by indexing images semantics upfront in image search engines. Although a limited concept set is used to test its performance, the decision tree can accommodate more semantic concepts. SIEVE can be applied more effectively if images in database are first classified into categories. 53 Conclusions A semantic image retrieval using decision tree learning has been proposed. The key characteristics of the DT-ST are discrete attribute inputs, pre-pruning pruning and post-pruning. pruning. DT produces human comprehensible rules which no other machine learning tools can do. Test results on concept learning show the proposed DT-ST outperforms existing DT techniques. Experimental results on image retrieval show the DT-ST is promising and gives better result than conventional CBIR technique. Application of DT-ST on WWW has also been tested and shown promising result. 54 Future Work The system will be extended to learn a much larger number of concepts for practical semantic image retrieval. System will be improved to accept keyword search to create a prototype Image Google. Various query interfaces will be investigated including relevance feedback, spatial query etc. 55 Publications from This Research 1. Y. Liu, D. S. Zhang and G. Lu, Region-Based Image Retrieval with High-Level Semantics using Decision Tree Learning , Accepted in Pattern Recognition, Dec., Y. Liu, D. S. Zhang and G. Lu, Narrowing Down The Semantic Gap in Content-Based Image Retrieval A Survey , Pattern Recognition, 40(1): , Y Liu, D. S. Zhang, G. Lu, SIEVE Search Images Effectively through Visual Elimination , Lecture Notes in Computer Science, Springer, ISBN , Vol. 4577, pp , Intl. Workshop on Multimedia Content Analysis and Mining (MCAM07), Weihai, China, 30 June-1 July, Y. Liu, D. S. Zhang, G. Lu and A. Tan, Integrating Semantic Templates with Decision Tree for Image Semantic Learning , Lecture Notes in Computer Science, Springer Berlin/Heidelberg, ISSN: , (MMM07), 4352: , Y. Liu, D. S. Zhang, G. Lu and W-Y. Ma, Study on Texture Feature Extraction in Region-based Image Retrieval System , In Proc. of IEEE International Conf. on Multimedia Modeling (MMM06), pp , Beijing, Jan.4-6, Y. Liu, D. S. Zhang, and G. Lu, Deriving High-Level Concepts Using Fuzzy-ID3 Decision Tree for Image Retrieval , In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP05), pp , Philadelphia, PA, USA, March 18-23, Y. Liu, D. S. Zhang, G. Lu, and W.-Y. Ma, Region-Based Image Retrieval with High-Level Semantic Color Names , In Proc. of IEEE 11th International Multi-Media Modelling Conference (MMM05), pp , Melbourne, Australia, January, 12-14, Y. Liu, D. S. Zhang, G. Lu and W.-Y. Ma, Region-based Image Retrieval with Perceptual Colors , In Proc. of 5th Pacific-Rim Conference on Multimedia (PCM04), Tokyo, Japan, Nov. 30-Dec.3, Y. Liu, W. Ma, D. S. Zhang, and G. Lu, An Efficient Texture Feature Extraction Algorithm for Arbitrary-Shaped Regions , In Proc. of IEEE 7th International Conference on Signal Processing (ICSP04), Beijing, China, Aug. 31 Sept. 4, Vol. 2, pp , References Y. Deng and B. S. Manjunath, Unsupervised Segmentation of Color-Texture Regions in Images and Video, IEEE Trans. on Pattern Analysis and Machine Learning (PAMI), 23(8): , B. S. Manjunath and W. Y. Ma, Texture Features for Browsing and Retrieval of Large Image Data, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8): , Y. Rubner, C. Tomasi and L. J. Guibas, A Metric for Distributions with Applications to Image Databases, in Proc. of IEEE Inter. Conf. on Computer Vision (ICCV'98), p , Jan accessed in Feb.,
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!