of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Similar Documents
  1 Fast multiclass vehicle detection on aerial images Kang Liu and Gellert Mattyus  Abstract —Detecting vehicles in aerial images provides impor-tant information for traffic management and urban planning.Detecting the cars in the images is challenging due to therelatively small size of the target objects and the complexbackground in man-made areas. It is particularly challengingif the goal is near real-time detection - within few seconds -on large images without any additional information, e.g. roaddatabase, accurate target size. We present a method which candetect the vehicles on a 21 MPixel srcinal frame image withoutan accurate scale information within seconds on a laptop singlethreaded. Beside the bounding box of the vehicles we extractalso an orientation and type (car/truck) information. First weapply a fast binary detector using Integral Channel Features ina Soft Cascade structure. In the next step we apply a multiclassclassifier on the output of the binary detector which gives theorientation and type of the vehicles. We evaluate our method ona challenging dataset of srcinal aerial images over Munich anda dataset captured from a UAV.  Index Terms —vehicle detection, classification, near real-time I. I NTRODUCTION The detection of vehicles in aerial images is importantfor various applications e.g. traffic management, parking lotutilization, urban planning, etc. Collecting traffic and parkingdata from an airborne platform gives fast coverage over alarger area. Getting the same coverage by terrestrial sensorswould need the deployment of more sensors, more manualwork, thus higher costs.A good example for an airborne road traffic measuringsystem is the one in the project  Vabene  [1] of the GermanAerospace Center (DLR). In this real-time system aerial im-ages are captured over roads and the vehicles are detectedand tracked across multiple consecutive frames. This gives afast and comprehensive information of the traffic situation byproviding the number of vehicles and their position and speed.Fig. 1 provides the overview of our work flow and illustrationof the output. The detection is a challenging problem due tothe small size of the vehicles (a car might be only  30 × 12 pixels) and the complex background of man-made objectswhich appear visually similar to the cars. Providing both theposition and the orientation of the detected objects supports thetracking by giving constraints on the motion of the vehicles.This is particularly important in dense traffic scenes where theobject assignment is more challenging. The utilization of roadsand parking lots depends also on the type of the vehicle (e.g.a truck impacts the traffic flow different as a personal car). Asystem having access to this richer information can managethe infrastructure better. In a real-time system as in [1] theprocessing time (and computing power) is limited. Thereforethe processing method should be as fast as possible. K. Liu and G. Mattyus are with the Remote Sensing Technology Instituteof the German Aerospace Center.Fig. 1. Proposed vehicle detection framework. The input image is firstevaluated by the multi-direction vehicle detector. A sliding window goesalong  x - and  y -axes. Features are extracted from the detection window andsent to trained binary classifier. The binary classifier classify whether currentdetection window contains a positive object or not. Detected vehicles are thenprocessed for estimating their orientations and categories. Our vehicle detection method provides both robust perfor-mance, fast speed and vehicle orientation and type informationfully automatically based only on the input image.We detect the bounding box of the vehicles by a very fastbinary sliding window detector using Integral Channel Fea-tures and an AdaBoost classifier in Soft Cascade structure. Thebounding boxes are further classified to different orientationsand vehicle type based on HOG features [2].We test and evaluate our method on a challenging datasetover the city Munich, Germany and another dataset collectedby a UAV. These datasets contain srcinal, non-orthorectifiedframe images which makes the problem more challengingsince the exact GSD 1 is unknown (we have only an ap-proximate prior). To make our results better comparable toother methods, we release the Munich images with the groundtruth 2 . To show the robustness of the method we also presentqualitative results on images downloaded from Google Eartharound the world in the supplementary material.Our main contributions are: (i) The presented method usesfeatures which can be calculated rapidly in a Soft Cascadestructure. This makes the detection very fast, it takes only afew seconds on a 21 MPixel image on a laptop single threaded.(ii) Our method also works on a single srcinal frame imagewithout any georeferencing, exact GSD, street database or 3Dimage information. (iii) Beside the location we also estimatethe orientation and type of the vehicles. 1 Ground Sampling Distance 2 read-42467/   2 II. R ELATED WORK The vehicle detection in aerial images has a large literature,here we mention only a few important recent papers.Moranduzzo and Melagni [3], [4] process very high resolu- tion (2 cm GSD) UAV images for car detection. In [3] a featurepoint detector and SVM classification of SIFT descriptorsis applied, while the method in [4] uses a catalog of HoGdescriptors and later an orientation estimation.In [5] the cars are detected by a deep neural network runningon the GPU in a sliding window approach on a known constantscale. In [6] the vehicles are detected with online boostingon Haar-like features, local binary patterns and orientationhistograms. They train the detector for cars in one directionand during testing they rotate the image in 15 degrees step.This detector is trained for a known object size  35 × 70  pixelsand tested on images with the same scale.Leitloff et al. [1] use a two stage approach for the detectionof cars: first an AdaBoost classifier with Haar-like features andthen an SVM on various geometric and radiometric features.They use the road database as a prior to detect only along theroads in a certain direction. The method achieves good resultsrunning fast on a CPU, however it is limited to orthorectifiedimages and areas covered by the road database.Tuermer et al. [7] utilize the road map and stereo matchingto limit the search area to roads and exclude buildings. HOGfeatures with an AdaBoost classifier are applied to detectthe cars on the selected region. This method is limited togeoreferenced image pairs and areas covered by the roaddatabase.III. M ULTI - DIRECTION  V EHICLE  D ETECTION We handle the vehicle detection problem in two stages. Thefirst stage is a very fast binary sliding window object detectorwhich delivers axis aligned bounding boxes of the vehicleswithout type or orientation information. The second stage is amulticlass classifier applied on the bounding boxes estimatingthe orientation and the type of the vehicles. The processingsteps are shown in Fig. 1.  A. Binary sliding window detector  For fast detection both the feature calculation and theclassification has to be efficient. 1) Fast image features:  Viola and Jones [8] introduced theintegral image concept with Haar-like features for fast androbust face detection. By using the integral image  I  Σ  the pixelintensity  I   sum of the Haar-like features is calculated by a fewoperations independent of the area of the feature. The value I  Σ ( x,y )  at  ( x,y )  location in an integral image is the sum of the pixels above and to the left of   ( x,y ) : I  Σ ( x,y ) = i ≤ x  i =0 j ≤ y  j =0 I  ( i,j )  (1)The integral  f  I   within an axis aligned rectangle defined by itsupper left corner  x 0 ,y 0 , width  w  and height  h  is calculatedas  f  I   =  I  Σ ( x 0  +  w,y 0  +  h ) +  I  Σ ( x 0 ,y 0 ) − I  Σ ( x 0  +  w,y 0 ) − I  Σ ( x 0 ,y 0  +  h ) .This idea is generalized by the Integral Channel Features(ICF) in the work of Dollar et al. [9]. Instead of working onpixel intensity values as in [8], an  ICF   can be constructedon top of an arbitrary feature channel (i.e. the transforma-tion of the original image). Features are defined as linearcombinations of sums over local rectangular regions in thechannels. By using the concept of integral images, an inte-gral channel can be pre-computed for each feature channelso that the computation of the sum over the rectangle isvery fast. The most commonly used channels are the colorintensities, the gradient magnitude and the gradient histogram.The gradient histogram is a weighted histogram where thebin is determined by the gradient orientation. It is givenby  Q Θ ( x,y ) =  G ( x,y ) 1 [Θ( x,y ) =  θ ], where  G ( x,y )  isthe gradient magnitude and  Θ( x,y )  is the quantized gradientorientation at  x,y  image location. The gradient histogramcan approximate the powerful and widely used HOG features[2]. If the rectangles are defined as squares, the sum can beaggregated to a single pixel in a downsampled image. In thiscase the integral is calculated even faster as a single pixel look up. This method is also called Aggregated Channel Features(ACF) [10]. For rapid speed we apply this method with fastfeature pyramid calculation as described in [10]. 2) AdaBoost classifier in Soft Cascade structure:  The num-ber of ICFs is very large (larger as the number of pixelsin the image window) since it is the linear combination of local rectangular regions in the image window. We select onlyrelevant features by the Discrete AdaBoost algorithm [11] for N   weak classifiers  h t ( x ) .  h t ( x )  is a simple classifer, e.g. athreshold or a shallow decision tree of a few features from theinput feature vector  x . AdaBoost is an iterative algorithm, ineach step it reweights the samples in the training set accordingto the classification result from the previous weak classifier.The final strong classifier  H   is composed of the weighted  α t weak classifiers  h t ( x ) . H   =  sgn N   t =1 α t h t ( x )  (2)At numerous sliding window positions (e.g. homogeneousregions) not all the weak classifiers have to be evaluated toclassify the image as non vehicle. To leverage this propertyfor speed improvement we form a Soft Cascade [12] from theweak classifiers. During the training a threshold  r t  is set for allthe weighted weak classifiers  c t  =  α t h t ( x ) . If the cumulativesum  H  t ( x ) =  i =1 ,...,t  c i ( x )  of the first  t  output functionsis  H  t ( x ) ≥ r t , then input sample is passed to the subsequentevaluation process; otherwise it is classified as negative andrejected immediately.  B. Multi-direction detection The orientation of the vehicles in aerial images can bearbitrary. This increases the intra-class variation of the appear-ance in the axis aligned sliding windows. A straightforwardbut computationally expensive solution, used in [6], is totrain the detector for one specific direction and rotate theinput image and do detection for each rotation. This wouldneed the computation of the integral images separately for  3 Rate    P  r  e  c   i  s   i  o  n   R  a   t  e  All featuresWithout ColorWithout Gradient HistogramWithout Gradient Magnitude Rate    P  r  e  c   i  s   i  o  n   R  a   t  e  Single detector, 45 deg rot stepSingle detector, 22.5 deg rot step2 detectors, 90 deg rot step4 detectors, 90 deg rot step4 detectors, 180 deg rot step8 detectors, 180 deg rot step Rate    P  r  e  c   i  s   i  o  n   R  a   t  e  DLR3K Scale0.6DLR3K Scale0.8DLR3K Scale1.0DLR3K Scale1.2DLR3K Scale1.4DLR3K Scale1.6DLR3K Scale1.8DLR3K Scale2.0 −180 −90 0 90 18000.511.522.533.544.5x 10 4 5.24%84.59%5.40%Prediction Error (degree)    P  r  e   d   i  c   t   i  o  n   N  u  m   b  e  r (a) feature contribution (b) classifier configurations (c) different scales (d) orientation estimation Fig. 2. (a) Evaluation of the Integral Channel Features. Gradient histogram channels play the most important role while gradient magnitude channel hasleast affects on the final result. (b) Detection result of aggregated detectors. (c) Performance after rescaling the image with different factors. (d) Orientationestimation error histogram using artificial neural network with 16 output classes. each direction and would result in slow processing speed. Toovercome this we propose two methods: One is to train a singleclassifier which is able to detect differently oriented vehicles;The other is to aggregate several simple classifiers, where eachis only sensitive to specific directions. 1) Single classifier method:  A single binary classifier istrained with samples covering all the directions. The trainingprocess has to deal with the high intra-class variety and findthe common part of all the positive samples. When the detectoris applied on the input image, vehicles in any directions canbe classified as positive samples. 2) Aggregated classifier method:  Alternatively the intra-class variety is reduced by splitting the training to differentorientations. Multiple binary classifiers are trained, each forspecific vehicle orientations. These classifiers are employed insequence during the detection phase, and the results from eachclassifier are aggregated using non-maximal suppression. Theintegral image does not need to be calculated multiple times,only the classification.The performances of these two methods are examined inSection V.IV. M ULTICLASS  V EHICLE  C LASSIFICATION The detector provides the axis aligned bounding boxesof the vehicles. In this next step we refine the extractedinformation by classifying the orientation and the type of the vehicle. We propose a two-step approach containing anorientation estimator and a type classifier. A sample is sentto the orientation estimator first, then rotated to horizontaldirection according to the orientation estimation, and finallyprocessed by the type classifier to identify which type categorythis vehicle belongs to.  A. Orientation estimation We consider the orientation estimation as a multi-class clas-sification problem. The directions are clustered, each clusteris considered as a class. The ICF features can be calculatedfast, but they have a very high number, thus they are notsuitable for multiclass classifiers working on a fixed lengthfeature vectors. Therefore we apply the powerful Histogramof Oriented Gradients (HOG) feature [2] which has a fixedfeature vector length. We use a neural network with one hiddenlayer as a multi-class classifier [13].  B. Type classification The type classifier needs to classify the input image intocorresponding categories. We have defined two type classes:car and truck but the presented method could be extended tomore classes. The object bounding box is rotated to horizontaldirection based on the orientation estimation. Unrelated con-text is cropped out and HOG features are again extracted andclassified by the type classifier.V. E XPERIMENTAL RESULTS We test the multi-direction detection and multiclass classi-fication parts in our detection method, respectively, and givequantitative results for the different processing stages. Thebinary detector is trained with 2048 weak classifiers in eachtest. We use depth-two decision trees as weak classifiers.  A. Results on Munich images The quantitative evaluation is performed on 20 aerial imagescaptured by the  DLR 3K   camera system [1] over the area of Munich, Germany. We use the srcinal nadir images with theresolution of   5616 × 3744  pixels. They are taken at a heightof 1000 meters above the ground, the approximate groundsampling distance is 13 cm. The first 10 images are usedfor training and the other 10 for testing. Positive trainingsamples come from 3418 cars and 54 trucks annotated in thetraining images, while the negatives are randomly picked fromthe background, i.e. areas without vehicles. To overcome thelow number of truck samples we randomly transformed themadditionally 30 times. Fig. 3 shows detection results on thetest images. We set the detection window to  48 × 48  pixels.For the ground truth the vehicles in the images are annotatedmanually as oriented bounding boxes. 1) Multi-direction vehicle detection:  Integral Channel Fea-tures contain rich information and can be computed rapidly.They are selected as the features for training and detection.Experiments are performed to evaluate the importance of eachfeature channel type and the performance of different classifierconfigurations. a) Feature channel:  We use three types of feature chan-nels: Luv color, gradient magnitude and gradient histogram.We have evaluated the contribution of each feature channel, thePrecision-Recall (PR) curves are plotted on Fig. 2 (a). Thesecurves indicate that gradient histogram channels play the most  4 (a) Main Road. (b) Buildings along main road. (c) Residential area.(d) Failure cases. (e) Detection on dataset in [3], [4] (f) Detection on dataset in [3], [4] Fig. 3. Detection results from the DLR test images. Green and cyan bounding boxes are the correct detected samples, representing cars and trucks, respectively.Black bounding boxes are the missed ones and red are the false positives. The results show that our method works well in most scenarios (a)(b)(c), howeverthe complicated rooftops or outdoor swimming pools may lead to false positive detections (d). We also evaluated our method on the dataset presented in [3],[4], the detection results are shown in (e)(f). important role in representing the vehicles while the gradientmagnitude channel affects the final result the least. For thelater tests we use all the feature channels. b) Multi-direction detection methods:  We proposed twomethods, single and aggregative classifiers, to detect vehiclesin different directions (Section III-B). The performances aredepicted in Fig. 2(b). The PR curve shows that the optimalsolution is the ’Classifier aggregation method’ with each clas-sifier trained using samples in opposite directions (8 detectorswith sample rotation step of   180 ◦ ). This means 8 detectorsand thus longer computation time. 2.7 s is needed for a singledetector while the detection with 8 classifiers takes 4.1 s.This is sublinear since the integral images doesn’t have tobe calculated again. We use the 8-classifier configuration forthe later tests. c) Detection on images with different scales:  To showthe ability of our method to detect the cars on images withdifferent scales we resized the image for the test but not thetraining. These results are shown on the Fig. 2(c). The detectorperforms best on the same scale as it was trained, if theresolution is increased the performance remains comparable.But if we decrease the resolution we lose information whichleads to a lower performance. 2) Multi-class vehicle classification:  After the axis-alignedbounding box detection we classify the orientation and type of the vehicles. We convert all the bounding boxes to  48 × 48  pixelgray images and calculate HOG features for this image. We getthe best performance with  4 × 4  cell size,  1 × 1  block size,  1 × 1 block stride HOG feature configuration and use this for thelater tests. The comparison of different HOG configurationscan be found in the supplementary material. TABLE IP ERFORMANCE COMPARISON BETWEEN DIFFERENT METHODS . Method GroundTruthTruePositiveFalsePositveRecallRatePrecisionRate Munich datasetViola-Jones 5892 3237 1467 54.9% 68.8%Ours 5892  4085 619 69.3% 86.8% UAV dataset from [3] [4][3] 119 88 143 73.95% 38.1%[4] 119 87 111 73.1% 43.4%Ours 119  94 6 79.0% 94.0% TABLE IIC OMPARISON OF COMPUTATION TIMES . Method ImageResolutionDetection TimePer Image [s]Detection TimePer MPixel [s] Proposed  5616 × 3744  4.4 0.2 Viola-Jones  5616 × 3744  1160 55.2[4]  5184 × 3456  14400 803.8[5]  a 1368 × 972  8 6.0 a Running on the GPU. a) Orientation estimation:  Orientation classification isperformed according to Section IV-A with 16 classes ( 22 . 5 ◦ rotation difference between adjacent sample groups, respec-tively). The orientation estimation error histogram is depictedin Fig. 2(d). In the supplementary material we provide resultswith different number of classes and an additional randomforest classifier [14]. The most common error is when thesamples are classified in the opposite direction. This is because  5 TABLE IIIC ONFUSION MATRICES OF TYPE CLASSIFICATION USING DIFFERENTCROPPING CONFIGURATIONS . Cropped Size  48 × 48  48 × 28ConfusionMatrix A/P  a Car Truck A/P Car Truck Car 2843 60 Car  2838 65 Truck  b 123 685 Truck   0 808Accuracy  95 . 1%  98.2% a Actual class / Predicted class b The number of truck type is increased by random transformation of theexisting samples. the front part of a vehicle might be similar to the rear part fromthe top view in aerial images. b) Type classification:  The detected bounding box isrotated to the horizontal direction according to the orientationestimation. We trim the input image by cropping the upper andlower parts, from  48 × 48  to  48 × 28 . In our dataset the numberof trucks is much less than the number of cars. We generatenew ones from the existing samples using random transforma-tion. The performances with different cropping configurationsare compared in table III and the supplementary material. Theoptimal type classification can reach  98 . 2%  in accuracy witha one-hidden-layer neural network. 3) Baseline comparison:  As baseline we use the OpenCV 3 implementation of the Viola-Jones detector [8]. We havetrained it on one vehicle direction while at detection we rotatethe image similar as in [6] and apply the detector for eachrotated image. Table I contains the numerical comparison of this method on the Munich dataset. 4) Computation time:  Since the processing time is alsoimportant for the detector we compare our method with othermethods where the computation time is provided in the paper.Table II contains the computation times. Our experiments areperformed on a laptop with Intel  R  Core TM i5 processor and 8GB memory and our program is running single threaded writ-ten in Matlab and C++. The comparisons show that the speedour method is considerably faster. This makes our methodmore suitable for real-time systems where the computationtime is a serious issue. The method of [5] achieves comparabledetection performance but on a different dataset, therefore weshow only the processing time of the method.  B. Baseline comparison on UAV images We also evaluated our method on the dataset presentedin [3], [4] and compared to the results provided without screening. The results can be found in the Table I. Theprecision rate of our method outperforms the other methodssignificantly. Due to the higher resolution we set the detectionwindow to  96 × 96  pixels for this dataset and have only carvehicle type (no truck). C. Qualitative results from around the world  To show the robustness of our detector we also run ourdetector on images downloaded from Google Earth. These canbe found in the supplementary material. 3  VI. C ONCLUSION We have presented a method which can detect vehicleswith orientation and type information on aerial images ina few seconds on large images. The application of IntegralChannel Features in a Soft Cascade structure results in bothgood detection performance and fast speed. The detector workson original images where no georeference and resolutioninformation is available. As future work the performance couldbe further improved by using a deep neural network afterthe binary detector like R-CNN in [15]. Since this has tobe applied only to a fraction of the image, the speed of thedetector would be still fast.VII. A CKNOWLEDGEMENT This work was funded by the DLR project  Vabene++ 4 . Wethank for the authors of [3], [4] who generously provided their image dataset with the ground truth.R EFERENCES[1] J. Leitloff, D. Rosenbaum, F. Kurz, O. Meynberg, and P. Reinartz, “Anoperational system for estimating road traffic information from aerialimages,”  Remote Sensing , vol. 6, no. 11, pp. 11315–11341, 2014.[2] N. Dalal and B. Triggs, “Histograms of oriented gradients for humandetection,” in  Computer Vision and Pattern Recognition, 2005. CVPR2005. IEEE Computer Society Conference on , vol. 1. IEEE, 2005, pp.886–893.[3] T. Moranduzzo and F. Melgani, “Automatic car counting method forunmanned aerial vehicle images,”  Geoscience and Remote Sensing, IEEE Transactions on , vol. 52, no. 3, pp. 1635–1647, March 2014.[4] ——, “Detecting cars in uav images with a catalog-based approach,” Geoscience and Remote Sensing, IEEE Transactions on , vol. 52, no. 10,pp. 6356–6367, Oct 2014.[5] X. Chen, S. Xiang, C. Liu, and C. Pan, “Vehicle detection in satelliteimages by hybrid deep convolutional neural networks,”  Geoscience and  Remote Sensing Letters, IEEE  , vol. 11, no. 10, pp. 1797–1801, Oct 2014.[6] S. Kluckner, G. Pacher, H. Grabner, H. Bischof, and J. Bauer, “A 3dteacher for car detection in aerial images,” in  Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on , 2007, pp. 1–8.[7] S. Tuermer, F. Kurz, P. Reinartz, and U. Stilla, “Airborne vehicledetection in dense urban areas using hog features and disparity maps,” Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of  , vol. 6, no. 6, pp. 2327–2337, Dec 2013.[8] P. Viola and M. Jones, “Rapid object detection using a boosted cascadeof simple features,” in  Computer Vision and Pattern Recognition, 2001.CVPR 2001. Proceedings of the 2001 IEEE Computer Society Confer-ence on , vol. 1, 2001, pp. I–511–I–518 vol.1.[9] P. Doll´ar, Z. Tu, P. Perona, and S. Belongie, “Integral channel features,”in  BMVC  , vol. 2, no. 3, 2009, p. 5.[10] P. Dollar, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramidsfor object detection,”  Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol. 36, no. 8, pp. 1532–1545, Aug 2014.[11] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression:a statistical view of boosting,”  Annals of Statistics , vol. 28, p. 2000,1998.[12] L. Bourdev and J. Brandt, “Robust object detection via soft cascade,”in  Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on , vol. 2. IEEE, 2005, pp. 236–243.[13] Y. Lecun, L. Bottou, G. B. Orr, and K. R. M¨uller, “Efficient BackProp,”in  Neural Networks—Tricks of the Trade , ser. Lecture Notes in ComputerScience, G. Orr and K. M¨uller, Eds. Springer Verlag, 1998, vol. 1524,pp. 5–50.[14] L. Breiman, “Random forests,”  Machine learning , vol. 45, no. 1, pp.5–32, 2001.[15] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich featurehierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2014. 4 
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!