Special Section on Video Surveillance and Transportation Imaging Applications

Efficient processing of transportation surveillance videos in the compressed domain

[+] Author Affiliations
Orhan Bulan

Xerox Research Center, Webster, New York 14580

Edgar A. Bernal

Xerox Research Center, Webster, New York 14580

Robert P. Loce

Xerox Research Center, Webster, New York 14580

J. Electron. Imaging. 22(4), 041116 (Sep 24, 2013). doi:10.1117/1.JEI.22.4.041116
History: Received April 9, 2013; Revised August 1, 2013; Accepted August 12, 2013
Text Size: A A A

Open Access Open Access

Abstract.  Video surveillance is used extensively in intelligent transportation systems to enforce laws, collect tolls, and regularize traffic flow. Benefits to society include reduced fuel consumption and emissions, improved safety, and reduced traffic congestion. These video cameras installed at traffic lights, highways, toll booths, etc., continuously capture video and hence generate a vast amount of data that are stored in large databases. The captured video is typically compressed before being transmitted and/or stored. While all the archived information is present in the compressed video, most current applications operate on uncompressed video. The aim is to improve the efficiency of processing by utilizing features of the compression process and the compressed video stream. Key methods that are employed involve intelligent selection of reference frames (I-frames) and exploitation of the compression motion vectors. Although specific applications in the transportation imaging domain are presented, the methods proposed here can generally impact the ability to mine vast amounts of video data for usable information in many diverse settings. Applications presented include rapid search for target vehicles (Amber Alert, Silver Alert, stolen car, etc.), vehicle counting, stop sign/light enforcement, and vehicle speed estimation.

Figures in this Article

Development of intelligent transportation systems (ITS) is an active area of research due to potential benefits to society that include reduced fuel consumption and emissions, improved safety, reduced cost of law enforcement, and reduced traffic congestion.1 One of the most common uses of ITS has been in several applications within traffic law enforcement, such as vehicle speed detection and stop light enforcement. These applications have played an important role in decreasing the number of serious accidents. For instance, in a study performed on rural roads in the Netherlands, speed limit enforcement with mobile radar was shown to result in 21 and 14% reductions in accidents involving severe collision and injuries, respectively.2 Another aspect of public safety served by ITS concerns Amber Alert and Silver Alert. Amber Alert is an emergency alert system to promptly inform the public when a child has been abducted. Much more common, but not as widely known is Silver Alert, which is a public notification system in the United States to broadcast information about missing persons, especially seniors with Alzheimer’s disease, dementia, or other mental disabilities, in order to aid in their return. Consider a statement from the West Virginia code on Amber Alert 15-3A-7:

“the use of traffic video recording and monitoring devices for the purpose of surveillance of a suspect vehicle adds yet another set of eyes to assist law enforcement and aid in the safe recovery of the child.”

Other areas of concern are urban and highway planning, traffic management, and reduction of congestion, which often involve vehicle counting, classification, and timing. In one study in England, it was estimated that congestion on the roadway network costs industry and commerce 3 billion dollars a year.3

Data collection from transportation networks is an important element of an ITS. Different types of sensors are used for data collection depending on the application requirements. For example, conventional roadway sensors such as pressure hoses, piezoelectric sensors, curtain sensors, and induction coils are used in automated vehicle counting for traffic flow studies. Similarly, in-ground and ultrasonic sensors are typically used in real-time parking occupancy detection systems and RADAR/LIDAR are used in vehicle speed estimation systems. Several of these sensors, however, can be difficult and expensive to deploy and maintain, or provide a very limited type of data. Video cameras, on the other hand, capture visual data that can provide a wider range of useful information, including vehicle color, license plate, vehicle type, speed, etc. These cameras have recently started to replace traditional sensors devoted to traffic monitoring, speed, red light, stop sign, and other law enforcement activities, as well as safety and security tasks.4

The continuous operation of traffic and surveillance cameras on highways, roads, toll booths, etc., generates a vast amount of data. Given the volume of data, video is typically compressed before it is transmitted and/or stored. In fact, according to Cisco projections, video was predicted to constitute 90% of the world’s generated digital data by 2012,5 most of which will be in compressed video format. Efficient ways to mine compressed video data for useful information, or to exploit properties of compression, will be of great value in unlocking this great source of data. In the case of searching for specific vehicles or incidents in large video databases, rapid mining of the data can be a matter critical to life, as in an abduction or lost person scenario.

This paper presents example applications where data can be mined and processed directly in the compressed domain, making the process more efficient than standard approaches that rely on uncompressed data. More specifically, we consider five different applications including efficient vehicle search, vehicle counting, stop sign/red light enforcement, and vehicular speed estimation. In these applications, the algorithms use motion vectors associated with video compression, which are typically calculated as one of the compression steps prior to archiving or transmission, or are readily extracted from the compressed data stream. The magnitude, stability, and coherence of the extracted motion vectors are analyzed to realize the applications. In data mining applications, the algorithms described here yield a mining process that is fast and efficient compared to existing video processing methods617 due to the use of motion vectors that are part of the compression data stream. The methods eliminate the need to fully decompress the video data to extract useful information. Our experimental results show the effectiveness and versatility of the proposed methods on videos captured for multiple applications in various settings.

We also introduce an alternative processing architecture where the applications are built directly into the compression step. This approach can be particularly useful in multifunction cameras tasked to perform several functions simultaneously. For example, a speed estimation or a red light enforcement camera can be used concurrently for vehicle counting or vehicle classification. The real-time operation of these multifunctional cameras can be challenging using standard video processing techniques as they typically involve the execution of complex motion detection (e.g., optical flow, background estimation and subtraction) and tracking algorithms (e.g., particle filtering, Kalman filtering, mean shift tracking, hidden Markov modeling, etc.). Building the applications into the compression step, as in the proposed architecture, adds only a small amount of computation and thereby allows real-time performance in the multifunctional cameras. The outcome of processing (e.g., violators in the case of a law enforcement application) can be provided to the appropriate application or embedded in the compressed video stream for future use. This embedded embodiment of the proposed method can negate the need for further processing at a server (full decompression, processing, and recompression).

The remainder of this paper is organized as follows. A brief background on video compression is presented in Sec. 2. Section 3 provides a description of the proposed algorithms and their experimental validations for rapid mining of compressed video streams captured by traffic surveillance cameras for several applications. Section 4 concludes the paper by summarizing key aspects of the proposed approach. A preliminary version of part of the work in this paper has been previously presented in 18.

Video compression is essential in applications where high-quality video transmission and/or archival is required. A typical traffic surveillance system is composed of a set of video cameras that relay video data to a central processing and archival facility. While the communication network used to transport the video stream between the cameras and the central facility may be built on top of proprietary technology, traffic management centers have recently started to migrate to Internet Protocol (IP)-compliant networks.

Whichever the case, the underlying communication network typically has bandwidth constraints that dictate the use of video compression techniques on the camera end, prior to transmission. In the case of legacy analog cameras, compression is performed at an external encoder attached to the camera, whereas digital cameras typically integrate the encoder within the camera itself. Typical transmission rates over IP networks require the frame rate of multimegapixel video streams to be limited to fewer than 5 frames per second (fps). The latest video compression standards enable the utilization of the full frame rate camera capabilities for transmitting high-definition (HD) video at the same network bandwidth.19 For example, transmission of 1080p HD uncompressed video requires a bandwidth of 1.5 Gbps, while its compressed counterpart typically requires only 250 Mbps; consequently, transmission of compressed video with up to six times the frame rate of the uncompressed version would be possible over the same network infrastructure.

Video compression is achieved by exploiting two types of redundancies within the video stream: spatial redundancies among neighboring pixels within a frame and temporal redundancies between adjacent frames. This modus operandi gives rise to two different types of prediction, namely intraframe and interframe predictions, which in turn result in two different types of encoded frames, reference and nonreference frames. Reference frames, or I-frames, are encoded in a standalone manner (intraframe) using compression methods similar to those used to compress digital images. Compression of nonreference frames (P- and B-frames) entails using interframe or motion-compensated prediction methods where the target frame is estimated or predicted from previously encoded frames in a process that typically entails three steps:20 (1) motion estimation, where motion vectors are estimated using previously encoded frames. The target frame is segmented into pixel blocks called target blocks, and an estimated or predicted frame is built by stitching together the blocks from previously encoded frames that best match the target blocks. Motion vectors describe the relative displacement between the location of the original blocks in the reference frames and their location in the predicted frame. While motion compensation of P-frames relies only on previous frames, previous and future frames are typically used to predict B-frames; (2) residual calculation, where the error between the predicted and target frames is calculated; and (3) compression, where the error residual and the extracted motion vectors are compressed and stored. A detailed tutorial on state-of-the-art video compression techniques can be found in 21.

State-of-the-art video compression techniques such as MPEG-4, H.264/AVC, etc., typically include a small header file with the encoded bit stream for each video frame. The header includes information such as mode for the next frame and the bit string indicating the start of the next frame. Depending on the compression standard, the mode can be encoded as a single bit (for video compression that encodes only I- and P-frames but not B-frames) or two bits (i.e., MPEG4, H.264 standards that can utilize I-, P-, and B-frames).

In the MPEG4 standard, for example, the video object plane header starts with the hexadecimal bit string 000001B6 to indicate start of a new frame. Following, there is a bit string for frame mode as shown in Table 1.22 The header file for each frame can be read to identify the reference frames without decoding the whole video sequence.

Table Grahic Jump Location
Table 1Bits corresponding to different frame modes in the video object plane header in MPEG-4 compression standard.

For video captured with a stationary camera, typical for current traffic cameras, the main cause of change between adjacent frames corresponds to object motion. In this setting, the output from the motion compensation stage is the vector field approximately describing the way pixel blocks move between adjacent frames. As such, the encoded set of motion vectors is a good descriptor of apparent motion of objects within the field of view of the camera.

Motion Vector Calculation

Motion vectors are extracted as part of the motion estimation stage in the compression process as described above. Compression algorithms such as H.264 and MPEG4 typically employ block-based approaches23 to calculate motion vectors between two adjacent frames, as opposed to pixel-level optical flow methods,24,25 which calculate motion vectors for each pixel in each nonreference frame (and are thus highly computationally expensive). Figure 1 depicts a graphical description of a block-matching algorithm.

The block-matching algorithm breaks up the frame to be compressed or target frame into pixel blocks of a predetermined size. For a motion block of m×npixels, typically m=n=16, a search is performed in the reference frame for the block most similar to the current m×n target frame pixel block. Since searching and calculating similarity metrics is a computationally expensive process, a search window is typically defined around the location of the target motion block as shown in Fig. 1. Example similarity criteria between blocks are mean squared error (MSE) and mean absolute difference (MAD). Display Formula

MSE(d1,d2)=[B(x,y,n1)B(x+d1,y+d2,n)]2,(1)
Display Formula
MAD(d1,d2)=|B(x,y,n1)B(x+d1,y+d2,n)|,(2)
where B(k,l,j) is the pixel located on the k’th row and l’th column of the m×n block of pixels in the j’th frame, and (d1,d2) is the displacement vector between the target and candidate blocks. In this case, the (j1)’th frame is the already encoded frame being used as a reference frame, and the j’th frame is the target frame. A block similarity measure can be defined as the reciprocal or negative MSE or MAD. The advantage of MSE over MAD is that it typically results in better matches when used in a block-matching algorithm.26 MAD, on the other hand, is more commonly used in video compression because it is computationally less expensive than MSE. For our experiments in this paper, we utilized MAD as the cost function in calculation of motion vectors, although it should be noted that the use of motion vector fields computed with other standard techniques will yield similar results as those presented herein. The motion vector for the target pixel block is the vector (d1,d2) that maximizes similarity between the target and reference blocks. The search for the best matching block in the search window can be conducted using full extensive search, binary search, three-step search, spiral search algorithms, etc.27Figure 2(c) illustrates the motion field resulting from the application of an 8×8pixel block-based motion estimation algorithm with a 16×16pixel search to the reference frame depicted in Fig. 2(a) and the target frame of Fig. 2(b). Figure 2(d) shows the predicted image that results from stitching together the best-matching reference blocks. In this scenario, the camera is fixed and the car is moving from right to left. As a consequence, all apparent movement is within the region where the car is located on the image plane.

Graphic Jump LocationF2 :

Block-based motion estimation algorithm.

We propose a new approach to enable fast and efficient mining of compressed video data as well as providing real-time processing when performed in multifunctional cameras. Our method operates in the compressed domain by using video compression motion vectors. Figure 3 illustrates the overview of an architecture, where processing is performed inside the camera. In an alternative architecture, the processing can be performed in already compressed videos stored in large databases. In the proposed architecture, a traffic surveillance camera captures video of the region of interest. Motion vectors are calculated from captured video as the first step of the video compression performed at the camera. The applications are enabled by processing motion vectors in parallel with the remaining compression operations. The outcome of the processing can be embedded in the compressed video stream as metadata, which can be used in the future after the video is saved in a database. For example, in law enforcement applications, metadata can be information regarding violators (e.g., violation date, license plate, vehicle type, speed), which facilitates future searching within the video. Building applications directly into the compression step adds only a small amount of computation, making the proposed method computationally efficient. In the alternative architecture, motion vectors are extracted from the already compressed video data. Extraction of motion vectors from compressed data does not necessitate full decompression of the video stream as the vectors are typically compressed separately from the rest of the video data. This, in turn, enables fast and efficient mining of the compressed video data from large databases.

Graphic Jump LocationF3 :

Analysis of traffic surveillance videos using compression motion vectors.

We next present several example applications where the processing is performed in the compressed domain, as shown in Fig. 3.

Efficient Vehicle Search in Large Video Databases

With the prevalence of digital technologies, the Internet, and high-quality, low-cost sensors, the amount of video generated each day is increasing tremendously. Performing searches for a specific vehicle in a video sequence stored in a large database is often required in several applications. In an incident of Amber/Silver Alert, for example, a search is conducted across large databases of video that has been acquired from highway, local road, traffic light, and stop sign monitoring, to track and find the child/mentally impaired person. Statistics indicate that it is highly desirable that an Amber/Silver Alert-related search is conducted in a very fast and efficient manner, as 75% of the abducted are murdered within the first three hours.28

Algorithm

Figure 4 shows an overview of our method for fast and efficient vehicle searching in large video databases using compression motion vectors. The processing in this scenario largely consists of frame-mode selection in order to adaptively determine reference and nonreference frames. The proposed algorithm selects reference frames (I-frames) based on the position of a vehicle in the scene (i.e., the algorithm selects one reference frame per vehicle, the frame capturing the instant when the vehicle is optimally positioned in the scene) while compressing the video. A search for a specific vehicle in the compressed video can then be conducted only across the reference frames, which do not require decompression of additional frames in the sequence and therefore reduce the search space significantly.

Graphic Jump LocationF4 :

Efficient vehicle search in large compressed video databases.

We detect a vehicle at a specific position in the scene by detecting motion vectors on a virtual sensor. Motion is considered detected when the magnitude of the extracted motion vector is larger than a threshold T1 for a given motion block. These motion blocks are called active blocks. When the active motion blocks form a cluster larger than a certain size, they indicate existence of a moving object in the scene. The threshold for the cluster size can be set based on the camera configuration, resolution, frame rate, average speed on the road, and motion block size. Figure 5(a) shows a video frame where the vehicle in the frame is moving, and Fig. 5(b) shows corresponding active motion blocks forming a cluster.

Graphic Jump LocationF5 :

Active motion blocks indicating the existence of a moving vehicle in these blocks.

The frame mode is determined based on the presence and the position of the active motion blocks in the frame. For this purpose, a virtual sensor operates on the subsampled motion block domain. A frame is labeled as a reference frame when a motion block is active for the first time in a given number of frames on the virtual sensor (i.e., corresponds to a vehicle entering the virtual sensor region) or when a motion block is active, for example, for the last time on the virtual sensor after subsequent frames with active motion blocks on the sensor (i.e., vehicle exiting the virtual sensor). The direction of the virtual sensor should be picked judiciously and typically depends on the road geometry within the camera view.

Figure 6 illustrates cases where a vehicle is entering and exiting the virtual sensor. In the figure, a horizontal virtual sensor [i.e., mid-gray line in Figs. 6(b) and 6(d)] is drawn crossing the road approximately perpendicular to the traffic flow direction. A frame is labeled as a reference frame when a vehicle on the left lane enters the virtual sensor [Figs. 6(a) and 6(b)] or a vehicle on the right lane exits it [Figs. 6(c) and 6(d)]. Frames that do not satisfy these conditions are labeled as nonreference frames.

Graphic Jump LocationF6 :

A vehicle detection entering [(a) and (b)] and exiting [(c) and (d)] the virtual sensor.

In a search for a specific vehicle in the compressed video database, such as in an Amber/Silver Alert or evidence search for law enforcement, only reference frames are decompressed. For this purpose, the frame mode for each frame in the compressed video is first identified without decoding the entire compressed video sequence.

Note that, with the proposed method, when the event of interest occurs frequently in a video sequence, the number of reference frames increases. This can impact the overall compression ratio due to reference frames being not as efficiently compressed relative to nonreference frames. In these cases, frame mode selection can revert to the conventional approach, where reference frames are selected at a fixed rate. A threshold for the reference frame rate can be defined, which, when exceeded, triggers a switch back to a fixed rate scheme. Specifically, if the rate of occurrence of the event is higher than a predefined threshold in a given time interval, reference frames can be selected at a fixed rate mode for subsequent time intervals or until the detected event rate falls below the threshold. A data field indicating the frame mode used can be encoded at the start of the interval. This ensures that the vehicle-driven video compression algorithm described in this section does not deteriorate the compression ratio.

Experimental validation

The method for vehicle search in large video databases was tested on three video sequences taken with a commercially available surveillance camera. The videos have a resolution of 720×1280pixels and frame rate of 30 fps. The videos were captured on a local road with two-way traffic with average speed of 35 mph. Each video was captured over 20 to 30 min and included over 50 vehicles (including motorcycles, trucks, cars, bicycles, buses, etc.) as reported in Table 2. This setting corresponds to a light traffic volume condition. Note that the traffic volume (i.e., number of vehicles passing through) influences the number of reference frames but should not affect the accuracy of the proposed method. The accuracy is most likely determined by robustness against spurious motion in the scene. We, therefore, captured the videos on a partly cloudy and windy day with numerous sources of spurious motion due to swaying trees, shadows, and clouds from the three different vantage points shown in Fig. 7.

Table Grahic Jump Location
Table 2Number of frames and vehicles passing through the scene for each test video.
Graphic Jump LocationF7 :

The view of angles for three test videos: (a) test video 1, (b) test video 2, and (c) test video 3.

The test videos were first converted from color to gray scale as near-infrared cameras commonly used in transportation applications can only generate videos in grayscale. The videos were then scaled down to 176×144pixels for faster processing for frame mode selection. The full resolution frame can be used for compression once a mode is selected for the frame. The video was then compressed with MPEG4 and decompressed using the proposed decompression algorithm, which decodes only reference frames in the compressed video sequence.

Ideally, the method should only assign one I-frame per vehicle traversing the scene. In order to evaluate the performance of the proposed method, we consider two error events associated with its execution. One type of error occurs when no I-frame is assigned for a vehicle passing through the scene, which we label a “miss.” A second error type occurs either if multiple I-frames are assigned for the same vehicle or if an I-frame is assigned to a frame when there is no vehicle in the scene, which we label a “false alarm.” Probabilities associated with these errors are defined as follows: Display Formula

Pr(miss)=Number of vehicles missedTotal number of vehicles,(3)
Display Formula
Pr(false alarm)=Number of false alarmsTotal number of frames.(4)

In our experiments, we defined a virtual line in the middle of the scene image as shown in Figs. 6(b) and 6(d) and calculated the motion vectors for each frame. The performance was tested for three different motion blocks: 32×32pixel, overlapping blocks with an overlap of 16 pixels; 32×32pixel, nonoverlapping blocks; and 16×16pixel, overlapping blocks with an overlap of 8 pixels. The number of error events for each video sequence for different motion blocks are reported in Table 3. The corresponding error probabilities are calculated and listed in Table 4. Note that using smaller motion blocks leads to smaller Pr(miss), but it also increases the false alarm rate Pr(false alarm) as shown in the table.

Table Grahic Jump Location
Table 3The numbers of occurrence of two error events (i.e., misses and false alarms) for three different motion blocks.
Table Grahic Jump Location
Table 4Frequencies of occurrence of the two error events, namely misses and false alarms, for three different motion blocks.

We next compared the performance of the proposed method with that of the conventional fixed rate I-frame selection. Conservatively, we counted the presence of a vehicle as accurately detected by the conventional approach when the corresponding decoded frame includes any portion of the vehicle in the scene. We evaluated the performance of traditional compression algorithms across three different reference frame selection rates: once every 30, 60, and 90 frames. For a typical 30-fps video, these correspond to selecting an I-frame at every 1, 2, and 3 s, respectively. The results are listed in Table 5. The classical approach with a fixed rate of one reference frame every 30 frames includes all the vehicles in the decoded frames, but the number of frames decoded in this case is 4855, which is almost 17 times larger than the number of decoded frames with the proposed algorithm. Decreasing the reference frame rate in the conventional approach to one in 60 or 90 not only decreases the number of decoded frames but also increases the number of vehicles missed in the decoded frames. For example, for the rate of one reference frame in 90 frames, the total number of vehicles missed is 120, which is 60% of the total number of vehicles traversing the scene across the full length of the videos. Our algorithm narrows down the search space significantly relative to the conventional approach while minimizing the number of vehicles missed in the decoded frames. Also, note that the image of a vehicle in a reference frame with the conventional approach will typically not be as ideal a view as those selected via the proposed method, due to the reference frame selection being performed when a vehicle is optimally positioned.

Table Grahic Jump Location
Table 5The performance of the classical approach where the reference frames are selected in a fixed rate.
Vehicle Counting

Automated vehicle counting is an important tool for traffic volume studies.29 Data derived from these studies can help local governments estimate road usage, volume trends, critical flow time periods, and optimal maintenance schedules, as well as optimal traffic enforcement time periods.610 Real-time traffic flow data can also enable efficient incident management, which consists of incident detection, verification, and response. The past decade or so has also seen increasing interest by retailers to understand, manage, and capitalize on the traffic trends of their customers.

Algorithm

The number of vehicles can be easily determined by using the algorithm described in the preceding section, which identifies the frames when a vehicle enters or exits a virtual sensor. In this section, we describe two alternative methods for vehicle counting by analyzing the stability and coherence of motion vectors on a virtual sensor. Figure 8 illustrates how knowledge of motion vectors for a given target frame can be used in conjunction with the location of the target virtual region in order to trigger a vehicle counter. Figures 8(a) and 8(b) show two adjacent frames within one of the test video sequences (with a spatial resolution of 720×1280pixels. Figure 8(c) shows the corresponding active 32×32pixel motion vectors using T1=8. Superimposed on all three figures are two sample target virtual regions: one virtual line that traverses the road, depicted in green, and one virtual polygon for the lane on which the car is traveling, depicted in red. The existence of a vehicle passing through can be detected from the motion vectors traversing the virtual line or from the motion vectors detected within the virtual polygon.

Graphic Jump LocationF8 :

Two sample adjacent (a) reference and (b) target frames along with the resulting (c) motion vector field. A virtual line and polygon are also depicted.

Counting vehicles using a virtual line

As a vehicle moves across the virtual line, a number of active motion vectors will overlap or intersect the virtual line. In order to avoid false positives due to active motion vectors produced by apparent motion of objects different than vehicles, two thresholds are set: a threshold N1 that defines the smallest number of active motion vectors that overlap a virtual line before a vehicle count can be triggered and a threshold N2 that defines the smallest number of consecutive frames on which at least N1 active motion vectors overlap a virtual line before a vehicle count can be triggered. The value of N1 will typically depend on the geometry of the camera setup, the resolution of the video sequence, the size of the vehicle class to be counted, as well as on the size of the blocks used in the motion estimation algorithm. For example, for a 720×1280pixel video sequence and 32×32pixel motion vectors, a reasonable threshold to use is N1=4 to count passenger cars. The value of N2 will depend on the value of N1, the geometry of the camera setup, the frame rate, and the average speed of the road being monitored. For a frame rate of 30 fps and the specifications given above with a target vehicle speed of 35 mph, a reasonable threshold to use is N2=6. A vehicle count will be triggered on the first frame in which N1 active motion vectors intersect the virtual line after at least N2 consecutive frames of at least N1 active motion vectors intersecting the virtual line.

Counting vehicles using a virtual polygon

In the case where a moving vehicle is detected using a virtual polygon, a number of active motion vectors are located inside the polygon as a vehicle moves across the virtual polygon. Similarly, two thresholds are set to avoid false positives due to active motion vectors produced by apparent motion of objects other than vehicles. A threshold N3 defines the smallest number of active motion vectors inside the virtual polygon before a vehicle count can be triggered, and a threshold N4 defines the smallest number of consecutive frames on which at least N3 active motion vectors are inside the virtual polygon before a vehicle count can be triggered. The value of N3 depends on camera/viewing geometry, polygon size, video resolution, and block size. For a video with specifications described above, and the virtual polygon of Fig. 8, a reasonable threshold is N3=16. The value of N4 depends on N3, frame rate, and average speed of the road being monitored. For the parameters described above, a reasonable threshold to use is N4=2. A vehicle count is triggered on the first frame in which N3 active motion vectors are located inside the virtual polygon after at least N4 consecutive frames of at least N3 active motion vectors are located inside the virtual polygon.

Experimental validation
Robustness against camera configuration and spurious sources of motion

We first tested the robustness of the proposed method against variations of camera configuration and spurious motion in the scene. We used the video sequences shown in Fig. 7, which were captured on a partly cloudy and windy day from three different vantage points. The captured videos had numerous sources of spurious motion due to swaying trees, shadows, and clouds.

We calculated the compression motion vectors using the block-matching algorithm described in Sec. 2.1. In order to stray as little as possible from the typical implementations of MPEG4 and H.264, the motion estimation block size was set at 16×16pixels, which is the recommended basic unit size for motion compensated prediction in a number of important coding standards. The block size choice determines, among other performance parameters, the smallest vehicle-to-vehicle distance the algorithm is able to resolve: a block size of m×npixels in an implementation with a horizontal virtual line will render the algorithm incapable of distinguishing between vehicles that are m+1pixels apart as they cross the virtual line. For each video, two virtual lines were drawn for each lane as shown in Fig. 7. We set the threshold T1=7 for all videos and set N1 and N2 between 4 and 6 depending on the camera configuration.

The performance of the proposed method is shown in Table 6. (Note that in our experiment set we exclude bicycles, jogging people, and motorcycles since they may not be desired in a traffic flow study.) The false alarms were mainly due to the camera capturing a spurious source of motion such as the shadow of a large cloud. The robustness of the algorithm to these types of unwanted motion can be improved by using a combination of multiple virtual areas. Most misses occur for vehicles driving close to the middle of the road instead of driving in the right or left lane.

Table Grahic Jump Location
Table 6Performance of the proposed method on videos captured on a local road from three different vantage points with various spurious sources of motion.

It is also worth noting that the algorithm was tested in a scenario of light to medium traffic volume in which the smallest distance between adjacent vehicles was long enough to be resolved for the given block size. Since motion blocks are typically no smaller than 8×8pixels, the algorithm will have limited applicability in cases where the geometry of the camera setup and the traffic conditions of the road being supervised result in vehicle-to-vehicle distances smaller than 8 pixels. This situation will lead to missed detections.

Extensibility to other settings

We next tested the proposed method on publicly available videos to evaluate the versatility and extensibility in other settings. For this purpose, we downloaded three test videos from a publicly available video sharing website. (The videos are downloaded from the video sharing website YouTube.) The videos illustrate vehicle counting in a highway, garage, and one-way road settings as shown in Fig. 9. The spatial resolution of highway, garage, and one-way road videos are 480×360, 550×360, and 450×360pixels, respectively.

Graphic Jump LocationF9 :

Publicly available test videos captured in different settings.

From the downloaded videos, we first calculated the compression motion vectors. In order to capture the smallest vehicle-to-vehicle distance possible, we set the block size to 8×8pixels. From the calculated motion vectors, the vehicles are detected by using a virtual line. The threshold T1 is set to 3 and parameters N1 and N2 of the algorithm are set to values between 3 and 6 depending on the video and the camera configuration.

The performance of the method on publicly available videos is shown in Table 7. In all cases, the performance of the method is better than 90% in terms of Pr(miss) and Pr(false alarm). Note that these public videos are typically subsampled before being shared on the website. Even in these subsampled videos, the proposed method achieves acceptable performance in terms of vehicle counting.

Table Grahic Jump Location
Table 7Performance of the proposed method on publicly available videos captured in three different settings.
Red Light (Stop Sign) Law Enforcement

Red light cameras are one of the more widely deployed transportation imaging technologies across the United States and internationally. Beyond what their name indicates, existing red light cameras are actually multimodular systems, typically composed of three modules: (1) a vehicle detector module, (2) a traffic surveillance camera, and (3) a controller or processing unit. In a single-pole solution, on the other hand, an IP-enabled traffic surveillance camera may serve as a video capture and transmission device in addition to performing intersection monitoring. When coupled with an intelligent algorithm, these cameras can perform red light/stop sign enforcement without requiring the other modules.

Algorithm

We next describe a method for single-pole, video-based stop sign and red light enforcement that can operate within the compression video stream. The method uses motion vectors associated with video compression using the general framework of Fig. 3. The method first detects a vehicle in the region of interest using the motion vectors. The vehicle detection is performed by defining a virtual polygon on the image plane and analyzing the motion vectors within the virtual polygon as described in the preceding section.

Monitoring for vehicle stoppage begins once a vehicle is detected. In the context of this application, vehicle stoppage is defined as a sustained absence of motion (where the terms sustained and absence of motion are defined below in terms of numbers of video frames and active motion blocks) in the target virtual area across a predetermined number of frames F0 (or, equivalently, for a predetermined length of time equal to F0/frame rate) after a vehicle has been detected. The value of F0 depends on video frame rate and typically varies across jurisdictions depending on local traffic law. Since absence of motion vectors can also mean the vehicle has left the target virtual area, resumption of motion has to follow vehicle stoppage.

In the case of a red-light monitoring camera, this monitoring stage is only activated when the traffic light is red. This requires a means of communication between the traffic light controller and the camera performing the monitoring. In cases where multiple lanes are controlled by different traffic lights (e.g., there is a left-turn-only lane controlled by a separate light), monitoring across each virtual target area would be independently activated depending on the state of the respective traffic light. Also, while some countries allow lawful turns on red, the law typically stipulates that prior to turning when the traffic light is red, the vehicle must come to a complete stop, so no special treatment for these situations is required.

Stoppage detection with virtual polygons relies on the assumption that the motion of a real object is smooth. Once a vehicle in motion has been detected, the connected cluster of associated motion vectors is tracked across time as it moves along the polygon. If the overlap between active motion clusters between adjacent frames is high, it is assumed that the motion clusters in both frames correspond to the same object. In this manner, multiple vehicles can be reliably monitored at the same time. Vehicle stoppage is detected when an active motion cluster that is being tracked and that is not touching the boundaries of the virtual polygon becomes inactive for a number of frames F0. Specifically, if a cluster that is being tracked goes from having at least N1 active motion blocks to having N1 active motion blocks (where N1N1) and the inactivity status persists for over F0 frames, a stoppage event is detected. Motion resumption is verified by comparing the overlap between the boundaries of the last active motion cluster and the new motion cluster produced by the motion of the vehicle. Tracking of an active cluster is discontinued when the cluster touches the boundary of the polygon that is closest to the stop line. If a vehicle traverses the virtual polygon without triggering a stoppage event, a violation event is triggered. The shape of the virtual polygon may be defined such that it enables enforcement of specific exit zones; for example, while it is permitted for a turning vehicle to resume motion and exit the virtual area when the light is red, the same exception does not apply to vehicles that are not turning. A violation event can be triggered if a vehicle exits the virtual area through an unauthorized exit zone, for example, by turning left or continuing straight when the traffic light is red. Conversely, no violation event is triggered if a vehicle that stopped exits the virtual area through an authorized exit zone.

Experimental validation

We tested the stop sign/red light law enforcement algorithm on a video sequence acquired at a local road, having a total length of 10min and 18.5 k frames at 30 fps. Figure 10 shows the scene and virtual polygon indicated with a red box. We set the motion estimation block size at 16×16pixels as recommended by a number of coding standards. The algorithm parameters were set as follows: N1=2, N1=2, N2=1, and F0=15 frames (that is, the required stoppage time is 0.5s).

Graphic Jump LocationF10 :

Two vehicles traversing virtual target area simultaneously.

The video contained a total of six cars in transit through the monitored traffic lane, which has a stop sign. The algorithm correctly identified all six vehicles that traversed the target area and correctly detected the violations incurred by four of the six vehicles. While it should be noted that the target area could capture vehicles moving in a different direction, detection and monitoring of those vehicles were automatically avoided by using information contained in the orientation of the motion vectors. Another source of error is confusion between multiple vehicles when those vehicles are transiting the target area simultaneously, as illustrated in Fig. 10. This situation was handled satisfactorily by the algorithm.

Note that while the camera setup is adequate for stoppage monitoring, a preferred embodiment would entail the surveillance camera having a frontal, rear, or oblique view of incoming traffic, so that tasks such as automatic license plate recognition can be performed. The vehicle detection and stoppage monitoring modules would require little modification to function adequately in that configuration.

Vehicle Speed Estimation

Excessive vehicle speed is considered a major contributing factor to vehicle crashes,30,31 which in turn result in significant bodily injury or death and monetary loss. It is estimated that 22 and 34% of passenger car and motorcycle fatalities, respectively, were caused by speeding in the United States in 2005 alone; the economic impact of excessive speed-related crashes is estimated at $40.4 billion annually.31 In addition to accidental injury to life and property, high vehicle speeds have a negative impact on the environment. Hydrocarbons nitrogen oxide emissions increase with speed. Carbon monoxide and particulate matter have the lowest emission levels at medium speeds.32 Vehicle speed measurement is a core requirement for speed limit enforcement and is also useful in optimizing speed management strategies, real-traffic control, and traveler information systems.33

Traditional monocular vision-based speed estimation algorithms rely on vehicle detection and feature tracking34,35 to measure distance traveled across image frames and estimate speed based on the knowledge of the video frame rate, typically accompanied by camera calibration data36,37 that map image pixel positions to real-world coordinates. The processing associated with these systems is typically computationally expensive and presents a challenge to achieving real-time performance.

Algorithm

The process of estimating vehicular speed from compression motion vectors involves four steps, namely vehicle detection, vehicle feature tracking, frame-to-frame speed estimation, and average speed estimation. Vehicle detection using motion vectors is described in Secs. 3.1 and 3.2. In the context of speed estimation, Fig. 11 illustrates the use of motion vectors for a given target frame in conjunction with the location of the target virtual area in order to trigger a vehicle detection event. Figures 11(a) and 11(b) show two 1728×2304pixel adjacent frames within one of the test video sequences. Figure 11(c) shows the corresponding 16×16pixel active motion vector field after thresholding the resulting motion vectors with T1=9.

Graphic Jump LocationF11 :

Two sample adjacent (a) reference and (b) target frames along with the resulting (c) motion vector field with active motion vectors only.

Once a vehicle has been detected, a salient feature from its associated active vector cluster is tracked across frames. Tracking the spatial location of the selected feature across time will enable speed estimation. The choice of the feature being tracked may have a significant impact on the accuracy of the speed estimates. Tracking a feature that is as close to the ground as possible is beneficial, otherwise the effect of feature height has to be taken into account.38 While specific knowledge about the characteristics of the vehicle is not available without performing full frame decompression, the spatial characteristics of the active motion clusters associated with a car in motion provide cues regarding the appearance of the vehicle. For example, approximate information regarding the location of the roofline and tires of the car can be extracted from the active motion vector field from Fig. 11(c) (albeit at a lower resolution than the native resolution of the video frames due to the subsampling effects of the motion vector computation from block matching). A binary template with the approximate appearance of a tire in the active motion vector field space was created and correlated with the active motion blob. The location at which the correlation reaches its largest value is assumed to be the approximate location of the tire. This operation is extremely fast because it is performed in a subsampled binary space. However, its accuracy is limited by the size of the motion blocks. The smaller the motion blocks, the (potentially) better the localization accuracy. At each motion field, the output of this stage is the estimated location of the rear tire of the vehicle.

Once the approximate location of the rear tire is determined, the temporal progress of the feature is tracked to determine instantaneous (frame to frame) and average speed estimates. Note, however, that the spatial location of the feature being tracked is typically computed in terms of pixels. In order to convert pixel locations to real-world coordinates, a calibration procedure is employed.39 This calibration process is implemented in the form of a projective transformation that converts pixel coordinates to real-world coordinates given an assumed feature height to the road. The average speed of the vehicle can be estimated from the instantaneous speed estimates through a nonlinear filtering.

Note that the proposed speed estimation algorithm can be directly used in several of the enforcement applications. For applications such as high-accuracy law enforcement, the present method can be used as a prefilter for selection of candidate vehicles for more accurate and computationally intensive processing.

Experimental validation

The proposed algorithm was tested on video sequences of 140 vehicles traveling at speeds between 30 and 60 mph. The video was acquired at the Xerox Services test facilities in Crofton, Maryland, and had a spatial resolution of 1728×2304pixels (4 Mpixel) and a frame rate of 30 fps. In the estimation of motion vectors, the motion estimation block size was set at 16×16pixels with a search range of +/-8pixels along each direction. The proposed algorithm for vehicle speed estimation was used to estimate the speed of the test vehicles and its accuracy determined by comparing the algorithm output to ground truth speed estimates acquired with a Vitronic LIDAR PoliScan Speed Enforcement System with an advertised 1% speed measurement accuracy.

Three target speeds representative of typical U.S. road speed limits were considered: 30, 45, and 60 mph. The test vehicles were instructed to travel at approximately their assigned target speed. Fifty vehicles were assigned to each target speed. Speed estimation error was measured as Display Formula

Speed estimation error=|VEVT|VT,(5)
where |·| denotes the absolute value operator, VT is the ground truth speed as measured with the LIDAR device, and VE is the speed estimated by the proposed algorithm. It can be seen that, as expected, the accuracy of the algorithm declines with the speed of the vehicle. This is because as the vehicle travels faster, it traverses the scene in fewer frames, which results in fewer instantaneous speed estimates. This makes the average speed estimation process more susceptible to noise and to outliers. Note that one way to achieve improved speed estimation accuracy is to increase the frame rate of the video camera, as that would result in a larger number of instantaneous measurements for a given camera configuration and vehicle speed. Figure 12 shows a scatter plot illustrating the relationship between the estimated and the ground truth data, and highlights the negative effect of the outliers in the statistical data.

Graphic Jump LocationF12 :

Scatter plot of estimated versus ground truth speed.

Since outlier vehicles are easily identified from the small number of available instantaneous speed estimation data points, it is reasonable to expect the performance of the algorithm to increase as outlier data are discarded. The amount of discarded data is measured by the yield of the process: the yield is the percentage of vehicles for which speed data are reported. For a typical algorithm, the lower the yield, the higher (lower) the accuracy (estimation error), as long as the process by which the outlier data points are discarded is reasonable.

Figure 13 shows a receiver operating characteristic curve that describes the relationship between accuracy and yield of the algorithm. The yield was controlled by discarding data from vehicles with the fewest instantaneous speed measurements. To this end, the vehicles were rank-ordered according to the number of instantaneous speed estimates available. To obtain a yield of q, the data corresponding to the fraction of 1q vehicles with the fewest instantaneous speed observations were discarded. As the yield decreases from 100%, both the mean and the 95th percentile of the estimation error decrease.

Graphic Jump LocationF13 :

Receiver operating characteristic curve of the proposed vehicle speed estimation algorithm.

The framework we presented is an effective solution for efficient mining of compressed video data stored in large databases. Our method effectively utilizes compression motion vectors to enable different surveillance applications for vehicle/incident search in the compressed domain without fully decompressing the video data. The framework also allows building the applications directly into the compression step inside the camera to achieve real-time performance in multifunction surveillance cameras. The proposed method is compatible with the state-of-the-art compression techniques such as MPEG4 and H.264 in the sense that the motion vectors used by the algorithms are the ones calculated in the standard video compression techniques.

Several video processing algorithms have been previously proposed for the applications presented in this paper.68,34,35 These algorithms can typically be implemented within a surveillance camera to process the acquired videos. Our algorithms share similar processing steps as these methods. For example, considering vehicular speed estimation algorithm, the steps involved are standard, namely vehicle detection, vehicle feature tracking, frame-to-frame speed estimation, and average speed estimation. The main advantage of our algorithms over previously proposed methods is in terms of computational complexity when integrated within the compression unit of a surveillance camera. Starting from the premise that motion-compensated video compression can be performed in real time, and is easily embedded in the hardware of commercially available IP cameras, the implementation of our algorithms described in this paper would require a negligible amount of additional computational software and hardware on top of what is required for video compression. All operations involved in our algorithms are performed in a spatially subsampled image space, with a subsampling rate equal to the dimensions of the motion blocks, that is also binary because only motion blocks with lengths greater than prespecified thresholds are taken into account. The operations involved are computationally efficient morphological operations (e.g., erosions, dilations, closings, etc.) and binary template correlations. In terms of required storage and memory requirements, and as opposed to traditional tracking algorithms, no multiframe buffering is required in our case since the tracking can be performed from an array of integer numbers containing the history of the locations of the vehicle feature being tracked. Also note that our methods are easily adaptable to surveillance cameras because compression is inherent to video cameras due to bandwidth and storage limitations. A drawback of performing video processing from the compression motion vectors is that detailed information (e.g., license plate, vehicle color, etc.) cannot be extracted from the motion vectors. This, however, does not represent a major drawback because detailed information can be extracted in an offline processing from the frames of interest that can be labeled by processing the compression motion vectors in the online phase.

Wikipedia, “Intelligent transportation systems,” http://en.wikipedia.org/wiki/Intelligent_transportation_system (March 2013).
Goldenbeld  C., Van Schagen  I., “The effects of speed enforcement with mobile radar on speed and accidents: an evaluation study on rural roads in the Dutch province friesland,” Accid. Anal. Prev.. 37, (6 ), 1135 –1144 (2005). 0001-4575 CrossRef
Bourn  J., “Tackling congestion by making better use of England’s motorways and trunk roads,” National Audit Office, the United Kingdom, 2006, http://www.nao.org.uk/wp-content/uploads/2004/11/040515.pdf (March 2013).
U.S. Department of Transportation Federal Highway Administration, “Surveillance cameras in transportation systems,” http://www.fhwa.dot.gov/policyinformation/pubs/vdstits2007/05.cfm (March 2013).
Tseng  B., Lin  C., Smith  J., “Real-time video surveillance for traffic monitoring using virtual line analysis,”  IEEE Int. Conf. on Multimedia, and Expo , Vol. 2, pp. 541 –544,  IEEE ,  Lausanne, Switzerland  (2002).
Pang  C., Lam  W., Yung  N. H. C., “A method for vehicle count in the presence of multiple-vehicle occlusions in traffic images,” IEEE Trans. Intell. Transp. Syst.. 8, (3 ), 441 –459 (2007). 1524-9050 CrossRef
Haag  M., Nagel  H.-H., “Incremental recognition of traffic situations from video image sequences,” Image Vis. Comput.. 18, (2 ), 137 –153 (2000). 0262-8856 CrossRef
de Oliveira  A., Scharcanski  J., “Vehicle counting and trajectory detection based on particle filtering,” in  23rd SIBGRAPI Conf. on Graphics, Patterns and Images , pp. 376 –383,  IEEE ,  Gramado, Brazil  (2010).
Bas  E., Tekalp  A., Salman  F., “Automatic vehicle counting from video for traffic flow analysis,” in  IEEE Intelligent Vehicles Symp. , pp. 392 –397,  IEEE ,  Istanbul, Turkey  (2007).
Fishbain  B. et al., “Real-time vision-based traffic flow measurements and incident detection,” Proc. SPIE. 7244, , 72440I  (2009). 0277-786X CrossRef
Chao  T.-H., Lau  B., Park  Y., “Vehicle detection and classification in shadowy traffic images using wavelets and neural networks,” Proc. SPIE. 2902, , 136 –147 (1997). 0277-786X CrossRef
Shi  P., Jones  E. G., Zhu  Q., “Median model for background subtraction in intelligent transportation system,” Proc. SPIE. 5298, , 168 –176 (2004). 0277-786X CrossRef
Tan  X., Li  J., Liu  W., “Approach for counting vehicles in congested traffic flow,” Proc. SPIE. 5671, , 228 –236 (2005). 0277-786X CrossRef
Kan  W. Y., Krogmeier  J. V., Doerschuk  P. C., “Hidden Markov model for the detection and tracking of highway vehicles in image sequences,” Proc. SPIE. 2847, , 234 –242 (1996). 0277-786X CrossRef
Bowen  F. R. et al., “Dynamic content based vehicle tracking and traffic monitoring system,” Proc. SPIE. 6497, , 64970I  (2007). 0277-786X CrossRef
Bergendahl  J., Masaki  I., Horn  B. K., “Three-camera stereo vision for intelligent transportation systems,” Proc. SPIE. 2902, , 42 –51 (1997). 0277-786X CrossRef
Bulan  O. et al., “Vehicle-triggered video compression/decompression for fast and efficient searching in large video databases,” Proc. SPIE. 8663, , 86630Q  (2013). 0277-786X CrossRef
Gharai  L. et al., “Experiences with high definition interactive video conferencing,” in  Proceedings of IEEE Int. Conf. on Multimedia and Expo , pp. 433 –436,  IEEE ,  Toronto, Canada  (2006).
Richardson  I., H. 264 and MPEG-4 Video Compression. ,  Wiley Online Library  (2003).
Puri  A., Chen  X., Luthra  A., “Video coding using the H. 264/MPEG-4 AVC compression standard,” Signal Process. Image Commun.. 19, (9 ), 793 –849 (2004). 0923-5965 CrossRef
“ISO/IEC 14496-2:2004—information technology—coding of audio-visual objects—part 2: Visual,” 2004, http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=61988 ( September 2013).
Huang  Y., Zhuang  X., “Motion-partitioned adaptive block matching for video compression,” in  Proc. of IEEE Int. Conf. on Image Processing , Vol. 1, pp. 554 –557,  IEEE ,  Washington, DC  (1995).
Horn  B., Schunck  B., “Determining optical flow,” Artif. Intell.. 17, (1–3 ), 185 –203 (1981). 0004-3702 CrossRef
Lucas  B. et al., “An iterative image registration technique with an application to stereo vision,” in  Proc. of the Int. Joint Conf. on Artificial Intelligence , Vol. 3, pp. 674 –679,  IJCAI ,  Vancouver, British Columbia  (1981).
Fadzil  M. A., Dennis  T., “A hierarchical motion estimator for interframe coding,” in  IEEE Colloquium on Applications of Motion Compensation , pp. 3 –11,  IEEE ,  London, UK  (1990).
Liu  B., Zaccarin  A., “New fast algorithms for the estimation of block motion vectors,” IEEE Trans. Circuits Syst. Video Technol.. 3, (2 ), 148 –157 (1993). 1051-8215 CrossRef
Wikipedia, “Amber-Alert,” http://en.wikipedia.org/wiki/AMBER_Alert (January 2013).
Tchrakian  T. T., Basu  B., Mahony  M. O., “Real-time traffic flow forecasting using spectral analysis,” IEEE Trans. Intell. Transp. Syst.. 13, (2 ), 519 –526 (2012). 1524-9050 CrossRef
Rodier  C. J., Shaheen  S. A., Cavanagh  E., “Automated speed enforcement in the US: a review of the literature on benefits and barriers to implementation,” Research Report UCD-ITS-RR-07-17 in Institute of Transportation Studies, University of California, Davis (2007).
NHTSA, “Traffic safety facts, 2005 speeding data,” DOT HS810629, NHTSA’s National Center for Statistical Analysis. ,  Washington, DC , http://www-nrd.nhtsa.dot.gov/Pubs/810623.pdf (2006).
Kallberg  V.-P. et al., “Recommendations for speed management on European roads,” in  Proc. of 78th Annual Meeting of the Transportation Research Board , pp. 1 –12,  National Research Council ,  Washington, DC  (1999).
Wang  Y., Nihan  N. L., “Freeway traffic speed estimation with single-loop outputs,” Transp. Res. Rec.. 1727, (-1 ), 120 –126 (2000). 0361-1981 CrossRef
Wu  J. et al., “An algorithm for automatic vehicle speed detection using video camera,” in  Proc. 4th Int. Conf. on Computer Science & Education , pp. 193 –196,  IEEE ,  Nanning, China  (2009).
Wimalaratna  L., Sonnadara  D., “Estimation of the speeds of moving vehicles from video sequences,” in  Proc. of the Technical Sessions, Institute of Physics, Sri Lanka , Vol. 24, pp. 6 –12 (2008).
Pumrin  S., Dailey  D., “Roadside camera motion detection for automated speed measurement,” in  Proc. of the IEEE 5th Int. Conf. on Intelligent Transportation Systems , pp. 147 –151,  IEEE ,  Singapore  (2002).
Schoepflin  T. N., Dailey  D. J., “Dynamic camera calibration of roadside traffic management cameras for vehicle speed estimation,” IEEE Trans. Intell. Transp. Syst.. 4, (2 ), 90 –98 (2003). 1524-9050 CrossRef
Rad  A. G., Dehghani  A., Karim  M. R., “Vehicle speed detection in video image sequences using CVS method,” Int. J. Phys. Sci.. 5, (17 ), 2555 –2563 (2010). 1992-1950 
Kanhere  N. K., Birchfield  S. T., “A taxonomy and analysis of camera calibration methods for traffic monitoring applications,” IEEE Trans. Intell. Transp. Syst.. 11, (2 ), 441 –452 (2010). 1524-9050 CrossRef

Grahic Jump LocationImage not available.

Orhan Bulan received his BS degree with high honors in electrical and electronics engineering from Bilkent University, Ankara, Turkey, in 2006 and his MS and PhD degrees in electrical and computer engineering from University of Rochester, New York, in 2007 and 2012, respectively. He is currently a postdoctoral fellow in the Xerox Research Center Webster, New York. He was with Xerox Research Center, Webster, during the summers of 2009, 2010, and 2011 as a research intern. He is the recipient of the best student paper award at the 2008 Western New York Image Processing Workshop organized by the Rochester Chapter of the IEEE Signal Processing Society. His recent research interests include signal/image processing, video processing, computer vision, and machine learning. He has 4 issued patents and over 10 pending patent applications in these areas.

Grahic Jump LocationImage not available.

Edgar A. Bernal is a senior research scientist at the Xerox Research Center in Webster. He joined Xerox in 2006 with MSc and PhD degrees in electrical engineering from Purdue University, West Lafayette, Indiana. His earlier career was focused on the areas of image processing, halftoning, image perception, watermarking, and color theory. His current research activities include computer vision, video compression, video-based object tracking, machine learning for financial data analytics, and the application of novel sensing technologies to healthcare and transportation. He has multiple papers and patents in areas related to his current and past research interests. He is a senior member of IEEE and serves as the vice-chair of the Rochester chapter of the IEEE Signal Processing Society. He also serves as an adjunct faculty member at the Rochester Institute of Technology, Center for Imaging Science and is a frequent reviewer for IEEE Transactions on Image Processing, the Journal of Electronic Imaging, and the Journal of Imaging Science and Technology.

Grahic Jump LocationImage not available.

Robert P. Loce is a research fellow and technical manager in the Xerox Research Center, Webster. He joined Xerox in 1981 with an associate degree in optical engineering technology from Monroe Community College. While working in optical and imaging technology and research departments at Xerox, he received a BS in photographic science (RIT 1985), an MS in optical engineering (UR 1987), and a PhD in imaging science (RIT 1993), and passed the U.S. patent bar in 2002. A significant portion of his earlier career was devoted to development of image processing methods for color electronic printing. His current research activities involve leading an organization and projects into new video processing and computer vision technologies that are relevant to transportation and healthcare. He has publications and many patents in the areas of digital image processing, image enhancement, imaging systems, and optics. He is a fellow of SPIE and a senior member of IEEE. His publications include a book on enhancement and restoration of digital documents, and book chapters on digital halftoning and digital document processing. He is currently an associate editor for Journal of Electronic Imaging, and has been an associate editor for Real-Time Imaging and IEEE Transactions on Image Processing.

© The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation

Orhan Bulan ; Edgar A. Bernal and Robert P. Loce
"Efficient processing of transportation surveillance videos in the compressed domain", J. Electron. Imaging. 22(4), 041116 (Sep 24, 2013). ; http://dx.doi.org/10.1117/1.JEI.22.4.041116


Figures

Graphic Jump LocationF2 :

Block-based motion estimation algorithm.

Graphic Jump LocationF3 :

Analysis of traffic surveillance videos using compression motion vectors.

Graphic Jump LocationF4 :

Efficient vehicle search in large compressed video databases.

Graphic Jump LocationF5 :

Active motion blocks indicating the existence of a moving vehicle in these blocks.

Graphic Jump LocationF6 :

A vehicle detection entering [(a) and (b)] and exiting [(c) and (d)] the virtual sensor.

Graphic Jump LocationF7 :

The view of angles for three test videos: (a) test video 1, (b) test video 2, and (c) test video 3.

Graphic Jump LocationF9 :

Publicly available test videos captured in different settings.

Graphic Jump LocationF8 :

Two sample adjacent (a) reference and (b) target frames along with the resulting (c) motion vector field. A virtual line and polygon are also depicted.

Graphic Jump LocationF10 :

Two vehicles traversing virtual target area simultaneously.

Graphic Jump LocationF11 :

Two sample adjacent (a) reference and (b) target frames along with the resulting (c) motion vector field with active motion vectors only.

Graphic Jump LocationF12 :

Scatter plot of estimated versus ground truth speed.

Graphic Jump LocationF13 :

Receiver operating characteristic curve of the proposed vehicle speed estimation algorithm.

Tables

Table Grahic Jump Location
Table 1Bits corresponding to different frame modes in the video object plane header in MPEG-4 compression standard.
Table Grahic Jump Location
Table 2Number of frames and vehicles passing through the scene for each test video.
Table Grahic Jump Location
Table 5The performance of the classical approach where the reference frames are selected in a fixed rate.
Table Grahic Jump Location
Table 6Performance of the proposed method on videos captured on a local road from three different vantage points with various spurious sources of motion.
Table Grahic Jump Location
Table 7Performance of the proposed method on publicly available videos captured in three different settings.
Table Grahic Jump Location
Table 4Frequencies of occurrence of the two error events, namely misses and false alarms, for three different motion blocks.
Table Grahic Jump Location
Table 3The numbers of occurrence of two error events (i.e., misses and false alarms) for three different motion blocks.

References

Wikipedia, “Intelligent transportation systems,” http://en.wikipedia.org/wiki/Intelligent_transportation_system (March 2013).
Goldenbeld  C., Van Schagen  I., “The effects of speed enforcement with mobile radar on speed and accidents: an evaluation study on rural roads in the Dutch province friesland,” Accid. Anal. Prev.. 37, (6 ), 1135 –1144 (2005). 0001-4575 CrossRef
Bourn  J., “Tackling congestion by making better use of England’s motorways and trunk roads,” National Audit Office, the United Kingdom, 2006, http://www.nao.org.uk/wp-content/uploads/2004/11/040515.pdf (March 2013).
U.S. Department of Transportation Federal Highway Administration, “Surveillance cameras in transportation systems,” http://www.fhwa.dot.gov/policyinformation/pubs/vdstits2007/05.cfm (March 2013).
Tseng  B., Lin  C., Smith  J., “Real-time video surveillance for traffic monitoring using virtual line analysis,”  IEEE Int. Conf. on Multimedia, and Expo , Vol. 2, pp. 541 –544,  IEEE ,  Lausanne, Switzerland  (2002).
Pang  C., Lam  W., Yung  N. H. C., “A method for vehicle count in the presence of multiple-vehicle occlusions in traffic images,” IEEE Trans. Intell. Transp. Syst.. 8, (3 ), 441 –459 (2007). 1524-9050 CrossRef
Haag  M., Nagel  H.-H., “Incremental recognition of traffic situations from video image sequences,” Image Vis. Comput.. 18, (2 ), 137 –153 (2000). 0262-8856 CrossRef
de Oliveira  A., Scharcanski  J., “Vehicle counting and trajectory detection based on particle filtering,” in  23rd SIBGRAPI Conf. on Graphics, Patterns and Images , pp. 376 –383,  IEEE ,  Gramado, Brazil  (2010).
Bas  E., Tekalp  A., Salman  F., “Automatic vehicle counting from video for traffic flow analysis,” in  IEEE Intelligent Vehicles Symp. , pp. 392 –397,  IEEE ,  Istanbul, Turkey  (2007).
Fishbain  B. et al., “Real-time vision-based traffic flow measurements and incident detection,” Proc. SPIE. 7244, , 72440I  (2009). 0277-786X CrossRef
Chao  T.-H., Lau  B., Park  Y., “Vehicle detection and classification in shadowy traffic images using wavelets and neural networks,” Proc. SPIE. 2902, , 136 –147 (1997). 0277-786X CrossRef
Shi  P., Jones  E. G., Zhu  Q., “Median model for background subtraction in intelligent transportation system,” Proc. SPIE. 5298, , 168 –176 (2004). 0277-786X CrossRef
Tan  X., Li  J., Liu  W., “Approach for counting vehicles in congested traffic flow,” Proc. SPIE. 5671, , 228 –236 (2005). 0277-786X CrossRef
Kan  W. Y., Krogmeier  J. V., Doerschuk  P. C., “Hidden Markov model for the detection and tracking of highway vehicles in image sequences,” Proc. SPIE. 2847, , 234 –242 (1996). 0277-786X CrossRef
Bowen  F. R. et al., “Dynamic content based vehicle tracking and traffic monitoring system,” Proc. SPIE. 6497, , 64970I  (2007). 0277-786X CrossRef
Bergendahl  J., Masaki  I., Horn  B. K., “Three-camera stereo vision for intelligent transportation systems,” Proc. SPIE. 2902, , 42 –51 (1997). 0277-786X CrossRef
Bulan  O. et al., “Vehicle-triggered video compression/decompression for fast and efficient searching in large video databases,” Proc. SPIE. 8663, , 86630Q  (2013). 0277-786X CrossRef
Gharai  L. et al., “Experiences with high definition interactive video conferencing,” in  Proceedings of IEEE Int. Conf. on Multimedia and Expo , pp. 433 –436,  IEEE ,  Toronto, Canada  (2006).
Richardson  I., H. 264 and MPEG-4 Video Compression. ,  Wiley Online Library  (2003).
Puri  A., Chen  X., Luthra  A., “Video coding using the H. 264/MPEG-4 AVC compression standard,” Signal Process. Image Commun.. 19, (9 ), 793 –849 (2004). 0923-5965 CrossRef
“ISO/IEC 14496-2:2004—information technology—coding of audio-visual objects—part 2: Visual,” 2004, http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=61988 ( September 2013).
Huang  Y., Zhuang  X., “Motion-partitioned adaptive block matching for video compression,” in  Proc. of IEEE Int. Conf. on Image Processing , Vol. 1, pp. 554 –557,  IEEE ,  Washington, DC  (1995).
Horn  B., Schunck  B., “Determining optical flow,” Artif. Intell.. 17, (1–3 ), 185 –203 (1981). 0004-3702 CrossRef
Lucas  B. et al., “An iterative image registration technique with an application to stereo vision,” in  Proc. of the Int. Joint Conf. on Artificial Intelligence , Vol. 3, pp. 674 –679,  IJCAI ,  Vancouver, British Columbia  (1981).
Fadzil  M. A., Dennis  T., “A hierarchical motion estimator for interframe coding,” in  IEEE Colloquium on Applications of Motion Compensation , pp. 3 –11,  IEEE ,  London, UK  (1990).
Liu  B., Zaccarin  A., “New fast algorithms for the estimation of block motion vectors,” IEEE Trans. Circuits Syst. Video Technol.. 3, (2 ), 148 –157 (1993). 1051-8215 CrossRef
Wikipedia, “Amber-Alert,” http://en.wikipedia.org/wiki/AMBER_Alert (January 2013).
Tchrakian  T. T., Basu  B., Mahony  M. O., “Real-time traffic flow forecasting using spectral analysis,” IEEE Trans. Intell. Transp. Syst.. 13, (2 ), 519 –526 (2012). 1524-9050 CrossRef
Rodier  C. J., Shaheen  S. A., Cavanagh  E., “Automated speed enforcement in the US: a review of the literature on benefits and barriers to implementation,” Research Report UCD-ITS-RR-07-17 in Institute of Transportation Studies, University of California, Davis (2007).
NHTSA, “Traffic safety facts, 2005 speeding data,” DOT HS810629, NHTSA’s National Center for Statistical Analysis. ,  Washington, DC , http://www-nrd.nhtsa.dot.gov/Pubs/810623.pdf (2006).
Kallberg  V.-P. et al., “Recommendations for speed management on European roads,” in  Proc. of 78th Annual Meeting of the Transportation Research Board , pp. 1 –12,  National Research Council ,  Washington, DC  (1999).
Wang  Y., Nihan  N. L., “Freeway traffic speed estimation with single-loop outputs,” Transp. Res. Rec.. 1727, (-1 ), 120 –126 (2000). 0361-1981 CrossRef
Wu  J. et al., “An algorithm for automatic vehicle speed detection using video camera,” in  Proc. 4th Int. Conf. on Computer Science & Education , pp. 193 –196,  IEEE ,  Nanning, China  (2009).
Wimalaratna  L., Sonnadara  D., “Estimation of the speeds of moving vehicles from video sequences,” in  Proc. of the Technical Sessions, Institute of Physics, Sri Lanka , Vol. 24, pp. 6 –12 (2008).
Pumrin  S., Dailey  D., “Roadside camera motion detection for automated speed measurement,” in  Proc. of the IEEE 5th Int. Conf. on Intelligent Transportation Systems , pp. 147 –151,  IEEE ,  Singapore  (2002).
Schoepflin  T. N., Dailey  D. J., “Dynamic camera calibration of roadside traffic management cameras for vehicle speed estimation,” IEEE Trans. Intell. Transp. Syst.. 4, (2 ), 90 –98 (2003). 1524-9050 CrossRef
Rad  A. G., Dehghani  A., Karim  M. R., “Vehicle speed detection in video image sequences using CVS method,” Int. J. Phys. Sci.. 5, (17 ), 2555 –2563 (2010). 1992-1950 
Kanhere  N. K., Birchfield  S. T., “A taxonomy and analysis of camera calibration methods for traffic monitoring applications,” IEEE Trans. Intell. Transp. Syst.. 11, (2 ), 441 –452 (2010). 1524-9050 CrossRef

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

PubMed Articles
Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.