“Profiling Across Spatio-temporal Data” Technology to Enable Detection of Suspicious Unregistered Individuals among Multiple Surveillance Camera Images
The past few years have seen the introduction of huge numbers of surveillance cameras in urban locations, both private and public. To date, the images recorded by these cameras have largely been used in the investigation of crimes after the crimes have occurred. However, with the growing threat of crimes that pose a significant risk of major damage or loss of life such as terrorist attacks, governments and security institutions are seeking ways to detect and prevent such incidents before they occur. This paper discusses new image search technology that makes it possible to detect not only those individuals already registered in a security database such as conventional wanted suspects, but also individuals who may not be registered but exhibit suspicious behavior. Using NEC’s face recognition technology and analyzing pedestrian appearance patterns, this technology is expected to help prevent crimes by facilitating early detection of suspicious individuals.
We have recently seen the introduction of vast numbers of surveillance cameras in city streets and commercial facilities. Until now, these cameras have been used mainly to support criminal investigations and to help find wanted suspects. At the same time, many countries find themselves facing an increased threat from terrorist attacks that could wreak tremendous havoc. Stopping these crimes before they happen is of critical importance and new technologies are being developed to enable surveillance cameras to help detect and prevent crimes before they occur.
In the past, surveillance images were typically viewed and analyzed by humans. Today, however, the vast amounts of image data being generated by surveillance cameras make it all but impossible for humans to handle these tasks.
Instead of relying on human eyes, modern surveillance systems are increasingly turning to image recognition technology - which is a type of artificial intelligence (AI). Progress in the field of image recognition technology has been remarkable, and today it is possible to recognize objects with high precision. NeoFace, a face recognition engine incorporating NEC’s face recognition technology, enables the identification of individual faces with astonishing accuracy.
This image recognition technology has already been used to develop systems capable of retrieving specific individuals from large amounts of video data1). These systems search for the faces of specific individuals based on the photographs provided beforehand.
However, these search technologies are restricted to searching for individuals who have already been registered, whether they are known criminals or have been flagged by the security establishment. The downside to this is that not all potential criminals will be registered, so preventing crimes requires the ability to detect suspicious individuals not already registered. In other words, in order to prevent crimes, the system has to be able to search the data for any suspicious individuals regardless of whether or not they are already known to authorities.
2.Individual Search Based on Appearance Patterns
Now, how can we retrieve unknown suspicious individuals from a massive amount of video images? What we have focused on is the patterns produced by certain individuals when they appear in front of the camera.
What do we mean by this? Think about someone who is planning a crime or a pickpocket stalking commuters and tourists. They tend to repeatedly look at the planned place of the crime in advance or loiter for extended periods while they wait for their targets (Fig. 1). In other words, there is a very good chance that anyone who appears frequently and regularly in surveillance video images is likely engaged in suspicious activity. Consequently, we developed a search system capable of retrieving unregistered individuals who are considered suspicious based on their appearance patterns - that is, individuals who appear in the images frequently2)–4).
2.1 Issues When This Technology is Put into Practical Use
Actually applying this system is extremely difficult, as a massive amount of processing would be required to search for suspicious individuals based on such appearance patterns. Let’s examine this using Fig. 2. In order to search for specific individuals based on their appearance patterns, it is necessary to extract the appearance patterns as to when and where they have appeared from all the people who have been recorded in the surveillance images. The extraction of the appearance patterns can be achieved by finding out where certain individuals in certain scenes were recorded other than in those scenes.
For example, determining whether the male in the leftmost scene in Fig. 2 is the same person as the individual recorded in the other scenes makes it possible to extract his appearance patterns. Whether or not he is the same person can be determined by comparing the faces in the leftmost scene with the other scenes (whether not there is a sufficient degree of resemblance between the faces in the various scenes). Similarly, this procedure is also performed for the female recorded in the subsequent scene. When this procedure is performed for all individuals recorded in the video images, their appearance patterns can be extracted.
Moreover, determining whether or not the extracted appearance pattern matches the targeted appearance pattern such as unnaturally frequent appearances, for example, makes it possible to search for a suspicious person.
Currently, however, all the people who appear in the video must be compared with one another in order to identify which ones are the same individuals. With so many surveillance cameras already installed and more being added all the time, together with longer recording times, the number of comparisons that have to be made is expanding at an exponential rate. As a result, it is simply impractical to search for individuals in this way.
2.2 A Solution That Uses an Index Structure Based on Similarities
In order to deal with the enormous increase in the number of comparisons, we have developed a system that dramatically reduces the number of comparisons that need to be made4). The basic idea is to focus on comparisons of faces with greater similarities, rather than comparing all faces in the images. We refer to Fig. 3 to explain how this works.
The index structure uses a tree structure and is structured in such a way that the lower the hierarchy in the tree, the greater the similarities between the data. With face data for example, faces that bear little resemblance to one another appear in the higher hierarchies of the tree, while faces that more closely resemble one another and even faces that belong to the same people are shown in the lower hierarchies. As a result, groups of similar data can be retrieved at high speed by targeting lower hierarchies in the tree.
3.Profiling Across Spatiotemporal Data Technology
“Profiling Across Spatio-temporal Data” technology is a method of searching for specific individuals according to their appearance patterns in video data shot with multiple cameras in multiple locations. This system utilizes the index of similarities described above2)–4).
Described below is a case where frequently appearing individuals are searched by determining whether or not the faces belong to the same individuals. First, the face information is extracted from the video images to construct an index structure based on the above-mentioned similarities. Next, a threshold level is set for the degree of facial similarities required to qualify as the same individual. Groups of faces with similarities that exceed the predetermined threshold level are then extracted from the index structure. At this time, each group becomes a set of data regarding the individual who seems to be the same person. The number of data that is contained in this group becomes the frequency of the appearance of that person. Finally, the groups with higher frequencies are extracted, and these can be assumed to contain the individuals who appear frequently.
Fig. 4 shows the use of the “Profiling Across Spatio-temporal Data” technology to search for individuals who have frequently appeared in the images captured by multiple cameras. Here, demonstration images are used in which people who act like suspicious individuals and are recorded repeatedly. “Profiling Across Spatio-temporal Data” technology was applied to these images to search for individuals who appeared frequently.
The results of the search are shown to the left of the video images. The individuals are listed in order of appearance frequency in the video images. Each row shows the facial images of the representative scenes in which the apparently same individual appeared. In these search results, the individual who appeared most frequently was the person acting as a suspicious individual who appeared in the video most frequently (ranked No. 1). The scenes in which he appeared (the circled person) are shown in the screens on the right.
To verify the effectiveness of this technology, we conducted a demonstration experiment using facial images obtained from surveillance cameras in cooperation with local authorities overseas. In this experiment, we used a total of 24-hour video images collected from multiple cameras installed in busy locations. When we analyzed these images, we were able to obtain a total of about one million facial images. When we applied our cross-profiling system, it took only 10 seconds to obtain a list of individuals who were frequently loitering in the surveilled locations. This is more than 100 times faster than the conventional method in which all faces are compared.
When conducting the experiment, we requested that actors portraying suspicious individuals also be recorded in the surveillance images. The number of actors and who they were was not to be disclosed to us. When we presented the top 20 candidates for suspicious behavior, we found that all 7 actors were included in the candidate list. We believe that this experiment verified the effectiveness of our technology in retrieving those individuals exhibiting suspicious behavior such as loitering from video data in which a large number of unspecified people come and go.
Our description so far has assumed that people who frequently appear in the streets might be suspicious individuals. We are confident that this technology can also be applied to marketing and tourism industries, for example, for customers who frequent the store or tourists who are lost in the street.
In this paper, we have introduced an example of our new image search technology - “Profiling Across Spatio-temporal Data” technology, a method of searching for individuals based on appearance patterns. This makes it possible to retrieve suspicious unregistered individuals - in addition to individuals who have already been registered (as is the case with conventional systems) - from vast amounts of image data. This will enable surveillance cameras to be utilized not only in the investigation of crimes, but in the prevention of crimes as well.
We plan to turn this technology into a product in FY 2016.
1) Jianquan Liu, Shoji Nishimura, Takuya Araki： Wally: A Scalable Distributed Automated Video Surveillance System with Rich Search Functionalities，ACM Multimedia 2014
2) Jianquan Liu, Shoji Nishimura, Takuya Araki： AntiCrime: A System for Suspect Retrieval and Loitering Discovery in Large Scale Surveillance Videos，IEEE CVPR (Demo)，2016
3) Jianquan Liu, Shoji Nishimura, Takuya Araki： VisLoiter: A System to Visualize Loiterers Discovered from Surveillance Videos，SIGGRAPH Posters 2016
4) Jianquan Liu, Shoji Nishimura, Takuya Araki： AntiLoiter: A Loitering Discovery System for Longtime Videos across Multiple Surveillance Cameras. ACM Multimedia 2016
System Platform Research Laboratories
System Platform Research Laboratories
System Platform Research Laboratories