In-situ Monitoring Technologies

Modern information systems are inherently distributed. In many cases, such as telecommunication systems, content management systems, and financial systems, data is processed in physically distributed locations. Even when data is processed at a central location, modern data centres are comprised of thousands (or even tens of thousands) of computation and storage units, which in practice process data in a distributed manner.

Consequently, monitoring the data processed by these systems to detect in real-time global events of interest is becoming increasingly challenging. This challenge is clearly evident in the FERARI use cases. The first use case deals with telecom fraud. Fraudsters exploit the distributed nature of telecom systems to devise ever more sophisticated methods of fraud. The second use case is system health monitoring, which consists of the real-time detection of abnormal performance of system components so that they can be serviced before they fail, and thus enhance the availability and robustness of the system while reducing costs.

The central goal of the FERARI project is to preform most of the processing of distributed data in a local manner, and thus detect complex patterns of interest (e.g. fraudulent activity or abnormal performance) without collecting data to a centralized location. This technology is referred to as in-situ processing. The FERARI project has introduced several novel innovations in the application of in-situ methods to real-world applications.

The innovations introduced by FERARI to in-situ processing are within the geometric monitoring framework[1], which enables defining complex events by applying an arbitrary function to data from distributed sources. Geometric monitoring solutions enable breaking up such monitoring tasks into individual constraints on the data generated at the local sites. The idea is that as long as none of the local constraints are violated, it is guaranteed that the global event of interest has not occurred.

Distributing the detection among the local sites reduces time delays and bottle necks associated with collecting data to a central location, and thus enables detecting such events in real-time. A major drawback of geometric monitoring, however, is that applying the types of constraints proposed thus far may be very demanding in terms of computational resources at the local sites.

The first innovation introduced by FERARI is a novel type of local constraint for geometric monitoring, known as convex bounds. While existing methods for determining local constraints rely on solutions to complex optimization problems, convex bounds employ functional analysis methods for selecting a convex function that bounds the function used to define the event of interest. While specifying such bounds for a given function may require some mathematical expertise, once it has been specified, applying the bound on data in run-time is very efficient. In addition, while this has not been the original motivation, empirical evidence has shown that these constraints are also very efficient in the sense that they produce very small number of false negatives (i.e. instances where that constraints were violated but the global event has not occurred) in comparison to existing methods.

A second innovation introduced by FERARI is the adaptation of geometric monitoring methods to dynamic large-scale industrial settings. So far, geometric monitoring has been applied in laboratory settings as part of academic research. Using geometric monitoring in an industrial production environment requires extending the method with several non-trivial adaptation.

Laboratory setups assume that the number of sites participating in the monitoring task is fixed and known in advance. Typically the algorithms are run on at most several hundred sites. Applying the method to a distributed counting task over CDR records produced by the cell towers of a telecom provider, as described in Deliverable D3.2, involves running the monitoring task on over 18,000 cell towers. Additionally, the set of active subscribers in not known in advance, and every subscriber interacts only with a small subset of these cell towers.

The efficient application of geometric monitoring in this case required extending the monitoring framework to simultaneously run concurrent monitoring tasks for an unknown and dynamic set of subscribers. Furthermore the local constraints were modified to support a different and dynamic set of sites for each subscriber.

[1] Sharfman, Izchak, Assaf Schuster, and Daniel Keren. “A geometric approach to monitoring threshold functions over distributed data streams.” ACM Transactions on Database Systems (TODS) 32.4 (2007): 23.