FERARI will implement its overall vision for building large scale distributed systems by decomposing it into a number of specific objectives:
Objective 1: Provide support for large scale services by making the sensor layer a first class citizen in Big Data architectures. FERARI advocates in situ processing (right where the data is generated) as a first choice for scalable processing of the future large scale services. It turns out that this is the most principled and, in the long run, the only realistic approach for keeping up with exponentially growing information at the sensors. It is the only approach that can avoid unnecessary data shuffling between nodes. Doing so requires a rethinking of current assumptions in state-of-the-art Big Data architectures, e.g. adding control flow mechanisms to the data flow capabilities.
Objective 2: Provide support for Complex Event Processing technology for business users in Big Data architectures: The goal is to bring stream processing much closer to the business world by extending simple stream processing of numeric or textual data to the much more powerful realm of Complex Event Processing (CEP). One of the open challenges for Big Data is to transform the approach from its original application areas – large scale web data processing – to other areas of business and industry. Only if this can be achieved it will live up to its economic promises. Providing a seamless model that applies CEP as part of Big Data application in a way easily consumable by a business will position CEP a great step towards bridging this gap.
Objective 3: Provide support for integrating machine learning tasks in the architecture. For implementing automated observe-decide-act cycles, it is not enough to analyse incoming information piece-by-piece, doing simple data transformations, aggregations and statistics. Instead it is necessary to learn sophisticated models based on machine learning techniques, which are then the basis for decisions and actions. Current Big Data architectures are mostly data flow driven and lack some of the required functionality for supporting efficient learning. The goal is to use the added control flow capabilities (objective 1) to support such learning algorithms.
Objective 4: Provide support for flexible and adaptive analytics workflows. Current data flows are difficult to set up and to maintain. One goal (taking advantage of objectives 2 and 3) is to support workflows that are more adaptable to changing data distributions and changes in either the environment or in the requirements by supporting adaptive event-driven workflows using machine learning techniques. A second direction is to provide tools that simplify the set-up of such processes.
Objective 5: Exemplify the potential of the new architecture in the telecommunication and the cloud domain. To show the potential of the approach, FERARI has selected two scenarios in challenging, high-impact areas of industry, where communication bottlenecks currently are severe limiting factors. These scenarios are (1) the analysis of mobile phone fraud in telecommunication networks and (2) real-time health monitoring in clouds and large data centers as a scenario where already today high volume of data is severely limiting the optimization and monitoring of IT systems.