The Internet of Things (IoT) is steadily generating inconceivable amounts of data. According to Gartner, 8.4 billion things will be in use in 2017, up 31 percent from 2016. And analysts expect this number to reach 20.4 billion by 2020. By 2025, the IoT is projected to generate more than 2 zettabytes of data, says Machina Research.
What does all that data mean?
With streaming analytics, it can mean real-time reactions to events that can be lifesaving. For example, a truck receives data about an ice patch on the road. The truck then not only adjusts for the driver, but alerts other vehicles of the exact location of the ice.
To make this kind of real-time information available with the high volume and velocity of data constantly streaming in from IoT sensors and network operations, you need a different type of data management solution than that is required for traditional, stationary transactional data.
Take the truck example. Imagine that you’re driving a truck in Colorado in the middle of winter. Your company has fitted the vehicle with IoT sensors that continually monitor wheel slip, air temperature, speed and RPMs. Suddenly, the wheel slip measurement spikes as the air temperature falls below freezing. If the truck or driver can react in milliseconds, an accident can be prevented. If not, the sensor data is meaningless.
Event stream processing
Event stream processing systems enable you to act on this information in a timely fashion through real-time data cleansing and analytics.
Let’s define event stream processing: An “event” is any occurrence that happens at a clearly defined time and is recorded in a collection of data fields; “stream” is a constant flow of data events, or a steady rush of data that flows into and around an enterprise from thousands of connected devices; and “processing” is the act of analyzing data.
When event stream processing systems manage data from IoT sensors, they perform processes that turn raw data into information to be acted upon in real time. As large amounts of data rapidly stream into the system, event stream processing systems cleanse, normalize and aggregate data immediately in memory. Simultaneously, real-time analytics models encoded in these data streams perform analysis to determine whether a particular event is relevant and generate instant alerts when urgent action is needed.
Real-time analytics versus analytics after the fact
Event stream processing systems filter data in real time. Because the memory in which these systems initially store data is limited, the event stream processing system decides what data to discard and what to keep long-term, possibly even in an aggregated form as multiple events often are more informative than single events.
For example, when that truck is in danger of slipping on the icy road, real-time analytics at the edge of the network immediately alert the driver to slow it down, or even automate that slow down.
In contrast, traditional relational database management systems (RDBMS) store all data and perform cleansing and analysis after the fact. RDBMSs collect data from predefined sources and store it in a persistent storage system, such as a data mart. Once in storage, data is cleansed, normalized and consolidated into a data warehouse or Hadoop. Only then can users derive meaning from the data through reporting, historical analysis – and even predictive analysis and machine learning.
For instance, with event stream processing, if a sensor is tracking temperature and the temperature stays steady, the system doesn’t store ongoing readings. Instead, it might retain only the readings that indicate a change.
Multiphase analytics offers advantages
Event stream processing gives you multiple opportunities to extract value from your data. With traditional data management, data is historical and doesn’t change. It may be analyzed once or twice after the fact, not more.
Event stream processing systems first analyze data in real time, enabling immediate response to events. Then, in real time or near-real time, you can bring a subset of the data from multiple sensors back to the cloud or on-site for cross-sensor analysis.
Let’s say you want to perform analysis across your entire fleet of trucks to determine fault conditions occurring at a certain elevation. If the system detects a problem, it could trigger a mass repair of all the trucks in the fleet.
Finally, the event stream processing system also stores specified data in a data warehouse or Hadoop. There you can perform visual analytics or visual statistics on the now-historical data.
With historical data in a data warehouse, you could use machine learning algorithms for predictive maintenance. Over time, machine learning algorithms can learn patterns that indicate when trucks will require maintenance and catch failures in advance.
In all steps of multiphase analytics, machine learning can train the system to better predict outcomes. As the model changes, the stream processing solution can update the models at the edge, on-premises or in the cloud as needed.
Streaming data allows you to assemble meaning from IoT data when you need it, both in real time and historically to identify trends in cross-sensor analysis. By processing data on the edge, organizations, individuals and communities are benefiting from the insights offered by real-time data. This real-time data promises to save lives, improve traffic and communicate crises.
Let’s live on the edge and see where it takes us.
About the Author
Jerry Baulier is the Vice President of IOT R&D at SAS. He joined SAS in 2010 to lead and champion the build-out of event stream processing and now also has R&D responsibility for the Internet of Things. Prior to joining SAS, Baulier was the CTO and VP of R&D at Aleri. His work on event stream processing has resulted in 12 patents and half a dozen pending patents.
Sign up for the free insideBIGDATA newsletter.