IoT Tutorial: Chapter7 – IoT data and IoT-BigData Convergence

IoT Tutorial: Chapter7 – IoT data and IoT-BigData Convergence

 Introduction to IoT Data and their Characteristics
Most IoT applications up to date involve the collection and processing of IoT data i.e. data stemming from IoT sources such as sensors, wearables and other internet connected devices. In the majority of cases the business benefits of an IoT application stem from the processing of IoT data. Typical examples include:

  • Security applications involving processing of information from multiple cameras deployed in urban areas in order to timely identify security events.
  • Urban mobility applications relying on the processing of data from traffic sensors in order to identify and alleviate traffic congestion.
  • Healthcare applications involving the collection and processing of behavioral information of a subject (based on streams from cameras, accelerometers and wearables) towards identifying lifestyle patterns.
  • Sports and fitness applications processing information from wearables in order to track statistics and training parameters for athletes.
  • Smart city applications entailing collection and processing of information from smart meters towards energy management at various timescales.

As evident from the above examples, data-intensive IoT applications involved processing of data from various sensors and devices. In several cases these applications can also combine data from other sources such as open data sources and social media. Furthermore, these IoT applications process IoT data in various timescales ranging from real-time processing for operational applications (e.g., traffic rerouting in case of congestion) to data processing at a weekly, monthly or yearly basis as part of strategic level applications (e.g., transport planning).

Apart from applications (such as the above-listed ones), whose business logic is the IoT data processing itself, there are also other IoT applications which focus on actuation and real-time control rather than on providing data to their end-users. Typical examples of such applications including for example CPS systems controlling robots in manufacturing plants or actuators in connected cars applications. Despite their emphasis on control (rather than data provision) these applications are in most cases also driven by IoT data processing, since decisions are usually based on the collection and analysis of IoT data from different data sources.

IoT data feature certain characteristics, which distinguish them radically from other types of data sources and respective applications (e.g., classical transaction applications). These characteristics include their streaming and real-time nature, their spatial and temporal characteristics, as well as their special security and privacy requirements (e.g., in cases where collection and processing of personal data are involved). The special characteristics and related challenges for IoT data processing applications can be listed as follows:

  • Heterogeneity of IoT data streams: IoT data streams tend to be multi-modal and heterogeneous in terms of their formats, semantics and velocities. Hence, IoT analytics applications expose typically variety and veracity. BigData technologies provide the means for dealing with this heterogeneity in the scope of operationalized applications.
  • Varying data quality: Several IoT streams are noisy and incomplete, which creates uncertainty in the scope of IoT analytics applications. Statistical and probabilistic approaches must be therefore employed in order to take into account the noisy nature of IoT data streams, especially in cases where they stem from unreliable sensors.
  • Real-time nature of IoT datasets: IoT streams feature high velocities and for several application must be processes nearly in real-time. Hence, IoT analytics can greatly benefit from data streaming platforms, which are part of the BigData ecosystem.
  • Time and location dependencies of IoT streams: IoT data come with temporal and spatial information, which is directly associated with their business value in a given application context. Hence, IoT analytics applications must in several cases process data in a timely fashion and from proper location. Cloud computing techniques (including edge computing architectures) can greatly facilitate timely processing of information from given locations in the scope of large scale deployments.
  • Privacy and security sensitivity: IoT data are typically associated with stringent security requirements and privacy sensitivities, especially in the case of IoT applications that involve the collection and processing of personal data.
  • Data bias: As in the majority of data mining problems, IoT datasets can lead to biased processing and hence a thorough understanding and scrutiny of both training and test datasets is required prior to their operationalized deployment. To this end, classical data mining techniques can also be applied in the case of IoT applications.

IoT Data-Intensive Applications Lifecycle
The development of IoT applications entails the following activities, which are usually combined towards developing and deploying non-trivial IoT data applications:

  • IoT Data Collection, including interfacing to IoT sources (i.e. internet connected devices) and enrichment of these data with appropriate contextual metadata, such as location information and timestamps. As already outlined, the collection process needs typically to deal with the heterogeneity of the IoT data sources and their data streams, including heterogeneity of interfaces to data sources and of data formats.
  • IoT Data Validation, including validation of the format and source of origin of the data. The process includes also the validation of their integrity, accuracy and consistency.
  • IoT Data Semantic Unification and Interoperability, which deal with the unification/homogenization of the semantics of IoT streams stemming from different sources, as a prerequisite for their unified processing.
  • IoT Data Structuring and Storage, which involves the persistence of validated and interoperable data in an appropriate database such streaming database, object database or even graph database.
  • IoT Data Analysis,which deals with the application of data mining and machine learning techniques (e.g., regression, neural networks, decision tree, clustering) towards transforming IoT data streams to actionable knowledge.
  • Deployment of IoT analytics algorithms,which involves the actual deployment and operationalization of machine learning and data mining schemes for data analytics.
  • IoT data visualization,which emphasizes the presentation of IoT data in a graphical format, including their browsing across the temporal and spatial dimensions of the IoT datasets.
  • IoT data repurposing and reuse,which entails access to IoT datasets towards reusing them across different applications.

IoT and BigData Convergence
The above-listed IoT data processing challenges and activities are very closely related to the wave of BigData technologies. Indeed, IoT data are characterized by the Vs that are commonly associated with BigData technologies. In particular, BigData systems refer to data processing and management systems, which feature one or more of the following characteristics (Vs):

  • Volume: Very high data volumes, beyond those that can be handled by state-of-the-art data management systems.
  • Velocity: Data streams with very high ingestion rates, which cannot be handled by state-of-the-art systems and databases.
  • Variety: Data featuring extreme heterogeneity in terms of velocities, formats and semantics.
  • Veracity: Data that are characterized by uncertainty and unreliability.

IoT analytics applications are typically characterized by:

  • High-data volumes, since in several cases they have to collect and process streaming information from thousands of sensors.
  • High-velocity streams, since they usually involve streaming data that are collected and in several cases processed in real-time.
  • High-Variety, since it is usual to interface and leverage data from heterogeneous sensors and internet-connected devices.
  • High-Veracity, as sensor data are typically noisy and prone to errors and the unreliability of the devices.

Nevertheless, IoT data have also several differences from conventional BigData analytics, in particular:

  • IoT data collection consumes bandwidth, network, energy and other resources. Furthermore, data collection depends on multiple layers of the network.
  • IoT data analytics should consider optimized data analytics considering the available resource and cross-layer optimisations (i.e. the so called deep IoT analytics).
  • Contrary to conventional BigData systems, IoT analytics solutions should work across multiple systems and platforms.
  • IoT analytics applications integrate in several case physical, cyber and social dataset.
  • IoT analytics and IoT processing are in several cases part of real-time control systems, through providing actionable information.

Note that IoT analytics systems are commonly deep IoT analytics involving multiple platforms (e.g.. IoT/cloud platforms, publish/subscribed platfoms), networks, IoT data sources etc. i.e. the whole ecosystem of IoT platforms and technologies. Such systems combine data from multiple sources, (near-) real time analytics, visualisation and semantic representations towards transforming raw IoT data to insights and actionable knowledge. The development and deployment of deep IoT analytics systems is challenging, given that they integrate and/or transcend multiple networks, clouds, IoT platforms and more, thus requiring optimization across multiple levels.

Beyond the systemic aspects of IoT-based data-intensive applications, the development of IoT analytics applications requires the blending and integration of machine learning schemes and data science with IoT platforms. This is discussed in one of the next chapters of the tutorial.

Resources for Further Reading

View all IoT Tutorial Chapters