Plenario Sensor Data

Comment thread here. Feedback welcome!

I dumped my first thoughts on this topic here. Here are my current thoughts. I'm writing this out so I have more to go in in conversations with stakeholders. None of this is final. I'm trying to converge on something concrete so we can move past proofs of concept.

Scope: Just the Web Application API

This document is about the web developer-focused sensor data API. The needs of this community are different enough from the needs of the scientific community that they'll need different access methods. I'll assume here that the Array of Things data producer will have a publication module that can export the data in different formats to different consumers. One such consumer will be an object store to distribute static data dumps with DOIs for scientific citation. Another consumer might be the NOAA Meteorological Assimilation Data Ingest System, with its own metadata requirements.

Overview

Plenario's sensor API will have three main components:

RESTful resources with metadata about sensor networks.
Set-based queries over sensor data with traditional Plenario spatiotemporal filters (e.g. all observations in Pilsen in the month of July where temperature > 78 degrees Fahrenheit).
Streaming feeds of near real-time data where you can subscribe to individual nodes.

Metadata Resources

There will be at least three types of RESTful resources:

sensorNetworks
featuresOfInterest
nodes

Sensor Networks

sensorNetworks are the top-level resources. Although we're designing with AoT in mind, we want this API to be flexible enough to accommodate other networks (NOAA weather sensors, for example). In addition to some descriptive metadata, each sensorNetwork owns a list of nodes and featuresOfInterest.

GET v2/api/sensor-networks/arrayofthings

{
  "name": "ArrayOfThings",
  "maintainerEmail": "[email protected]",
  "description": "Brief description of this sensor network.",
  "infoLink": "https://arrayofthings.github.io/developers",
  "nodeMetadata": {
    "height": "meters",
    "direction": "Cardinal directions. One of N, NE, E, SE, S, SW, W, NW"
  }
  "nodes": ["ArrayOfThings1", "ArrayOfThings2", ... ],
  "featuresOfInterest": ["temperature", "numPeople", ... ]
}

Features of Interest

These are features of the physical world that the sensor network is observing. featuresOfInterest are defined at the network level rather than the node level. That lets the network maintainer enable queries against heterogeneous sensors that are measuring the same physical thing. Let's say one feature is temperature. Even if some nodes have temperature sensor A and some have temperature sensor B, the network maintainer can say these both apply to the temperature feature as long as they both fulfill that feature's requirements. That is, they report in the correct units with the correct precision.

GET /sensor-networks/arrayofthings/features-of-interest/temperature

{
  "name": "temperature",
  "observedProperties": [
    {
      "name": "temperature",
      "type": "numeric",
      "unit": "degrees Fahrenheit",
      "description": "accurate within +- .5 degrees Fahrenheit"
    }
  ]
}

Why go to the trouble of nesting observedProperties within featuresOfInterest? Some features won't be scalar, so they'll need multiple properties. For example, vibration will be a three-dimensional numerical vector. Others may be a tuple of heterogenous types. A computer vision derived feature will probably have a reading and a confidence value. Imagine a boolean feature for whether a traffic accident occurred in a node's field of vision along with a decimal confidence value. Let's use pedestrian counting as an example of a tuple-valued feature:

GET /sensor-networks/arrayofthings/features-of-interest/numPedestrians

{  
  "name":"numPedestrians",
  "observedProperties":[  
    {  
      "name": "numPedestrians",
      "type": "numeric",
      "description": "Number of pedestrians that passed through node's field of vision in sampling window",
      "unit": "Whole number of pedestrians"
    },
    {  
      "name": "marginOfError",
      "type": "numeric",
      "description": "The range in which the algorithm is 95% certain it has the correct count. If numPedestrians is 7 and marginOfError is 1, then the algorithm is 95% certain that between 6 and 8 pedestrians passed through the node's field of vision",
      "unit": "Whole number of pedestrians"
    }
  ]
}

I don't know what uncertainty metrics we'll actually be using, but that should illustrate how we can represent them.

Nodes

Immutable Metadata

Certain attributes of a node are so central that if they change, we need to consider it a different node. If a physical node is taken down from one street corner and moved to another, we'll need to consider it a new logical node with a new identifier. It's measuring a different location in the world, so we must consider it a distinct stream of data.

These immutable attributes will include:

GET /nodes/arrayofthings1

{
    "id": "ArrayOfThings1",
    "sensorNetwork": "ArrayOfThings",
    "height": 6.5,
    "direction": "NE",
    "location": {"lon": -87.91372618, "lat": 41.64625754},
    ...
}

To interpret height and direction, recall the nodeMetadata field in the network object:

GET /sensor-networks/arrayofthings
{
  ...
  "nodeMetadata": {
    "height": "meters",
    "direction": "Cardinal directions. One of N, NE, E, SE, S, SW, W, NW"
  }
  ...
}

Mutable Metadata

Then what are the mutable attributes of a node? Unlike a location change, these don't change the part of the world that is being measured, but they can change how we're measuring it. A node can start reporting on a new featureOfInterest or change the method by which it derives a featureOfInterest. Any such change will trigger a version bump of the node metadata.

The big idea is that a version bump shouldn't change the semantics of the observations. If a new temperature sensor is installed but it still meets the definition of the temperature feature of interest, then we can still consider it the same stream of temperature data. However, each observation will include its node's metadata version so that users can track down specifics.

The mutable attributes are a list of every feature a node is reporting and the procedure it uses to derive each feature:

{  
  "id":"ArrayOfThings1",
  ...
  "version":"7",
  "featuresOfInterest":[  
    "temperature",
    "numPeople"
  ],
  "procedures":{  
    "temperature":{  
      "sensors":[  
        {  
          "sensorType":"temperature sensor DS18B20+",
          "datasheet":"arrayofthings.github.io/datasheets/DS18B20"
        },
        {  
          "sensorType":"temperature sensor TMP36",
          "datasheet":"arrayofthings.github.io/datasheets/TMP36"
        }
      ]
    },
      "numPeople":{  
        "sensors":[  
          {  
            "sensorType":"OV7670 300KP camera",
            "datasheet":"arrayofthings.github.io/datasheets/OV7670"
          }
        ],
        "algorithms":[  
          {  
            "algorithm":"Szeliski 2.5.46",
            "datasheet":"arrayofthings.github.io/datasheets/Szeliski"
          }
        ]
      }
    }
  }
}

We need to put a lot more thought in on how to represent the procedures, but that should get the idea across.

Observations

Before we discuss how to query observations, let's tie up the metadata discussion by examining how observations will be formatted. Observations will be uniquely identified by the node that captured them and the time of capture. Each observation's results object must contain a key for every featureOfInterest defined by its node (at the specified metadata version). If a node was unable to report on a given feature at a time, it will report null.

// Sample data based on example metadata
{  
  "nodeId":"ArrayOfThings1",
  "time":"2016-07-04T16:34:10",
  "nodeVersion":"9",
  "results":{  
    "temperature":{  
      "temperature":89.7
    },
    "numPeople":{  
      "numPeople":14,
      "marginOfError":2
    },
    "standingWaterHeight":null
  }
}

// ... OR ...
// Sample data based on node's current sensors
{
    "nodeId":"ArrayOfThings1",
    "time":"2016-07-04T16:34:10",
    "nodeVersion":"9",
    "results":{
        "temperature":{
            "temperature":89.7
        },
        "atmosphericPressure":{
            "atmosphericPressure":89.7
        },
        "relativeHumidity":{
            "relativeHumidity":89.7
        },
        "lightIntensity":{
            "lightIntensity":89.7
        },
        "acceleration":{
            "X":89.7,
            "Y":89.7,
            "Z":89.7
        },
        "instantaneousSoundSample":{
            "instantaneousSoundSample":89.7
        },
        "magneticFieldIntensity":{
            "X":89.7,
            "Y":89.7,
            "Z":89.7
        },
        "concentrationOf":{
            "SO2":89.7,
            "H2S":89.7,
            "O3":89.7,
            "NO2":89.7,
            "CO":89.7,
            "reducingGases":89.7,
            "oxidizingGases":89.7
        },
        "particulateMatter":{
            "PM1":89.7,
            "PM2.5":89.7,
            "PM10":89.7
        }
    }
}

Spatiotemporal Queries

To return a set of observations from a single node,

/sensor-networks/<id>/nodes/<id>/query?<filters>

or from an entire network

/sensor-networks/<id>/query?<filters>

where there will be optional query parameters to filter on space, time, availability of features, and conditions on features (e.g. temperature > 75).

To find the observation nearest to a point in space and time that contains a desired feature of interest:

/sensor-networks/<id>/nearest/<geom>/<timestamp>?featureOfInterest=<feature>

Streaming

We also want to provide a near real-time streaming API for clients to create dashboards, real time apps, and just to mirror all the data as it arrives.

That would look like

/sensor-networks/<id>/nodes/<id>/stream

to open a websocket through which every incoming observation on this node will be pushed. For every observation in the network,

/sensor-networks/<id>/stream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly