Plenario Sensor Data

I dumped my first thoughts on this topic here. Here are my current thoughts. I'm writing this out so I have more to go in in conversations with stakeholders. None of this is final. I'm trying to converge on something concrete so we can move past proofs of concept.

Scope: Just the Web Application API

This document is about the web developer-focused sensor data API. The needs of this community are different enough from the needs of the scientific community that they'll need different access methods. I'll assume here that the Array of Things data producer will have a publication module that can export the data in different formats to different consumers. One such consumer will be an object store to distribute static data dumps with DOIs for scientific citation. Another consumer might be the NOAA Meteorological Assimilation Data Ingest System, with its own metadata requirements.

Overview

Plenario's sensor API will have three main components:

RESTful resources with metadata about sensor networks.
Set-based queries over sensor data with traditional Plenario spatiotemporal filters (e.g. all observations in Pilsen in the month of July where temperature > 78 degrees Fahrenheit).
Streaming feeds of near real-time data where you can subscribe to individual nodes.

Metadata Resources

There will be at least three types of RESTful resources:

sensorNetworks
featuresOfInterest
nodes

Sensor Networks

sensorNetworks are the top-level resources. Although we're designing with AoT in mind, we want this API to be flexible enough to accommodate other networks (NOAA weather sensors, for example). In addition to some descriptive metadata, each sensorNetwork owns a list of nodes and featuresOfInterest.

GET v2/api/sensor-networks/arrayofthings

{
  "name": "ArrayOfThings",
  "maintainerEmail": "[email protected]",
  "description": "Brief description of this sensor network.",
  "infoLink": "https://arrayofthings.github.io/developers",
  "nodeMetadata": {
    "height": "meters",
    "direction": "Cardinal directions. One of N, NE, E, SE, S, SW, W, NW"
  }
  "nodes": ["ArrayOfThings1", "ArrayOfThings2", ... ],
  "featuresOfInterest": ["temperature", "numPeople", ... ]
}

Features of Interest

These are features of the physical world that the sensor network is observing. featuresOfInterest are defined at the network level rather than the node level. That lets the network maintainer enable queries against heterogeneous sensors that are measuring the same physical thing. Let's say one feature is temperature. Even if some nodes have temperature sensor A and some have temperature sensor B, the network maintainer can say these both apply to the temperature feature as long as they both fulfill that feature's requirements. That is, they report in the correct units with the correct precision.

GET /sensor-networks/arrayofthings/features-of-interest/temperature

{
  "name": "temperature",
  "observedProperties": [
    {
      "name": "temperature",
      "type": "numeric",
      "unit": "degrees Fahrenheit",
      "description": "accurate within +- .5 degrees Fahrenheit"
    }
  ]
}

Why go to the trouble of nesting observedProperties within featuresOfInterest? Some features won't be scalar, so they'll need multiple properties. For example, vibration will be a three-dimensional numerical vector. Others may be a tuple of heterogenous types. A computer vision derived feature will probably have a reading and a confidence value. Imagine a boolean feature for whether a traffic accident occurred in a node's field of vision along with a decimal confidence value. Let's use pedestrian counting as an example of a tuple-valued feature:

GET /sensor-networks/arrayofthings/features-of-interest/numPedestrians

{  
  "name":"numPedestrians",
  "observedProperties":[  
    {  
      "name": "numPedestrians",
      "type": "numeric",
      "description": "Number of pedestrians that passed through node's field of vision in sampling window",
      "unit": "Whole number of pedestrians"
    },
    {  
      "name": "marginOfError",
      "type": "numeric",
      "description": "The range in which the algorithm is 95% certain it has the correct count. If numPedestrians is 7 and marginOfError is 1, then the algorithm is 95% certain that between 6 and 8 pedestrians passed through the node's field of vision",
      "unit": "Whole number of pedestrians"
    }
  ]
}

I don't know what uncertainty metrics we'll actually be using, but that should illustrate how we can represent them.

Nodes

Facts about nodes:

They belong to a single network
They can only report on FeaturesOfInterest defined at the network level
They have a fixed location (for now)
Each node must explain how it reports on each topic
Node metadata is versioned

First, recall the nodeMetadata field in the network object:

GET /sensor-networks/arrayofthings
{
  ...
  "nodeMetadata": {
    "height": "meters",
    "direction": "Cardinal directions. One of N, NE, E, SE, S, SW, W, NW"
  }
  ...
}

That network level definition will help us interpret individual nodes' metadata. With that in mind, let's look at the first half of a node's metadata:

GET /nodes/arrayofthings1

{
    "id": "ArrayOfThings1",
    "sensorNetwork": "ArrayOfThings",
    "height": 6.5,
    "direction": "NE",
    "location": {"lon": -87.91372618, "lat": 41.64625754},
    ...
}

I'll call these the immutable attributes of a node. If a physical node is taken down from one street corner and moved to another, we'll need to consider it a new logical node with a new identifier. It's measuring a different location in the world, so it must be considered a distinct stream of data.

Then what are the mutable attributes of a node? Unlike a location change, these don't change the part of the world that is being measured, but they can change how we're measuring it. A node can start reporting on a new featureOfInterest or change the method by which it derives its observations.

{  
  "id":"ArrayOfThings1",
  "version":"7",
  "featuresOfInterest":[  
    "temperature",
    "numPeople"
  ],
  "procedures":{  
    "temperature":{  
      "isDerived":false,
      "isAggregated":true,
      "aggregationMethod":"average over all onboard temperature sensors",
      "sensors":[  
        {  
          "sensorType":"temperature sensor DS18B20+",
          "datasheet":"arrayofthings.github.io/datasheets/DS18B20"
        },
        {  
          "sensorType":"temperature sensor TMP36",
          "datasheet":"arrayofthings.github.io/datasheets/TMP36"
        }
      ],
      "numPeople":{  
        "isDerived":true,
        "isAggregated":false,
        "sensors":[  
          {  
            "sensorType":"OV7670 300KP camera",
            "datasheet":"arrayofthings.github.io/datasheets/OV7670",
            "algorithm":"Szeliski 2.5.46"
          }
        ],
        "algorithms":[  
          {  
            "algorithm":"Szeliski 2.5.46",
            "datasheet":"arrayofthings.github.io/datasheets/Szeliski"
          }
        ]
      }
    }
  }
}

Note: The featureOfInterest and observedProperty titles (along with procedure and result discussed later) are derived from the Observations and Measurements standard (https://en.wikipedia.org/wiki/Observations_and_Measurements) in order to allow easier integration with clients already using this format. The question of whether or not complying with an existing data format is worthwhile is still an open one.

Every node will report observations that contain the results of each feature of interest, marked with a timestamp. These results hold the values for each of the feature of interest's observed properties.

Sample Observation JSON

{
    'id': 'ArrayOfThings1',
    ...
    'version': '7',
    'time': "2016-07-04T16:34:10",
    'results': [
        'temperature': {
            'temperature': 89.7,
            ...
        },
        'numPeople': {
            'numPeople': 14,
            'certainty': 0.83,
            ...
        }
        ...
    ],
};

Provide feedback

Saved searches

Use saved searches to filter your results more quickly