Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirement: Proposal (2) for standardized message envelope for requests, responses and notifications #34

Open
hspaay opened this issue Dec 22, 2024 · 55 comments

Comments

@hspaay
Copy link
Collaborator

hspaay commented Dec 22, 2024

[Update 2025-01-02: changed requestID to correlationID as per @RobWin feedback]
[Update 2025-01-02: changed operation in ResponseMessage to be optional as per @unit9a feedback]

This is proposal 2 draft for standardizing the message envelopes for all protocol bindings, including the websocket binding. Adopting this approach would make the websocket protocol binding a trailblazer and would serve as an example for all future protocol bindings.

HiveOT is in the process of implementing this proposal at the application level and currently maps to existing protocol bindings as the transport.

Note, I realize this is a rather different approach to constructing messages so I don't really expect it to be adopted unless it garners a lot of support. I've documented it here in case there is an actual interest or otherwise for future prosperity.

Rational

  1. WoT protocol bindings all solve the same problem. Send requests, receive responses and send notifications. Currently protocol bindings each define their own messages and interactions in doing so. This makes life harder for the implementor as it is a lot more work to implement and test multiple different approaches than a single approach.

Protocol bindings in WoT serve two different purposes. Provide a transport and act as a (partial) application protocol for handling properties, events and actions. These are different concerns that can be separated. Separating the application protocol from the transport enables defining a single application protocol that can be used on any transport. This in turn would simplify adoption and improve interoperability.

  1. The WoT protocol bindings only describe the consumer-server interaction. They only partially address how Hubs and Gateways interact with Thing agents (servients) asynchronously.

The WoT protocol bindings does not describe how Thing agents connect as a client to a hub or gateway. Instead the assumption is that all Things run a server. This is too limited of a view. Hiveot is a Hub where Thing agents are clients to the Hub just like consumers. WoT does not describe the interaction for these agents.

This proposal describes a messaging format and behavior for protocol bindings that address the above issues.

Message Types

There are three message types: Request, Response and Notification with corresponding message envelopes. All WoT interaction that takes place between consumers, Thing agents, hubs, and gateways can be described using just these three messages.

Messages are identified by their type and operation. Operations are those defined in the WoT TD 1.1 specification and are open to extensions using @context. Response messages include the operation of the request they are a response to and thus identify the response payload.

Underlying transport protocols such as HTTP, SSE, Websocket, MQTT, etc mere act as pipes to deliver these messages in the most efficient way possible.

Request messages

The purpose of the request message is for a client to send a request for an operation on a thing. Thing agents are required to send a response when a request is received.

The following operations are considered to be requests:

  • invokeaction [WoT]
  • subscribe, unsubscribe [WoT]
  • observe, unobserve [WoT]
  • readproperty, readallproperties [WoT]
  • queryaction, queryallactions [WoT]
  • readevent, readallevents (of a Thing) [HiveOT extension]
  • readtd, readalltds (of a directory or thing) [HiveOT extension]

The request message defines the following fields:

name data type description required
type string "request". Identifies the message as a request message mandatory
operation string Describes the request to perform mandatory
thingID string ID of the thing the request applies to optional
name string Name of the affordance the request applies to if applicable. The type of affordance (event, action, property) is determined by the operation optional
input any Input data of the request as described by the operation. invokeaction refers to the action affordance while other operations define the input as part of the operation optional
correlationID string Unique identifier of the request. This must be included in the response. If no correlationID is provided then the request will still be handled by no response is returned. optional
senderID string Authenticated sender of the request. optional
messageID string Unique identification of the message. optional

Response messages

Responses serve to notify a single client of the result of a request.

Response message payload is determined by the request operation. Therefore the request operation is included in the response:

name type description required
type string "response". Identifies the message as a response message mandatory
operation string The request operation this is a response to. Not required. Intended to help with debugging optional
correlationID string identifies the request this is a response to. required
status string Status of the request processing: "pending", "running", "completed" or "failed" required
output any Result of processing the request if status is "completed" as defined in the dataschema of the action or operation. If status is "failed" then this can contain additional error information. optional
error string Error title if status is "failed". optional
received string Timestamp the request was received by the Thing (or its agent) optional
updated string Timestamp the status was updated optional
thingID string ID of the thing the request applies to optional
name string Name of the affordance the request applies to if applicable. The type of affordance (event, action, property) is determined by the operation optional
messageID string Unique identification of the message. optional

Notification messages

Notifications serve to notify subscribers of a change as identified by the operation, thingID and affordance name. Notifications are not targeted to a single receiver but intended for subscribers (or observers).

All notifications use the same message format as implemented in NotificationMessage struct (golang, JS, Python). Protocol bindings can use this envelope directly or map from their protocol equivalent to this message format.

The following operations are considered notifications:

  • property: Update of a property value, sent by a Thing agent to observers of a property.
  • event: Notification of event to subscribers.
name type description required
type string "notification". Identifies the message as a notification message mandatory
operation string Identification of the notification mandatory
data any notification data as specified by the operation optional
correlationID string optional correlation with the request, for subscriptions or streams optional
created string Timestamp the notification was created optional
thingID string ID of the thing the notification applies to optional
name string Name of the affordance the notification applies to, if applicable. The type of affordance (event, action, property) is determined by the operation optional
messageID string Unique identification of the message. optional

Behavior

Any client can publish a request. They can choose to wait for a response message or handle the response asynchronously. This is dependent on the client implementation.

The server can be an agent for a thing or a hub or gateway.

Hub or gateways will forward requests to the Thing agent. If the agent is not reachable at this point they return the error response or a pending response if there is support for queuing requests.

Thing agents will process the request. They SHOULD send a response with one of three statuses: running, completed or failed. If a request is received without a request ID then no response is sent.

If a 'running' response is send, agents MUST also send a completed or failed response once the operation has finished.

If a hub or gateway is used then the response is received by the hub/gateway, which in turn forwards it to the client that sent the request. If the client is no longer reachable then the response can be queued or discarded, depending on the capabilities of the hub or gateway.

Clients will receive a response message containing the original correlationID, the payload an any error information. Client implementations can choose to wait for a response (with timeout) or handle it as a separate callback.

Agents can sent notifications to subscribers. The notification message includes an operation that identifies the type of notification. A hub or gateway can implement their own subscription mechanism for consumers and ask agents to send all notifications.

@RobWin
Copy link
Collaborator

RobWin commented Jan 2, 2025

I suggest renaming requestID to correlationID.

This would enhance functionality, especially for subscriptions to property changes or events. Including a correlationID in the request message (subscribe, observe, unsubscribe, unobserve) and notification messages allows better correlation between notifications and their respective subscriptions.

For instance, in a WebSocket client, you might return an individual reactive stream for each (property change or event) subscription. For example a stream could be implemented with a reactive programming library like ReactiveX. And the websocket client does not return a single stream of notifications, but could separate notifications into multiple individual streams.
A correlationID (acting as a subscriptionId) can help dispatch notifications to the appropriate stream .

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 2, 2025

Yes, good feedback and the streaming use-case is really nice. I'll change it in the hiveot documentation and code and add it to notifications as well.

I just completed implementing this proposal in hiveot as an application layer. The http/wss/mqtt transports maps from the three message types to the specified message types for that transport. On the hub the mapping is reversed and further processing uses the three message types again (so-far only implemented http/sse). I love the simplicity of the approach!

Some observations sofar:

  1. This works well with intermediaries like a hub. Prior there was a struggle to mentally keep track of who is sending/receiving what and in reply to what. That problem has pretty much disappeared. A request goes from consumer to Thing, a response the other way and notifications are send and forget by the Thing. The only difficulties come from all the mapping to the various transport message types and back.
  2. Not having to do the mapping of messages types saves a lot of code. Agents (a Thing that connects as a client to the hub) are not supported in WoT so I didn't have to worry about interoperability. The agent just receives the RequestMessage and sends ResponseMessage and NotificationMessage envelopes. This really simplified implementation.
  3. Using 'operation' in the messages also works quite well to further process the message at the application level. There is no tension (mapping, translation) between protocol and application, and no need to define separate message types to deal with responses.
  4. Forms are also simpler. No need to have a form entry for each operation. Instead there are only 3 forms (http) with endpoint and a set of corresponding operations. I haven't run into a need to define forms with the affordances either.
  5. There is no need to define separate response messages like the current websocket protocol needs. (actionstatus(es), propertyreading(s),). Instead the ResponseMessage envelope can be used.
  6. There is no need to define response message operations. Instead the operation in the ResponseMessage is that of the request it is responding to.
  7. It works equally well for consumers without a hub that talk directly to Thing agents (servients).
  8. subscriptions, observations, and property writes are requests that can now be confirmed with a response message when applied or error if failed. (by including a correlationID)

It looks to me that the protocol bindings got overly complicated for what is essentially a transport problem. This brings it back to what it essentially is.

Next steps:

  • apply this on top of the websocket binding. Do all the mapping unfortunately, unless Ben can be convinced to adopt this.
  • same for the mqtt binding
  • better document the approach with diagrams
  • define a 'hiveot' WoT protocol proposal with http/sse, wss, mqtt as simple transports
  • implement this is a core protocol in hiveot. Initially for internal use but maybe there is an interest from adopters.

@RobWin
Copy link
Collaborator

RobWin commented Jan 2, 2025

But from a protocol level, I still miss acknowledgements.
Acknowledgements had been very useful for event subscriptions or property change subscriptions.

@unit9a
Copy link

unit9a commented Jan 2, 2025

sorry to crash in, I have been chewing on how to implement something like this for Iot via WoT with webRTC + webSocket(for any messaging that needs more reliability than webRTC). I would like to contribute here too eventually.

@RobWin, you mentioned the use of libraries like ReactiveX, would you happen to be aware on any wc3 standards for similar functionality?

reactive programming library like ReactiveX.

would you happen to be aware on any wc3 standards for similar functionality?

@hspaay, about the following response message fields:

operation | string | The request operation this is a response to. | mandatory
correlationID | string | identifies the request this is a response to. | required

shouldn't the sender of a request be able to determine the operation from the correlationID in a response only. I am under the impression the correlationID is being tracked/stored by the sender along with the operation. so why send "operation" again?

i ask because i am interested in using minimal payloads for webRTC's MediaStreamTrack & data channels.

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 2, 2025

@RobWin

But from a protocol level, I still miss acknowledgements.

This is solved by making subscription,observations,and writeproperty operations a request. All requests are acknowledged using their correlation ID.
So, apart from transport constraints (such as mapping to transport messages), these protocol messages do send an acknowledgement.
Does that address your issue or am I missing something?

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 2, 2025

@unit9a

shouldn't the sender of a request be able to determine the operation from the correlationID in a response only.

Yes you are correct. operation in response can be optional. The reason it was included is that I found it useful to assist in testing and debugging. You are right though, the protocol doesn't need it. I'll change it to optional in the proposal.

(ps: the media stream track looks very interesting. I'll experiment with support for it in hiveot once I have some more time)

@RobWin
Copy link
Collaborator

RobWin commented Jan 2, 2025

@unit9a I'm using Kotlin Flows on client and server side, not ReativeX. But the concepts are quite similar to RX or Project Reactor.

There is something like https://developer.mozilla.org/en-US/docs/Web/API/WebSocketStream
and https://rsocket.io/ for Websockets and streams.
Rsocket spec is here: https://rsocket.io/about/protocol

@unit9a
Copy link

unit9a commented Jan 2, 2025

@hspaay the media stream track API is something i am exploring for industrial Iot applications. it closely related to the draft standard for WebCodecs. it would would encourage the use of WoT and other web standards in a lot of applications. However, i don't think webCodecs will be ratified any time soon.

@RobWin I'm a excited about rRocket.io, THANKS SO MUCH!!!. Fyi, the RSocket Protocol Specification Community Group seems dead and i could not find a wc3 draft or standards page. And I hope WebSocketStream is combined with rRocket.io at some point

@unit9a
Copy link

unit9a commented Jan 3, 2025

@hspaay this reminds me of json-rpc 2.0. do you mind if I explore what a Json rpc version of this looks like via protobufs?

do you even think json rpc is relevant at all?

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 3, 2025

@unit9a please go ahead and explore. Always interesting to gain more perspectives.

On the application side I found it helpful if the protocol matches the vocabulary and meaning of the application as minimalistic as possible. This is why the proposed message envelopes uses fields like operation, thingID, and (affordance) name, with the message types for request, response and notification.

On the transport layer below this, the more the merrier. Encodings like json-rpc, bson, protobuf, capn'proto with matching transports like websockets, gRPC, mqtt, and so on, are all good to support. This is at least my thinking with this proposal.

Looking forward to see what you have in mind.

(ps: lets also not lose sight of what @benfrancis has in mind for the websocket proposal. This issue is merely my own 2c's based on insights gained implementing the http, sse and proposed wss bindings)

@RobWin
Copy link
Collaborator

RobWin commented Jan 3, 2025

@unit9a Yes, seems there was no big interest into a reactive streams capable application protocol at w3c.

@hspaay

This is solved by making subscription,observations,and writeproperty operations a request. All requests are acknowledged using their correlation ID.

Means for every request (subscribe, unsubscribe, observe, unobserve, writeproperty) there is a response without output as a acknowledgement? What would be the status?

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 3, 2025

Means for every request (subscribe, unsubscribe, observe, unobserve, writeproperty) there is a response without output as a acknowledgement? What would be the status?

That is correct. The status can be completed on success, or failed if the request isn't accepted for whatever reason. The error field in the response can contain the error message on failure while output optionally contains any further details.

Btw, if no acknowledgement is desired (can't think of a good reason) then the correlationID can be left empty.

The part I'm still unclear about is how to include 'alternative results' that can be specified in Forms for actions, but that is a different topic.

@unit9a
Copy link

unit9a commented Jan 3, 2025

(ps: lets also not lose sight of what @benfrancis has in mind for the websocket proposal. This issue is merely my own 2c's based on insights gained implementing the http, sse and proposed wss bindings)

@hspaay & @benfrancis, the use webRTC data channels should be the same as or similar to websockets per: https://developer.mozilla.org/en-US/docs/Web/API/RTCDataChannel#instance_properties

Values are the same as allowed on the WebSocket.binaryType

regardless, i will prioritize using webSockets.

unit9a added a commit to unit9a/web-thing-protocol that referenced this issue Jan 3, 2025
@unit9a
Copy link

unit9a commented Jan 3, 2025

@hspaay my idea so far.
#34 (comment)

transport layer concept: equivalent json rpc

assumption: this proposed web-thing-protocol (WoTP) is an application
axiomatization
of domain axiomatization:The WoT Thing Description

goals

  1. stay as close as possible to the semantics & ontology of
    The WoT Thing Description: Hypermedia Controls Vocabulary Definitions
  2. maintain compatibility or easy conversion with json-rpc
  3. expect to be used with something like protobuf to convert between json and
    binary
  4. transmitted via websocket an ArrayBuff binary payload

my other assumptions/ideas

conversion of json-rpc member name

Json-prc Request object members proposed WoTP ontology names
jsonrpc wotp
method operation

syntax

"-->" = Request
"<--" = Response

Request messages fields

Request messages fields Json-prc Request object members proposed WoTP ontology names
type indicated by member names
operation method op
thingID param.thingID thingID
name param.name affID(affordanceID)
input param.input input
correlationID param.correlationID corrID
senderID param.senderID senderID
messageID id msgID

example 1:

--> { 
        "jsonrpc": "2.0", 
        "method": "string", 
        "params": {
            "thingID": "string", 
            "name": "string", 
            "input": "any", 
            "correlationID": "string", 
            "senderID": "string"
        },
        "id": "string"
    }

======= with wotp parameter names: ========
--> { 
        "wotp": "<version>", 
        "op": "string", 
        "params": {
            "thingID": "string", 
            "affID": "string", 
            "input": <any>, 
            "corrID": "string", 
            "senderID": "string", 
            "msgId": "string"
        },
        "id": "string"
    }

Notification messages fields

same as a Request except "id" is moved to "params" as "messageID"

{
    "jsonrpc": "2.0",
    "method": "string", 
    "params": {
        "thingID": "string",
        ...
+   "msgID": "string"
    },
-   "id": "string"
}

====== becomes: ====== 
{ 
    "jsonrpc": "2.0",
    "method": "string", 
    "params": {
        "thingID": "string",
        ...
        "messageID": "string"
    }
}

======= using wotp member names: ========
{ 
        "wotp": "<version>",
        "op": "string", 
        "params": {
            "thingID": "string",
            ...
            "msgID": "string"
        }
}

Response messages fields

Response messages fields Json-prc Request object members proposed WoTP ontology names
type indicated by member names
status result.status status
thingID result.thingID thingID
name result.name affID(affordanceID)
output result.output results
error result.error errors
received result.received rxTs
updated result.updated udTs
correlationID result.correlationID corrID
messageID id msgId

example 2:

<-- { 
        "jsonrpc": "2.0", 
        "method": "string", 
        "result": {
            "correlationID": "string",
            "error": "string",  
            "name": "string", 
            "output": "any",  
            "received": "string", 
            "status": "string", 
            "thingID": "string", 
            "updated": "string"
        },
        "id": "string"
    }

======= using wotp member names: ========
<-- { 
        "wotp": "<version>", 
        "op": "string",
        "result": {
            "corrID": "string",
            "error": "string",  
            "affID": "string", 
            "output": "any",  
            "rxTs": "string", 
            "status": "string", 
            "thingID": "string", 
            "udTs": "string"
        },
        "id": "string"
    }

@unit9a
Copy link

unit9a commented Jan 4, 2025

@hspaay when using webSockets,

  1. is there a difference between an individual websocket connection ID is and the "senderID" Request messages field?
  2. can they even be the same thing?

@RobWin, i will start with WebSocketStream but will eventually support use both rsocket.io

@unit9a
Copy link

unit9a commented Jan 4, 2025

@benfrancis
I am chewing on the idea of replacing the "json-rpc" version identifier member with a json-LD style "@context" member used in a similar way "@context" is use in Wot thing description example 1

sequenceDiagram
    autonumber
    participant wot1 as thing1
    participant wot2 as thing2

    critical Negotiate "@context for rpc session"
        wot1->> wot2: { wotp:"0.1",<br>params = <@context value in example 1>}
        alt success
            wot2->> wot1: { wotp:"0.1", result = {wotpSessionID: <short uuid>, contextInfo:{<selected context>}}}

            Note right of wot1: both WoT entities map the <wotpSessionID> to @context value for the session

            par thing1 messages 
                wot1->> wot2: { @wotp: "<wotpSessionID>"... }
            and thing2 messages
                wot2->> wot1: { @wotp: "<wotpSessionID>"... }
            end

        else fail
            wot2->> wot1: { wotp:"0.1", result :{error: "no supported rpc context...", output: ...}}
        end
    end
Loading

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 4, 2025

@unit9a

  • is there a difference between an individual websocket connection ID is and the "senderID" Request messages field?
  • can they even be the same thing?

Yes there is a difference between a connectionID and senderID:

  • The purpose of the senderID is for use in authorization, logging or other client specific behavior, during processing of the message. It identifies the connected client and is intented to be set by the transport protocol which has the authentication credentials for each connection.
  • A connection ID however is unique per connection. Connection-ID is used internally in hiveot as the reply-to address, (linked to by the correlationID), to ensure the response goes to the correct connection. It is intended for routing the response in a hub or gateway. It is not exposed. (no use-case?)

Hope that explains it.

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 4, 2025

@unit9a I'm unclear on what WoTP is and what problem it is intended to solve.
Is it an alternative format from the initially proposed messages or is there another purpose behind it? Is it intended for interoperability with another existing format?

For example, what is the thinking behind using "corrID" instead of "correlationID".

That aside I don't see a problem with the mapping you describe other than it looks like doing the same thing with different names. Doing the same thing is good as it makes mapping easy btw.

@unit9a
Copy link

unit9a commented Jan 4, 2025

@hspaay

I'm unclear on what WoTP is and what problem it is intended to solve

"WoTP" was meant as an acronym for this"Web of Things Protocol" draft. i just wanted something short to type.



For example, what is the thinking behind using "corrID" instead of "correlationID".

Good catch, they are supposed to mean the same thing but i did not make that clear.

so i guess more clear & updated mappings are:

Request & Response messages fields Json-prc Request object members proposed WoTP ontology names
name param.name affordanceID or affID
correlationID param.correlationID correlationID or corrID


Doing the same thing is good as it makes mapping easy btw.

yeah, that is what I wanted to convey

@unit9a
Copy link

unit9a commented Jan 4, 2025

follow up, #34 (comment)

for some simplification, I realized that from the perspective of json-LD: "@context" functionally does the same thing as "type", "json-rpc" and "wotp".

Response messages fields Json-prc Request object members proposed WoTP ontology names
type indicated by member names indicated by member names

so the purpose of Negotiating "@context for rpc session" just amounts to establishing a short string identifier for the schema of response, request, and notifications to be used for subsequent messages.

so rather that deviate from the json-rpc spec, all the "WoTP" stuff i proposed.
using a response message for example

<-- { 
        "wotp": "<version>", 
        "op": "string",
        "result": {
            "corrID": "string",
            "error": "string",  
            "affID": "string", 
            "output": "any",  
            "rxTs": "string", 
            "status": "string", 
            "thingID": "string", 
            "udTs": "string"
        },
        "id": "string"
    }

could be simplified down to:

<-- { 
        "jsonrpc": "2.0", 
        "result": {
            "@context": "ID assigned to the _negotiated_ schema and extension info", 
            "op": "string - request operation this is a response to",
            "corrID": "string - identifies the request this is a response to.",
            "error": "string - Error title if status is 'failed'",  
            "affID": "string- Name of the affordance ", 
            "output": "any",  
            "rxTs": "string - received Timestamp", 
            "status": "string - Status of the request processing ['pending', 'running', 'completed' or 'failed']", 
            "thingID": "string - ID of the thing the request applies to", 
            "udTs": "string - updated Timestamp "
        },
        "id": "messageID string"
    }

@hspaay do you think this makes it even more clear the same thing is being done?

Also what do you think of "@context" being use to establish the schema of all possible WoT interactions for the remainder of the connection session between consumers, Thing agents, hubs, and gateways. kind of like a json-LD version of an openAPi spec.

@unit9a
Copy link

unit9a commented Jan 6, 2025

more simplification of my json-rpc transport layer
Request
rpc spec says that it can be extended by prefixing method (or operation) names with "rpc."

8 Extensions
Method names that begin with rpc. are reserved for system extensions, and MUST NOT be used for anything else. Each system ?> extension is defined in a related specification. All system extensions are OPTIONAL.

so using rpc extensions, the Request method/operation names gain a prefix of: "rpc."
but i think a prefix like "rpc.wot." or "rpc.wotp." is more clear.

-->{ 
        "jsonrpc": "2.0"
        "method":  "<string - operation name>", 
        "params": {
            "@context": "ID assigned to the _negotiated_ schema and extension info", 
             "thingID": "string - ID of the thing the request, 
             "affID": "string- Name of the affordance ", 
            "input": "any", 
            "correlationID": "string - Unique identifier of the request...", 
            "senderID": "string - Authenticated sender of the request."
        },
        "id": "string - messageID"
}

becomes:

--> { 
        "jsonrpc": "2.0", 
        "method": "rpc.wot." + "<string - operation name>", 
        "params": {
            "@context": "ID assigned to the _negotiated_ schema and extension info", 
             "thingID": "string - ID of the thing the request, 
             "affID": "string- Name of the affordance ", 
            "input": "any", 
            "correlationID": "string - Unique identifier of the request...", 
            "senderID": "string - Authenticated sender of the request."
        },
        "id": "string - messageID"
    }

@RobWin
Copy link
Collaborator

RobWin commented Jan 6, 2025

Somehow I have the feeling that this issue was hijacked :)
Please keep JSON RPC out of the discussions.
The web thing protocol and then proposal is first of all a protocol for Websockets.

@VigneshVSV
Copy link

VigneshVSV commented Jan 6, 2025

I like this version much better than the previous/existing strawman proposal of webthing protocol.

There must be a separation between the "type" and the "operation".

@VigneshVSV
Copy link

I would appreciate if somebody here also starts accounting pre-encoded binary payloads into the message.

In JSON, I heard its easier to encode base64 strings, but I think the use of a generic binary payload in a broader sense would be very useful.

This is similar to extracting the buffer value from InteractionOutput in node-wot, without deserializing with an existing known content type of the TD.

@unit9a
Copy link

unit9a commented Jan 6, 2025

Somehow I have the feeling that this issue was hijacked :) Please keep JSON RPC out of the discussions. The web thing protocol and then proposal is first of all a protocol for Websockets.

@RobWin how? this is just a transport concept with json-Rpc. complying with:

On the transport layer below this, the more the merrier. Encodings like json-rpc, bson, protobuf, capn'proto with matching transports like websockets, gRPC, mqtt, and so on, are all good to support. This is at least my thinking with this proposal.
also



The web thing protocol and then proposal is first of all a protocol for Websockets.

@RobWin are you talking about inserting @hspaay web thing protocol proposal's fields/datums in into the websocket message event class and life cycle? if so, then I apologize for not understanding and raising the json-rpc-stuff here.

my json-rpc idea is only meant to be a transport layer for this web thing protocol proposal's data, One that is encoded as the raw binary payload of the websocket message while still being compatible with current json-rpc use.

@RobWin & @hspaay should i move the json-rpc/json-LD transport idea to its own issue?


Also, what do you all think of a pure json-LD representation of this web thing protocol proposal's request, reponse, and notification messages?

@VigneshVSV

In JSON, I heard its easier to encode base64 strings, but I think the use of a generic binary payload in a broader sense would be very useful.

agreed! @RobWin pointed me to https://rsocket.io/ and its use of binary payloads is webSockets. this approach is was i want to pass my json-rpc/json-LD objects into. Also, I came across this in a stackoverflow comment:

however javascript environments (like v8/node.js) are heavily optimized for JSON handling (because it's a subset of javascript).

this, tracks with my experiences, so i dont think binary offers any significant advantage outside of IoT embedded systems constraints. But that still makes it a useful option to have.



There must be a separation between the "type" and the "operation".

i think @hspaay already addressed with:

name data type description required
type string "request". Identifies the message as a request message mandatory
operation string Describes the request to perform mandatory

or am I* not understanding you mean? in json-rpc the separation is done the objects structure.

@RobWin
Copy link
Collaborator

RobWin commented Jan 6, 2025

@unit9a I think that the JSON-RPC discussion should be moved out of this issue. Perhaps it could continue in the WoT Discord channel until someone expresses interest in creating a dedicated Sub-Protocol Community Group.

The scope of this Community Group is outlined in the Web Thing Protocol Charter and includes:

  • Definition of a WebSocket sub-protocol for the Web of Things.
  • Definition of an HTTP sub-protocol for the Web of Things.

Out of scope are:

  • Protocol bindings/sub-protocols for non-web or non-internet protocols.

I believe we should maintain a narrow focus within this Community Group's scope and discussions. While I also see the appeal of reusing the Web Thing Protocol as a sub-protocol for AMQP or MQTT bindings, this is beyond the current scope of this community group.

@benfrancis It might be worth clarifying in the charter document what is meant by "internet protocols" to set clearer boundaries.

As for the HTTP Profile, it is detailed in the HTTP Basic Profile. It follows a RESTful design and does not utilize JSON-RPC. In my understanding, JSON-RPC would constitute a distinct sub-protocol (or profile?) within the HTTP protocol binding. That said, we should aim to limit the number of profiles or sub-protocols for HTTP to preserve interoperability.

Looking forward to further discussions on this topic in the appropriate channels!

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 6, 2025

This is a great discussion. It highlights there are concerns on messaging, transport and encoding levels. Thank you all for your insights.

However, for the sake of keeping this specific discussion on track it is probably good to split the transport and encoding from the messaging issue as @RobWin pointed out.

@unit9a. This issue was originally intended as part of feedback to the websocket strawman proposal by @benfrancis . It offers a simpler alternative to the current message formats that are in the proposal.

The idea is that it can evolve to an application level protocol with various transports and encodings, as your examples show, requires though that the message format is adopted as a stand-alone application protocol. This is definitely out of scope for the strawman proposal which is a subprotocol of the WoT http binding. I don't want to lose this discussion though as it is valuable but it should probably move to another issue, outside of the strawman proposal.

@benfrancis is an application protocol with transport protocols and encodings like discussed here in-scope for the web thing protocol discussion group? If so, should it be considered a separate proposal? What is the process for this?

@benfrancis
Copy link
Member

On the issue of generalising the message payload format across message types I can certainly see the argument. As a Software Engineer I understand the instinct to abstract away similar features into a common parent class/schema, which is something we are literally trained to do (e.g. through object oriented programming).

However, I want to state that the Web Thing Protocol is not intended as a general purpose request/response or publish/subscribe protocol. It is designed for the very specific purpose of communicating with Things on the Web of Things, using the set of WoT operations and WoT native terminology.

It was therefore a conscious design decision to directly map WoT operations onto message types and not to try to design another general purpose protocol. If that's what someone is looking for then there is a very long list of existing WebSocket sub-protocols which try to achieve just that https://www.iana.org/assignments/websocket/websocket.xhtml

I have more to say on this topic and will work my way through all of the responses because I think this is an interesting discussion with some specific points I would like to respond to, but I wanted to reply to your top level post first of all.

To some specific points in your initial post:

@hspaay wrote:

This is proposal 2 draft for standardizing the message envelopes for all protocol bindings

If that is what your proposal is about then I would suggest this is the wrong place for it, it should be proposed in https://github.com/w3c/wot-binding-templates/. However, I don't personally think this payload format can or should be adopted by all protocol bindings. Many protocols have existing fields for many of these members, which is the reason that the vocabularies in protocol binding templates exist - to map WoT concepts onto fields in messages in existing protocols.

WoT protocol bindings all solve the same problem. Send requests, receive responses and send notifications.

I think this is the wrong abstraction. The Web of Things models interactions in terms of "properties", "actions" and "events". Some protocols use a request/response pattern, some protocols use a pub/sub pattern, and some use a combination of both or something entirely different. Protocol bindings map WoT operations onto messages in a given protocol.

Separating the application protocol from the transport enables defining a single application protocol that can be used on any transport.

I think you're going to have a very hard time pushing this "application protocol" onto all the other protocol bindings. Whilst you could argue the payload format makes sense for a raw TCP socket like WebSockets, it doesn't make sense for most of the other protocols used by WoT because it duplicates a lot of the features of existing protocols. For some protocols it would add a redundant extra wrapper around messages and for others it wouldn't be possible to implement at all.

The WoT protocol bindings does not describe how Thing agents connect as a client to a hub or gateway. Instead the assumption is that all Things run a server.

This is not strictly true. The HTTP Webhook Profile assumes that Things are clients and Consumers are servers for example. The terms Consumer and Producer were used in the WoT specifications instead of client and server for this very reason. However, it is true that Thing Descriptions describe how to interact with a Thing using Forms which require URLs for endpoints. This does make it very tricky to describe Things which clients rather than servers.

This is too limited of a view. Hiveot is a Hub where Thing agents are clients to the Hub just like consumers. WoT does not describe the interaction for these agents.

This is really a wider issue for the Web of Things (the Thing Description specification in particular), rather than the Web Thing Protocol. Defining a Web Thing Protocol which can flip the client/server roles does not solve the fundamental problem of how to describe devices in a WoT Thing Description.

readevent, readallevents (of a Thing) [HiveOT extension]

See w3c/wot-thing-description#892

FYI There is some work under the new WoT Working Group charter around querying time-series data.

readtd, readalltds (of a directory or thing) [HiveOT extension]

An API for managing a collection of Web Things is explicitly out of scope for the Web Thing Protocol and is already covered by the Directory Service API in the WoT Discovery specification.

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 6, 2025

Thank you for a very well thought out response @benfrancis. While I might not share the same perspective about this as you do, it is clear that you know exactly what you want to achieve. It looks like this issue has no chance of being adopted in the Strawman proposal, so I won't persue it here any further.

This issue did provide very useful and insightful feedback for which I'm grateful to all commentors. It would be nice IMHO to keep it open as a useful discussion until it is addressed elsewhere.

I will look into defining a separate proposal elsewhere for a WoT application protocol with this message format and the mentioned multiple underlying transports. Let others decide whether they find it useful to them. I'll post the link once it becomes clear where this should go instead. Maybe @egekorkan can point in the right direction.

In the meantime @benfrancis I'll continue with full support for the strawman proposal as it evolves and attempt to provide an implementation in hiveot.

@benfrancis
Copy link
Member

benfrancis commented Jan 7, 2025

@RobWin wrote:

suggest renaming requestID to correlationID.

This would enhance functionality, especially for subscriptions to property changes or events.

Just curious, why would changing the name enhance the functionality? I think we have a general consensus so far that some kind of requestID/correlationID would be a good idea, but I do have some open questions about how that should work when subscribing to events and observing properties.

See #31

For example a stream could be implemented with a reactive programming library like ReactiveX. And the websocket client does not return a single stream of notifications, but could separate notifications into multiple individual streams.
A correlationID (acting as a subscriptionId) can help dispatch notifications to the appropriate stream .

I don't have experience with ReactiveX and I'm not exactly sure what you mean by multiple individual streams. Is your intention that a Consumer should be able to have multiple subscriptions to the same event at any one time, with each subscription identified by a unique ID and individually unsubscribable? If so what benefit would that bring which justifies the added implementation complexity of managing multiple subscriptions per Consumer?

See #29

@hspaay wrote:

I just completed implementing this proposal in hiveot as an application layer. The http/wss/mqtt transports maps from the three message types to the specified message types for that transport.

What do you mean by "mapping" to a "transport"? How do you map a notification to SSE for example? Are you sending the whole message envelope defined above inside the data field of an SSE event for example? Or are you somehow using the event and id fields as well? If it's the former then why aren't you using the built-in features of the protocol? If it's the latter that isn't so much a "transport" as a "binding" and no different to the HTTP SSE Profile, it's just an internal implementation detail.

Using 'operation' in the messages also works quite well to further process the message at the application level. There is no tension (mapping, translation) between protocol and application, and no need to define separate message types to deal with responses.

I've mentioned before that I started out with the approach of using operation names to differentiate message types but found the need for other message types (like propertyReading, propertyReadings, actionStatus, event, ping and pong) which don't have a 1:1 mapping with operations. I can see that having three message types of "request", "response" and "notification" with a separate operation field is quite neat, but it does assume exactly one request, response and notification message type for each WoT operation and no other message types.

This works well with intermediaries like a hub. Prior there was a struggle to mentally keep track of who is sending/receiving what and in reply to what. That problem has pretty much disappeared. A request goes from consumer to Thing, a response the other way and notifications are send and forget by the Thing.

Would a requestID in all messages solve this problem?

Not having to do the mapping of messages types saves a lot of code.

Surely you still have to map operations onto messages and back, the format of the messages is just different...

Forms are also simpler. No need to have a form entry for each operation.

Why did you need multiple form entries for each operation before? Have you seen the example Thing Description in the strawman proposal which has one form per interaction affordance? That is actually one of the benefits of the WebSocket sub-protocol compared with a declarative HTTP binding for example.

I haven't run into a need to define forms with the affordances either.

Are you aware that in a Thing Description the forms member of an InteractionAffordance is mandatory and is not allowed to be empty?

@RobWin wrote:

But from a protocol level, I still miss acknowledgements.

See #29

@unit9a wrote:

do you even think json rpc is relevant at all?
Also, what do you all think of a pure json-LD representation of this web thing protocol proposal's request, reponse, and notification messages?

I can definitely see the similarity with @hspaay's proposal (except that in JSON-RPC notifications usually work in the opposite direction). I can certainly imagine something that looks like the WoT Scripting API implemented over JSON-RPC, but I'm not personally interested in that as a design for similar reasons to those given above. I would rather not abstract everything as a method.

@VigneshVSV wrote:

There must be a separation between the "type" and the "operation".

Why?

I would appreciate if somebody here also starts accounting pre-encoded binary payloads into the message.
In JSON, I heard its easier to encode base64 strings, but I think the use of a generic binary payload in a broader sense would be very useful.

This is one of the limitations of JSON, and yes base64 encoding values as strings is a common solution to that problem.

There are alternatives like BSON, CBOR and Protocol Buffers which don't have this problem. However:

  1. The current requirements document specifies that all messages in the WebSocket sub-protocol should be serialised in JSON
  2. Because WoT Thing Descriptions use JSON Schema for data schemas, as far as I know the only way to describe binary payloads is as a string anyway

I'm not inclined to switch away from JSON for this reason alone, since its ubiquitous support brings a lot of benefits and it being the default serialisation assumed in Thing Descriptions makes it the obvious choice.

That said, the charter does allow for the "Evaluation of other potential Web of Things sub-protocols (e.g. for CoAP)". I can imagine a future variant of the Web Thing Protocol using CoAP + CBOR for constrained devices that would struggle with WebSockets + JSON, but that is not the current focus.

@RobWin wrote:

@benfrancis It might be worth clarifying in the charter document what is meant by "internet protocols" to set clearer boundaries.

I think it's clear that an "internet protocol" is anything that works over IP. Much trickier to define is a "web protocol". A broad definition would be anything with a URI scheme registered with IANA, but I tend to have a much stricter view and haven't yet come across anything other than HTTP and CoAP that I would describe as a web protocol. Even WebSockets is stretching the definition of "web" for me, but that's another story.

JSON-RPC can be used used as a sub-protocol of both HTTP and WebSockets so arguably does fall within that scope, it's just not something I'm personally interested in. It could make an interesting protocol binding template though.

That said, we should aim to limit the number of profiles or sub-protocols for HTTP to preserve interoperability.

I strongly agree with this.

@hspaay wrote:

for the sake of keeping this specific discussion on track it is probably good to split the transport and encoding from the messaging issue as @RobWin pointed out

👍

@benfrancis
Copy link
Member

benfrancis commented Jan 7, 2025

@hspaay wrote:

@benfrancis is an application protocol with transport protocols and encodings like discussed here in-scope for the web thing protocol discussion group?

No. But let me explain why because I think it's important.

The Web of Things is basically designed to be everything-agnostic. Protocol agnostic, serialisation format agnostic and programming language agnostic. If you wanted to you could create a valid "Web Thing" which communicates entirely using animated GIFs over the IRC protocol! Depending on who you listen to this is either its biggest strength or its biggest weakness.

The benefit of this approach is that it theoretically makes it possible to describe any brownfield device using any existing IoT protocol and have it be a valid Web Thing.

The downside of this approach is that any given WoT Consumer can only communicate with a tiny subset of all the possible Web Things in the world. Imagine a 2-dimensional matrix of all the possible protocols and payload formats and the proportion of combinations supported by a given Consumer implementation will be tiny. Now expand this by n dimensions to allow for the other WoT extension points like security mechanisms, discovery mechanisms, link relation types and semantic contexts. Now you have a serious interoperability problem.

The mission of this community group is to "define a common protocol for communicating with connected devices over the web, to enable ad-hoc interoperability on the Web of Things."

The ideal outcome would be a single protocol that all greenfield WoT implementations can opt into using in order to benefit from complete out-of-the-box interoperability.

As soon as you start abstracting things into layers and talk about alternative transport protocols and encodings you have immediately fragmented that landscape again, which is counter to the mission of the group.

The charter takes a slightly more pragmatic view than this by allowing for both an HTTP sub-protocol and WebSocket sub-protocol (because HTTP has some significant limitations for IoT use cases), and for the exploration of other potential WoT sub-protocols "e.g. for CoAP" for use cases where the other two options are simply not possible to use.

Apart from that, the main thing that concerns me about your proposal in this issue is that by generalising into requests, responses and notifications you are basically re-inventing HTTP+SSE (or CoAP+Observe) over WebSockets. The Web Thing Protocol WebSocket sub-protocol is intended for the very specific purpose of communicating with connected devices over the Web [of Things], not a general purpose application protocol for the web, which is why my strawman proposal directly maps WoT concepts onto messages.

Genuine suggestion: Rather than try to re-invent HTTP+SSE over WebSockets, how about you (or we) take the underlying features we need to the IETF task force that works on the HTTP protocol and propose them for the next version of HTTP?

If the next version of HTTP supported persistent connections, multiple requests and responses over the same socket and messages pushed from the server to the client then we arguably wouldn't need a WebSocket subprotocol. We could just define a default HTTP/4 (?) protocol binding instead. I would actually prefer that to having separate HTTP and WebSocket sub-protocols.

HTTP/2 has a push feature but which doesn't really support our use cases, and HTTP/3 supports multiplexing over QUIC. But as far as I can tell not even HTTP/3 does everything that we need.

In the meantime, we have JSON and WebSockets.

@RobWin
Copy link
Collaborator

RobWin commented Jan 7, 2025

Just curious, why would changing the name enhance the functionality? I think we have a general consensus so far that some kind of requestID/correlationID would be a good idea

I think the the term correlationId is more fitting than requestId because it better captures the broader purpose of the identifier. While requestId implies a straightforward, one-to-one request-response interaction, correlationId highlights its role in linking related messages across various communication patterns.

This distinction is particularly relevant for scenarios like subscriptions and property observations, where a single request may lead to multiple or continuous responses over time. For instance, event messages in a subscription or propertyReading messages in an observation context are not one-time responses but part of an ongoing stream, making correlationId a more accurate descriptor.

Moreover, the name correlationId is versatile and can support diverse communication patterns:

  • Request/Reply: A single request corresponds to a single response.
  • Client-Streaming: Multiple request messages result in a single response. All messages share the same correlationId, including the response. (Currently not needed in WoT, but maybe in the future?)
  • Server-Streaming: A single request leads to a continuous stream of responses. All response messages in the stream share the same correlationId. (Example event subscription or property observation)

But this implies that "request" messages would also have a dedicated correlationId field, distinguishing it from the messageId, which is used to uniquely identify individual messages. The correlationId serves to group related messages together, providing a clear link across different messages, while the messageId identifies each individual message within that group.

AMQP it is called correlation-id.
In MQTT 5 request-response is called correlation data

Is your intention that a Consumer should be able to have multiple subscriptions to the same event at any one time, with each subscription identified by a unique ID and individually unsubscribable?

No, my intention is that a Consumer can have subscriptions to multiple events (or observations to multiple properties), each subscription resulting in a dedicated (reactive) event stream that can be individually canceled (unsubscribed from).
Upon receiving an event or property change message, the consumer code dispatches it to the appropriate stream. Using the property or event name alone as a unique key isn’t sufficient for managing these streams. For example, imagine a Consumer subscribes to an event or property, then unsubscribes and subscribes again rapidly. If the event or property name is the only identifier, the system can't clearly differentiate between the initial and subsequent subscriptions, leading to race conditions where an event or property message might be dispatched to the wrong (cancelled) stream. Without a correlationId in messages the code is a little more difficult to be implemented in a thread-safe manner.

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 8, 2025

@benfrancis

Apart from that, the main thing that concerns me about your proposal in this issue is that by generalising into requests, responses and notifications you are basically re-inventing HTTP+SSE (or CoAP+Observe) over WebSockets.

Thank you for elaboration Ben. You've mentioned this before and unfortunately I don't agree whether this is true and whether this is even relevant.

The proposed messages are clearly application level as they are tailored for messaging related to things with properties, events and action affordances. This is not a general purpose message format.

The websocket messages defined in the original proposal are very similar to this approach. Instead of defining a message per operation (and sometimes a separate message for its response), this proposal groups messages by their intention (request, response, notification). The payload is also similar but instead of changing the member names to match the affordance (event, property, action) it uses 'name'. In both cases you have to decode the message to determine further processing.

I don't see how this even gets close to reinventing http/sse or coap/observe over websockets. The application should not be driven by the underlying protocol but the use of the protocol should be driven by the need of the application. Top down, not bottom up. It really doesn't matter what features the underlying protocol support and it whether they are all used. What matters is that a WoT application can invoke an action, write a property, get a response and receive notifications. These application level concerns are expressed in the proposed messages.

I hope this kinda helps explain my perspective. I'll continue exploring both points of view.

@benfrancis
Copy link
Member

@RobWin wrote:

But this implies that "request" messages would also have a dedicated correlationId field, distinguishing it from the messageId, which is used to uniquely identify individual messages.

Just checking, should "request" read "response" here? Is the idea that request messages only contain a messageID but response messages contain both a messageID and a correlationID? Or would request messages contain both as well?

@RobWin
Copy link
Collaborator

RobWin commented Jan 8, 2025

No, I meant that some messaging protocols allow both a unique message ID and a correlation ID even in request messages. Not only in response messages.

These fields serve distinct purposes:

  • The message ID uniquely identifies each message, ensuring traceability and de-duplication.
  • The correlation ID links messages within a specific context, such as a request-response flow or a group of related messages.

For example, in a "multiple requests, single response" pattern, the correlation ID could be used to group multiple request messages under a shared context, while each request retains its own unique message ID.

In such messaging protocols, the consumer of the request typically copies the correlation ID from the request into the correlation ID field of the response, instead of using the message ID. This ensures the responder maintains the context established by the requester, enabling the requester to match responses to the original requests.

@benfrancis
Copy link
Member

benfrancis commented Jan 8, 2025

@hspaay wrote:

I don't see how this even gets close to reinventing http/sse or coap/observe over websockets.

OK fair enough, that is probably an exaggeration and you are right that there are lots of application specific members in your proposed message format.

I think what I'm trying to get at is that what I set out to do was to create a protocol with a very direct representation of WoT operations as messages, without needing a binding to map those operations onto the terminology and concepts of an existing protocol.

The generalisation of "requests", "responses" and "notifications" proposed in this issue feels like an additional layer of abstraction above WoT concepts.

In particular, although "response" is a term used in the WoT specifications (e.g. the response and additionalResponses members of Forms), the specifications don't have such a clear concept of a "notification" as distinct from a "response".

E.g. is the actionStatus message from the current strawman proposal a "response" or a "notification"? Some actions could immediately respond with a single completed status response message, but some long running actions could result in multiple actionStatus messages being sent each time to the status of the action changes (pending -> running -> completed -> failed). (See the distinction between synchronous and asynchronous actions in the HTTP Basic Profile). If an actionStatus message is in response to a queryAction message it's a response, but if it's in response to an invokeAction message it's more like a notification.

Having said all of the above, I do like the idea of having an "operation" member which directly takes an operation name from a Thing Description, and a consistent "name" member across interaction affordances (although that one is a bit less clear). I don't like property values being referred to as "input" and "output" though, because those terms are only used in relation to actions in the WoT specifications.

I will give it some more thought.

@benfrancis
Copy link
Member

@RobWin wrote:

No, I meant that some messaging protocols allow both a unique message ID and a correlation ID in request messages. These fields serve distinct purposes

Ah OK. I think in all current WoT use cases the messageID from a request could be used as the correlationID in a response, but you're considering another use case we don't currently have where they may be multiple correlated request messages.

I see the rationale, but I am also reluctant to require three different ID members in every message if they are not needed.

@RobWin
Copy link
Collaborator

RobWin commented Jan 8, 2025

Why 3? You meant the thingId as well?

@benfrancis
Copy link
Member

Why 3? You meant the thingId as well?

Yep

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 8, 2025

@benfrancis wrote:

Genuine suggestion: Rather than try to re-invent HTTP+SSE over WebSockets, how about you (or we) take the underlying features we need to the IETF task force that works on the HTTP protocol and propose them for the next version of HTTP?

I'm honored that you think I can be of any help there. My skills at the bit level have deteriorated over time unfortunately and my focus is on hiveot. @RobWin seems to be much better at this though so maybe he would be able to help 😄

@benfrancis wrote:

I see the rationale, but I am also reluctant to require three different ID members in every message if they are not needed.

Yes, I don't see a use-case to always require the messageID, so IMHO it can be optional in requests. @RobWin has done an excellent job describing use-cases in #35. We can probably continue the messageID discussion over there.

OK fair enough, that is probably an exaggeration ..

Chuckle 😄

I think what I'm trying to get at is that what I set out to do was to create a protocol with a very direct representation of WoT operations as messages, without needing a binding to map those operations onto the terminology and concepts of an existing protocol.
The generalisation of "requests", "responses" and "notifications" proposed in this issue feels like an additional layer of abstraction above WoT concepts.

Yes, and the strawman proposal does exactly that. Can't argue on that. This hiveot proposal doesn't deviate that much on the request/notification side other than a minor generalization.
I wasn't sure at first either if this 'generalization' would be a help or hindrance in using it, so I implemented both approaches. (yeah that was my December :)) I was surprised how much easier the implementation became (coding, testing, debugging) using just the 3 messages. Most likely because the post processing after unmarshalling the message has to be done by the application anyways and it didn't make much of a difference there. If anything it simplified debugging. There is just less stuff to code with the 3 messages.

In particular, although "response" is a term used in the WoT specifications (e.g. the response and additionalResponses members of Forms), the specifications don't have such a clear concept of a "notification" as distinct from a "response".
E.g. is the actionStatus message from the current strawman proposal a "response" or a "notification"?

The lack of a clear specification on action acknowledgement is true, but to me that is more of an omission than intent. The same goes for acknowledgement of subscriptions and property write request as discussed elsewhere.

Hiveot needs these to function properly so I have no choice other than support it somehow. My choice is to incorporate in the messaging in the hope that WoT 2.0 will take notice and adopt some of these ideas. This doesn't take away from the use of these messages for things that are defined in the TD, so nothing lost. It opens the door to standardization of these improvements in the future. Hence this proposal.

Bottom line is that I can't just say, sorry no ack because the spec doesn't mention it. I'm also looking at the gateway/hub use-case which is quite a bit more demanding than simply consumer->Thing.

Wrt terminology overloading. 'Response' is widely used in many protocols so I wasnt thinking it to be a problem. I'm open to renaming though if it helps make things clearer.

E.g. is the actionStatus message from the current strawman proposal a "response" or a "notification"? Some actions could immediately respond with a single completed status response message, but some long running actions could result in multiple actionStatus messages being sent each time to the status of the action changes

In this proposal the action status message is replaced with a response message, which has a status field. There is no need for an actionStatus message or propertyReading message, which can be considered unnecesary artifacts. The combination of message type 'ResponseMessage' and the original operation is all that is needed. I haven't run into a problem with this.

The hiveot http binding (golang) implementation in hiveot supports both an immediate response from the http and async responses via SSE. In golang this is easily done using channels. Direct response is posted on the result channel as are asynchronously received responses. The application waits for a response by listening on the channel and therefore doesnt care how it gets there. The 'status' field indicates if more responses are to be expected. Once status is completed the last response has been received.

Clearly this needs much better documentation that I've provided thus far. Sequence diagrams for various use-cases should clarify the simplicity of this approach. This is in the works.

Anyways, I hope this helps explaining this can actually work. If a little demo down the road would help, along with documentation, I'd be happy to present.

In the meantime, I'm giving the full support to the strawman proposal. This hiveot proposal is in my eyes an evolution, which can be included or be a separate protocol. Its all good.

Thanks for the feedback and comments Ben and Rob.

@unit9a
Copy link

unit9a commented Jan 9, 2025

@hspaay why is there no dedicated Error message type? I am not opposed to how it is now. can you give an example of how a response object with both error, output fields can be handled?

my assumption is that:

  • a response with both fields means the errors should be treated as warnings.
  • a response with only error should be treated as exceptions.

having a dedicated error response is more explicit but might not be as efficient. does it make for a better developer/debugging experience?

just curious and learning.

@RobWin
Copy link
Collaborator

RobWin commented Jan 9, 2025

I’m not referring to this specific proposal here, but just to inform you, the Web Thing Protocol Strawman Proposal adopts dedicated error messages aligned with the Problem Details for HTTP APIs specification, a widely used standard for HTTP APIs. The WoT HTTP Profile also utilizes this approach, as detailed in Error Responses in WoT HTTP Profile.

One could argue that the RFC is too complex for most use cases.

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 9, 2025

@unit9a wrote:

@hspaay why is there no dedicated Error message type?

The motivation is that an error is also a response. The response has a status field with predefined values, one of them is 'failed'. If the status field holds failed then the error field holds the error title and the 'output' can hold additional detail.

I'm on the fence whether the 'error' field is even needed and whether the output field should contain the error if status is failed. The only reason that wasn't done is that the http error handling recommends a separate details field, so there are use cases where more information is useful. It is not the intent to implement RFC9457 btw.

It is of course possible to define a separate error message as the strawman proposal does here. Nothing wrong with that. For the proposed minimalistic application protocol this isn't needed though as explained.

@benfrancis
Copy link
Member

benfrancis commented Jan 15, 2025

I've been thinking more about splitting out operation as a separate member, with messageTypes of:

  • request
  • response
  • notification
  • error

E.g.

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "c370da58-69ae-4e83-bb5a-ac6cfb2fed54",
  "operation": "readproperty",
  "messageType": "request",
  "name": "on",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}
{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "9e05ebad-1a65-47f8-8bf1-fdb2ab215a7e",
  "operation": "readproperty",
  "messageType": "response",
  "name": "on",
  "value": true,
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}
{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "f2cb66ae-fb3c-4f9f-b8f6-0967170b142d",
  "operation": "readproperty",
  "messageType": "notification",
  "name": "on",
  "value": true,
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}
{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "f6c3d1a5-1b4b-4e64-8876-0a864524ce9d",
  "operation": "readproperty",
  "messageType": "error",
  "title": "Not Found",
  "status": "404",
  "detail": "No property found with the name on",
  "instance": "https://mythingserver.com/errors/426653",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

The problem is that the distinction between "response" and "notification" is not always clear. For example:

  • Is a property reading a "response" if in response to a readproperty request but a "notification" if in response to an observeproperty request (with an otherwise identical payload)?
  • Is an action status a "response" or a "notification" if there could either be one message (synchronous action) or multiple messages (asynchronous action)?
  • Is the acknowledgement of an event subscription/unsubscription or property observation/unobservation a "response"?
  • Does a cancelaction operation have a response too, and does that follow the same format as an invokeaction response/notification message?

It kind of works but I'm just not sure if there is an established enough WoT-native taxonomy to make this distinction intuitive and future proof. We'd really be inventing the concept of "notifications" as distinct from "responses" which doesn't clearly exist in other WoT specifications. Maybe that's OK given we're also introducing other terms like correlation ID...

What do you think?

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 15, 2025

Is a property reading a "response" if in response to a readproperty request but a "notification" if in response to an observeproperty request (with an otherwise identical payload)?

Yes that is correct.

  • When sending an observeproperty RequestMessage, the ResponseMessage is simply the acknowledgement with 'observeproperty' as the operation. No payload.
  • A response to a 'readproperty' RequestMessage will indeed return a ResponseMessage containing the 'readproperty' operation and the property value as payload.
  • When receiving updates to observed properties, it is received in the form of a NotificationMessage with the 'observeproperty' notification, containing the new property value as a payload.

There is no ambiguity here. It is clear why each message was sent. Somewhat easier to debug as well since request flow messages doesn't 'interfere' (hmm, trying to find the right word for this. 'disguise' or 'look like') with the observation flow messages.

In comparison a propertyreading message in the strawman proposal can be sent as a response to a request or as the result to a observation. You don't know which one until you use the correlationID and match them up. I found this a bit harder to track in the logs as well.

Is an action status a "response" or a "notification" if there could either be one message (synchronous action) or multiple messages (asynchronous action)?

This is a good example where IMHO the application and transport layers get mixed up. There should be a separation of concern between transporting the action and performing the action. The status of an action is an application concern and have nothing to do with the transport.

In this proposal the "action status" is the payload of a ResponseMessage in result to a 'queryaction' RequestMessage. The operation in the ResponseMessage is 'queryaction'. It is similar to the readproperty request where the response contains the property value. The payload is an application concern (and defined in the TD).

So in short. ActionStatus is not a message. It is a payload of a response. Yeah I know, the HTTP Basic profile defines it too. This is the incorrect approach IMHO. The action status should be a TD concern, not a protocol binding concern. @egekorkan, my feedback to the WoT group.

Is the acknowledgement of an event subscription/unsubscription or property observation/unobservation a "response"?

Yes, just like any other request, a response will acknowledge with the request. There is no payload.

Does a cancelaction operation have a response too, and does that follow the same format as an invokeaction response/notification message?

Yes. The sender of a 'cancelaction' request receives a response and is no different than invokeaction as far as the transport concerns. Actions are application concerns and the protocol should only concern itself with transporting the requests and responses for them.

It kind of works but I'm just not sure if there is an established enough WoT-native taxonomy to make this distinction intuitive and future proof. We'd really be inventing the concept of...

Well this is a bit of a gray area. I feel we are breaking new ground a bit with this to improve on the specification and fill in some gaps. So stick to what is specified and live with the idiosyncracies that this caused, or take the oppertunity to improve, but bend the rules. Sometimes bending the rules will improve them :)

I can't help much in the taxonomy area as this is not my expertise. I appreciate that fitting this in somehow can be a challenge. I just hope that won't lead to throwing out a potentially good idea.

@benfrancis
Copy link
Member

benfrancis commented Jan 15, 2025

@hspaay wrote:

I appreciate that fitting this in somehow can be a challenge. I just hope that won't lead to throwing out a potentially good idea.

I want to keep exploring the idea of a separate operation member because I do agree it could be cleaner. There are just a few areas where I can't quite figure out how it would work. Actions in particular.

The action status should be a TD concern, not a protocol binding concern.

I think you might be right, but unfortunately w3c/wot-thing-description#2068 suggests this might be under-specified in the Thing Description 1.1 specification. There is no way to describe an action status data schema separately from an action output data schema.

The queryaction operation came from trying to model a queue of actions in an HTTP REST API, but it arguably isn't a good design for a WebSocket sub-protocol. Ideally we'd have a way to observe an action rather than poll it via query requests.

I mainly included a queryaction operation in the strawman proposal for completeness, because the stated goal was to support the full set of WoT operations over a single connection, but also because there isn't really a great alternative at the moment.

I don't think we can wait for TD 2.0 to maybe fix the issues with actions, so we would need to figure out how actions work with the current set of operations and whatever message format we end up with.

So you could have an invokeaction request...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "c370da58-69ae-4e83-bb5a-ac6cfb2fed54",
  "operation": "invokeaction",
  "messageType": "request",
  "name": "fade",
  "input": {
    "level": 50,
    "duration": 1000
  },
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...with a response that just acknowledges the request was received...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "74173be7-1c85-419e-b671-297f07629d76",
  "operation": "invokeaction",
  "messageType": "response",
  "name": "fade",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...then a query request...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "7004bbbc-c34e-4be0-9e41-de30c4f9bd78",
  "operation": "queryaction",
  "messageType": "request",
  "name": "fade",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...which responds with a status and output...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "38512928-b41e-4677-abc4-0aa42655af63",
  "operation": "queryaction",
  "messageType": "response",
  "name": "fade",
  "status": {...},
  "output": {...},
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...then the action could be cancelled...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "03ebd580-3144-4ca8-9a7e-ec234de1d14f",
  "operation": "cancelaction",
  "messageType": "request",
  "name": "fade",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...with an acknowledgement in response, with no payload...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "8b95278d-7714-46e9-8137-4ddf1d04182d",
  "operation": "cancelaction",
  "messageType": "response",
  "name": "fade",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

However:

  • Does the response to a queryaction operation have both a status and an output? Where does the data schema for the status come from if not specified in the sub-protocol?
  • Do the queryaction and cancelaction operations share a correlationID with the original invokeaction operation? Or if not, do we now need an actionID to correlate them?
  • Do I have to keep polling the action until it is finished?

In the Strawman proposal the idea was that a Consumer sends an invokeAction message to a Thing and then gets an an actionStatus message pushed to it every time the status of the action changes. We could automatically send notification messages to a Consumer which sends an invokeaction request (which is stretching the WoT operation model a little), but the data schema of those messages still needs to come from somewhere. And does the output come in a response or a notification?

@benfrancis
Copy link
Member

What if we just throw out the idea of an action status (which was kind of invented in WoT Profiles) entirely and the response to a queryaction request just contained an output if there is one, or no payload if the action hasn't finished yet.

To replace the various states of the status member I borrowed from the HTTP Basic Profile:

  • pending - You know the action is pending once you receive a response to the initial invokeaction request
  • running - There is no separate concept of pending vs. running any more (again this concept was invented by WoT Profiles but might not even make sense for all use cases)
  • completed - You know an action is completed once an output appears in the response to a queryaction request. How about actions which don't have an output, would an output with a value of null be OK..?
  • failed - You know an action has failed if you receive an error message

We could also specify invokeaction notification messages which automatically sends (containing the output) when the action completes, so you don't have to keep polling it.

So you would send an invokeaction request

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "c370da58-69ae-4e83-bb5a-ac6cfb2fed54",
  "operation": "invokeaction",
  "messageType": "request",
  "name": "fade",
  "input": {
    "level": 50,
    "duration": 1000
  },
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...with a response that just acknowledges the request was received...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "74173be7-1c85-419e-b671-297f07629d76",
  "operation": "invokeaction",
  "messageType": "response",
  "name": "fade",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...then a notification containing an output when the action is complete...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "de378e5b-c55f-495d-92c9-c2855296b41c",
  "operation": "invokeaction",
  "messageType": "notification",
  "name": "fade",
  "output": {...}
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...You can manually query request (e.g. if the connection dropped)...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "7004bbbc-c34e-4be0-9e41-de30c4f9bd78",
  "operation": "queryaction",
  "messageType": "request",
  "name": "fade",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...which responds with either no output or an output depending on whether the action is complete...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "38512928-b41e-4677-abc4-0aa42655af63",
  "operation": "queryaction",
  "messageType": "response",
  "name": "fade",
  "output": {...},
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...The action could be cancelled...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "03ebd580-3144-4ca8-9a7e-ec234de1d14f",
  "operation": "cancelaction",
  "messageType": "request",
  "name": "fade",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

...with an acknowledgement in response, with no payload...

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageID": "8b95278d-7714-46e9-8137-4ddf1d04182d",
  "operation": "cancelaction",
  "messageType": "response",
  "name": "fade",
  "correlationID": "5afb752f-8be0-4a3c-8108-1327a6009cbd"
}

If the action fails you would get an error notification (or an error response in response to a queryaction request).

We might need to add in some timestamps to indicate when the action was requested and when it completed/failed, as part of the message payload.

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 15, 2025

This is the response to your first post above. Will response separately to your second post. (Both have excellent considerations!)

I think you might be right, but unfortunately w3c/wot-thing-description#2068 suggests this might be under-specified in the Thing Description 1.1 specification. There is no way to describe an action status data schema separately from an action output data schema.

This is an interesting point and I agree. It shows that action status belongs in the application domain and the transport protocol shouldn't mess around with it too much.

The queryaction operation came from trying to model a queue of actions in an HTTP REST API, but it arguably isn't a good design for a WebSocket sub-protocol. Ideally we'd have a way to w3c/wot-thing-description#1775 rather than poll it via query requests.

Maybe good to take a step back and ask what are we trying to accomplish (use-cases) with queryaction. Hiveot's use-case is the UI where the last action can be shown to the user. In this case the requested input, current status and output (if completed) are presented. So this is the information I'm looking for in queryaction. With that said, queryaction is handled by the hub service that handles the action flow and knows about actions and responses. So it is not a requirement of a Thing response.

I mainly included a queryaction operation in the strawman proposal for completeness, because the stated goal was to support the full set of WoT operations over a single connection, but also because there isn't really a great alternative at the moment.

I find the operation itself quite useful but was struggling with yaas (yet another action status ;) record. You had an interesting proposal down below which I'll respond to.

So you could have an invokeaction request...
...
with a response that just acknowledges the request was received...

This is also how it goes in hiveot. Action requests are acknowledged with a 'pending' response as the hub forwards it to the device, whose response is passed back to the consumer asynchronously. (unless it is handled directly on the hub as in readproperty but that is another topic)
The only difference with your example is that hiveot includes a 'status' field, which is used to determine if the response is failed, running or completed. So it is possible to send a stream of responses if the application desires such a thing. (will comment more on your follow post).

...then a query request...
...which responds with a status and output...

This is where we differ in approach. The downside here is that you need to invoke an extra query to get the response, instead of including the response in ... well, the original response.

...then the action could be cancelled...
...with an acknowledgement in response, with no payload...

Yes, the correlation ID identifies the action to be cancelled.

Does the response to a queryaction operation have both a status and an output? Where does the data schema for the status come from if not specified in the sub-protocol?

Excellent question :)
Option 1: status is just a field as described in http basic and was adopted in the strawman proposal. I see no problem with it. Maybe this is one that can go in TD-2.0 in future.
Option 2: we only need to know if the action is still ongoing, failed or completed. Are there other ways to convey this? (you already proposed this, so will comment on that).

Do the queryaction and cancelaction operations share a correlationID with the original invokeaction operation? Or if not, do we now need an actionID to correlate them?

Please not another ID :) I don't think this overloads the use of correlationID as it relates directly to the original request. So yes this is a good way of linking cancelaction to invokeaction.

Do I have to keep polling the action until it is finished?
In the Strawman proposal the idea was that a Consumer sends an invokeAction message to a Thing and then gets an an actionStatus message pushed to it every time the status of the action changes

I prefer not to do any polling. It isn't neccessary as the Thing can just keep sending responses instead as the strawman proposal describes. To me it is intuitive. You ask for an action and keep getting informed on the progress until completion. Nothing else is needed. Clean and simple.

We could automatically send notification messages to a Consumer

It is an option but I rather use responses.
In this proposal, actions are 1-on-1 interaction with the consumer, while notifications (observe, subscribe) are a 1-to-many interaction. Anyone can observe and receive notifications. Using notifications for progress updates however would make that distinction fuzzy with no benefit (that I can tell).

It does raise the question, should a consumer be able to observe other consumer's actions? The answer to that is IMHO no. State changes can be observed through properties if needed. (I have come around to your point from a while back that many actions can just be properties, but that is a separate topic)

but the data schema of those messages still needs to come from somewhere. And does the output come in a response or a notification?

The output comes in a response. The schema of that output is that of the Action output in the TD. No need the specify an additional message for the response. (which would be a yaas :)) Do you see any problems in this approach?

Sending responses until completion is what hiveot currently does. I am working on an update to the UI in hiveot to handle 'action progress' this way. I like the approach. So far it works pretty smooth.
For example, a dialog to invoke an action can stay open and display progress until a completed response is received, or display an error if a 'failed' response is received. It only needs to handle the responses and doesn't have to do anything else. For the UI responses are always asynchronous. The http protocol binding converts a direct http result into an async response so the application doesn't have to deal with it.

Very interesting discussion, now on to your next iteration ... :)

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 15, 2025

What if we just throw out the idea of an action status ...

LOL, I love throwing things out and clean house. Too often problems are solved by adding complexity (I'm guilty). I'll try to follow along...

...the response to a queryaction request just contained an output if there is one, or no payload if the action hasn't finished yet...

So the client implementation needs to determine if the output field is missing in the response. Eg a response without output field is always just an acknowledgement that it was received, eg 'pending'. Yeah that could work. I'm slightly uncomfortable with using lack of data as an explicit status but can't really think of an argument against it.

Is it important that other consumers can obtain the complete action status, eg including the input of the action and timestamps? Right now I'm assuming this is out of scope.

One concern though, how would a data stream be returned? Eg, multiple responses when a single response would be too large or too slow?

To replace the various states of the status member I borrowed from the HTTP Basic Profile: ...

Hmm yes that could work.
Wrt empty output: "Json null value represents an empty or missing node in a Json document. It can be thought of as a placeholder representing an absence of a value or not applicable in a certain field. In a technical sense, null is an object data type, which means that it can be used to designate a lack of representation or type."

We could also specify invokeaction notification messages which automatically sends (containing the output) when the action completes, so you don't have to keep polling it.

This is trading the status field for a notification with the invokeaction operation, and alternatively an error message with the same correlationID if things go bad. I'm not convinced this is an improvement, but lets continue.

One concern: Action completion should be a response IMHO as this is the primary purpose of a response, to carry the response. The progress updates can be sent as notifications however. That would still be in line with your idea here. If a consumer is only interested in the result, not its progress, then it would just wait for the response.

...You can manually query request (e.g. if the connection dropped)...
...which responds with either no output or an output depending on whether the action is complete...

Yes, this is consistent with the above (using progress as notifications)

Question: What does the queryaction correllationID indicate here. Is it the correlation ID of the action queried or is this considered its own request?
If it is its own request then how to know if the response is that of the same request vs a new request made by another consumer. Does this actually matter?
That probably depends on the requirements of 'queryaction', which I haven't seen.
If we assume that queryaction only applies to the latest action then your idea works fine.

...The action could be cancelled...

Hmm, same issue as above. Assuming that actions are not concurrent, then any consumer can cancel the latest running action, regardless of who initiated it. I don't see a problem, just want to point out the implication. If I recall this is the same constraint as with the strawman proposal.

We might need to add in some timestamps to indicate when the action was requested and when it completed/failed, as part of the message payload.

Maybe just an 'updated' timestamp in both response and notification message. That carries all information needed. The action request has its own timestamp. We're not tracking the input value here so is it needed to track the 'requested' timestamp?
The consumer that send the request knows when it was requested after all.

Is there a requirement that other consumers can obtain when an action was requested? In that case what about who requested it. It seems relevant to auditing and maybe an admin but not a regular consumer. Not sure if there are use-cases around this.

In summary, this could work. I also don't mind updating the proposal after it has proven to work in hiveot (which is a rather elaborate POC). Shouldn't be too much work.

PS: my main concern is the way a response stream would work. A progress update can be a notification, no problem, but a response stream is part of the output. ... hmm, not sure on this.

@benfrancis
Copy link
Member

benfrancis commented Jan 16, 2025

@hspaay wrote:

This is an interesting point and I agree. It shows that action status belongs in the application domain and the transport protocol shouldn't mess around with it too much.

I'm still not quite clear on the distinction you're drawing here, since the Web Thing Protocol is very much an application layer protocol, not a transport layer protocol. But I can see the argument that the data schema of an action status (if there is one) should be described in a Thing Description rather than be fixed as part of the protocol. Unfortunately that's not currently possible in TD 1.1.

Maybe good to take a step back and ask what are we trying to accomplish (use-cases) with queryaction.

The history of the queryaction operation is that it was invented to try to help describe long running actions in HTTP (and CoAP) protocol bindings, where the action takes longer than the timeout period of an HTTP request. In the HTTP Basic Profile (inspired by Mozilla's legacy Web Thing REST API) this was modelled as an action queue where the status of each ongoing action is represented by its own dynamic resource with a unique URL. A GET request on that URL can be used to poll the status of an ongoing action. The queryallactions operation was also invented to get a list of those ongoing (and recently completed) actions.

None of this is strictly needed in a WebSocket-based protocol which isn't limited by HTTP request timeouts, though it could be useful after a WebSocket gets disconnected, or for a newly connected Consumer to find out about any currently ongoing actions being carried out by a device (e.g. a printer queue).

Hiveot's use-case is the UI where the last action can be shown to the user.

The queryaction operation wasn't really intended for querying historical data, but a side-effect of the polling approach is that it can kind of be used in this way (though the length of time old action statuses are kept for is entirely implementation specific).

The only difference with your example is that hiveot includes a 'status' field, which is used to determine if the response is failed, running or completed. So it is possible to send a stream of responses if the application desires such a thing.

I had actually assumed that each request would only get a single response. A request with multiple responses makes the distinction between responses and notifications even less clear to me.

I prefer not to do any polling. It isn't neccessary as the Thing can just keep sending responses instead as the strawman proposal describes.

I agree.

In this proposal, actions are 1-on-1 interaction with the consumer, while notifications (observe, subscribe) are a 1-to-many interaction.

I'm not sure that's really true. All WoT operations are a 1:1 interaction between one Consumer and one Thing. Two Consumers could subscribe to the same event, but they would have to each receive separate messages because the correlation ID would be different.

I thought the distinction was between 1 request + 1 response (request/response pattern) vs. 1 request + many notifications (publish/subscribe pattern). But now I see that was not your intention.

It does raise the question, should a consumer be able to observe other consumer's actions?

I've never been completely sure about this. There are use cases where it could make sense (e.g. a printer queue), but it's a bit of a security nightmare. I think the simple answer is probably no.

The output comes in a response. The schema of that output is that of the Action output in the TD. No need the specify an additional message for the response. (which would be a yaas :)) Do you see any problems in this approach?

That makes total sense, I just thought that if an invokeaction response message contains the output only once the action is completed, then what message would be used to acknowledge the initial invokeaction request message?

Sending responses until completion is what hiveot currently does.

OK, so I understand you're saying that a single request can have multiple responses, which as I explained above I was not expecting.


@hspaay wrote:

Is it important that other consumers can obtain the complete action status, eg including the input of the action and timestamps? Right now I'm assuming this is out of scope.

I'm not completely sure but I don't imagine Consumers would expect the action input to be included when querying the status of an action. I do think timestamp(s) are important though.

One concern though, how would a data stream be returned? Eg, multiple responses when a single response would be too large or too slow?

Can you provide an example use case for this which wouldn't make more sense as an observed property or event? What do you imagine a stream would look like?

Action completion should be a response IMHO as this is the primary purpose of a response

OK, if a response message is sent when the action is completed, what message is sent to acknowledge the initial invokeaction request? Are there two responses to a single request?

Wrt empty output: "Json null value represents an empty or missing node in a Json document. It can be thought of as a placeholder representing an absence of a value or not applicable in a certain field. In a technical sense, null is an object data type, which means that it can be used to designate a lack of representation or type."

Yeah it seems to work, though I admit I'm not super comfortable with it. In JavaScript terms the default return type of a function is undefined, not null. You could have a function which returns null as a value, which is different from returning undefined. Given JSON comes from JavaScript, using null to mean that an action returned without an output feels a bit wrong.

Action completion should be a response IMHO as this is the primary purpose of a response, to carry the response. The progress updates can be sent as notifications however. That would still be in line with your idea here. If a consumer is only interested in the result, not its progress, then it would just wait for the response.

I feel a bit more comfortable with having multiple notifications of progress and then a single response upon completion, although really you only need a single message to acknowledge the invokeaction request then a single message to provide the output upon completion. Having a request, a single notification and then a single response also feels a bit odd.

What does the queryaction correllationID indicate here. Is it the correlation ID of the action queried or is this considered its own request?

This was really my question to you, but in the absense of an actionID I think it has to be the same correlationID as the invokeaction request.

If we assume that queryaction only applies to the latest action then your idea works fine.

It doesn't. There can be multiple actions of the same type ongoing in parallel.

Hmm, same issue as above. Assuming that actions are not concurrent, then any consumer can cancel the latest running action, regardless of who initiated it. I don't see a problem, just want to point out the implication. If I recall this is the same constraint as with the strawman proposal.

A Consumer can only cancel an action if it has the correlation ID which identifies the original invokeaction request. Depending on the application implementations could add additional security around this where a Consumer can only query and cancel actions invoked using the same credentials.

Maybe just an 'updated' timestamp in both response and notification message. That carries all information needed. The action request has its own timestamp. We're not tracking the input value here so is it needed to track the 'requested' timestamp? The consumer that send the request knows when it was requested after all.

I'm not sure. As a minimum I think a Consumer would want to know when an action was completed. It might also want to know when it was invoked, but as you say the Consumer that invoked the action technically already knows this. The HTTP Basic Profile includes both timeRequested and timeEnded (either completed or failed) in an ActionStatus.

Is there a requirement that other consumers can obtain when an action was requested? In that case what about who requested it. It seems relevant to auditing and maybe an admin but not a regular consumer. Not sure if there are use-cases around this.

I think we should avoid including any kind of user identifier or credentials in an action status in the protocol, the Thing might only have been sent a Bearer token when opening the WebSocket and we don't want to have to start decoding JWTs etc.


My own conclusion is that this kind of works but:

  1. I don't love relying on a null output to indicate that an output-less action has completed
  2. We haven't even discussed how queryallactions would work, and that's potentially even trickier
  3. The distinction between "response" and "notification" is even less clear than I thought it was <-- This is my biggest concern

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 16, 2025

Thanks for the clarifications @benfrancis. Good feedback.

The history of the queryaction operation ...
Ahh okay, I wasnt clear on this. It does change my thinking around this. So the intent is to read (long) running actions and there can be multiple concurrent actions. I need to let this sink in for a bit.

I had actually assumed that each request would only get a single response...

Ah I see. That explains some of the confusion.
Each request can have multiple responses while the status progresses until the last response which is either completed or failed. This is intentional to provide progress feedback and to support streamed response data.

I'm not sure that's really true. All WoT operations are a 1:1 interaction between one Consumer and one Thing. Two Consumers could subscribe to the same event, but they would have to each receive separate messages because the correlation ID would be different.

You are right. My 'explanation' was a bit loose and fast. Please disregard this comment. Where this came from is that in a hub setup, the Thing publishes a notification without knowledge of the consumers, eg a 1:many. The hub server handles the subscription of the consumer. The Thing is unaware of this. In a request-response, the response is targeted to the consumer.

With that said, you can argue that a notification is simply a response to a subscription request. ... and that is probably true ... I need to think about this is a bit more ☕... this implies that notifications can be replaced with responses. It does however raise the question how do agents publish event and property updates. While not supported in WoT I would like to be able to support this in this proposal.

I thought the distinction was between 1 request + 1 response (request/response pattern) vs. 1 request + many notifications (publish/subscribe pattern). But now I see that was not your intention.

It does raise the question, should a consumer be able to observe other consumer's actions?
I've never been completely sure about this. ... I think the simple answer is probably no.

Yes agreed this is a can of worms.
With that said, I think it must be possible for the Thing to know who the sender of an action request is. For example, a state store service that stores consumer state blob. When connected through a hub or gateway, the auth info of the connection isn't available. Although I realize this might be out of scope for WoT as it assumes a direct connection between a consumer and Thing.

Wrt empty output
Yeah it seems to work, though I admit I'm not super comfortable with it. In

I was just going along with the thinking to try it out. I agree this is not a good approach. I'd rather have a status field instead that explicitly says that the request is complete.

If we assume that queryaction only applies to the latest action then your idea works fine.
It doesn't. There can be multiple actions of the same type ongoing in parallel.

Okay, thanks for clarifying.

My own conclusion is that this kind of works but:
I don't love relying on a null output to indicate that an output-less action has completed

Agreed.

We haven't even discussed how queryallactions would work, and that's potentially even trickier

Yep, that still needs to be addressed

The distinction between "response" and "notification" is even less clear than I thought it was <-- This is my biggest concern

I understand now.

This has been very helpful Ben, I'll keep chugging away at this. I'll consider the following improvements:

  1. Remove NotificationMessage and just use ResponseMessage.
    For property updates and events, the correlationID is that of the subscription/observe request. They are considered long running requests.
  2. Clarify that multiple responses can be received until the status is completed or failed.
  3. Keep the status field to support progress and streams (status is running)
  4. Propose a solution for queryaction and queryallactions operations. Maybe just respond with the last action ResponseMessage and an array of these for queryallactions.

@hspaay
Copy link
Collaborator Author

hspaay commented Jan 31, 2025

Some findings after further experimentation:

  1. I found that removing NotificationMessage does not cause any issues. With everything being request-response there is no more confusion between notifications and responses.
  2. I can also confirm that the approach to have responses contain the request operation does not cause any ambiguity and avoids the need to add more constants as in the strawman proposal.
  3. For querying actions, I found myself turning back in the direction of the http-basic and strawman proposal. An ActionStatus type is needed in a response to queryaction and queryallactions. It isn't needed in the response to invokeaction as the response contains the information already.
    I do wonder about the use-case for having concurrent actions of the same name. For stateful actions this seems contradictionary. For safe (stateful) actions it seems unnecesary ??. The http-basic protocol returns only a single action for queryaction but a map with an array of actions for queryallactions, which is inconsistent. I wonder if that was intentional.
  4. For queryallactions, simply returning a map of [name][]ActionStatus works well.
  5. Similarly, for reading properties, a ThingValue type (PropertyReading in strawman) is needed to hold the result of readproperty. Http-basic just returns the value but this would omit the 'updated' timestamp and the unique ID (messageID) of the value.
  6. For readallproperties, returning a map of [name]ThingValue also works well.
  7. HiveOT also adds readevent/readallevents operations which also returns ThingValue objects.
  8. This still meets the goal of preventing creating a different message type per operation.

In summary (from the hiveot repo):

// ResponseMessage serves to notify a client of the result of a request.
//
// The Output field contains the message response data as defined by the operation
// Action related response output:
//   - invokeaction             action output as per TD, when status==completed
//   - queryaction              []ActionStatus object array
//   - queryallactions          map [name][]ActionStatus objects
//
// Property related response output:
//   - observeproperty          property value as per TD, when status==running
//   - observeallproperties     map[name]value  (multiple updates) 
//   - readproperty             ThingValue object
//   - readallproperties        map[name]ThingValue objects
//
// Event related response output
//   - subscribeevent           event value as per TD, when status==running
//   - readevent                ThingValue object
//   - readallevents            map[name]ThingValue objects
type ResponseMessage struct {...}

Additional notes:

  • The ActionStatus type is similar to ResponseMessage and includes the action input value.
  • The ThingValue object holds the messageID, name, output, thingID and update timestamp of the event or property.
  • ActionStatus and ThingValue types contain the messageID to uniquely identify the action, event and property values.

(as an aside, In HiveOT these types are used as the application level protocol elements that are mapped to the corresponding underlying transport protocol, providing a uniform API to the application. HiveOT has a second websocket protocol binding that simply passes the ResponseMessage without the need to map them to a type per operation.)

I hope this addresses your remaining questions @benfrancis

@benfrancis
Copy link
Member

An ActionStatus type is needed in a response to queryaction and queryallactions. It isn't needed in the response to invokeaction as the response contains the information already.

I think this could benefit from some examples.

So in your proposal the output member of an invokeaction ResponseMessage may directly contain the action output (or error information), the output member of a queryaction ResponseMessage would contain the action output wrapped in an array of ActionStatus objects, and the output member of a queryallactions ResponseMessage would contain action outputs wrapped in a map of arrays of ActionStatuses? This is a good example of where you're giving the member the same name for "consistency" but it actually contains completely different information.

I assume the "status" of a ResponseMessage for an invokeaction operation is assumed to be "completed"? And if the action failed the output of a ResponseMessage contains an error instead of an output? Does the response include timeRequested and timeEnded?

I do wonder about the use-case for having concurrent actions of the same name.

Examples of an action queue:

  1. A print action which adds a job to a printer queue on a thermal printer
  2. A move action which can queue a series of movements for a robot arm to carry out
  3. A sequence action that can display sequences of flashing lights one after another
  4. A message action which queues a message to scroll across an LCD screen

I'm sure there are variations on these use cases where multiple actions can happen concurrently, e.g. in manufacturing.

The http-basic protocol returns only a single action for queryaction but a map with an array of actions for queryallactions, which is inconsistent. I wonder if that was intentional.

It's complicated.

In Mozilla's legacy Web Thing REST API there are actually three separate types of resources:

During TD standardisation I wasn't able to convince people that querying the pending action requests for one action and querying the pending action requests for all actions were separate operations, so we ended up with just queryaction (which returns a single ActionStatus object) and queryallactions (which returns a map of arrays of ActionStatus objects).

For queryallactions, simply returning a map of [name][]ActionStatus works well.

This is consistent with the HTTP Basic Profile, which I like (the alternative being to just have one array with a name in each ActionStatus). Can you give an example of what this kind of message looks like? Is there a correlationID per ActionStatus (i.e. multiple correlationIDs in one message)?

This is something I hadn't worked out in the strawman proposal, because it's a bit awkward to fit into the message format. With the strawman approach it would probably have to look something like:

{
  "thingID": "https://mythingserver.com/things/mylamp1",
  "messageType": "actionStatuses",
  "messageID": "123e4567-e89b-12d3-a456-426655",
  "statuses": {
    "fade": [
      {
        "status": "pending",
        "correlationID": "542e4567-e89b-12d3-a456-631968",
        "timeRequested": "2024-11-11T11:43:20.135Z"
      },
      {
        "status": "completed",
        "correlationID": "321e4567-e89b-12d3-a456-531531",
        "timeRequested": "2024-11-10T11:43:20.135Z",
        "timeEnded": "2024-11-10T11:43:25.135Z",
        "output": "..."
      }
    ]
  }	 
}

Similarly, for reading properties, a ThingValue type (PropertyReading in strawman) is needed to hold the result of readproperty. Http-basic just returns the value but this would omit the 'updated' timestamp and the unique ID (messageID) of the value.

I'm not convinced this is simpler than just having a value member and separate timestamp members at the top level of the message.

For readallproperties, returning a map of [name]ThingValue also works well.

Again, is a map of ThingValue objects really simpler than a map of values? The only argument I can see is that in the strawman proposal it isn't possible to provide a different "last updated" timestamp for each property in a propertyReadings message.

The ThingValue object holds the messageID

Does that mean there could be multiple messageIDs per message in a readallproperties operation?

ActionStatus and ThingValue types contain the messageID to uniquely identify the action, event and property values

Do you mean messageID, or correlationID?


In conclusion, in theory I still like the idea of splitting out operation into a separate member, if we can make it work. Combining responses and notifications is maybe a bit better, though just overloads the ResponseMessage even more. Having a single output member of a ResponseMessage which can have a completely different format and meaning depending on the operation doesn't seem like a good design to me. If the member has a different purpose I think it's better to just give it a different name.

I'm still not quite convinced this is a better design than just having separate message types, but I'm open to other opinions.

@hspaay
Copy link
Collaborator Author

hspaay commented Feb 7, 2025

An ActionStatus type is needed in a response to queryaction and queryallactions. It isn't needed in the response to invokeaction
I think this could benefit from some examples.

ResponseMessage to InvokeAction

{
   "operation": "invokeaction",
   "output": {/*output as defined in the action affordance*/}
   "status": "completed",
   "correlationID": "12345",
}

ResponseMessage of queryaction

{
   "operation": "queryaction",
   "name": "fade",
   "status": "completed",  // status of the response to queryaction
   "output": [    // Array of ActionStatus objects
     {
          "status": "pending",
          "correlationID": "542e4567-e89b-12d3-a456-631968",
          "timeRequested": "2024-11-11T11:43:20.135Z"
        },
        {
          "status": "completed",
          "correlationID": "321e4567-e89b-12d3-a456-531531",
          "timeRequested": "2024-11-10T11:43:20.135Z",
          "timeEnded": "2024-11-10T11:43:25.135Z",
          "output": "..."
        }
      ],
}

ResponseMessage to queryallactions

{
   "operation": "queryallactions",
   "status": "completed",  // status of the response to queryaction
   "output": {   // map of array of action status 
      "fade": [ {
          "status": "pending",
          "correlationID": "542e4567-e89b-12d3-a456-631968",
          "timeRequested": "2024-11-11T11:43:20.135Z"
        },
        {
          "status": "completed",
          "correlationID": "321e4567-e89b-12d3-a456-531531",
          "timeRequested": "2024-11-10T11:43:20.135Z",
          "timeEnded": "2024-11-10T11:43:25.135Z",
          "output": "..."
        }
      ],      
   }
}

This is a good example of where you're giving the member the same name for "consistency" but it actually contains completely different information.

Hmm, I don't think so. The content of the output field is determined by the operation it is a response to. The operation of invokeaction has its output defined in the ActionAffordance, so this is returned in the output field of the response. The response of queryaction is defined as an array of ActionStatus objects, hence the output field contains the array of ActionStatus objects. I think this is quite logical :)

I suspect that the confusion comes from the fact that queryaction is actually an operation handled by an unspecified service. There is no requirement that it is handled by the same service that handles the action itself. This approach makes no such assumption.

I assume the "status" of a ResponseMessage for an invokeaction operation is assumed to be "completed"? And if the action failed the output of a ResponseMessage contains an error instead of an output? Does the response include timeRequested and timeEnded?

Kinda yes, the status in ResponseMessage for invokeaction can be completed, or running, or pending or failed. Only when status is completed its output contains the action output as per its action affordance. As you mentioned, if the status is failed then the Error field in the ResponseMessage contains the error and output is empty.

The response contains the 'updated' timestamp which is the time the status was updated. There is no need to include the timeRequested as the caller did the request and knows the time.

The queryaction operation is defined as returning the ActionStatus, so .. it returns ActionStatus objects. The ActionStatus object does include both timeRequest and timeEnded as the audience has no way of knowing the request time otherwise.

So this approach treats invoking an action as completely separate from the queryaction (and queryallactions) request. ActionStatus belongs to the latter.

Examples of an action queue:

Very good examples. Thank you.

It's complicated
During TD standardisation I wasn't able to convince people that querying the pending action requests for one action and querying the pending action requests for all actions were separate operations,

Thanks for clarifying. For what its worth, I think you were right.

For queryallactions, simply returning a map of [name][]ActionStatus works well.
This is consistent with the HTTP Basic Profile, which I like (the alternative being to just have one array with a name in each ActionStatus). Can you give an example of what this kind of message looks like?

Yes indeed. As the above example (hopefully) shows, each ActionStatus contains the correlationID of the action request, which can be used to cancel the action if it is still running.

Similarly, for reading properties, a ThingValue type (PropertyReading in strawman) is needed to hold the result of readproperty. Http-basic just returns the value but this would omit the 'updated' timestamp and the unique ID (messageID) of the value.
I'm not convinced this is simpler than just having a value member and separate timestamp members at the top level of the message.

Mmmm, I don't follow. Are you saying that the ResponseMessage (or PropertyReading in strawman) can have the value in the output field instead of a ThingValue wrapper object?

I started out that way but ran into the problem that I think the timestamp of the value is important:

  1. The ResponseMessage is the response to the readproperty request, not the response to the update of the value. So the timestamp is different.
  2. ReadAllProperties would require some kind of envelope for each property value returned to hold the timestamp.

Another consideration is that just like queryaction, readproperty and readallproperties are an operation separate from observeproperty. Observerproperty can return the observed value in the ResponseMessage as that is what the response is about. Readproperty however is an operation whose response contains the requested data, which happens to be a property value and corresponding timestamp. That 'ThingValue' object is therefore the response.

The ThingValue object holds the messageID
Does that mean there could be multiple messageIDs per message in a readallproperties operation?

Yes, readallproperties returns a map of properties containing a ThingValue object that has the correlationID (a similar approach as queryallactions).

ActionStatus and ThingValue types contain the messageID to uniquely identify the action, event and property values
Do you mean messageID, or correlationID?

Yes you're right, ActionStatus holds the correlationID of the invokeaction request.

For readproperty, the ThingValue there is no correlationID. correlationIDs are created for requests and thing value updates are send as responses to subscribers (observers). Each observer has a different correlationID, which is meaningless in the context of readproperty.

In conclusion, in theory I still like the idea of splitting out operation into a separate member, if we can make it work. Combining responses and notifications is maybe a bit better, though just overloads the ResponseMessage even more.

It is not really overloading a response any more. This approach now sees event subscriptions and property observations as long running request, where the response is a stream instead of a single value. I think it fits quite logically once you get to see it from this perspective.

Having a single output member of a ResponseMessage which can have a completely different format and meaning depending on the operation doesn't seem like a good design to me.

Looks like this is the main bastion of resistance. Hahaha :) I don't see it as a problem because the "real output" is the type described in the property dataschema and action outputschema. So in strawman the output type of a message also varies based on that. The only difference is that the field name used to carry the output changes based on the messagetype, which you now have to map differently for each request. This proposal just says that you can always find the output in the 'output' field. Looks like we have to agree to disagree. I can't think of a logical argument that warrants defining all these different request and response message types.

I'm still not quite convinced this is a better design than just having separate message types, but I'm open to other opinions.

Yeah I'd love to hear that as well. As usual, I appreciate your feedback Ben.

@hspaay
Copy link
Collaborator Author

hspaay commented Feb 8, 2025

One thought that I haven't highlighted enough is that a proposal such as strawman works well if you build a Thing or consumer that only supports this protocol. When there is a need to support multiple transport protocols the story changes as you need to find common ground between messages from each protocol.

This proposal is that common ground. A protocol can pass the request/response messages as-is or it can map to equivalent messages of another protocol based on the request operation. This is what hiveot does. The http-basic, strawman websocket, and hiveot websocket transport bindings all provide the API to send RequestMessage and receive ResponseMessage. The hub server, Things, and all consumers only need to implement support for this format.

So rather than embed a protocol dependency in the application, the consumer can implement this application level message format and select a protocol binding as desired, even at runtime, depending on the protocol provided by discovery.

If every single protocol would adopt the strawman message types I'd be happy as a pig in mud too ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants