Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Well-know Media Types content paremeters #161

Closed
wants to merge 1 commit into from

Conversation

relu91
Copy link
Member

@relu91 relu91 commented Jun 8, 2022

In implementing the MODBUS protocol binding for node-wot, we found the need to define a Media Type parameter for indicating the endianness of a payload encoded with application/octet-stream. Since the parameter might also be used in other protocols, this PR updates the main document with a section dedicated to a set of well-known parameters used in WoT.

Consider it as a first draft of the section as I would like to discuss our options to describe these parameters.


Preview | Diff

@netlify
Copy link

netlify bot commented Jun 8, 2022

Deploy Preview for wot-binding-templates ready!

Name Link
🔨 Latest commit c610eba
🔍 Latest deploy log https://app.netlify.com/sites/wot-binding-templates/deploys/62a06ea46fcbb80008fb6a34
😎 Deploy Preview https://deploy-preview-161--wot-binding-templates.netlify.app/
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@egekorkan
Copy link
Contributor

I think that the byteSeq contribution is fine but the other introductory content should actually go to Arch spec. There is already the following section there that almost has the same https://w3c.github.io/wot-architecture/#media-types

@relu91
Copy link
Member Author

relu91 commented Jun 9, 2022

I think that the byteSeq contribution is fine but the other introductory content should actually go to Arch spec. There is already the following section there that almost has the same https://w3c.github.io/wot-architecture/#media-types

Should I solve this issue here, or can we refactor it later, perhaps?

@egekorkan
Copy link
Contributor

I think that the byteSeq contribution is fine but the other introductory content should actually go to Arch spec. There is already the following section there that almost has the same https://w3c.github.io/wot-architecture/#media-types

Should I solve this issue here, or can we refactor it later, perhaps?

It would be better to not delay it but I would be open to commenting it and putting a TODO label in the html.

@relu91
Copy link
Member Author

relu91 commented Jun 27, 2022

TODO: also add an example in MODBUS.

@relu91
Copy link
Member Author

relu91 commented Apr 12, 2023

I rebased this PR; it might still be controversial to add this information to the main document. Furthermore, I noticed that in the content-type section, we refer to types with the + sign as parametrized but I can't find this nomenclature in the RFC6838. The RFC6838 defines as parameters the name-value pair list after ; which I think might confuse the readers.

@egekorkan
Copy link
Contributor

  1. So the use of ; is already specified in the paragraph above. I would remove that in this PR but add RFC references to the paragraph above
  2. I think we need to discuss whether this is applicable to multiple protocols or only to Modbus. If we remove the restrictions of what a protocol can specify, then we might be better? Removing Categories Discussion #281
  3. The use of these parameters need to be reflected in the modbus binding document.

"href": "modbus+tcp://127.0.0.1:60000/1",
"modbus:function": "readCoil",
"op": "readproperty",
"contentType": "application/octet-stream;byteSeq=LITTLE_ENDIAN"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the IANA registration of application/octet-stream, there are two possible parameters that can be used: TYPE and PADDING. There is no parameter byteSeq defined, and IMHO this document is not the right place to update the definition of this media type.

If the data returned when reading this property actually has semantics that are more specific than that of an octet stream, I think it's a sign that you need a media type that is more specific than application/octet-stream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree as it was discussed during the call probably we should find other means to support that usecase.

@egekorkan
Copy link
Contributor

Call of 12.04:

  • @lu-zero : If we add non-standard parameters, some parsers will fail. We should use something like application/x-modbus. You are not sending octet streams but integers. We can also register a new media type like application/modbus and then define parameters there.
  • @relu91 : RFC6838 does not restrict unknown params and parsers should ignore it.
  • @relu91 : In the case of Modbus, the payloads are really bytes so it is weird to create a new contentType
  • Both Cris and Luca: It would be weird to add new parameters that are not in the IANA registration.
  • @lu-zero : We can use another vocabulary term for the endiannes.
  • @ashimura : binary streams is a big topic, we need to evaluate it in general and collect information (use cases, experience).

Question to @relu91 : Does the contentType matter for Modbus? Can you choose other media types?

@sebastiankb
Copy link
Contributor

sebastiankb commented Jul 18, 2023

I do not see any conflict to use byteSeq and/or length as parameter with application/octet-stream. It is true that the IANA entry from the year 1996 does not know these parameters, but it does not say that no more parameters are allowed (which would also comply to RFC6838).

I played around with

const contentTypeParser = require("content-type-parser");
const contentType = contentTypeParser(`application/octet-stream;byteSeq=BIG_ENDIAN;length=4`);
console.log(contentType.toString());
console.log(contentType.get("byteSeq"));
console.log(contentType.get("length"));

and the parameters are passed by the content type parser. In the context of the Web of Things (as defined by the bindings), the parameters can be interpreted correctly and passed to the corresponding place (e.g. protocol driver).

@sebastiankb
Copy link
Contributor

sebastiankb commented Jul 19, 2023

OK, my suggestion does not seem to be well received.

What are alternatives such as discussed here?

  1. Introduce a new media type (e.g., application/x-modbus)
  2. Extend application/octet-stream with new parameters
  3. Introduce vocabulary terms such as modbus:endian and modbus:byteLength in Modbus binding

Option 1 would need a REC, similar we did for the Thing Description and Thing Model media type registration. Bindings, however, defined as W3C Note. Alternative we create a new RFC which also involves a certain amount of effort.
Option 2 seems to be unrealistic, since we have to touch RFC2046

In my opinion, option 3 is the only realistic and simple option then.

What other thoughts are there?

@relu91
Copy link
Member Author

relu91 commented Jul 19, 2023

Some points on the discussion. I believe that in the last call, there was the "misconception" that we are talking about streams of data, but, the RFC simply states that the octet-stream "is used to indicate that a body contains arbitrary binary data". There is no mention of any streaming nature but rather a plain blob of data, as the RFC further explains its possible interpretation by the client.

Another not-clear point is that we were trying to describe some "modbus" specific payload. We are not. In practice, some manufacturers can even transfer application/json data with Modbus (a crazy thing to do but possible). Funny, enough we have this same problem with MQTT, since we can put whatever you want in an MQTT payload there might be cases where the developers decided to encode directly one single raw integer in it.

So probably, what we really need is a content-type capable to indicate that the payload contains a number (e.g., application/number). I remember @lu-zero pointing to some examples, but those were audio specific and I'm not sure they could be applied in other contexts.

About the options that @sebastiankb is suggesting above:

  1. As mentioned above, this would help but only if is not modbus-specific. With the new charter and the new Protocol Binding templates, we could have some room to create a REC for it.
  2. Still believe this is the fastest option and as stated above it will remain syntactically correct. However, I agree that is kind of leaking information about the payload type, and therefore, not really correct for the RFC.
  3. We can do that but we need extra effort in parsers and validators that should now know that they don't only need the content-type to parse correctly the payload but also some extra protocol-specific information. Plus, this would only work for Modbus.

@lu-zero
Copy link
Contributor

lu-zero commented Jul 19, 2023

For CoAP we are providing similar information as vocabulary terms.

In any case the consumer has to be aware of the details of the protocol it uses so all depends on what is in use for content_type consumption/validation and protocol parsing. If everything is mapped to hashmaps I'd argue that the amount of effort is the same, you can setup your deserialiser only once you are done parsing the form and the affordance anyway.

@egekorkan
Copy link
Contributor

We can do that but we need extra effort in parsers and validators that should now know that they don't only need the content-type to parse correctly the payload but also some extra protocol-specific information. Plus, this would only work for Modbus.

Just for further clarification, in implementations like node-wot, the whole form is not passed on to the encoder/decoder but only the contentType . To accommodate Modbus, we would need to change this whole mechanism and move encoder/decoder to each protocol separately (hopefully not but that was the first solution came to mind)

@lu-zero
Copy link
Contributor

lu-zero commented Jul 20, 2023

I guess another part that is potentially interesting is what is transport configuration and what is payload encoding.

In the case of modbus-tcp there is already a quite rich set of vocabulary terms to express what you want to map to, so I'm not sure why endianness and framing should stay in the content type instead of being two more terms.

@relu91
Copy link
Member Author

relu91 commented Jul 20, 2023

In the case of modbus-tcp there is already a quite rich set of vocabulary terms to express what you want to map to, so I'm not sure why endianness and framing should stay in the content type instead of being two more terms.

Because as I stated above, it is not modbus specific 😺 .

Just for further clarification, in implementations like node-wot, the whole form is not passed on to the encoder/decoder but only the contentType . To accommodate Modbus, we would need to change this whole mechanism and move encoder/decoder to each protocol separately (hopefully not but that was the first solution came to mind)

Thank you for clarifying that. I'm too used to node-wot architecture that sometimes I fuse it in the WoT abstract architecture. However, I would say that this kind of processing could be taken as a generic way of handling communication between two endpoints. You have two levels [ protocol message decoding, application payload ]. Keeping the two levels separated helps to share the parsing logic of the application payload across different bindings.

I guess another part that is potentially interesting is what is transport configuration and what is payload encoding.

This is exactly what is happening here. Again, the proper solution is not modbus-specific but rather a generic content-type that can describe the encoding and decoding of a single number / or array of numbers.

I want to add another point from the RFC of octet-stream:

The recommended action for an implementation that receives an
"application/octet-stream" entity is to simply offer to put the data
in a file, with any Content-Transfer-Encoding undone, or perhaps to
use it as input to a user-specified process.

Basically, we are allowed to define our user-specified process to handle this unstructured payload information. Again since we are dealing with IoT protocol, I won't be surprised if, in the short future, we stumble upon another similar use case within other protocols.

@lu-zero
Copy link
Contributor

lu-zero commented Jul 20, 2023

Let me recap today discussion with @relu91 :

Remark : the different fields are encoded in Big-endian.

Data Encoding
MODBUS uses a ‘big-Endian’ representation for addresses and data items. This means that when a numerical quantity larger than a single byte is transmitted, the most significant byte is sent first.
And that is less ambiguous.

  • At least the implementation I found returns Vec<u16> and does that by reading the integers as big endian.
  • I did not read the python impl but it would probably also work this way.

If we want to support non-standard impl that send data as LE or as bytes we'd need to have both a content_type (e.g. text/plain) AND a vocabulary term or two to specify the dialect so the bytes received by the content_type decoder are in the order expected.

On top of this, this protocol is probably also a good candidate for the data mapping activities since apparently some implementations in the wild do bit-stuffing so you'd also need a way to extract information for the data you care from the data that's being exchanged.

@egekorkan
Copy link
Contributor

by the way, @fexpal has pointed me to a Modbus driver in Golang that also cares about word/byte swapping. This means that we have 4 cases in total.

See https://github.com/simonvetter/modbus/blob/master/encoding.go#L49

@relu91
Copy link
Member Author

relu91 commented Aug 3, 2023

To add more to the discussion. I started from text/plain, asking how, in such use cases, different endiannesses were handled. As you know in text/plain the "encoding" is expressed thanks to the charset parameter. With no surprise, you can imagine that for charsets that use more than one byte, the endianness needs to be somehow configured. For this reason, UTF-16 has three variants UTF-16, UTF-16BE, and UTF-16LE. This supports once again the proposal of this PR of having the endianness expressed in the content type. If we introduce yet another form of property, we have to handle also situations like the following:

{
   // ... other fields
   "content-type": 'text/plain;charset=UTF-16LE'
   "endianness"=BE // <--  what does it mean? we should not allow BE here. 
}

So once again, I'm not really sure if any of the two solutions is good.

@egekorkan
Copy link
Contributor

TD Call of 22.11: we will close this PR without merging. We can revisit it later when we know more on how to handle this generically, which can include media (audio-video) streaming or byte streams in general. Modbus will have a modbus-specific solution, at least for now.

@egekorkan egekorkan closed this Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants