Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethernet packet handling calls _unpack_data twice when EtherType > 1500 #671

Open
philipaxer opened this issue Dec 5, 2024 · 5 comments

Comments

@philipaxer
Copy link

Describe the bug
Calling Ethernet(buf) will always internally double call the _unpack_data() chain.
The first call comes from dpkt.Packet.__init__ --> self.unpack --> _unpack_data
The second call then comes later from within dpkt.Ethernet.__init__:

   # if data was given in kwargs, try to unpack it
        if self.data:
            .....

At this point self.data is always set, since dpkt.Packet.__init__ sets it to be an empty byte array

To Reproduce
Add a print into Ethernet's _unpack_data function and call e = Ethernet(buf) with some Ethernet packet. You see it printed twice.

Expected behavior
The decode chain should only be executed once.

Fix
replace if self.data with if kwargs.get('data', None): in ethernet.py

If we agree that this is an issue, I will create a PR

@obormot
Copy link
Collaborator

obormot commented Dec 9, 2024

To Reproduce
Add a print into Ethernet's _unpack_data function and call e = Ethernet(buf) with some Ethernet packet. You see it printed twice.

So far I wasn't able to reproduce this. The print is called once every time. Running cPython 3.10.15 on OSX

In [11]: e = Ethernet(b'foobarbazaaaaa')
unpack data

In [12]: e = Ethernet(data=b'zzzz')
unpack data

In [13]: e = Ethernet(b'foobarbazaaaaa', data=b'zzzz')
unpack data

philipaxer added a commit to philipaxer/dpkt that referenced this issue Dec 10, 2024
@philipaxer
Copy link
Author

Using this annotated code: https://github.com/philipaxer/dpkt/tree/bug/multiple_call_to_unpack

with the following test:

userdata = """08 ff 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C
1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C
2D 2E 2F 30 31 32 33 34 00 01""".strip().replace("\n", "").replace(" ", "")
userdata = bytes(bytearray.fromhex(userdata))

e = Ethernet(userdata)

I get:

unpack
  -> unpack and type > 1500
_unpack_data
__init__
  -> isstr or is bytes
_unpack_data

By the way, it makes a difference if i pass a bytes or bytearray object as argv[0].

@obormot
Copy link
Collaborator

obormot commented Dec 10, 2024

Ok so type > 1500 condition was critical to reproduce it.
This makes the bug description of "Calling Ethernet(buf) will always internally double call the _unpack_data() chain." not accurate.
I agree that calling _unpack_data explicitly from __init__ looks like a bad idea. However the suggested fix (replace if self.data with if kwargs.get('data', None): in ethernet.py) will just mask the issue, since the userdata is not given in kwargs, but in args. And it will still leave the call to _unpack_data within __init__.
Is the given userdata a real-world example? The ethernet parser is pretty intricate and I'd be hesitant to introduce potentially breaking changes unless they are based on real-world evidence.

@philipaxer philipaxer changed the title Ethernet packet handling calls _unpack_data twice. Ethernet packet handling calls _unpack_data twice when EtherType > 1500 Dec 10, 2024
@philipaxer
Copy link
Author

philipaxer commented Dec 10, 2024

Ok so type > 1500 condition was critical to reproduce it.

This is THE common case. Virtually all protocols are of this nature.

unless they are based on real-world evidence.

What evidence do you want? Wireshark a trace on your machine, open the PCAP with dpkt, parse the frames and you will run into this issue. However, this should be mostly a performance issue.

I agree that calling _unpack_data explicitly from init looks like a bad idea. However the suggested fix (replace if self.data with if kwargs.get('data', None): in ethernet.py) will just mask the issue, since the userdata is not given in kwargs, but in args.

I think you are mixing two concepts. data and userdata and how they tie into the API. Also, I disagree to the 'mask the issue' statement. The code block has its justification, from what i can see.
If you call Ethernet(data) it will treat data as the entire Ethernet frame (with DA, SA, ...). If you call Ethernet(data=data) it will treat data as userdata (aka payload).

Looking at dpkt.Packet.__init__ there are two distinct paths.:

  1. Ethernet(data) - calling the thing with at least one positional argument (usecase: raw decode)
  2. Ethernet(data=data) - calling the thing with a keyword argument (usecase: programatic frame assembly)
    In case 2, the actual 'parsing' of userdata is left to the user and not kicked-off by dpkt.Packet. This is why (I think) there is this werid if self.data statement.

However by the way this was put together it accidentally causes double-calls in the Ethernet(data) case.

The ethernet parser is pretty intricate and I'd be hesitant to introduce potentially breaking changes unless they are based on real-world evidence.

I dont see an API documenation, except for a handful of examples. Thus, I cannot conclusivly judge if things break or not. With this, I leave it to you to decide the fate of this issue :)

Unrelated to this, I dislike the inflexible handling of MPLS, VLANs and thus MACsec (see my PR). The tags are hardwired in the Ethernet parser and don't allow me to craft deliberatly broken tagging structures (broken double tagging for example or VLAN-in-clear style frames). Also, I would like to use broken tags to generate test vectors, but cannot do that right now. Thus, I am considering to propose an alternative Ethernet parser that implements this differently and allows more freedom. Would you/the team be open to accept an PR for an EthernetNG implemenation that has a different/breaking tags API? If not, i will probably still implement this as my private fork to float my boat.

@obormot
Copy link
Collaborator

obormot commented Dec 10, 2024

What evidence do you want? Wireshark a trace on your machine, open the PCAP with dpkt, parse the frames and you will run into this issue.

Most unit tests in dpkt are based on real-world examples taken from PCAPs. Take the 1st unit test in ethernet.py for example (test_eth). It's Ethernet II, THE most common, the data buffer is provided via args, and yet _unpack_data is not called twice.
I'm asking for a real-world example of packet data that demonstrates there's a bug in dpkt.

Unrelated to this, I dislike the inflexible handling of MPLS, VLANs and thus MACsec (see my PR).

Off topic here. Feel free to open a separate issue / PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants