Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rlm_python3: attribute values that fail utf8 conversion will be returned as bytes, not str #5242

Open
wants to merge 7 commits into
base: v3.2.x
Choose a base branch
from

Conversation

aren
Copy link

@aren aren commented Dec 6, 2023

which will return a byte string if it a string can't be interpreted as UTF8.

Also improve exception handling so that no python exceptions remain after the do_python_single call. Otherwise the next request will immediately fail.

…rn a byte string if it a string can't be interpreted as UTF8.

Also improve exception handling so that no python exceptions remain after the do_python_single call. Otherwise the next request will immediately fail.
@arr2036
Copy link
Member

arr2036 commented Dec 25, 2023

Maybe just do the conversion by default? Don't see the need for a magic knob.

@aren
Copy link
Author

aren commented Dec 26, 2023

I'm fine with that. However, I added the knob because it's a potentially breaking change. Without the knob, suddenly, someone using the python3 module who is expecting every string in authorize() (for example) to be a str is suddenly getting bytes objects too where they didn't before.

@arr2036
Copy link
Member

arr2036 commented Dec 27, 2023

All RADIUS attributes should be UTF8, so it's unlikely they were getting binary data in string attributes anyway.

@aren
Copy link
Author

aren commented Dec 27, 2023

The problematic attribute is "MS-CHAP-Error". It frequently has binary data embedded in it when an MS-CHAP error is encountered.

@alandekok
Copy link
Member

https://datatracker.ietf.org/doc/html/rfc2548#section-2.1.5 says

      This field contains specially formatted ASCII text, which is
      interpreted by the authenticating peer.

So anything putting non-ASCII data in there is likely wrong.

For FreeRADIUS, the rlm_mschap module prints it as ASCII text. See function mschap_error()

@aren
Copy link
Author

aren commented Dec 27, 2023

I'll capture an example and post it in here; it's possible this has exposed another issue elsewhere.

@aren
Copy link
Author

aren commented Dec 29, 2023

Here's the attribute:

Error: Conversion to Unicode failed, returning MS-CHAP-Error as bytes: ?E=691 R=1 C=34991e4639c657a2c4870ec5fce94731 V=3 M=Authentication rejected

No obvious non-ascii chars. This is formatted with the vp_prints_value() function (with arguments vp_prints_value(buf, sizeof(buf), vp, '\0'))

vp_prints_value() uses fr_prints() which says it will Escape any non printable or non-UTF8 characters in the input string so I'm confused as to why Python is getting (or thinks it's getting) any non-UTF8 to begin with.

Is there an easy way to dump the attribute in hex with just the vp pointer? (if I had access to the request object it looks like I could call rad_print_hex on request->packet but it's not easily available in this context.

@alandekok
Copy link
Member

@aren The key is:

returning MS-CHAP-Error as bytes: ?E=691 R=1 C=34991e4639c657a2c4870ec5fce94731 V=3 M=Authentication rejected

The first ? is really a non-ASCII character. It's the one octet "ident" which is used to match requests and responses.

So the MS-CHAP-Error attributes is partly binary:

https://www.rfc-editor.org/rfc/rfc2548#section-2.1.5

  ...

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Vendor-Type  | Vendor-Length |     Ident     |    String...
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

My take is to skip the flag, and automatically convert the attribute to bytes if it isn't UTF-8.

The issue is that while the RFCs say things like User-Name is UTF-8, there is no way to enforce that. So clients can send any random garbage in any attribute of type string. And the server just has to live with it.

Since the previous behavior was to fail conversion, I think it's safe to just convert the bad data to bytes.

The only issue is that when it's converted back from Python to VALUE_PAIR, it has to check the destination data type. An attribute of type string can have arbitrary binary data added to it, and the length must not depend on doing strlen(data). Instead, the string length is the length of the Python data, plus one for a trailing zero.

alandekok added a commit that referenced this pull request Jan 5, 2024
so that no python exceptions remain after the do_python_single call.
Otherwise the next request will immediately fail.

Patch from #5242, but separated out to keep commit history
a little clearer.
@vo-va
Copy link

vo-va commented Jan 10, 2024

Hi, I have something similar in the logs. Sometimes, I see this sometimes

Mon Sep 18 10:01:11 2023 : Error: python_error_log:209, Exception type: <class 'UnicodeDecodeError'>, Exception value: 'utf-8' codec can't decode byte 0xdd in position 1: invalid continuation byte
Mon Sep 18 10:01:11 2023 : Error: do_python_single:496, authorize - mod_populate_vps failed
Mon Sep 18 10:01:11 2023 : Error: do_python_single:675, authorize - RLM_MODULE_FAIL
Mon Sep 18 10:01:11 2023 : Error: mod_populate_vptuple:402, vp->da->name: User-Password

I think if I understand it correctly I can't do anything about it, i.e. users need to send the password or other field in correct encoding to not have such errors in the logs. I saw this issue, and just want to confirm my understanding of it. Am I right or I am missing something?

@alandekok
Copy link
Member

@vo-va If it's complaining about the User-Password containing invalid data, then the shared secret is wrong. When you run the server in debug mode it will give you a large error message saying this.

alandekok added a commit that referenced this pull request Feb 16, 2024
so that no python exceptions remain after the do_python_single call.
Otherwise the next request will immediately fail.

Patch from #5242, but separated out to keep commit history
a little clearer.
@aren
Copy link
Author

aren commented Feb 21, 2024

The only issue is that when it's converted back from Python to VALUE_PAIR, it has to check the destination data type. An attribute of type string can have arbitrary binary data added to it, and the length must not depend on doing strlen(data). Instead, the string length is the length of the Python data, plus one for a trailing zero.

@alandekok circling back to this PR so I can get it closed out. Does another module (perl?) do this, so I can use it as an example?

@alandekok
Copy link
Member

@aren You can just use the function fr_pair_value_bstrncpy(). It takes the parent VP, a pointer to data, and a length of data. It will produce the correctly formatted VP.

aren added 2 commits March 7, 2024 22:19
… fr_pair_value_bstrncpy and/or fr_pair_value_from_str. Also silence some warnings.
@aren
Copy link
Author

aren commented Mar 8, 2024

@alandekok fr_pair_value_bstrncpy() didn't work on its own, but when I mimicked what rlm_perl does (by using fr_pair_value_from_str() in the non-string case) it seems to work. Is this what you had in mind? Feedback welcome.

@alandekok
Copy link
Member

Why doesn't fr_pair_value_bstrncpy() work? if you pass it a string + length, it will copy all of the data into the pair.

@aren
Copy link
Author

aren commented Mar 8, 2024

fr_pair_value_bstrncpy() doesn't work with if vps type is something other than PW_TYPE_STRING.

For example, this assignment: config:Auth-Type=Accept which is of type integer in dictionary.freeradius.internal (and probably translates to PW_TYPE_INTEGER)

Should it handle this case? I can dive deeper if so. But it's a special case in rlm_perl so I assumed fr_pair_value_bstrncpy() could only handle PW_TYPE_STRING types.

@alandekok
Copy link
Member

The fr_pair_value_bstrncpy() function is only for assigning string values to a string type. It doesn't parse the a string into a data type like integer.

@aren
Copy link
Author

aren commented Mar 10, 2024

Ok great, that's what I figured. Then I think this PR does the right thing.

@aren aren changed the title Add rlm_python3 config parameter 'utf8_fail_as_bytes' … rlm_python3: attribute values that fail utf8 conversion will be returned as bytes, not str Sep 13, 2024
@aren
Copy link
Author

aren commented Sep 13, 2024

Anything I can do to help get this PR over the finish line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants