Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agi.verbose("string") bug #19

Open
Scinawa opened this issue Nov 18, 2015 · 8 comments
Open

agi.verbose("string") bug #19

Scinawa opened this issue Nov 18, 2015 · 8 comments

Comments

@Scinawa
Copy link
Contributor

Scinawa commented Nov 18, 2015

File "/opt/agi/unknow", line 13, in
agi.verbose("python agi started")
File "/usr/local/lib/python3.4/dist-packages/pyst2-0.4.7-py3.4.egg/asterisk/agi.py", line 606, in verbose
self.execute('VERBOSE', self._quote(message), level)
File "/usr/local/lib/python3.4/dist-packages/pyst2-0.4.7-py3.4.egg/asterisk/agi.py", line 121, in _quote
return ''.join(['"', string.encode('ascii', 'ignore'), '"'])
TypeError: sequence item 1: expected str instance, bytes found

@tuxpowered
Copy link
Contributor

What is the variable (data) you are passing into it?
You need to pass a string to the function. The _quote function will convert int and float to strings automatically, any other objects passed must be of a type that can process string.encode() meaning they need to be a string.

Is it possible you are passing an object reference ?

@eill
Copy link

eill commented Dec 9, 2015

you are trying to join together string and byte objects. Oops, you've just dropped the python3 support.

In [6]: ''.join(['"', 'teststr'.encode('ascii', 'ignore'), '"'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-9e7ac0978072> in <module>()
----> 1 ''.join(['"', 'teststr'.encode('ascii', 'ignore'), '"'])

TypeError: sequence item 1: expected str instance, bytes found

did you ever test your own patches?

@tuxpowered
Copy link
Contributor

Eill, Asterisk AGI works by standard in/out. And this stdin/out is expected to be plain text. The previous code converted everything passed to it to text str(). This had the side effect of if you passed a string that contained string encodings, as shown below, that the data would not be translated

r = '\xc2\xae' # The ® trademark symbol.

Because python can not determine the actual encoding used on the string, this can not be converted. Furthermore it can cause your asterisk application to crash.
The code change assures that only valid 'ascii' text is sent back to asterisk to be processed, thus preventing dialplan errors and crashes.

Enter Python 3.
Unfortunately strings in python 3 are not really strings. they are byte strings, in Python 2 we have str and unicode, and unicode is not the same as byte strings. There is many more articles about the differences documented elsewhere on the web, which is better at explaining them.

The issue comes from python3 when it performs the encoding it returns a byte string, and in python 2 it returns an ascii string.

Finally, if you care to look at the first line of the agi.py file. this module is designed to work with python2 not python3.

I am happy to review alternative methods, if you present them. BUT do keep in mind that things should remain compatible with python2.

As to your question "did you ever test your own patches?", yes, obviously I have. Under the conditions that the script is designed to work, using the correct python interpreter, this functions correctly.
Did you ever run them under the supported interpreter?

Is PyST2 compatible with Python3? No. Not fully. It's getting there.

Can you submit a test condition and code to be reviewed? Absolutely, and I would be happy to run tests against it. Keep in mind the code should remain backward compatible with python2 installations.

@eill
Copy link

eill commented Dec 10, 2015

this module is designed to work with python2 not python3

that's wrong, I've added python3 support a year ago (and the module works with python3 fine). Yes, I forgot to change the shebang, but it does not matter here (you almost never directly execute the agi.py file but import it).

for all, could you tell me what's the purpose of function _quote? Let's see the code (I've omitted the isinstance checks for clarity):

Python 2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> string='®'
>>> def quote(string):
...     return ''.join(['"', string.encode('ascii', 'ignore'), '"'])
... 
>>> quote(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in quote
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

if you want to dispose of exceptions (according to your commit cef87f4 comment) you've done it wrong, because exceptions are still here.

so code needs to be rewritten here.

@eill
Copy link

eill commented Dec 10, 2015

ok, I've read again your comment and I see that you just want not to pass the unicode strings to asterisk. I wish I finally understood your logic and I'll try to rewrite this code as soon as possible.

@tuxpowered
Copy link
Contributor

The purpose of _quote() is to prevent illegal characters are passed back over AGI.

Python 2.7.9 (v2.7.9:648dcafa7e5f, Dec 10 2014, 10:10:46)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> string='®'
>>> escaped_string = string.decode('utf-8')
>>> def _quote(string):
...     return ''.join(['"', string.encode('ascii', 'ignore'), '"'])
...
>>> _quote(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in _quote
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

string here can not be encoded to 'ascii' because the string is encoded. You would need to decode this first.

>>> _quote(escaped_string)
'""'

Passing the decoded string to _quote functions correctly.

>>> string
'\xc2\xae'

Python 2 automatically provides the output of string as the escaped characters. While Python3 does not.

>>> str(string)
'\xc2\xae'
>>>

The old code would have taken executed the above returning escaped characters under Python2 and a utf-8 (or whatever encoded character) under Python3. There is also no .decode() method on the string in python3 which adds to the problem.

Under python2 converting of values such as '®' do not have an ascii equivalent and therefore are ignored. It would be considered reasonable to expect that a user would not want '\xc2\xae' to be the data returned to asterisk.

The problem becomes more interesting when add things like JSON. Take for example the following under python3

>>> import json
>>> string
'®'
>>> j = { 'char': string}
>>> j
{'char': '®'}
>>> json.dumps(j)
'{"char": "\\u00ae"}'

and under python2

>>> string='®'
>>> import json
>>> j = {'char': string}
>>> j
{'char': '\xc2\xae'}
>>> json.dumps(j)
'{"char": "\\u00ae"}'

No shock here as python transforms each as expected. But note what happens when it is stored as JSON. We get yet another encoding. \u00ae

So lets assume for example that your AGI application has made a call to some web service, or maybe a DB that is returning a JSON payload, something that is a rather common task. If the data contains our sample string, it would have been converted, to the cryptic \u00ae value.

The new code change will ignore this char.

>>> json.loads(json.dumps(j))['char'].encode('ascii', 'ignore')
''

While the older code, converting all values to strings would have crashed.

>>> str(json.loads(json.dumps(j))['char'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 0: ordinal not in range(128)

This is what the code change is intended to prevent.

Moving Forward
We can not just convert everything to str() this would error as seen above on python2. while python3 understands it. So we need a better solution, I agree. We want to support both.

Admittedly I did not do a lot of testing on python3, as python2 is still the default on distributions. (I am at least not aware of a python3 as default on any distro).

I think the only way to make this work is to do some additional condition checking.
Perhaps we attempt to convert the value first and capture the exception error, and then try to convert to str() in Python3. In that way it can support python2/3 as it appears with python3 if you force it to str() you can pass .encode() to it.

>>> str(json.loads(json.dumps(j))['char'])
'®'
>>> str(json.loads(json.dumps(j))['char']).encode('ascii', 'ignore')
b''

Okay I hope that gave enough examples and use case to help paint a better picture.
I will likely do some testing this weekend on this as well and welcome your feedback.

@ghost ghost mentioned this issue Nov 2, 2016
@ghost
Copy link

ghost commented Nov 2, 2016

Just wondering if there were any more thoughts on this? Given that Python 3 is almost a decade old now... also this is the subject of several issues and pull requests, eg:

5f37f3e
#28
#30

In my case, I just ended up doing this:

def _quote(self, string):
    joined = '"' + string + '"'
    return joined`

So far, it SEEMS to be OK. What might bite me here?

In fact, I even tried a little test, pushing some weird characters back from pyst2 to Asterisk.
Output looks OK...

<PJSIP/6001-00000009>AGI Tx >> 200 result=1
<PJSIP/6001-00000009>AGI Rx << VERBOSE "ʞsᴉɹǝʇsɐ uᴉ ǝuᴉɟ sʞɹoʍ puɐ uʍop ǝpᴉsdn sᴉ sᴉɥʇ" 1
 chartest.py: ʞsᴉɹǝʇsɐ uᴉ ǝuᴉɟ sʞɹoʍ puɐ uʍop ǝpᴉsdn sᴉ sᴉɥʇ
<PJSIP/6001-00000009>AGI Tx >> 200 result=1
    -- <PJSIP/6001-00000009>AGI Script chartest.py completed, returning 0

@kevin-olbrich
Copy link

This indeed works but better make sure the other types are converted:

    def _quote(self, string):
        """ provides double quotes to string, converts int/bool to string """
        if isinstance(string, int):
          string = str(string)
        if isinstance(string, float):
          string = str(string)
        joined = '"' + string + '"'
        return joined

Or better !isinstance(string, str) which will fail in cases, where casting will not make sense.

@ghost ghost mentioned this issue Jun 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants