Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why checking blank line from stdin input #40

Open
hhkbp2 opened this issue Feb 10, 2014 · 11 comments
Open

why checking blank line from stdin input #40

hhkbp2 opened this issue Feb 10, 2014 · 11 comments

Comments

@hhkbp2
Copy link

hhkbp2 commented Feb 10, 2014

Hi,

I use Petrel to run some gevent-patched python code on top of storm. The code runs for a while and then some error happens. Ths log file looks like:

[2014-02-10 12:58:00,745][storm][DEBUG]Message line #1 is blank. Pipe to Storm supervisor may be broken.
[2014-02-10 12:58:00,746][storm][DEBUG]Message line #2: 
[2014-02-10 12:58:00,746][storm][DEBUG]Message line #2 is blank. Pipe to Storm supervisor may be broken.
[2014-02-10 12:58:00,746][storm][DEBUG]Message line #3: 
[2014-02-10 12:58:00,746][storm][DEBUG]Message line #3 is blank. Pipe to Storm supervisor may be broken.
[2014-02-10 12:58:00,746][storm][DEBUG]Message line #4: 
[2014-02-10 12:58:00,747][storm][DEBUG]Message line #4 is blank. Pipe to Storm supervisor may be broken.

......{some more similiar lines}

[2014-02-10 12:58:00,754][storm][DEBUG]Message line #19: 
[2014-02-10 12:58:00,754][storm][DEBUG]Message line #19 is blank. Pipe to Storm supervisor may be broken.
[2014-02-10 12:58:00,754][storm][DEBUG]Message line #20: 
[2014-02-10 12:58:00,755][storm][DEBUG]Message line #20 is blank. Pipe to Storm supervisor may be broken.
[2014-02-10 12:58:00,756][storm][ERROR]Sent failure message ("E_BOLTFAILED__splitsentence__rds-secondary__pid__10243__port__-1__taskindex__-1__StormIPCException") to Storm
[2014-02-10 12:58:05,760][storm][ERROR]Caught exception in Bolt.run
Traceback (most recent call last):
  File "/py_petrel/lib/python2.7/site-packages/petrel-0.8.1.0.1-py2.7.egg/petrel/storm.py", line 368, in run
    tup = readTuple()
  File "/py_petrel/lib/python2.7/site-packages/petrel-0.8.1.0.1-py2.7.egg/petrel/storm.py", line 104, in readTuple
    cmd = readCommand()
  File "/py_petrel/lib/python2.7/site-packages/petrel-0.8.1.0.1-py2.7.egg/petrel/storm.py", line 97, in readCommand
    msg = readMsg()
  File "/py_petrel/lib/python2.7/site-packages/petrel-0.8.1.0.1-py2.7.egg/petrel/storm.py", line 71, in readMsg
    msg = ''.join('%s\n' % line for line in read_message_lines())
  File "/py_petrel/lib/python2.7/site-packages/petrel-0.8.1.0.1-py2.7.egg/petrel/storm.py", line 71, in <genexpr>
    msg = ''.join('%s\n' % line for line in read_message_lines())
  File "/py_petrel/lib/python2.7/site-packages/petrel-0.8.1.0.1-py2.7.egg/petrel/storm.py", line 65, in read_message_lines
    raise StormIPCException('Pipe to Storm supervisor seems to be broken!')
StormIPCException: Pipe to Storm supervisor seems to be broken!

I check the Petrel code in "storm.py". This exception is thrown when 20 empty lines are met from stdin input rather than when stdin is closed.

What is the purpose of checking blank line from stdin? why 20 empty lines not 30 or more? on what occasion the "blank line checking" could to be disabled?

@barrywhart
Copy link
Contributor

I have encountered cases where the Java process Python communicates with will fail, and as a result, reading from stdin yields an infinite stream of empty lines. The blank line check was a hacky workaround, and the number 20 was arbitrary; I thought maybe blank lines could occur in some data streams but 20 was enough to confirm there was a problem. Since writing this code, I have found the following note in the Python documentation:

http://docs.python.org/2/library/stdtypes.html

The advantage of leaving the newline on is that returning an empty string is then an unambiguous EOF indication.

I should fix this at some point.

@hhkbp2
Copy link
Author

hhkbp2 commented Feb 10, 2014

The link is broken. Could you fix it or post the note you found?

@barrywhart
Copy link
Contributor

I fixed the link. I was missing the "l" in "html".

@hhkbp2
Copy link
Author

hhkbp2 commented Feb 24, 2014

Thanks.

As we known, Storm uses ZeroMQ as communation layer among machines. It's somehow strange that Storm chooses the native pipe instead of ZeroMQ as IPC between storm worker process and outer multi-lang process.
I don't think using pipe is a good idea. It's hard to debug. And the Storm dev team abviously had known that, although there is no concrete schedule on refactoring it to socket or ZeroMQ.

@barrywhart
Copy link
Contributor

I think they were trying to keep the multilanguage binding very simple. I agree it makes things hard to debug. Have you looked at the unit test support in Petrel? Using those tools, it's possible to do most of your development and testing outside of Storm, which makes debugging much easier. For example, you can use pdb and print statements.

@hhkbp2
Copy link
Author

hhkbp2 commented Feb 24, 2014

Yes, I have looked at the 'mock.py' and 'tests' dir in the source code. It's should be quite helpful when debugging the application logic.
In my use case, probably some simple hacks are needed in 'mock.py' to simulate the concurrency/asynchronism of storm worker processes. I would take a try to use those tools later.

@philejmath
Copy link

any idea how to fix this? I have a topology that runs in local mode and on my local 1 node cluster but has this error when I push it to my new QA cluster. It has to be something with storm configuration on the new cluster I guess?

@barrywhart
Copy link
Contributor

If the Python workers are exiting due to blank lines, then this is almost certainly a symptom, not the problem itself. It indicates that the Java process on the other end of the stdin/stdout "channel" has exited, probably due to an error. Check the Storm logs and your setup, ask on the Google group, etc. Sorry I can't offer more specific help.

@philejmath
Copy link

thank you, I figured it out Petrel wasn't installed properly on all the servers. I know this might be obvious but you may want to add something to the docs specifically stating on which nodes petrel is required.

@barrywhart
Copy link
Contributor

Can you read the following section?

https://github.com/AirSage/Petrel#deploy-and-run

Did you have to install Petrel itself on the machines? As I mentioned in that section, each machine needs some underlying tools but should not require Petrel itself, because it is deployed by Storm, embedded within the topology jar file.

@philejmath
Copy link

once I installed Petrel on all the nodes this problem when away

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants