Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtests sometime fail with unable to connect to scylla-jmx #98

Open
bhalevy opened this issue Mar 5, 2020 · 2 comments
Open

dtests sometime fail with unable to connect to scylla-jmx #98

bhalevy opened this issue Mar 5, 2020 · 2 comments
Milestone

Comments

@bhalevy
Copy link
Member

bhalevy commented Mar 5, 2020

See scylladb/scylla-ccm#223 (comment)

Still seeing this, e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/1678/testReport/junit/bootstrap_test/TestBootstrap/start_stop_test/
Scylla version 359b32fb63e2c5f88ff855e535b647984e2fe623

Traceback (most recent call last):
  File "/usr/lib64/python3.7/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib64/python3.7/unittest/case.py", line 645, in run
    testMethod()
  File "/jenkins/workspace/scylla-master/next/scylla-dtest/bootstrap_test.py", line 53, in start_stop_test
    cluster.start(wait_for_binary_proto=True, wait_other_notice=True)
  File "/jenkins/workspace/scylla-master/next/scylla-ccm/ccmlib/scylla_cluster.py", line 137, in start
    started = self.start_nodes(**args)
  File "/jenkins/workspace/scylla-master/next/scylla-ccm/ccmlib/scylla_cluster.py", line 109, in start_nodes
    profile_options=profile_options, no_wait=no_wait)
  File "/jenkins/workspace/scylla-master/next/scylla-ccm/ccmlib/scylla_node.py", line 516, in start
    raise NodeError(e_msg, scylla_process)
ccmlib.node.NodeError: Error starting node node1: unable to connect to scylla-jmx port 127.0.89.1:7189

https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/1678/artifact/logs-release.2/dtest.log indicates that 2 processes were killed.
Since the test starts only 1 node these should be scylla and scylla-jmx

2020-03-03 15:44:01,849 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - cluster ccm directory: /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-ny5nvwt0
2020-03-03 15:44:01,850 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - Starting Scylla cluster from directory /jenkins/workspace/scylla-master/next/scylla-dtest/../scylla/build/release/
2020-03-03 15:44:01,853 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - Allocated cluster ID 89: /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-ny5nvwt0
2020-03-03 15:44:01,860 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - configuring skip_wait_for_gossip_to_settle=0 for single_node test
2020-03-03 15:44:01,861 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - populating cluster with one node
2020-03-03 15:44:15,809 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - starting cluster
2020-03-03 15:44:45,900 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - Test failed with errors: [(<bootstrap_test.TestBootstrap testMethod=start_stop_test>, (<class 'ccmlib.node.NodeError'>, NodeError('Error starting node node1: unable to connect to scylla-jmx port 127.0.89.1:7189'), <traceback object at 0x7f208c536690>))]
2020-03-03 15:44:45,905 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - removing ccm cluster test at: /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-ny5nvwt0
2020-03-03 15:44:46,981 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - proc 182 killed - cluster 127.0.89.
2020-03-03 15:44:46,982 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - proc 184 killed - cluster 127.0.89.
2020-03-03 15:44:46,982 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - Freeing cluster ID 89: link /jenkins/workspace/scylla-master/next/scylla/.dtest/89

So it seems like the scylla-jmx process is up but unresponsive.

@bhalevy
Copy link
Member Author

bhalevy commented May 5, 2020

As I wrote on scylladb/scylla-ccm#223 (comment)
I saw this today:

https://jenkins.scylladb.com/view/master/job/scylla-master/job/byo/job/dtest-byo/144/artifact/logs-release.2/1588687609026_materialized_views_test.TestMaterializedViews.add_dc_during_mv_insert_test/node1_jmx.log

Using config file: /jenkins/workspace/scylla-master/byo/dtest-byo/scylla/.dtest/dtest-3ngmni08/test/node1/conf/scylla.yaml
library initialization failed - unable to allocate file descriptor table - out of memory

@penberg
Copy link
Contributor

penberg commented May 6, 2020

@bhalevy The "unable to allocate file descriptor table" is an artifact of the node running out of memory. You ran the test on thor so it's unfortunately pretty common scenario...

@DoronArazii DoronArazii added this to the Backlog milestone May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants