Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lorre is extremely aggressive on the tezos-node #950

Open
ghost opened this issue Dec 1, 2020 · 9 comments
Open

Lorre is extremely aggressive on the tezos-node #950

ghost opened this issue Dec 1, 2020 · 9 comments

Comments

@ghost
Copy link

ghost commented Dec 1, 2020

Hi,
I have a tezos-node (mainnet) and Lorre and Conseil API connected.
I do not use your docker images, I run the processes myself.

My problem is that when lorre is running I can barely connect to tezos-node anymore, e.g. it takes 20 to 60 seconds to check /chains/main/blocks/head:

 time curl -s --noproxy "*" --connect-timeout 60 --max-time 60 -X GET -H 'Content-Type: application/json' 'http://*****/chains/main/blocks/head' | jq '.header.level'
1239159

real    0m0.050s
user    0m0.008s
sys     0m0.005s

time curl -s --noproxy "*" --connect-timeout 60 --max-time 60 -X GET -H 'Content-Type: application/json' 'http://*****/chains/main/blocks/head' | jq '.header.level'
1239163

real    0m20.426s
user    0m0.008s
sys     0m0.013s

My current Lorre config looks like this:

platforms: [ {
  name: tezos
  network: mainnet
  enabled: true
  node: {
    protocol: "http"
    hostname: "****"
    port: *****
    pathPrefix: ""
    }
  }
]

lorre {

  request-await-time: 120 s
  get-response-entity-timeout: 90s
  post-response-entity-timeout: 1s

  sleep-interval: 5 s
  bootup-retry-interval: 10 s
  bootup-connection-check-timeout: 10 s
  fee-update-interval: 20
  fees-average-time-window: 3600
  depth: newest
  chain-events: []
  block-rights-fetching: {
    init-delay: 2 minutes
    interval: 60 minutes
    cycles-to-fetch: 5
    cycle-size: 4096
    fetch-size: 200
    update-size: 16
    enabled: true
  }

  batched-fetches {
    account-concurrency-level: 5
    block-operations-concurrency-level: 10
    block-page-size: 500
    block-page-processing-timeout: 1 hour
    account-page-processing-timeout: 15 minutes
    delegate-page-processing-timeout: 15 minutes
  }

  db {
    dataSourceClass: "org.postgresql.ds.PGSimpleDataSource"
    properties {
      user: "***********"
      password: "***********"
      url: "jdbc:postgresql://************"
    }
  }

}

akka {
  tezos-streaming-client {
    max-connections: 10
    max-open-requests: 512
    idle-timeout: 10 minutes
    pipelining-limit: 7
    response-entity-subscription-timeout: 15 seconds
  }
  tezos-dispatcher {
    type: "Dispatcher"
    executor: "thread-pool-executor"
    throughput: 1

    thread-pool-executor {
      fixed-pool-size: 16
    }
  }

  http {
    server {
      request-timeout: 5 minutes
      idle-timeout: 5 minutes
    }
  }
}

I built Lorre from the master branch today.

What can I do to make it less aggressive?

@ivanopagano
Copy link
Contributor

you can start by halving a couple of values in the akka.tezos-streaming-client section.

try something like

max-connections: 5 # <- half the number of concurrent open connections
max-open-requests: 512
idle-timeout: 10 minutes
pipelining-limit: 7
response-entity-subscription-timeout: 15 seconds

This should essentially drop the ongoing requests to half because it will use less connections

What I don't know for sure is why your tezos node should have less capacity than the one we use in our docker. Unless the node can auto-tune based on available system resources?
Did you set any custom configuration to run the tezos node?

@ghost
Copy link
Author

ghost commented Dec 2, 2020

@ivanopagano
Thank you for the advice - I am testing it now. I run the tezos-node like this:

tezos-node run -v --history-mode=archive --data-dir=/tezos --network=mainnet --rpc-addr=0.0.0.0:8732 --config-file=mainnet.json --connections=5

whereas mainnet.json contains:

{
  "data-dir": "/tezos",
  "p2p": {
    "bootstrap-peers": [
      "boot.tzbeta.net",
      "dubnodes.tzbeta.net:9732",
      "franodes.tzbeta.net:9732",
      "sinnodes.tzbeta.net:9732",
      <... many more peers ...>
    ],
    "listen-addr": "[::]:9732"
  }
}

@ghost
Copy link
Author

ghost commented Dec 2, 2020

@ivanopagano
I tried like this:

akka {
  tezos-streaming-client {
    max-connections: 3
    max-open-requests: 256
    idle-timeout: 10 minutes
    pipelining-limit: 7
    response-entity-subscription-timeout: 15 seconds
  }
  tezos-dispatcher {
    type: "Dispatcher"
    executor: "thread-pool-executor"
    throughput: 1

    thread-pool-executor {
      fixed-pool-size: 16
    }
  }

  http {
    server {
      request-timeout: 5 minutes
      idle-timeout: 5 minutes
    }
  }
}

And it did not improve the situation. Any other idea?

@ghost
Copy link
Author

ghost commented Dec 9, 2020

I have experimented a little more.
First I lower the akka values as follows:

akka {
  tezos-streaming-client {
    max-connections: 3
    max-open-requests: 128
    idle-timeout: 10 minutes
    pipelining-limit: 7
    response-entity-subscription-timeout: 15 seconds
  }
  tezos-dispatcher {
    type: "Dispatcher"
    executor: "thread-pool-executor"
    throughput: 1

    thread-pool-executor {
      fixed-pool-size: 16
    }
  }

  http {
    server {
      request-timeout: 5 minutes
      idle-timeout: 5 minutes
    }
  }
}

It still did not make a tangible difference. So as a workaround I created a new tezos-node on the same machine. So I have one that I can query and there's one for Conseil.
This way it works, I get quick responses.
What I learn from this is that it is not a hardware/IO/network issue as it works with more processes better than with less processes.
It seems to me that there's something like a "max-rpc-calls-per-second" limit on the tezos-node or Conseil ignores my akka config?

@ghost
Copy link
Author

ghost commented Dec 27, 2020

Hi There, any news on this? Do you have a suggestion what to do?

@ghost
Copy link
Author

ghost commented Jan 13, 2021

Hi, any idea what I shall do? The advises I received did not have any effect. Conseil keeps paralyzing the tezos-node.

@jun0tpyrc
Copy link

jun0tpyrc commented Jan 14, 2021

I got docker-compose of all those conseil+psql + tezos running for mainnet,
most tunings do not help much until i determined to scale up my instance to a 8core 32GB memory one and a fast gp3 disk on aws - which together seems solved the io bottleneck for me

@vishakh
Copy link
Contributor

vishakh commented Jan 29, 2021

Please try the latest release and let us know how it looks. There is improved logging so it should be easier to identify the root issue.

https://github.com/Cryptonomic/Conseil/releases/tag/2021-january-release-35

@vishakh
Copy link
Contributor

vishakh commented Jan 29, 2021

@g574 @jun0tpyrc Please see the above comment about the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants