It is safe to use a Service name in routes in a helm chart? #3077

howkey666 · 2019-07-04T15:20:16Z

howkey666
Jul 4, 2019

Hello,
I am trying to write a helm chart for NATS streaming using the fault tolerance mode. In a yaml file specifying a StatefulSet I have a few args but the important are:

"--cluster", "nats://0.0.0.0:{{ .Values.service.clusterPort }}",
"--routes", "nats://{{ .Values.service.name }}:{{ .Values.service.clusterPort }}"]

I run 3 replicas so I have 3 pods named nats-0, nats-1 and nats-2.
From a log I can see that nats server on nats-0 is set as an active and others are in standby mode. When I manually delete the nats-0 pod then a new active server is elected from the last two servers. That's expected and that meas that they know about each other.

But what is confusing for me is that in a log for particular server I see x.x.x.x:6222 - rid:6 - Route connection created where x.x.x.x is a Service IP and on another line the x.x.x.x is a pod IP. In kubernetes we are using IPVS proxy mode https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-ipvs and if I understand it correctly, it means that a client (in this case it can be any nats server running within cluster) will not know about a real IP of the pod where it runs but it will know about the Service IP only.
My assumption is that this isn't any problem. But I can be wrong because I don't know how the routing internally works and I were unable to find any resources other then source code.

Is my assumption correct? Can someone write more info about how the routing works? Would be great to have an explanaiton of this proces step by step with some example.

Thank you very much.

wallyqs · 2019-07-04T22:34:46Z

wallyqs
Jul 4, 2019
Maintainer

It is recommended that a headless service is used instead when deploying NATS as part of a statefulset and let the NATS Servers and clients to handle the randomizing of the connections.
Also for TLS and gossip to work, it is needed to set the cluster advertise as part of the NATS configuration as well as to use the serviceName setting from the statefulset spec so that each one of the pods have an A record assigned (meaning that the cert has to be a wildcard cert with the name of the service (*.serviceName.$namespace.svc)). For example:

---
apiVersion: v1
kind: Service
metadata:
  name: nats
  labels:
    app: nats
spec:
  selector:
    app: nats
  clusterIP: None
  ports:
  - name: client
    port: 4222
  - name: cluster
    port: 6222
  - name: monitor
    port: 8222
  - name: metrics
    port: 7777
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nats-config
data:
  nats.conf: |
    pid_file: "/var/run/nats/nats.pid"

    http: 8222

    cluster {
      port: 6222
      routes [
        nats://nats:6222
      ]

      cluster_advertise: $CLUSTER_ADVERTISE
      connect_retries: 10
    }

As part of the pod spec the cluster advertise setting, you can define the cluster advertise via env vars:

        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: CLUSTER_ADVERTISE
          value: $(POD_NAME).nats.$(POD_NAMESPACE).svc

I have an example here of how to deploy NATS + NATS Streaming as part of a statefulset https://gist.github.com/wallyqs/1f7460072b8cf743b9ff616f0a32007a from which I'll base to make updates to the helm chart soon.

0 replies

howkey666 · 2019-07-09T09:23:26Z

howkey666
Jul 9, 2019
Author

Thank you for information. So gossip protocol is used for auto discovery -> finding routes?
I set statefulset as a headless service and set the cluster_advertise option. Now I see in all logs only pods IP. Great :)

I have also faced with this error after manual deletion of pod with active server:

1] 2019/07/09 09:01:20.039060 [INF] STREAM: Server is active
[1] 2019/07/09 09:01:20.039088 [INF] STREAM: Recovering the state...
panic: ft: serverID "gBY8YrLqXsYVv3edomjWtj" claims to be active, aborting

goroutine 161 [running]:
github.com/nats-io/nats-streaming-server/server.(*StanServer).ftSendHBLoop(0xc000189080, 0xbf4133d80253eaf3, 0x10f328
9a0f, 0x104ef20)
        /go/src/github.com/nats-io/nats-streaming-server/server/ft.go:170 +0x7fd
github.com/nats-io/nats-streaming-server/server.(*StanServer).ftStart.func1()
        /go/src/github.com/nats-io/nats-streaming-server/server/ft.go:98 +0x45
created by github.com/nats-io/nats-streaming-server/server.(*StanServer).startGoRoutine
        /go/src/github.com/nats-io/nats-streaming-server/server/server.go:5251 +0x8a

Is it expected to happen sometimes?

What I also noticed is that my approach is totally different and more simple. I have only one statefulset and one service definition with image nats-streaming:0.14.2-linux. Would you recommend this approach or not?

Thank you.

0 replies

kozlovic · 2019-07-09T15:38:08Z

kozlovic
Jul 9, 2019
Collaborator

@howkey666 How did you configure the FT group?

The error you see means that a standby has become active (missed FT heartbeats and was able to acquire the lock on the datastore) but then received an FT heartbeat from another server that claims to be active (only active servers send FT heartbeats) and the FT heartbeat contains the time the server became active. In your situation, it means that the newly elected active realized that there was an active server that was active for longer, so it "stepped down".

In FT, there should be only one server active and the state (datastore) needs to be shared and support distributed locking. That is, the active server is asking the state to be locked and the OS/file system releases that lock if the file is closed or the process crashes. So in the event that a standby server misses the heartbeat and tries to become active, it will first attempt to lock the state, which should fail if the active server has still the lock on that. If that is the case, the standby would go back into standby mode waiting for FT heartbeats - and possibly attempting to grab the lock again, etc..

In FT mode you can have more than one standby, so other nodes could then take over from that panic. Still, we need to make sure that you have configured them properly, that the state is really shared and distributed locking working properly.

0 replies

howkey666 · 2019-07-10T14:30:53Z

howkey666
Jul 10, 2019
Author

It's happening only sometimes. I also found another problem, and I think it is related. Sometimes two out of three servers start as active.

I have set theese arguments:

args:
    - -D
    - -m=8222
    - -st=FILE
    - --dir=/nats-datastore
    - -ft_group={{ template "nats-streaming-server.fullname" . }}
    - --cluster=nats://0.0.0.0:6222
    - --routes=nats://{{ template "nats-streaming-server.fullname" . }}:6222
    - --cluster_advertise=$(POD_NAME).{{ template "nats-streaming-server.fullname" . }}.$(POD_NAMESPACE).svc

Here is my repository https://github.com/howkey666/nats-streaming-chart/tree/init/charts/file-store

0 replies

kozlovic · 2019-07-10T19:29:44Z

kozlovic
Jul 10, 2019
Collaborator

@howkey666 Looks to me that the storage is not actually shared or distributed locking not working otherwise that should not happen.
Make sure that:

all nodes belong to the same ft_group, that is, if you want them part of the same group they should have the exact same name
mount is available to all nodes at running time and that distributed locking works. NFS v4 should offer that.

I am thinking that it could be that your ReadWriteOnce is misunderstood in that if 2 pods are deployed in the same node then they would have access to the mount (https://stackoverflow.com/questions/56592929/how-pods-are-able-to-mount-the-same-pvc-with-readwriteonce-access-mode-when-stor). If that is the case and the mount does not do anything about syscall.FcntlFlock(f.Fd(), syscall.F_SETLK, spec) then I could see a situation where if server starts while networking is not yet avail between each streaming server but storage is, then they both become active because they would not get HB response but be able to lock the file. Again, this would mean that distributed file locking is not supported on your deployment.

Btw, have you made sure that if you send messages to active server then cause a failover, the state is coherent after a standby takes over? (making sure that they are really sharing storage and not have their own individual storage).

0 replies

howkey666 · 2019-07-11T12:42:54Z

howkey666
Jul 11, 2019
Author

@kozlovic If I understand it correctly in a group must be only one storage? Now I have a storage for each server.

0 replies

VitBartunek · 2019-07-11T12:58:15Z

VitBartunek
Jul 11, 2019

Hi, it's worth to mention that each pod have it's own PV as listed bellow. Every PV is RBD supplied from ceph cluster.

catalogue-event-bus-catalogue-event-bus-ha-0 Bound pvc-4134ae41-a310-11e9-9af3-5254009037f4 1Gi RWO ceph-rbd 1d
catalogue-event-bus-catalogue-event-bus-ha-1 Bound pvc-43b6b008-a310-11e9-9af3-5254009037f4 1Gi RWO ceph-rbd 1d
catalogue-event-bus-catalogue-event-bus-ha-2 Bound pvc-4c166ba7-a310-11e9-9af3-5254009037f4 1Gi RWO ceph-rbd 1d

0 replies

kozlovic · 2019-07-11T15:35:29Z

kozlovic
Jul 11, 2019
Collaborator

@howkey666 In FT mode, the storage location needs to be shared, that is, from each streaming server perspective, the file(s) they access is(are) the same. The active server "locks" the file, that is, when opening the file it execute a system call that requires the OS to grant it a lock (syscall.FcntlFlock(f.Fd(), syscall.F_SETLK, spec). That means that if a standby server was incorrectly or just missing heartbeats but have access to the store, opening the file should fail since the active server still has the lock.

0 replies

howkey666 · 2019-07-24T08:51:56Z

howkey666
Jul 24, 2019
Author

@kozlovic @wallyqs Thank you guys for your help. The problem is with wrongly used data store (type and number). Will try operator in a near future.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It is safe to use a Service name in routes in a helm chart? #3077

{{title}}

Replies: 9 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

It is safe to use a Service name in routes in a helm chart? #3077

howkey666 Jul 4, 2019

Replies: 9 comments

wallyqs Jul 4, 2019 Maintainer

howkey666 Jul 9, 2019 Author

kozlovic Jul 9, 2019 Collaborator

howkey666 Jul 10, 2019 Author

kozlovic Jul 10, 2019 Collaborator

howkey666 Jul 11, 2019 Author

VitBartunek Jul 11, 2019

kozlovic Jul 11, 2019 Collaborator

howkey666 Jul 24, 2019 Author

howkey666
Jul 4, 2019

wallyqs
Jul 4, 2019
Maintainer

howkey666
Jul 9, 2019
Author

kozlovic
Jul 9, 2019
Collaborator

howkey666
Jul 10, 2019
Author

kozlovic
Jul 10, 2019
Collaborator

howkey666
Jul 11, 2019
Author

VitBartunek
Jul 11, 2019

kozlovic
Jul 11, 2019
Collaborator

howkey666
Jul 24, 2019
Author