Skip to content
This repository has been archived by the owner on Jun 23, 2020. It is now read-only.

kubernetes: persistent volumes #19

Closed
bradrydzewski opened this issue Oct 24, 2018 · 22 comments
Closed

kubernetes: persistent volumes #19

bradrydzewski opened this issue Oct 24, 2018 · 22 comments
Labels
kubernetes Issues related to the Kubernetes Engine

Comments

@bradrydzewski
Copy link
Member

this helps ensure all pods have access to a shared workspace and can run on the same machine. It also helps us implement temp_dir volumes (as defined in the drone yaml). Persistent volumes are currently disabled while we try to figure out an approach to scheduling pods on specific nodes.

@bradrydzewski bradrydzewski added the kubernetes Issues related to the Kubernetes Engine label Dec 5, 2018
@based64god
Copy link

Has any progress been made on this? I'd be interested in taking a crack at it if it hasn't yet been touched.

@bradrydzewski
Copy link
Member Author

no progress really, but would love some assistance :) I added some code to create the pv and pvc data objects but never had time to finish the job
https://github.com/drone/drone-runtime/blob/master/engine/kube/volume.go

@zetaab
Copy link
Contributor

zetaab commented Dec 11, 2018

I am checking this currently. However, the problem that I see is that we need to have ReadWriteMany volumes in kubernetes cluster. At least we does not have those. Need to install something like https://github.com/gluster/gluster-kubernetes

@bradrydzewski
Copy link
Member Author

bradrydzewski commented Dec 11, 2018

@zetaab a ReadWriteMany volume is not required because all Pipeline steps execute on the same node, using a shared workspace (make sure you pass the --kube-node flag when testing with the cli). This means the persistent volume needs to be of type HostPath.

@zetaab
Copy link
Contributor

zetaab commented Dec 11, 2018

sharing hostpath volumes in kubernetes is not recommended. Platforms like openshift it is not even allowed (without modifying things).

This can be maybe used in future for hostpath things (after dynamic provisioning is supported): https://kubernetes.io/docs/concepts/storage/storage-classes/#local

currently the problem is that if namespace x takes hostpath /foo namespace y can do the same. So it is kind of isolation issue between namespaces. Hopefully that localstorage dynamic provisioner will solve that issue somehow

@zetaab
Copy link
Contributor

zetaab commented Dec 11, 2018

@bradrydzewski btw how you are planning to use that --kube-node thing? When new build is going to start you need to just define one node where everything should be executed? You need to know which node has enough resources to execute that beforehand? Or should user define in which node build is always running?

@bradrydzewski
Copy link
Member Author

The --kube-node parameter is only required from the command line so you can more closely emulate how drone works. Under the hood drone ensures all pipeline steps are assigned to the same node. With a persistent volume claim this would no longer be necessary.

I think the default volume type should be HostPath because installing a volume plugin should not be a requirement for using Drone. But we can certainly give teams the option to use alternate volume plugin types if that want or need to.

@zetaab
Copy link
Contributor

zetaab commented Dec 11, 2018

Yes I agree with you, there should be hostpath(in future this can be moved to pvc using local hostpath dynamic provisioner) and pvc option. Also hostpath should be maybe the default one, because installing things like RWX volumes is not that easy. RWX volume is needed if people are executing two pipeline steps simultaneously, otherwise RWO is enough. However, it might be quite slow to execute pipelines with RWO because detaching/attaching volume to each step takes time.

@laszlocph
Copy link

laszlocph commented Feb 8, 2019

I think this issue is very important to make the Kubernetes runtime fully native. The implementation uses hostPath that limits a pipeline to run only on a single node giving up on the real benefits of using a scheduler like Kubernetes.

While other aspects of the Kubernetes implementation actually embrace the Kubernetes scheduler. The decision that there is no agent concept in the Kubernetes runtime is a forward looking one. Currently Drone starts every build in about the time it is able to spin up a container. Practically this means that builds and their steps always start, getting rid of the queue concept.

But a running step on the UI sometimes means a Pending pod state if you configure resource requests/limits for your Drone steps - which is a must for any predictable behavior. With a fully Kubernetes native implementation this would mean that the cluster would scale up and in a minute the Pending state would deterministically resolve into Running.

Today it's not the case, if you fan-out and one of the step/pod is in Pending it has to wait until other steps/pods are done on the same node. This is a non-deterministic and non-transparent behavior that annoys me as a user.

Furthermore limiting all pods of a pipeline on a specific node, prevents the cluster from scaling up. Which is a shame given that the abstraction is able to rely on Kubernetes for a fully queueless autoscaling behavior, while the current implementation prevents Kubernetes from doing its job. Plus prevents me from migrating my (homegrown) autoscaling 0.8 Drone setup to the Kubernetes runtime.

See how a Pending pod doesn't trigger scale-up:

Events:
  Type     Reason             Age               From                Message
  ----     ------             ----              ----                -------
  Warning  FailedScheduling   1m (x18 over 2m)  default-scheduler   0/2 nodes are available: 1 Insufficient cpu, 1 node(s) didn't match node selector.                                                            
  Warning  FailedScheduling   1m (x7 over 1m)   default-scheduler   0/2 nodes are available: 1 node(s) didn't match node selector, 2 Insufficient cpu.                                                            
  Normal   NotTriggerScaleUp  7s (x11 over 2m)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)

@kevinsimper
Copy link

@laszlocph So if you run multiple pipelines, it still only run it one machine?

I can see why you would use hostpath, but not of multiple pipelines

@laszlocph
Copy link

@laszlocph So if you run multiple pipelines, it still only run it one machine?

Each pipeline has a dedicated machine. If you have two nodes and two pipelines, Drone can schedule them on different nodes, I've seen that happening.

The problem happens when a pipeline fans out to 4 - let say, each of those steps requiring a single core, and you have 2 or 4 core machine. Then at least one of those supposed to be parallel steps are becoming sequential. No matter how many other idle nodes you have.

@max-rocket-internet
Copy link

The problem happens when a pipeline fans out to 4 - let say, each of those steps requiring a single core, and you have 2 or 4 core machine. Then at least one of those supposed to be parallel steps are becoming sequential. No matter how many other idle nodes you have.

For sure. But I do think a certain amount blame for this problem lies with the cluster owner. If you have workloads that require certain or specific amounts of resources, resource in the cluster is at or near the limit, and/or you are relying on the cluster-autoscaler to add the required resources to schedule your loads, then you will end up with some issues at some point. In this situation you can apply some specific configuration to help, for example:

  • Create specific node groups with higher or specific sets of resources
  • Add a separate cluster-autoscaler for the specific node groups (and optionally allow them to scale to 0)
  • Use node labels to schedule pods on specific node groups

It won't solve all problems but with some careful consideration you can solve a lot of scheduling problems like this.

@max-rocket-internet
Copy link

Also worthing noting that in k8s version 1.12 they improved scheduling in regards to volumes and zones: https://kubernetes.io/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/

But I think until the cluster-autoscaler is zone aware this will always be tricky.

@laszlocph
Copy link

It won't solve all problems but with some careful consideration you can solve a lot of scheduling problems like this.

I agree, but overprovisioning and being smart about placing workloads are exactly the problems schedulers meant to eliminate.

So let's say we overprovision each node.
How do you tell Drone how many pipelines it is allowed to place on a single node? This was possible with DRONE_MAX_PROCS=2 in the non Kubernetes implementation. Now it is not - and would be silly to introduce - so I can't prevent Drone scheduling all pipelines on a single node.

How do you know when to add a node to the cluster?
Without defining DRONE_MAX_PROCS again it's just guesswork with bad scheduling as cornercases.

@max-rocket-internet
Copy link

FYI I haven't used Drone prior to version 1.0 so my knowledge there is lacking.

How do you tell Drone how many pipelines it is allowed to place on a single node?

Ideally I would say this isn't Drone's responsibility to know or control this. This is scheduling and is the responsibly of k8s. But when Drone creates the job, it could set nodeSelector with node labels, podAffinity or podAntiAffinity to control these to some degree.

How do you know when to add a node to the cluster?

The cluster-autoscaler will do this. If you want limits around its behaviour then you can set these (to some degree) with min/max, cool down, delays etc.

I'm not saying either of these solutions are ideal for Drone but normally that's how these issues are solved on k8s 🙂

@bradrydzewski
Copy link
Member Author

bradrydzewski commented Feb 8, 2019

How do you tell Drone how many pipelines it is allowed to place on a single node

Drone for Nomad handles this by requesting CPU and RAM resources for each Pipeline. In Nomad, the request does not place any actual limits, and is only used for scheduling. I believe a similar concept could be used for Kubernetes via resource requests vs limits.

(nomad task definition)

Name:"stage"
Resources:Object
CPU:200
DiskMB:0
IOPS:0
MemoryMB:4000
Networks:null

I think this issue is very important to make the Kubernetes runtime fully native. The implementation uses hostPath that limits a pipeline to run only on a single node giving up on the real benefits of using a scheduler like Kubernetes.

Fundamentally Drone does not care about whether or not steps run on the same or different nodes, however, Drone does care that there is a shared disk (workspace) available for all steps in the pipeline. The current implementation -- running all steps on a single node -- was the easiest way for us to create the initial proof of concept, which we can now build upon.

It is possible Drone can support ReadWriteMany volumes in the future (Ceph, Gluster, etc) although we still need to get Drone working well with vanilla Kubernetes and HostPath persistent volume claim, since a cluster may not have ReadWriteMany volume plugins available and this should not be a requirement of running Drone.

To support ReadWriteMany volumes, we will need to assign resource limits to every step in the pipeline in order for it to be properly scheduled by Kubernetes. In terms of user-experience, this will sort of suck, so we need to find a way to minimize this or come up with sane defaults that can easily be overriden.

@laszlocph
Copy link

laszlocph commented Feb 9, 2019

It is possible Drone can support ReadWriteMany volumes in the future (Ceph, Gluster, etc)

Using the storageClass approach from @zetaab earlier in this issue would be a clean solution in my opinion.

although we still need to get Drone working well with vanilla Kubernetes and HostPath persistent volume claim, since a cluster may not have ReadWriteMany volume plugins available and this should not be a requirement of running Drone.

Ceph and Gluster is a pain to setup, but i just tried ReadWriteMany volumes with NFS and only needed a single pod. Maybe it could be bundled with the Drone yaml. The linked branch contains a drone-runtime implementation that uses NFS based ReadWriteManyVolumes.

Or if you chose the pluggable stoageClass approach, an NFS provisioner can be showcased as best practice. It's more approachable than Cepf/Gluster and works everywhere.

To support ReadWriteMany volumes, we will need to assign resource limits to every step in the pipeline in order for it to be properly scheduled by Kubernetes. In terms of user-experience, this will sort of suck, so we need to find a way to minimize this or come up with sane defaults that can easily be overriden.

I don't get this part, what kind of limits are needed?

@zetaab
Copy link
Contributor

zetaab commented Mar 6, 2019

I did my PR once already (#27), but @bradrydzewski closed it. As I see it storageclasses is correct solution for this.

Of course storageclass needs to support RWX or similar, but with general storageclass implementation we could use more than one storagebackends

@galexrt
Copy link

galexrt commented Apr 17, 2019

What about adding a new kubernetes section to the config to add this whole storage configuration part?

That way we could use Kubernetes native "objects" for configuration by the user, e.g., custom volumes for secrets, the user would specify those volumes in normal yaml format, storageclass name is separated from Docker config and more, namespace pattern / what namespace should be used for the jobs.

Though I'm not sure how the user would set the configuration "globally" and maybe even on a per project to be passed to the Kubernetes Drone Runtime.

Example idea:

{
    "metadata": {
[...]
	},
	"steps": [
[...]
	],
	"docker": {}.
	"kubernetes": {
		// Storage Class name to use for the PersistentVolumeClaims to use per Job
		"storageClassName": "my-awesome-rwx-storageclass",
		// Attach other volumes to the job pod(s)
		"additionalVolumes": [
			{
				"volumeMount": {
					"name": "foo",
					"mountPath: "/etc/foo",
					"readOnly": true
				},
				"volume": {
					"name": "foo",
					"secret": {
						"secretName": "mysecret"
					}
				}
			}
		],
		// How the namespace should be named and / or if separate namespaces per Job should be used
		"namespaces": {
			"mode": "isolated",
			"prefix": "drone-"
			// or
			"mode": "shared",
			"name": "my-droneci-ci-namespace"
		},
		// Clean up jobs after X time, other items to clean up?
		// E.g., adding a label to the created namespaces and making sure that non are hanging on deletion every X internval
		"cleanUp": {
			"jobDeletionDelay": "24h"
		}
	}
}

(the example covers more than just StorageClassName but just adding to get a bigger picture of what at least I can think of here)

This would be a config section for Kubernetes which I personally can think of when looking at Drone CI Runtime Kubernetes right now and what I would like to see in the future. 😉

@bradrydzewski
Copy link
Member Author

bradrydzewski commented Apr 17, 2019

I see us going more the direction of adopting something like Knative or Tekton as a runtime target for Drone (see #65). Drone was designed for Docker and trying to force-fit this design into Kubernetes is not working very well. I expect we will invest a lot more time into Knative in the coming months, as opposed to investing in the existing experimental implementation.

@zetaab
Copy link
Contributor

zetaab commented Apr 17, 2019

@bradrydzewski as I see this could work #27 if in some day we have working local volume provisioner which can dynamically make volumes in current host machine. It works pretty much in same way than current solution (except no hardcoded hostpath mount, instead persistentvolumeclaim).

See https://kubernetes.io/docs/concepts/storage/volumes/#local and https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/

Dynamic provisioning is not supported yet. - but in some day it could be supported

@bradrydzewski
Copy link
Member Author

bradrydzewski commented Nov 5, 2019

the kubernetes implementation in this repository was scrapped for reasons described here. We have a new implementation, created from scratch, that no longer uses a persistent volume which in turn obsoletes this issue. New implementation can be found at drone-runners/drone-runner-kube.

new kubernetes runner documentation can be found here:
https://kube-runner.docs.drone.io/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kubernetes Issues related to the Kubernetes Engine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants