-
Notifications
You must be signed in to change notification settings - Fork 0
/
guide.yml
1528 lines (1216 loc) · 90.8 KB
/
guide.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Copyright (C) Nicolas Lamirault <[email protected]>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# SPDX-License-Identifier: Apache-2.0
---
categories:
- category: "Introduction"
content: |
If you've researched cloud native applications and technologies, you've probably come
across the [CNCF cloud native landscape](https://landscape.cncf.io).
In this guide, we'll break down the components used by Portefaix, and provide an overview
of its layers, columns and categories.
subcategories:
- subcategory: "What is the Portefaix landscape?"
content: |
The goal of the Portefaix landscape is to compile and organize cloud native open
source projects and proprietary products used by Portefaix, into categories, providing
an overview of the current ecosystem.
- subcategory: "How to use this guide"
content: |
In this guide, you'll find one chapter per layer and column which discusses each category
within it. Categories are broken down into: what it is, the problem it addresses, how it
helps, and technical 101. While the first three sections assume no technical background,
the technical 101 is targeted to engineers just getting started with cloud native.
- category: "Provisioning"
content: |
Provisioning is the first layer in the cloud native landscape. It encompasses tools that
are used to *create and harden* the foundation on which cloud native apps are built.
You'll find tools to automatically configure, create, and manage the infrastructure,
as well as for scanning, signing, and storing container images. The layer also extends
to security with tools that enable policy setting and enforcement, embedded authentication
and authorization, and the handling of secrets distribution. That's a mouthful, so let's
discuss each category at a time.
subcategories:
- subcategory: "Automation & Configuration"
keywords:
- "Infrastructure-as-Code (IaC)"
- "Automation"
- "Declarative Configuration"
content: |
#### What it is
Automation and configuration tools speed up the creation and configuration of compute
resources (virtual machines, networks, firewall rules, load balancers, etc.). Tools in
this category either handle different parts of the provisioning process or try to control
everything end-to-end. Most provide the ability to integrate with other projects and
products in the space.
#### Problem it addresses
Traditionally, IT processes relied on lengthy and labor intensive manual release cycles,
typically between three to six months. Those cycles came with lots of human processes and
controls that slowed down changes to production environments. These slow release cycles
and static environments aren't compatible with cloud native development. To deliver on
rapid development cycles, infrastructure must be provisioned dynamically and without
human intervention.
#### How it helps
Tools of this category allow engineers to build computing environments without human
intervention. By codifying the environment setup it becomes reproducible with the click
of a button. While manual setup is error prone, once codified, environment creation
matches the exact desired state -- a huge advantage.
While tools may take different approaches, they all aim at reducing the required work
to provision resources through automation.
#### Technical 101
As we move from old-style human-driven provisioning to a new on-demand scaling model
driven by the cloud, the patterns and tools we used before no longer meet our needs.
Most organizations can't afford a large 24x7 staff to create, configure, and manage
servers. Automated tools like Terraform reduce the level of effort required to scale
tens of servers and networks with hundreds of firewall rules. Tools like Puppet, Chef,
and Ansible provision and/or configure these new servers and applications
programmatically as they are spun up and allow them to be consumed by developers.
Some tools interact directly with the infrastructure APIs provided by platforms like
AWS or vSphere, while others focus on configuring the individual machines to make them
part of a Kubernetes cluster. Many, like Chef and Terraform, can interoperate to provision
and configure the environment. Others, like OpenStack, exist to provide an
Infrastructure-as-a-Service (IaaS) environment that other tools could consume.
Fundamentally, you'll need one or more tools in this space as part of laying down the
computing environment, CPU, memory, storage, and networking, for your Kubernetes clusters.
You'll also need a subset of these to create and manage the Kubernetes clusters
themselves.
There are now over 5 CNCF projects in this space, more if you count projects like Cluster
API which don't appear on the landscape. There is also a very robust set of other open
source and vendor provided tools.
- subcategory: "Container Registry"
keywords:
- "Container"
- "OCI Image"
- "Registry"
content: |
#### What it is
Before diving into container registries, we need to define three tightly related concepts:
* **Container** is "a running process with resource and capability constraints managed by a
computer's operating system"
([Cloud Native Glossary](https://github.com/cncf/glossary/blob/main/content/en/container.md)).
* **Image** is a set of archive files needed to run containers and its process. You could
see it as a form of template on which you can create an unlimited number of containers.
* **Repository**, or just repo, is a space to store images.
And **container registries** are specialized web applications that categorize and store repositories.
Let's recap real quick: images contain the information needed to execute a program
(within a container) and are stored in repositories which in turn are categorized and
grouped in registries. Tools that build, run, and manage containers need access to those
images. Access is provided by referencing the registry (the path to access the image).
#### Problem it addresses
Cloud native applications are packaged and run as containers. Container registries store and provide
the container images needed to run these apps.
#### How it helps
By centrally storing all container images in one place, they are easily accessible for any developer
working on that app.
#### Technical 101
Container registries either store and distribute images or enhance an existing registry in some
way. Fundamentally, a registry is a web API that allows container runtimes to store and retrieve
images. Many provide interfaces to allow container scanning or signing tools to enhance the
security of the images they store. Some specialize in distributing or duplicating images in a
particularly efficient manner. Any environment using containers will need to use one or more
registries.
Tools in this space provide integrations to scan, sign, and inspect the images they store.
Dragonfly and Harbor are CNCF projects and Harbor recently gained the distinction of
[being the first](https://goharbor.io/blog/harbor-2.0/) OCI compliant registry. Each major cloud
provider provides its own hosted registry and many other registries can be deployed standalone or
directly into your Kubernetes cluster via tools like Helm.
- subcategory: "Security & Compliance"
keywords:
- "Image scanning"
- "Image signing"
- "Policy enforcement"
- "Audit"
- "Certificate Management"
content: |
#### What it is
Cloud native applications are designed to be rapidly iterated on. Think of your mobile phone's
continuous flow of app updates — they evolve everyday, presumably getting better. In order to
release code on a regular cadence you must ensure that the code and operating environment are
secure and only accessed by authorized engineers. Tools and projects in this section provide
some of the abilities needed to build and run modern applications securely.
#### Problem it addresses
Security and compliance tools help harden, monitor, and enforce platform and application security.
From containers to Kubernetes environments, these tools allow you to set policy (for compliance),
get insights into existing vulnerabilities, catch misconfigurations, and harden the containers and
clusters.
#### How it helps
To run containers securely, containers must be scanned for known vulnerabilities and signed to
ensure they haven't been tampered with. Kubernetes has extremely permissive access control settings
by default that are unsuitable for production use. The result: Kubernetes clusters are an attractive
target for anyone looking to attack your systems. The tools and projects in this space help harden
the cluster and detect when the system is behaving abnormally.
#### Technical 101
* Audit and compliance
* Path to production:
* Code scanning
* Vulnerability scanning
* Image signing
* Policy creation and enforcement
* Network layer security
Some of these tools are rarely used directly. Trivy, Claire, and Notary, for example, are leveraged
by registries or other scanning tools. Others represent key hardening components of a modern
application platform. Examples include Falco or Open Policy Agent (OPA).
You'll find a number of mature vendors providing solutions in this space, as well as startups
founded explicitly to bring Kubernetes native frameworks to market. At the time of this writing
Falco, Notary/TUF, and OPA are CNCF projects in this space.
- subcategory: "Key Management"
keywords:
- "AuthN and AuthZ"
- "Identity"
- "Access"
- "Secrets"
content: |
#### What it is
Before digging into key management, let's first define cryptographic keys. A key is a string of
characters used to encrypt or sign data. Like a physical key, it locks (encrypts) data so that
only someone with the right key can unlock (decrypt) it.
As applications and operations adapt to a new cloud native world, security tools are evolving to
meet new security needs. The tools and projects in this category cover everything from how to
securely store passwords and other secrets (sensitive data such as API keys, encryption keys, etc.)
to how to safely eliminate passwords and secrets from your microservices environment.
#### Problem it addresses
Cloud native environments are highly dynamic, requiring on-demand secret distribution. That means
it has to be entirely programmatic (no humans in the loop) and automated.
Additionally, applications need to know if a given request comes from a valid source
(authentication) and if that request has the right to do whatever it's trying to do
(authorization). This is commonly referred to as AuthN and AuthZ.
#### How it helps
Each tool or project takes a different approach but they all provide a way to either securely
distribute secrets and keys or a service or specification related to authentication, authorization,
or both.
#### Technical 101
Tools in this category can be grouped into two sets: 1) key generation, storage, management, and
rotation, and 2) single sign-on and identity management. Vault, for example, is a rather generic
key management tool allowing you to manage different types of keys. Keycloak, on the other hand,
is an identity broker which can be used to manage access keys for different services.
- subcategory: "Summary"
content: |
As we've seen, the provisioning layer focuses on building the foundation of your cloud native
platforms and applications with tools handling everything from infrastructure provisioning to
container registries to security. Next, we'll discuss the runtime layer containing cloud native
storage, container runtime, and networking.
- category: "Runtime"
content: |
Now that we've established the foundation of a cloud native environment, we'll move one
infrastructure layer up and zoom into the runtime layer. It encompasses everything a container
needs to run in a cloud native environment. That includes the code used to start a container,
referred to as a container runtime; tools to make persistent storage available to containers;
and those that manage the container environment networks.
But note, these resources are not to be confused with the networking and storage work handled by
the provisioning layer discussed above. Those focused on getting the container platform running.
Tools in this category are used to start and stop containers, help them store data, and allow them
to talk to each other.
subcategories:
- subcategory: "Cloud Native Storage"
keywords:
- "Persistent volume"
- "CSI"
- "Storage API"
- "Backup and restore"
content: |
#### What it is
Storage is where the persistent data of an app is stored, often referred to as a persistent volume.
To function reliably, applications need to have easy access to storage. Generally, when we say
persistent data, we mean storing things like databases, messages, or any other information we want
to ensure doesn't disappear when an app gets restarted.
#### Problem it addresses
Cloud native architectures are fluid, flexible, and elastic, making persisting data between
restarts challenging. To scale up and down or self-heal, containerized apps are continuously
created and deleted, changing physical location over time. That's why cloud native storage must
be provided node-independently. To store data, though, you'll need hardware, a disk to be specific,
and disks, just like any other hardware, are infrastructure-bound — our first big challenge.
Then there is the actual storage interface which can change significantly between datacenters
(in the old world, each infrastructure had their own storage solution with its own interface),
making portability really tough.
And lastly, manual provisioning and autoscaling aren't compatible, so, to benefit from the
elasticity of the cloud, storage must be provisioned automatically.
Cloud native storage is tailored to this new cloud native reality.
#### How it helps
The tools in this category help either:
1. Provide cloud native storage options for containers,
2. Standardize the interfaces between containers and storage providers, or
3. Provide data protection through backup and restore operations.
The former means storage that uses a cloud native compatible container storage interface
(tools in the second category) and which can be provisioned automatically, enabling autoscaling
and self-healing by eliminating the human bottleneck.
#### Technical 101
Cloud native storage is largely made possible by the Container Storage Interface (CSI) which
provides a standard API for providing file and block storage to containers. There are a number
of tools in this category, both open source and vendor-provided, that leverage the CSI to provide
on-demand storage for containers.
Additionally, there are technologies aiming to solve other cloud native storage challenges.
Minio is a popular project that provides an S3-compatible API for object storage among other
things. Tools like Velero help simplify the process of backing up and restoring both the
Kubernetes clusters themselves as well as persistent data used by the applications.
- subcategory: "Container Runtime"
keywords:
- "Container"
- "MicroVM"
content: |
#### What it is
As discussed under container registry, a container is a set of compute constraints used to execute
(or launch) an application. Containerized apps believe they are running on their own dedicated
computer and are oblivious that they are sharing resources with other processes
(similar to virtual machines).
The container runtime is the software that executes containerized (or "constrained") applications.
Without the runtime, you only have the container image, the at-rest file specifying how the
containerized app should look like. The runtime will start an app within a container and provide
it with the needed resources.
#### Problem it addresses
Container images (the files with the application specs) must be launched in a standardized, secure,
and isolated way. Standardized because you need standard operating rules no matter where they are
running. Secure, well, because you don't want anyone who shouldn't access it to do so. And isolated
because you don't want the app to affect or be affected by other apps (for instance, if a
co-located application crashes). Isolation basically functions as protection. Additionally, the
application needs to be provided resources, such as CPU, storage, and memory.
#### How it helps
The container runtime does all that. It launches apps in a standardized fashion across all
environments and sets security boundaries. The latter is where some of these tools differ. Runtimes
like CRI-O or gVisor have hardened their security boundaries. The runtime also sets resource limits
for the container. Without it, the app could consume resources as needed, potentially taking
resources away from other apps, so you always need to set limits.
#### Technical 101
Not all tools in this category are created equal. Containerd (part of the famous Docker product)
and CRI-O are standard container runtime implementations. Then there are tools that expand the use
of containers to other technologies, such as Kata which allows you to run containers as VMs. Others
aim at solving a specific container-related problem such as gVisor which provides an additional
security layer between containers and the OS.
- subcategory: "Cloud Native Network"
keywords:
- "SDN"
- "Network Overlay"
- "CNI"
content: |
#### What it is
Containers talk to each other and to the infrastructure layer through a cloud native network.
[Distributed applications](https://thenewstack.io/primer-distributed-systems-and-cloud-native-computing/)
have multiple components that use the network for different purposes. Tools in this category create
a virtual network on top of existing networks specifically for apps to communicate, referred to
as an **overlay network**.
#### Problem it addresses
While it's common to refer to the code running in a container as an app, the reality is that most
containers hold only a small specific set of functionalities of a larger application. Modern
applications such as Netflix or Gmail are composed of a number of these smaller components each
running in its own container. To allow all these independent pieces to function as a cohesive
application, containers need to communicate with each other privately. Tools in this category
provide that private communication network.
Data and messages flowing between containers may have sensitive or private data. Because cloud
native networking uses software for controlling, inspecting and modifying data flows, it is a lot
easier to manage, secure and isolate connections between containers. In some cases you may want to
extend your container networks and network policies such as firewall and access rules to allow an
app to connect to virtual machines or services running outside the container network. The
programmable and often declarative nature of cloud native networking makes this possible.
#### How it helps
Projects and products in this category use the Container Network Interface (CNI), a CNCF project,
to provide networking functionalities to containerized applications. Some tools, like Flannel, are
rather minimalist, providing bare bones connectivity to containers. Others, such as NSX-T provide a
full software-defined networking layer creating an isolated virtual network for every Kubernetes
namespace.
At a minimum, a container network needs to assign IP addresses to pods (that's where containerized
apps run in Kubernetes), allowing other processes to access it.
#### Technical 101
The variety and innovation in this space is largely made possible by the CNI (similar to storage
and the Container Storage Interface mentioned above).The CNI standardizes the way network layers
provide functionalities to pods. Selecting the right container network for your Kubernetes
environment is critical and you've got a number of tools to choose from. Weave Net, Antrea, Calico,
and Flannel all provide effective open source networking layers. Their functionalities vary widely
and your choice should be ultimately driven by your specific needs.
Numerous vendors support and extend Kubernetes networks with Software Defined Networking (SDN)
tools, providing additional insights into network traffic, enforcing network policies, and even
extending container networks and policies to your broader datacenter.
- subcategory: "Summary"
content: |
This concludes our overview of the runtime layer which provides all the tools containers need to
run in a cloud native environment:
* Cloud native storage gives apps easy and fast access to data needed to run reliably
* The container runtime which creates and starts containers executing application code
* Cloud native networking provides connectivity for containerized apps to communicate.
- category: "Orchestration & Management"
content: |
Now that we've covered both the provisioning and runtime layer we can now dive into orchestration
and management. Here you'll find tooling to handle running and connecting your cloud native
applications. This section covers everything from Kubernetes itself, one of the key enablers of
cloud native development to the infrastructure layers responsible for inter app, and external
communication. Inherently scalable, cloud native apps rely on automation and resilience, enabled
by these tools.
subcategories:
- subcategory: "Scheduling & Orchestration"
keywords:
- "Cluster"
- "Scheduler"
- "Orchestration"
content: |
#### What it is
Orchestration and scheduling refer to running and managing
[containers](https://github.com/cncf/glossary/blob/main/content/en/container.md) across a cluster.
A cluster is a group of machines, physical or virtual, connected over a network (see cloud native
networking).
Container orchestrators (and schedulers) are somewhat similar to the operating system (OS) on your
laptop. The OS manages all your apps such as Microsoft 365, Slack and Zoom; executes them, and
schedules when each app gets to use your laptop's hardware resources like CPU, memory and storage.
While running everything on a single machine is great, most applications today are a lot bigger
than one computer can possibly handle. Think Gmail or Netflix. These massive apps are distributed
across multiple machines forming a
[distributed application](https://thenewstack.io/primer-distributed-systems-and-cloud-native-computing/).
Most modern-day applications are built this way, requiring software that is able to manage all
components running across these different machines. In short, you need a "cluster OS." That's
where orchestration tools come in.
You probably noticed that containers come up time and again. Their ability to run apps in many
different environments is key. Container orchestrators, in most cases,
[Kubernetes](https://kubernetes.io/), provide the ability to manage these containers. Containers
and Kubernetes are both central to cloud native architectures, which is why we hear so much about
them.
#### Problem it addresses
As mentioned in the section 'cloud native networking', in cloud native architectures, applications
are broken down into small components, or services, each placed in a container. You may have heard
of them referred to as [microservices](https://github.com/cncf/glossary/blob/main/content/en/microservices-architecture.md).
Instead of having one big app (often known as a 'monolith') you now have dozens or even hundreds
of (micro)services. And each of these services needs resources, monitoring, and fixing if a problem
occurs. While it may be feasible to do all those things manually for a single service, you'll need
automated processes when dealing with multiple services, each with its own containers.
#### How it helps
Container orchestrators automate container management. But what does that mean in practice? Let's
answer that for Kubernetes since it is the de facto container orchestrator.
Kubernetes does something called desired state reconciliation: it matches the current state of
containers within a cluster to the desired state. The desired state is specified by the engineer
(e.g. ten instances of service A running on three nodes, i.e. machines, with access to database B,
etc.) and continuously compared against the actual state. If the desired and actual state don't
match, Kubernetes reconciles them by creating or destroying objects. For example, if a container
crashes, Kubernetes will spin up a new one.
In short, Kubernetes allows you to treat a cluster as one computer. It focuses only on what that
environment should look like and handles the implementation details for you.
#### Technical 101
Kubernetes lives in the orchestration and scheduling section along with other less widely
adopted orchestrators like Docker Swarm and Mesos. It enables users to manage a number of
disparate computers as a single pool of resources in a declarative way.
Declarative configuration management in Kubernetes is handled via
[control loops](https://kubernetes.io/docs/concepts/architecture/controller/), a pattern in which
a process running in Kubernetes monitors the Kubernetes store for a particular object type and
ensures the actual state in the cluster matches the desired state.
As an example, a user creates a Kubernetes deployment that states there must be 3 copies of a web
application. The deployment controller will ensure that those 3 web application containers get
created then continue to monitor the cluster to see if the number of containers is correct. If a
particular container gets removed for any reason the deployment controller will cause a new
instance to be created. Alternatively if the deployment is modified to scale down to 1 web app
instance it will instruct Kubernetes to delete 2 of the running web apps.
This core controller pattern can also be used to extend Kubernetes by users or software developers.
The operator pattern allows people to write custom controllers for custom resources and build any
arbitrary logic, and automation, into kubernetes itself.
While Kubernetes isn't the only orchestrator the CNCF hosts (both Crossplane and Volcano are
incubating projects), it is the most commonly used and actively maintained orchestrator.
- subcategory: "Coordination & Service Discovery"
keywords:
- "DNS"
- "Service Discovery"
content: |
#### What it is
Modern applications are composed of multiple individual services that need to collaborate to
provide value to the end user. To collaborate, they communicate over a network (see cloud native
networking), and to communicate, they must first locate one another. Service discovery is the
process of figuring out how to do that.
#### Problem it addresses
Cloud native architectures are dynamic and fluid, meaning they are constantly changing. When a
container crashes on one node, a new container is spun up on a different node to replace it. Or,
when an app scales, replicas are spread out throughout the network. There is no one place where a
particular service is — the location of everything is constantly changing. Tools in this category
keep track of services within the network so services can find one another when needed.
#### How it helps
Service discovery tools address this problem by providing a common place to find and potentially
identify individual services. There are basically two types of tools in this category:
1. **Service discovery engines**: database-like tools that store information on all services and
how to locate them
2. **Name resolution tools**: tools that receive service location requests and return network
address information (e.g. CoreDNS)
> ##### INFOBOX
> In Kubernetes, to make a pod reachable a new abstraction layer called "service" is introduced.
> Services provide a single stable address for a dynamically changing group of pods.
>
> Please note that "service" may have different meanings in different contexts, which can be quite
> confusing. The term "services" generally refers to the service placed inside a container and pod.
> It's the app component or microservice with a specific function within the actual app, for
> example your mobile phone's face recognition algorithm.
>
> A Kubernetes service is the abstraction that helps pods find and connect to each other. It is an
> entry point for a service (functionality) as a collection of processes or pods. In Kubernetes,
> when you create a service (abstraction), you create a group of pods which together provide a
> service (functionality within one or more containers) with a single end point (entry point)
> which is the Kubernetes service.
#### Technical 101
As distributed systems became more and more prevalent, traditional DNS processes and traditional
load balancers were often unable to keep up with changing endpoint information. To make up for
these shortcomings, service discovery tools handle individual application instances rapidly
registering and deregistering themselves. Some options such as CoreDNS and etcd are CNCF projects
and are built into Kubernetes. Others have custom libraries or tools to allow services to operate
effectively.
- subcategory: "Remote Procedure Call"
keywords:
- "gRPC"
content: |
#### What it is
Remote Procedure Call (RPC) is a particular technique enabling applications to talk to each other.
It's one way of structuring app communication.
#### Problem it addresses
Modern apps are composed of numerous individual services that must communicate in order to
collaborate. RPC is one option for handling the communication between applications.
#### How it helps
RPC provides a tightly coupled and highly opinionated way of handling communication between
services. It allows for bandwidth-efficient communications and many programming languages enable
RPC interface implementations.
#### Technical 101
There are a lot of potential benefits with RPC: It makes coding connections easier, it allows for
extremely efficient use of the network layer and well-structured communications between services.
RPC has also been criticized for creating brittle connection points and forcing users to do
coordinated upgrades for multiple services. gRPC is a particularly popular RPC implementation and
has been adopted by the CNCF.
- subcategory: "Service Proxy"
keywords:
- "Service Proxy"
- "Ingress"
content: |
#### What it is
A service proxy is a tool that intercepts traffic to or from a given service, applies some logic to
it, then forwards that traffic to another service. It essentially acts as a "go-between" that can
collect information about network traffic as well as apply rules to it. This can be as simple as
serving as a load balancer that forwards traffic to individual applications or as complex as
an interconnected mesh of proxies running side by side with individual containerized applications
handling all network connections.
While a service proxy is useful in and of itself, especially when driving traffic from the broader
network into a Kubernetes cluster, service proxies are also building blocks for other systems, such
as API gateways or service meshes, which we'll discuss below.
#### Problem it addresses
Applications should send and receive network traffic in a controlled manner. To keep track of the
traffic and potentially transform or redirect it, we need to collect data. Traditionally, the code
enabling data collection and network traffic management was embedded within each application.
A service proxy "externalizes" this functionality. No longer does it have to live within the app.
Instead, it's embedded in the platform layer (where your apps run). This is incredibly powerful
because it allows developers to fully focus on writing your value-generating application code,
allowing the universal task of handling traffic to be managed by the platform team, whose
responsibility it should be in the first place. Centralizing the distribution and management of
globally needed service functionality such as routing or TLS termination from a single common
location allows communication between services to become more reliable, secure, and performant.
#### How it helps
Proxies act as gatekeepers between the user and services or between different services. With this
unique positioning, they provide insight into what type of communication is happening and can then
determine where to send a particular request or even deny it entirely.
Proxies gather critical data, manage routing (spreading traffic evenly among services or rerouting
if some services break down), encrypt connections, and cache content (reducing resource
consumption).
#### Technical 101
Service proxies work by intercepting traffic between services, applying logic on it, and allowing
it to move on if permitted. Centrally controlled capabilities embedded into proxies allow
administrators to accomplish several things. They can gather detailed metrics about inter-service
communication, protect services from being overloaded, and apply other common standards to
services, like mutual TLS. Service proxies are fundamental to other tools like service meshes as
they provide a way to enforce higher-level policies to all network traffic.
Please note, the CNCF includes load balancers and ingress providers into this category.
- subcategory: "API Gateway"
keywords:
- "API Gateway"
content: |
#### What it is
While humans generally interact with computer programs via a GUI (graphical user interface) such as
a web page or a desktop application, computers interact with each other through APIs
(application programming interfaces). But an API shouldn't be confused with an API gateway.
An API gateway allows organizations to move key functions, such as authorizing or limiting the
number of requests between applications, to a centrally managed location. It also functions as a
common interface to (often external) API consumers.
#### Problem it addresses
While most containers and core applications have an API, an API gateway is more than just an API.
An API gateway simplifies how organizations manage and apply rules to all interactions.
API gateways allow developers to write and maintain less custom code (the system functionality
is encoded into the API gateway, remember?). They also enable teams to see and control the
interactions between application users and the applications themselves.
#### How it helps
An API gateway sits between the users and the application. It acts as a go-between that takes the
messages (requests) from the users and forwards them to the appropriate service. But before handing
the request off, it evaluates whether the user is allowed to do what they're trying to do and
records details about who made the request and how many requests they've made.
Put simply, an API gateway provides a single point of entry with a common user interface for app
users. It also enables you to handoff tasks otherwise implemented within the app to the gateway,
saving developer time and money.
> ##### EXAMPLE
>
> Take Amazon store cards. To offer them, Amazon partners with a bank that will issue and manage
> all Amazon store cards. In return, the bank will keep, let's say, $1 per transaction. The bank
> will use an API gateway to authorize the retailer to request new cards, keep track of the number
> of transactions for billing, and maybe even restrict the number of requested cards per minute.
> All that functionality is encoded into the gateway, not the services using it. Services just
> worry about issuing cards.
#### Technical 101
Like proxies and service meshes (see below), an API gateway takes custom code out of our apps and
brings it into a central system. The API gateway works by intercepting calls to backend services,
performing some kind of value add activity like validating authorization, collecting metrics,
or transforming requests, then performing whatever action it deems appropriate.
API gateways serve as a common entry point for a set of downstream applications while at the same
time providing a place where teams can inject business logic to handle authorization, rate
limiting, and chargeback. They allow application developers to abstract away changes to their
downstream APIs from their customers and offload tasks like onboarding new customers to the gateway.
- subcategory: "Service Mesh"
keywords:
- "Service mesh"
- "Sidecar"
- "Data plane"
- "Control plane"
content: |
#### What it is
Service meshes manage traffic (i.e. communication) between services. They enable platform teams
to add reliability, observability, and security features uniformly across all services running
within a cluster without requiring any code changes.
Along with Kubernetes, service meshes have become some of the most critical infrastructure
components of the cloud native stack.
#### Problem it addresses
In a cloud native world, we are dealing with multiple services all needing to communicate. This
means a lot more traffic is going back and forth on an inherently unreliable and often slow
network. To address this new set of challenges, engineers must implement additional functionality.
Prior to the service mesh, that functionality had to be encoded into every single application.
This custom code often became a source of technical debt and provided new avenues for failures
or vulnerabilities.
#### How it helps
Service meshes add reliability, observability, and security features uniformly across all services
on a platform layer without touching the app code. They are compatible with any programming
language, allowing development teams to focus on writing business logic.
> ##### INFOBOX
>
> Since traditionally, these service mesh features had to be coded into each service, each time
> a new service was released or updated, the developer had to ensure these features were
> functional, too, providing a lot of room for human error. And here's a dirty little secret,
> developers prefer focusing on business logic (value-generating functionalities) rather than
> building reliability, observability, and security features.
>
> For the platform owners, on the other hand, these are core capabilities and central to everything
> they do. Making developers responsible for adding features that platform owners need is
> inherently problematic. This, by the way, also applies to general-purpose proxies and API
> gateways mentioned above. Service meshes and API gateways solve that very issue as they are
> implemented by the platform owners and applied universally across all services.
#### Technical 101
Service meshes bind all services running on a cluster together via service proxies creating a mesh
of services, hence service mesh. These are managed and controlled through the service mesh control
plane. Service meshes allow platform owners to perform common actions or collect data on
applications without having developers write custom logic.
In essence, a service mesh is an infrastructure layer that manages inter-service communications by
providing command and control signals to a network of service proxies (your mesh). Its power lies
in its ability to provide key system functionality without having to modify the applications.
Some service meshes use a general-purpose service proxy (see above) for their data plane. Others
use a dedicated proxy; Linkerd, for example, uses the [Linkerd2-proxy "micro proxy"](https://linkerd.io/)
to gain an advantage in performance and resource consumption. These proxies are uniformly attached
to each service through so-called sidecars. Sidecar refers to the fact that the proxy runs in its
own container but lives in the same pod. Just like a motorcycle sidecar, it's a separate module
attached to the motorcycle, following it wherever it goes.
> ##### EXAMPLE
>
> Take circuit breaking. In microservice environments, individual components often fail or begin
> running slowly. Without a service mesh, developers would have to write custom logic to handle
> downstream failures gracefully and potentially set cooldown timers to avoid upstream services
> to continually request responses from degraded or failed downstream services. With a service
> mesh, that logic is handled at a platform level.
>
> Service meshes provide many useful features, including the ability to surface detailed metrics,
> encrypt all traffic, limit what operations are authorized by what service, provide additional
> plugins for other tools, and much more. For more detailed information, check out the
> [service mesh interface](https://smi-spec.io/) specification.
- subcategory: "Summary"
content: |
As we've seen, tools in this layer deal with how all these independent containerized services are
managed as a group. Orchestration and scheduling tools can be thought of as a cluster OS
managing containerized applications across your cluster. Coordination and service discovery,
service proxies, and service meshes ensure services can find each other and communicate effectively
in order to collaborate as one cohesive app. API gateways are an additional layer providing even
more control over service communication, in particular between external applications. Next, we'll
discuss the application definition and development layer — the last layer of the CNCF landscape. It
covers databases, streaming and messaging, application definition, and image build, as well as
continuous integration and delivery.
- category: "App Definition and Development"
content: |
Everything we have discussed up to this point was related to building a reliable, secure environment
and providing all needed dependencies. We've now arrived at the top layer of the CNCF cloud
native landscape. As the name suggests, the application definition and development layer focuses
on the tools that enable engineers to build apps.
subcategories:
- subcategory: "Database"
keywords:
- "SQL"
- "DB"
- "Persistence"
content: |
#### What it is
A database is an application through which other apps can efficiently store and retrieve data.
Databases allow you to store data, ensure only authorized users access it, and enable users to
retrieve it via specialized requests. While there are numerous different types of databases with
different approaches, they ultimately all have these same overarching goals.
#### Problem it addresses
Most applications need an effective way to store and retrieve data while keeping that data safe.
Databases do this in a structured way with proven technology though there is quite a bit of
complexity that goes into doing this well.
#### How it helps
Databases provide a common interface for applications to store and retrieve data. Developers use
these standard interfaces and a relatively simple query language to store, query, and retrieve
information. At the same time, databases allow users to continuously backup and save data, as
well as encrypt and regulate access to it.
#### Technical 101
Databases are apps that store and retrieve data, using a common language and interface compatible
with a number of different languages and frameworks.
In general, there are two common types of databases: Structured query language (SQL) databases and
no-SQL databases. Which database a particular application uses should be driven by its needs and
constraints.
With the rise of Kubernetes and its ability to support stateful applications, we've seen a new
generation of databases take advantage of containerization. These new cloud native databases aim
to bring the scaling and availability benefits of Kubernetes to databases. Tools like YugabyteDB
and Couchbase are examples of cloud native databases, although more traditional databases like
MySQL and Postgres run successfully and effectively in Kubernetes clusters.
Vitess and TiKV are CNCF projects in this space.
> ##### INFOBOX
>
> If you look at this category, you'll notice multiple names ending in DB (e.g. MongoDB,
> CockroachDB, FaunaDB) which, as you may guess, stands for database. You'll also see various
> names ending in SQL (e.g. MySQL or memSQL) — they are still relevant. Some are "old school"
> databases that have been adapted to a cloud native reality. There are also some databases that
> are no-SQL but SQL compatible, such as YugabyteDB and Vitess.
- subcategory: "Streaming & Messaging"
keywords:
- "Choreography"
- "Streaming"
- "MQ"
- "Message bus"
content: |
#### What it is
To accomplish a common goal, services need to communicate with one another and keep each other in
the loop. Each time a service does something, it sends a message about that particular event.
Streaming and messaging tools enable service-to-service communication by transporting messages
(i.e. events) between systems. Individual services connect to the messaging service to either
publish events, read messages from other services, or both. This dynamic creates an environment
where individual apps are either publishers, meaning they write events, or subscribers that read
events, or more likely both.
#### Problem it addresses
As services proliferate, application environments become increasingly complex, making the
management of communication between apps more challenging. A streaming or messaging platform
provides a central place to publish and read all the events that occur within a system,
allowing applications to work together without necessarily knowing anything about one another.
#### How it helps
When a service does something other services should know about, it "publishes" an event to the
streaming or messaging tool. Services that need to know about these types of events “subscribe”
and watch the streaming or messaging tool. That's the essence of a publish-subscribe, or just
pub-sub, approach and is enabled by these tools.
By introducing a "go-between" layer that manages all communication, we are decoupling services
from one another. They simply watch for events, take action, and publish a new one.
Here's an example. When you first sign up for Netflix, the "signup" service publishes a "new signup
event" to a messaging platform with further details such as name, email address, subscription
level, etc. The "account creator" service, which subscribes to signup events, will see the event and
create your account. A "customer communication" service that also subscribes to new signup
events will add your email address to the customer mailing list and generate a welcome email,
and so on.
This allows for a highly decoupled architecture where services can collaborate without needing to
know about one another. This decoupling enables engineers to add new functionality without
updating downstream apps, known as consumers, or sending a bunch of queries. The more decoupled a
system is, the more flexible and amenable it is to change. And that is exactly what engineers
strive for in a system.
#### Technical 101
Messaging and streaming tools have been around long before cloud native became a thing. To
centrally manage business-critical events, organizations have built large enterprise service
buses. But when we talk about messaging and streaming in a cloud native context, we're generally
referring to tools like NATS, RabbitMQ, Kafka, or cloud provided message queues.
What these systems have in common are the architecture patterns they enable. Application
interactions in a cloud native environment are either orchestrated or choreographed. There's a
lot more to it, but let's just say that orchestrated refers to systems that are centrally managed,
and choreographed systems allow individual components to act independently.
Messaging and streaming systems provide a central place for choreographed systems to communicate.
The message bus provides a common place where all apps can go to tell others what they're doing
by publishing messages, or see what's going on by subscribing to messages.
The NATS and Cloudevents projects are both incubating CNCF projects in this space. NATS provides a
mature messaging system and Cloudevents is an effort to standardize message formats between
systems. Strimzi, Pravega, and Tremor are sandbox projects with each being tailored to a unique
use case around streaming and messaging.
- subcategory: "Application Definition & Image Build"
keywords:
- "Package Management"
- "Charts"
- "Operators"
content: |
#### What it is
Application definition and image build is a broad category that can be broken down into two main
subgroups. First, developer-focused tools that help build application code into containers and/or
Kubernetes. And second, operations-focused tools that deploy apps in a standardized way. Whether
you intend to speed up or simplify your development environment, provide a standardized way to
deploy third-party apps, or wish to simplify the process of writing a new Kubernetes extension,
this category serves as a catch-all for a number of projects and products that optimize the
Kubernetes developer and operator experience.
#### Problem it addresses
Kubernetes, and containerized environments more generally, are incredibly flexible and powerful.
With that flexibility also comes complexity, mainly in the form of multiple configuration options
as well as multiple demands for the various use cases. Developers need the ability to create
reproducible images when they containerize their code. Operators need a standardized way to deploy
apps into container environments, and finally, platform teams need to provide tools to simplify
image creation and application deployment, both for in-house and third party applications.
#### How it Helps
Tools in this space aim to solve some of these developer or operator challenges. On the developer
side, there are tools that simplify the process of extending Kubernetes to build, deploy, and
connect applications. A number of projects and products help to store or deploy pre-packaged apps.
These allow operators to quickly deploy a streaming service like NATS or Kafka or install a service
mesh like Linkerd.
Developing cloud native applications brings a whole new set of challenges calling for a large set
of diverse tools to simplify application build and deployments. As you start addressing operational
and developer concerns in your environment, look for tools in this category.
#### Technical 101
Application definition and build tools encompass a huge range of functionality. From extending
Kubernetes to virtual machines with KubeVirt, to speeding app development by allowing you to port
your development environment into Kubernetes with tools like Telepresence. At a high level, tools
in this space solve either developer-focused concerns, like how to correctly write, package, test,
or run custom apps, or operations-focused concerns, such as deploying and managing applications.
Helm, the only graduated project in this category, underpins many app deployment patterns. Helm
allows Kubernetes users to deploy and customize many popular third-party apps, and it has been
adopted by other projects like the Artifact Hub (a CNCF sandbox project). Companies like Bitnami
also provide curated catalogs of apps. Finally, Helm is flexible enough to allow users to customize
their own app deployments and is often used by organizations for their own internal releases.
The Operator Framework is an incubating project aimed at simplifying the process of building and
deploying operators. Operators are out of scope for this guide but let's note here that they help
deploy and manage apps, similar to Helm (you can read more about operators
[here](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)). Cloud Native Buildpacks,
another incubating project, aims to simplify the process of building application code into
containers.
There's a lot more in this space and exploring it all would require a dedicated chapter. But
research these tools further if you want to make Kubernetes easier for developers and operators.
You'll likely find something that meets your needs.
- subcategory: "Continuous Integration & Delivery"
keywords:
- "CI/CD"
- "Continuous integration"
- "Continuous delivery"
- "Continuous deployment"
- "Blue/green"
- "Canary deploy"
content: |
#### What it is
Continuous integration (CI) and continuous delivery (CD) tools enable fast and efficient development
with embedded quality assurance. CI automates code changes by immediately building and testing the
code, ensuring it produces a deployable artifact. CD goes one step further and pushes the artifact
through the deployment phases.