Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose release 1.5.4 #534

Merged
merged 49 commits into from
Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
87686b0
Fix xrefs for director Operator (#481)
leifmadsen Jun 19, 2023
be5cec0
Initial pass for external ES (#483)
csibbitt Aug 22, 2023
982cec2
Trivial leftover suggestions (#485)
csibbitt Aug 22, 2023
1f7aca7
Link ES section to KB article (#486)
csibbitt Sep 5, 2023
564130c
Initial changes to installation for STF 1.5.3 (#484)
leifmadsen Sep 5, 2023
4856eda
use_redhat and migration link (#462)
csibbitt Sep 7, 2023
b248a57
Override qdr::router_id defaults in stf-connectors (#487)
leifmadsen Sep 11, 2023
563a551
Don't enable event collection by default on OSP (#488)
leifmadsen Sep 12, 2023
72d91b6
No longer import the events dashboard (#490)
leifmadsen Sep 13, 2023
e2ba966
Installation of cluster monitoring is no longer necessary (#491)
leifmadsen Sep 14, 2023
ae4bbb9
Adjust the default polling interval for collectd (#489)
leifmadsen Sep 21, 2023
a17f01e
Remove logs configuration from sample CR (#493)
leifmadsen Sep 26, 2023
720ac91
mg_master_RHOSPDOC-1380_chunk-installation-procedure (#492)
mickogeary Sep 26, 2023
428b8ae
Reduce the number of Ceilometer pollsters (#497)
leifmadsen Sep 28, 2023
bd2ba0c
Deprecate the use of high availability mode in STF (#494)
leifmadsen Sep 28, 2023
56ba82e
Fix up the table syntax in Observability Strategy (#495)
leifmadsen Sep 28, 2023
24f9d92
Do not manage the event pipeline by default (#498)
leifmadsen Sep 28, 2023
e99b8a1
Minor clean up and user experience updates (#496)
leifmadsen Oct 2, 2023
ac64c72
Creating an alert does not use curl (#500)
leifmadsen Oct 3, 2023
a5a82aa
Eliminate duplicate line (#501)
csibbitt Oct 4, 2023
3e400fc
Adding details for QDR password auth (#502)
csibbitt Oct 12, 2023
13e7f21
Support OCP versions 4.12 through 4.14 (#503)
leifmadsen Oct 24, 2023
3802f4a
Summary: Replace incorrect stf-connectors.yaml filename with enable-s…
rheslop Oct 24, 2023
f94a996
Clean up the STF install (#505)
leifmadsen Oct 24, 2023
741aba4
Provide the preferred STF object for deployment (#507)
leifmadsen Oct 24, 2023
51e894c
Fix various RHOSP links and versions (#508)
leifmadsen Oct 25, 2023
0c88867
Update and adjust dashboard procedures (#509)
leifmadsen Oct 25, 2023
0c5a236
Add deprecation note for Grafana authentication (#510)
leifmadsen Oct 25, 2023
c8386ab
Update deprecated Grafana login warning (#511)
leifmadsen Oct 25, 2023
e2f1961
Add updated architecture diagrams (#499)
leifmadsen Nov 3, 2023
9c79b75
Update install guide for dependent operators (#513)
leifmadsen Nov 23, 2023
b7ef9b6
Clean up the prerequisites lists (#514)
leifmadsen Nov 23, 2023
0cad5de
Add removal instructions for COO (#516)
leifmadsen Nov 28, 2023
66bd308
Refer to cert-manager removal documentation (#515)
leifmadsen Nov 28, 2023
86aec74
Pre-STF 1.5.3 Documentation Walkthrough and Cleanup (#517)
leifmadsen Nov 30, 2023
bd3472c
Modularize STF architecture changes (#518)
leifmadsen Nov 30, 2023
d098587
Update diagrams for Cluster Observability Operator (#519)
leifmadsen Dec 1, 2023
c1d1be8
mg_master_517_minor-style-edits (#521)
mickogeary Dec 5, 2023
0e56add
Reference 17.1 in docinfo.xml (#522)
leifmadsen Dec 5, 2023
36d54d1
PrometheusRules must reference monitoring.rhobs (#523)
leifmadsen Dec 5, 2023
eea657b
Basic Auth in Grafana no longer supported (#525)
csibbitt Dec 7, 2023
3b88889
Adjust prometheus query to use token (#520)
csibbitt Dec 12, 2023
e534bf3
Update installation to target Grafana Operator v5 (#526)
leifmadsen Jan 30, 2024
0d5626e
Add enable dashboard procedure (#527)
leifmadsen Jan 31, 2024
8e1b8fa
Update OCP version support status (#529)
leifmadsen Feb 8, 2024
d8e2c12
Update required resource permission reference (#528)
leifmadsen Feb 8, 2024
8477306
Drop unused module found in other issue (#533)
leifmadsen Mar 1, 2024
fe90630
mg-master_RHOSPDOC-1200_STF-disconnected (#531)
mickogeary Mar 5, 2024
9f9f050
Merge branch 'origin/master' into stable-1.5
elfiesmelfie Mar 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions common/global/stf-attributes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ endif::[]
ifeval::["{build}" == "upstream"]
:ObservabilityOperator: Observability{nbsp}Operator
:OpenShift: OpenShift
:OpenShiftShort: OKD
:OpenStack: OpenStack
:OpenStackShort: OSP
:OpenStackVersion: Wallaby
Expand All @@ -58,7 +57,6 @@ endif::[]
ifeval::["{build}" == "downstream"]
:ObservabilityOperator: Cluster{nbsp}Observability{nbsp}Operator
:OpenShift: Red{nbsp}Hat{nbsp}OpenShift{nbsp}Container{nbsp}Platform
:OpenShiftShort: OCP
:OpenStack: Red{nbsp}Hat{nbsp}OpenStack{nbsp}Platform
:OpenStackShort: RHOSP
:OpenStackVersion: 17.1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,11 @@ ifdef::include_when_16[]
* xref:container-health-and-api-status_assembly-advanced-features[Monitoring container health and API status]
endif::include_when_16[]


//Dashboards
include::../modules/con_dashboards.adoc[leveloffset=+1]
include::../modules/proc_setting-up-grafana-to-host-the-dashboard.adoc[leveloffset=+2]
ifdef::include_when_16[]
// TODO: either rewrite or drop this procedure. We now provide the preferred downstream RHEL Grafana workload image in the deployment procedure.
//include::../modules/proc_overriding-the-default-grafana-container-image.adoc[leveloffset=+2]
include::../modules/proc_importing-dashboards.adoc[leveloffset=+2]
endif::include_when_16[]
include::../modules/proc_retrieving-and-setting-grafana-login-credentials.adoc[leveloffset=+2]

include::../modules/proc_connecting-an-external-dashboard-system.adoc[leveloffset=+2]

//Editing the metrics retention time period
include::../modules/con_metrics-retention-time-period.adoc[leveloffset=+1]
Expand Down Expand Up @@ -69,13 +63,10 @@ include::../modules/con_resource-usage-of-openstack.adoc[leveloffset=+1]
include::../modules/proc_disabling-resource-usage-monitoring-of-openstack-services.adoc[leveloffset=+2]

//Monitoring container health

include::../modules/con_container-health-and-api-status.adoc[leveloffset=+1]
include::../modules/proc_disabling-container-health-and-api-status-monitoring.adoc[leveloffset=+2]
endif::include_when_16[]



//reset the context
ifdef::parent-context[:context: {parent-context}]
ifndef::parent-context[:!context:]
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,18 @@ ifeval::["{SupportedOpenShiftVersion}" == "{NextSupportedOpenShiftVersion}"]
* {OpenShift} version {SupportedOpenShiftVersion} is running.
endif::[]
ifeval::["{SupportedOpenShiftVersion}" != "{NextSupportedOpenShiftVersion}"]
* An {OpenShift} version inclusive of {SupportedOpenShiftVersion} through {NextSupportedOpenShiftVersion} is running.
* An {OpenShift} Extended Update Support (EUS) release version {SupportedOpenShiftVersion} or {NextSupportedOpenShiftVersion} is running.
endif::[]
* You have prepared your {OpenShift} environment and ensured that there is persistent storage and enough resources to run the {ProjectShort} components on top of the {OpenShift} environment. For more information about {ProjectShort} performance, see the Red Hat Knowledge Base article https://access.redhat.com/articles/4907241[Service Telemetry Framework Performance and Scaling].
* Your environment is fully connected. {ProjectShort} does not work in a {OpenShift}-disconnected environments or network proxy environments.
* You have deployed {ProjectShort} in a fully connected or {OpenShift}-disconnected environments. {ProjectShort} is unavailable in network proxy environments.

ifeval::["{build}" == "downstream"]
[IMPORTANT]
ifeval::["{SupportedOpenShiftVersion}" == "{NextSupportedOpenShiftVersion}"]
{ProjectShort} is compatible with {OpenShift} version {SupportedOpenShiftVersion}
endif::[]
ifeval::["{SupportedOpenShiftVersion}" != "{NextSupportedOpenShiftVersion}"]
{ProjectShort} is compatible with {OpenShift} version {SupportedOpenShiftVersion} through {NextSupportedOpenShiftVersion}.
{ProjectShort} is compatible with {OpenShift} versions {SupportedOpenShiftVersion} and {NextSupportedOpenShiftVersion}.
endif::[]
endif::[]

Expand All @@ -42,6 +42,7 @@ endif::[]
* For more information about Operator catalogs, see https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/operators/understanding/olm-rh-catalogs.html[_Red Hat-provided Operator catalogs_].
* For more information about the cert-manager Operator for Red Hat, see https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/security/cert_manager_operator/index.html[_cert-manager Operator for Red Hat OpenShift overview_].
* For more information about {ObservabilityOperator}, see https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/monitoring/cluster_observability_operator/cluster-observability-operator-overview.html[_Cluster Observability Operator Overview_].
* For more information about OpenShift life cycle policy and Extended Update Support (EUS), see https://access.redhat.com/support/policy/updates/openshift[_Red Hat OpenShift Container Platform Life Cycle Policy_].

include::../modules/con_deploying-stf-to-the-openshift-environment.adoc[leveloffset=+1]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ ifeval::["{SupportedOpenShiftVersion}" == "{NextSupportedOpenShiftVersion}"]
{ProjectShort} is compatible with {OpenShift} version {SupportedOpenShiftVersion}
endif::[]
ifeval::["{SupportedOpenShiftVersion}" != "{NextSupportedOpenShiftVersion}"]
{ProjectShort} is compatible with {OpenShift} version {SupportedOpenShiftVersion} through {NextSupportedOpenShiftVersion}.
{ProjectShort} is compatible with {OpenShift} Extended Update Support (EUS) release versions {SupportedOpenShiftVersion} and {NextSupportedOpenShiftVersion}.
endif::[]
endif::[]

Expand All @@ -40,6 +40,7 @@ endif::[]
* https://access.redhat.com/documentation/en-us/openshift_container_platform/{NextSupportedOpenShiftVersion}/[{OpenShift} product documentation]
* https://access.redhat.com/articles/4907241[Service Telemetry Framework Performance and Scaling]
* https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/welcome/index.html#cluster-installer-activities[OpenShift Container Platform {NextSupportedOpenShiftVersion} Documentation]
* https://access.redhat.com/support/policy/updates/openshift[Red Hat OpenShift Container Platform Life Cycle Policy]



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ To prepare your {OpenShift} environment for {Project} ({ProjectShort}), you must

* Ensure that you have persistent storage available in your {OpenShift} cluster for a production-grade deployment. For more information, see <<persistent-volumes_assembly-preparing-your-ocp-environment-for-stf>>.
* Ensure that enough resources are available to run the Operators and the application containers. For more information, see <<resource-allocation_assembly-preparing-your-ocp-environment-for-stf>>.
* Ensure that you have a fully connected network environment. For more information, see xref:con-network-considerations-for-service-telemetry-framework_assembly-preparing-your-ocp-environment-for-stf[].

include::../modules/con_observability-strategy.adoc[leveloffset=+1]
include::../modules/con_persistent-volumes.adoc[leveloffset=+1]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ Use the third-party application, Grafana, to visualize system-level metrics that
For more information about configuring data collectors, see xref:configuring-red-hat-openstack-platform-overcloud-for-stf_assembly-completing-the-stf-configuration[].

ifdef::include_when_16[]
//TODO: can re-work this once we have OSP13 dashboard(s) to show. Can't use container health checks or monitoring in OSP13.
You can use dashboards to monitor a cloud:

Infrastructure dashboard::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
= Customizing the deployment

[role="_abstract"]
The Service Telemetry Operator watches for a `ServiceTelemetry` manifest to load into {OpenShift} ({OpenShiftShort}). The Operator then creates other objects in memory, which results in the dependent Operators creating the workloads they are responsible for managing.
The Service Telemetry Operator watches for a `ServiceTelemetry` manifest to load into {OpenShift}. The Operator then creates other objects in memory, which results in the dependent Operators creating the workloads they are responsible for managing.

[WARNING]
====
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
[id="con-network-considerations-for-service-telemetry-framework_{context}"]
= Network considerations for Service Telemetry Framework

You can only deploy {Project} ({ProjectShort}) in a fully connected network environment. You cannot deploy {ProjectShort} in {OpenShift}-disconnected environments or network proxy environments.
You can deploy {Project} ({ProjectShort}) in fully connected network environments or in {OpenShift}-disconnected environments. You cannot deploy {ProjectShort} in network proxy environments.
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,12 @@ ifeval::["{SupportedOpenShiftVersion}" == "{NextSupportedOpenShiftVersion}"]
* {OpenShift} {SupportedOpenShiftVersion}
endif::[]
ifeval::["{SupportedOpenShiftVersion}" != "{NextSupportedOpenShiftVersion}"]
* {OpenShift} {SupportedOpenShiftVersion} through {NextSupportedOpenShiftVersion}
* {OpenShift} Extended Update Support (EUS) releases {SupportedOpenShiftVersion} and {NextSupportedOpenShiftVersion}
endif::[]
* Infrastructure platform

For more information about the {OpenShift} EUS releases, see link:https://access.redhat.com/support/policy/updates/openshift[Red Hat OpenShift Container Platform Life Cycle Policy].

[[osp-stf-server-side-monitoring]]
.Server-side STF monitoring infrastructure
image::363_OpenStack_STF_updates_0923_deployment_prereq.png[Server-side STF monitoring infrastructure]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@
[role="_abstract"]
Red Hat supports the core Operators and workloads, including {MessageBus}, {ObservabilityOperator} (Prometheus, Alertmanager), Service Telemetry Operator, and Smart Gateway Operator. Red Hat does not support the community Operators or workload components, inclusive of Elasticsearch, Grafana, and their Operators.

You can only deploy {ProjectShort} in a fully connected network environment. You cannot deploy {ProjectShort} in {OpenShift}-disconnected environments or network proxy environments.
You can deploy {Project} ({ProjectShort}) in fully connected network environments or in {OpenShift}-disconnected environments. You cannot deploy {ProjectShort} in network proxy environments.

For more information about {ProjectShort} life cycle and support status, see the https://access.redhat.com/node/6225361[{Project} Supported Version Matrix].
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,18 @@
[role="_abstract"]
In {OpenShift}, applications are exposed to the external network through a route. For more information about routes, see https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/networking/configuring_ingress_cluster_traffic/overview-traffic.html[Configuring ingress cluster traffic].

In {Project} ({ProjectShort}), HTTPS routes are exposed for each service that has a web-based interface. These routes are protected by {OpenShift} RBAC and any user that has a `ClusterRoleBinding` that enables them to view {OpenShift} Namespaces can log in. For more information about RBAC, see https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/authentication/using-rbac.html[Using RBAC to define and apply permissions].
In {Project} ({ProjectShort}), HTTPS routes are exposed for each service that has a web-based interface and protected by {OpenShift} role-based access control (RBAC).

You need the following permissions to access the corresponding component UI's:

[source,json,options="nowrap"]
----
{"namespace":"service-telemetry", "resource":"grafana", "group":"grafana.integreatly.org", "verb":"get"}
{"namespace":"service-telemetry", "resource":"prometheus", "group":"monitoring.rhobs", "verb":"get"}
{"namespace":"service-telemetry", "resource":"alertmanager", "group":"monitoring.rhobs", "verb":"get"}
----

For more information about RBAC, see https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/authentication/using-rbac.html[Using RBAC to define and apply permissions].

.Procedure

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ EOF
+
[source,bash]
----
$ for o in alertmanager/default prometheus/default elasticsearch/elasticsearch grafana/default; do oc delete $o; done
$ for o in alertmanagers.monitoring.rhobs/default prometheuses.monitoring.rhobs/default elasticsearch/elasticsearch grafana/default-grafana; do oc delete $o; done
----
+
. To verify that all workloads are operating correctly, view the pods and the status of each pod:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@

[id="connecting-an-external-dashboard-system_{context}"]
= Connecting an external dashboard system

It is possible to configure third-party visualization tools to connect to the {ProjectShort} Prometheus for metrics retrieval. Access is controlled via an OAuth token, and a ServiceAccount is already created that has (only) the required permissions. A new OAuth token can be generated against this account for the external system to use.

To use the authentication token, the third-party tool must be configured to supply an HTTP Bearer Token Authorization header as described in RFC6750. Consult the documentation of the third-party tool for how to configure this header. For example link:https://grafana.com/docs/grafana/latest/datasources/prometheus/configure-prometheus-data-source/#custom-http-headers[Configure Prometheus - Custom HTTP Headers] in the _Grafana Documentation_.

.Procedure

. Log in to {OpenShift}.

. Change to the `service-telemetry` namespace:
+
[source,bash]
----
$ oc project service-telemetry
----

. Create a new token secret for the stf-prometheus-reader service account
+
[source,bash]
----
$ oc create -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: my-prometheus-reader-token
namespace: service-telemetry
annotations:
kubernetes.io/service-account.name: stf-prometheus-reader
type: kubernetes.io/service-account-token
EOF
----

. Retrieve the token from the secret
+
[source,bash]
----
$ TOKEN=$(oc get secret my-prometheus-reader-token -o template='{{.data.token}}' | base64 -d)
----

. Retrieve the Prometheus host name
+
[source,bash]
----
$ PROM_HOST=$(oc get route default-prometheus-proxy -ogo-template='{{ .spec.host }}')
----

. Test the access token
+
[source,bash]
----
$ curl -k -H "Authorization: Bearer ${TOKEN}" https://${PROM_HOST}/api/v1/query?query=up

{"status":"success",[...]
----

. Configure your third-party tool with the PROM_HOST and TOKEN values from above
+
[source,bash]
----
$ echo $PROM_HOST
$ echo $TOKEN
----

. The token remains valid as long as the secret exists. You can revoke the token by deleting the secret.
+
[source,bash]
----
$ oc delete secret my-prometheus-reader-token
secret "my-prometheus-reader-token" deleted
----

.Additional information

For more information about service account token secrets, see link:https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/nodes/pods/nodes-pods-secrets.html#nodes-pods-secrets-creating-sa_nodes-pods-secrets[Creating a service account token secret] in the _OpenShift Container Platform Documentation_.
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ To change the rule, edit the value of the `expr` parameter.
+
[source,bash,options="nowrap"]
----
$ curl -k --user "internal:$(oc get secret default-prometheus-htpasswd -ogo-template='{{ .data.password | base64decode }}')" https://$(oc get route default-prometheus-proxy -ogo-template='{{ .spec.host }}')/api/v1/rules
$ curl -k -H "Authorization: Bearer $(oc create token stf-prometheus-reader)" https://$(oc get route default-prometheus-proxy -ogo-template='{{ .spec.host }}')/api/v1/rules

{"status":"success","data":{"groups":[{"name":"./openstack.rules","file":"/etc/prometheus/rules/prometheus-default-rulefiles-0/service-telemetry-prometheus-alarm-rules.yaml","rules":[{"state":"inactive","name":"Collectd metrics receive count is zero","query":"rate(sg_total_collectd_msg_received_count[1m]) == 0","duration":0,"labels":{},"annotations":{},"alerts":[],"health":"ok","evaluationTime":0.00034627,"lastEvaluation":"2021-12-07T17:23:22.160448028Z","type":"alerting"}],"interval":30,"evaluationTime":0.000353787,"lastEvaluation":"2021-12-07T17:23:22.160444017Z"}]}}
----
Expand Down
Loading
Loading