You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Step 8&9: Error from server (NotFound): services "topology-monitor" not found.
I couldn't find any instruction for topology-monitor. If there are any, please let me know.
Some astronomy_shop pods also got OOMKilled.
It also prevents deployment.
How I patch this:
adService: increase the memory limit to 500Mi.
frauddetectionService: increase the memory limit to 500Mi.
kafka: increase the memory limit to 700Mi.
recommendationService: increase the memory limit to 700Mi.
In my test on astronomy shop, after patching (including elasticsearch), the total memory usage is 16GiB-17GiB, which is close to what the local cluster setup mentioned.
The deployment took 15min, which is too long and not debug-friendly. Is there a way to disable certain options for a faster deployment?
Grafana: I followed the instruction and still found errors in grafana page (http://localhost:8080/prometheus/alerting/list):
Errors loading rules
Failed to load the data source configuration for loki: Unable to fetch alert rules. Is the loki data source properly configured?
Environment
Issues (Based on the SRE scenario)
ansible-galaxy install -r requirements.yaml
does not able to install without--force
option on my machine.nano group_vars/all.yaml
orsed xxx
to the code block in step 1.value: data.resourcesPreset="large"
.data.resourcesPreset
is set tomedium
by default.large
.http://localhost:8080/prometheus/alerting/list
):install_chaos_mesh.yaml
.INCIDENT_NUMBER=26 make remove_incident_fault
)The text was updated successfully, but these errors were encountered: