From a302d6c501eb2979b4373095f310ccf3210d15eb Mon Sep 17 00:00:00 2001 From: DeDe Morton Date: Fri, 22 Nov 2024 11:58:33 -0800 Subject: [PATCH] Apply IA changes to serverless docs (#4532) * Apply IA changes to serverless docs * Remove inline comments * Apply changes from colleenmcginnis (cherry picked from commit 5b8130767b56a3d277281c0faf35608a5e7f340c) # Conflicts: # docs/en/serverless/ai-assistant/ai-assistant.asciidoc # docs/en/serverless/dashboards/dashboards-and-visualizations.asciidoc # docs/en/serverless/index.asciidoc # docs/en/serverless/infra-monitoring/infra-monitoring.asciidoc # docs/en/serverless/logging/log-monitoring.asciidoc # docs/en/serverless/machine-learning/aiops-analyze-spikes.asciidoc # docs/en/serverless/machine-learning/aiops-detect-anomalies.asciidoc # docs/en/serverless/machine-learning/aiops-detect-change-points.asciidoc # docs/en/serverless/machine-learning/aiops-forecast-anomaly.asciidoc # docs/en/serverless/machine-learning/aiops-tune-anomaly-detection-job.asciidoc # docs/en/serverless/machine-learning/machine-learning.asciidoc # docs/en/serverless/monitor-datasets.asciidoc # docs/en/serverless/observability-overview.asciidoc # docs/en/serverless/quickstarts/collect-data-with-aws-firehose.asciidoc # docs/en/serverless/quickstarts/k8s-logs-metrics.asciidoc # docs/en/serverless/quickstarts/monitor-hosts-with-elastic-agent.asciidoc # docs/en/serverless/redirects.asciidoc # docs/en/serverless/reference/elastic-entity-model.asciidoc # docs/en/serverless/reference/metrics-app-fields.asciidoc # docs/en/serverless/slos/slos.asciidoc # docs/en/serverless/what-is-observability-serverless.asciidoc --- docs/en/observability/create-alerts.asciidoc | 2 +- docs/en/observability/slo-overview.asciidoc | 6 +- .../ai-assistant/ai-assistant.asciidoc | 354 ++++++++++++++++++ ...pplication-and-service-monitoring.asciidoc | 17 + .../dashboards-and-visualizations.asciidoc | 44 +++ docs/en/serverless/images/get-started.svg | 21 ++ .../serverless/incident-management.asciidoc | 19 + docs/en/serverless/index.asciidoc | 225 +++++++++++ .../infra-monitoring.asciidoc | 30 ++ ...ructure-and-host-monitoring-intro.asciidoc | 20 + .../logging/log-monitoring.asciidoc | 120 ++++++ .../aiops-analyze-spikes.asciidoc | 71 ++++ .../aiops-detect-anomalies.asciidoc | 273 ++++++++++++++ .../aiops-detect-change-points.asciidoc | 68 ++++ .../aiops-forecast-anomaly.asciidoc | 45 +++ .../aiops-tune-anomaly-detection-job.asciidoc | 184 +++++++++ .../machine-learning.asciidoc | 26 ++ docs/en/serverless/monitor-datasets.asciidoc | 76 ++++ .../observability-get-started.asciidoc | 78 ++++ .../observability-overview.asciidoc | 149 ++++++++ .../collect-data-with-aws-firehose.asciidoc | 130 +++++++ .../quickstarts/k8s-logs-metrics.asciidoc | 51 +++ .../monitor-hosts-with-elastic-agent.asciidoc | 126 +++++++ docs/en/serverless/redirects.asciidoc | 14 + docs/en/serverless/reference.asciidoc | 7 + .../reference/elastic-entity-model.asciidoc | 57 +++ .../reference/metrics-app-fields.asciidoc | 295 +++++++++++++++ docs/en/serverless/slos/slos.asciidoc | 104 +++++ .../what-is-observability-serverless.asciidoc | 27 ++ 29 files changed, 2633 insertions(+), 6 deletions(-) create mode 100644 docs/en/serverless/ai-assistant/ai-assistant.asciidoc create mode 100644 docs/en/serverless/application-and-service-monitoring.asciidoc create mode 100644 docs/en/serverless/dashboards/dashboards-and-visualizations.asciidoc create mode 100644 docs/en/serverless/images/get-started.svg create mode 100644 docs/en/serverless/incident-management.asciidoc create mode 100644 docs/en/serverless/index.asciidoc create mode 100644 docs/en/serverless/infra-monitoring/infra-monitoring.asciidoc create mode 100644 docs/en/serverless/infrastructure-and-host-monitoring-intro.asciidoc create mode 100644 docs/en/serverless/logging/log-monitoring.asciidoc create mode 100644 docs/en/serverless/machine-learning/aiops-analyze-spikes.asciidoc create mode 100644 docs/en/serverless/machine-learning/aiops-detect-anomalies.asciidoc create mode 100644 docs/en/serverless/machine-learning/aiops-detect-change-points.asciidoc create mode 100644 docs/en/serverless/machine-learning/aiops-forecast-anomaly.asciidoc create mode 100644 docs/en/serverless/machine-learning/aiops-tune-anomaly-detection-job.asciidoc create mode 100644 docs/en/serverless/machine-learning/machine-learning.asciidoc create mode 100644 docs/en/serverless/monitor-datasets.asciidoc create mode 100644 docs/en/serverless/observability-get-started.asciidoc create mode 100644 docs/en/serverless/observability-overview.asciidoc create mode 100644 docs/en/serverless/quickstarts/collect-data-with-aws-firehose.asciidoc create mode 100644 docs/en/serverless/quickstarts/k8s-logs-metrics.asciidoc create mode 100644 docs/en/serverless/quickstarts/monitor-hosts-with-elastic-agent.asciidoc create mode 100644 docs/en/serverless/redirects.asciidoc create mode 100644 docs/en/serverless/reference.asciidoc create mode 100644 docs/en/serverless/reference/elastic-entity-model.asciidoc create mode 100644 docs/en/serverless/reference/metrics-app-fields.asciidoc create mode 100644 docs/en/serverless/slos/slos.asciidoc create mode 100644 docs/en/serverless/what-is-observability-serverless.asciidoc diff --git a/docs/en/observability/create-alerts.asciidoc b/docs/en/observability/create-alerts.asciidoc index 270a4c948c..7622d64633 100644 --- a/docs/en/observability/create-alerts.asciidoc +++ b/docs/en/observability/create-alerts.asciidoc @@ -11,7 +11,7 @@ Alerting enables you to detect complex conditions defined by a *rule* within the Applications, Logs, Infrastructure, Synthetics, and Uptime UIs. When a condition is met, the rule tracks it as an *alert* and responds by triggering one or more *actions*. -Alerts and rules related to service level objectives (SLOs), and {observability} apps, including Applications, Logs, Infrastructure, Synthetics, and Uptime, can be managed in the {observability} UI. +Alerts and rules related to service-level objectives (SLOs), and {observability} apps, including Applications, Logs, Infrastructure, Synthetics, and Uptime, can be managed in the {observability} UI. You can also manage {observability} app rules alongside rules for other apps from the {kibana-ref}/create-and-manage-rules.html[{kib} Management UI]. [discrete] diff --git a/docs/en/observability/slo-overview.asciidoc b/docs/en/observability/slo-overview.asciidoc index a9a9832e9f..968c246f7b 100644 --- a/docs/en/observability/slo-overview.asciidoc +++ b/docs/en/observability/slo-overview.asciidoc @@ -1,9 +1,5 @@ [[slo]] -= SLOs - -++++ -Service-level objectives (SLOs) -++++ += Service-level objectives (SLOs) // tag::slo-license[] [IMPORTANT] diff --git a/docs/en/serverless/ai-assistant/ai-assistant.asciidoc b/docs/en/serverless/ai-assistant/ai-assistant.asciidoc new file mode 100644 index 0000000000..d630d02ef6 --- /dev/null +++ b/docs/en/serverless/ai-assistant/ai-assistant.asciidoc @@ -0,0 +1,354 @@ +[[observability-ai-assistant]] += {observability} AI Assistant + +// :keywords: serverless, observability, overview + +The AI Assistant uses generative AI to provide: + +* **Chat**: Have conversations with the AI Assistant. Chat uses function calling to request, analyze, and visualize your data. +* **Contextual insights**: Open prompts throughout {obs-serverless} that explain errors and messages and suggest remediation. + +[role="screenshot"] +image::images/ai-assistant-overview.gif[Observability AI assistant preview, 60%] + +The AI Assistant integrates with your large language model (LLM) provider through our supported Elastic connectors: + +* {kibana-ref}/openai-action-type.html[OpenAI connector] for OpenAI or Azure OpenAI Service. +* {kibana-ref}/bedrock-action-type.html[Amazon Bedrock connector] for Amazon Bedrock, specifically for the Claude models. +* {kibana-ref}/gemini-action-type.html[Google Gemini connector] for Google Gemini. + +[IMPORTANT] +==== +The AI Assistant is powered by an integration with your large language model (LLM) provider. +LLMs are known to sometimes present incorrect information as if it's correct. +Elastic supports configuration and connection to the LLM provider and your knowledge base, +but is not responsible for the LLM's responses. +==== + +[IMPORTANT] +==== +Also, the data you provide to the Observability AI assistant is _not_ anonymized, and is stored and processed by the third-party AI provider. This includes any data used in conversations for analysis or context, such as alert or event data, detection rule configurations, and queries. Therefore, be careful about sharing any confidential or sensitive details while using this feature. +==== + +[discrete] +[[observability-ai-assistant-requirements]] +== Requirements + +The AI assistant requires the following: + +* An account with a third-party generative AI provider that preferably supports function calling. +If your AI provider does not support function calling, you can configure AI Assistant settings under **Project settings** → **Management** → **AI Assistant for Observability Settings** to simulate function calling, but this might affect performance. ++ +Refer to the {kibana-ref}/action-types.html[connector documentation] for your provider to learn about supported and default models. +* The knowledge base requires a 4 GB {ml} node. + +[IMPORTANT] +==== +The free tier offered by third-party generative AI providers may not be sufficient for the proper functioning of the AI assistant. +In most cases, a paid subscription to one of the supported providers is required. +The Observability AI assistant doesn't support connecting to a private LLM. +Elastic doesn't recommend using private LLMs with the Observability AI assistant. +==== + +[discrete] +[[observability-ai-assistant-your-data-and-the-ai-assistant]] +== Your data and the AI Assistant + +Elastic does not use customer data for model training. This includes anything you send the model, such as alert or event data, detection rule configurations, queries, and prompts. However, any data you provide to the AI Assistant will be processed by the third-party provider you chose when setting up the OpenAI connector as part of the assistant setup. + +Elastic does not control third-party tools, and assumes no responsibility or liability for their content, operation, or use, nor for any loss or damage that may arise from your using such tools. Please exercise caution when using AI tools with personal, sensitive, or confidential information. Any data you submit may be used by the provider for AI training or other purposes. There is no guarantee that the provider will keep any information you provide secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use. + +[discrete] +[[observability-ai-assistant-set-up-the-ai-assistant]] +== Set up the AI Assistant + +To set up the AI Assistant: + +. Create an authentication key with your AI provider to authenticate requests from the AI Assistant. You'll use this in the next step. Refer to your provider's documentation for information about creating authentication keys: ++ +** https://platform.openai.com/docs/api-reference[OpenAI API keys] +** https://learn.microsoft.com/en-us/azure/cognitive-services/openai/reference[Azure OpenAI Service API keys] +** https://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html[Amazon Bedrock authentication keys and secrets] +** https://cloud.google.com/iam/docs/keys-list-get[Google Gemini service account keys] +. From **Project settings** → **Management** → **Connectors**, create a connector for your AI provider: ++ +** {kibana-ref}/openai-action-type.html[OpenAI] +** {kibana-ref}/bedrock-action-type.html[Amazon Bedrock] +** {kibana-ref}/gemini-action-type.html[Google Gemini] +. Authenticate communication between {obs-serverless} and the AI provider by providing the following information: ++ +.. In the **URL** field, enter the AI provider's API endpoint URL. +.. Under **Authentication**, enter the key or secret you created in the previous step. + +[discrete] +[[observability-ai-assistant-add-data-to-the-ai-assistant-knowledge-base]] +== Add data to the AI Assistant knowledge base + +[IMPORTANT] +==== +**If you started using the AI Assistant in technical preview**, +any knowledge base articles you created using ELSER v1 will need to be reindexed or upgraded before they can be used. +Going forward, you must create knowledge base articles using ELSER v2. +You can either: + +* Clear all old knowledge base articles manually and reindex them. +* Upgrade all knowledge base articles indexed with ELSER v1 to ELSER v2 using a https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/model-upgrades/upgrading-index-to-use-elser.ipynb[Python script]. +==== + +The AI Assistant uses {ml-docs}/ml-nlp-elser.html[ELSER], Elastic's semantic search engine, to recall data from its internal knowledge base index to create retrieval augmented generation (RAG) responses. Adding data such as Runbooks, GitHub issues, internal documentation, and Slack messages to the knowledge base gives the AI Assistant context to provide more specific assistance. + +[NOTE] +==== +Your AI provider may collect telemetry when using the AI Assistant. Contact your AI provider for information on how data is collected. +==== + +You can add information to the knowledge base by asking the AI Assistant to remember something while chatting (for example, "remember this for next time"). The assistant will create a summary of the information and add it to the knowledge base. + +You can also add external data to the knowledge base either in the Project Settings UI or using the {es} Index API. + +[discrete] +[[observability-ai-assistant-use-the-ui]] +=== Use the UI + +To add external data to the knowledge base in the Project Settings UI: + +. Go to **Project Settings**. +. In the _Other_ section, click **AI assistant for Observability settings**. +. Then select the **Elastic AI Assistant for Observability**. +. Switch to the **Knowledge base** tab. +. Click the **New entry** button, and choose either: ++ +** **Single entry**: Write content for a single entry in the UI. +** **Bulk import**: Upload a newline delimited JSON (`ndjson`) file containing a list of entries to add to the knowledge base. +Each object should conform to the following format: ++ +[source,json] +---- +{ + "id": "a_unique_human_readable_id", + "text": "Contents of item", +} +---- + +[discrete] +[[observability-ai-assistant-use-the-es-index-api]] +=== Use the {es} Index API + +. Ingest external data (GitHub issues, Markdown files, Jira tickets, text files, etc.) into {es} using the {es} {ref}/docs-index_.html[Index API]. +. Reindex your data into the AI Assistant's knowledge base index by completing the following query in **Developer Tools** → **Console**. Update the following fields before reindexing: ++ +** `InternalDocsIndex`: Name of the index where your internal documents are stored. +** `text_field`: Name of the field containing your internal documents' text. +** `timestamp`: Name of the timestamp field in your internal documents. +** `public`: If `true`, the document is available to all users with access to your Observability project. If `false`, the document is restricted to the user indicated in the following `user.name` field. +** `user.name` (optional): If defined, restricts the internal document's availability to a specific user. +** You can add a query filter to index specific documents. + +[source,console] +---- +POST _reindex +{ + "source": { + "index": "", + "_source": [ + "", + "", + "namespace", + "is_correction", + "public", + "confidence" + ] + }, + "dest": { + "index": ".kibana-observability-ai-assistant-kb-000001", + "pipeline": ".kibana-observability-ai-assistant-kb-ingest-pipeline" + }, + "script": { + "inline": "ctx._source.text = ctx._source.remove(\"\");ctx._source.namespace=\"\";ctx._source.is_correction=false;ctx._source.public=;ctx._source.confidence=\"high\";ctx._source['@timestamp'] = ctx._source.remove(\"\");ctx._source['user.name'] = \"\"" + } +} +---- + +[discrete] +[[observability-ai-assistant-interact-with-the-ai-assistant]] +== Interact with the AI Assistant + +You can chat with the AI Assistant or interact with contextual insights located throughout {obs-serverless}. +See the following sections for more on interacting with the AI Assistant. + +[TIP] +==== +After every answer the LLM provides, let us know if the answer was helpful. +Your feedback helps us improve the AI Assistant! +==== + +[discrete] +[[observability-ai-assistant-chat-with-the-assistant]] +=== Chat with the assistant + +Click the AI Assistant button (image:images/ai-assistant-button.png[AI Assistant icon]) in the upper-right corner where available to start the chat. + +This opens the AI Assistant flyout, where you can ask the assistant questions about your instance: + +[role="screenshot"] +image::images/ai-assistant-chat.png[Observability AI assistant chat, 60%] + +[IMPORTANT] +==== +Asking questions about your data requires function calling, which enables LLMs to reliably interact with third-party generative AI providers to perform searches or run advanced functions using customer data. + +When the Observability AI Assistant performs searches in the cluster, the queries are run with the same level of permissions as the user. +==== + +[discrete] +[[observability-ai-assistant-suggest-functions]] +=== Suggest functions + +beta::[] + +The AI Assistant uses several functions to include relevant context in the chat conversation through text, data, and visual components. Both you and the AI Assistant can suggest functions. You can also edit the AI Assistant's function suggestions and inspect function responses. For example, you could use the `kibana` function to call a {kib} API on your behalf. + +You can suggest the following functions: + +|=== +| Function | Description + +| `alerts` +| Get alerts for {obs-serverless}. + +| `elasticsearch` +| Call {es} APIs on your behalf. + +| `kibana` +| Call {kib} APIs on your behalf. + +| `summarize` +| Summarize parts of the conversation. + +| `visualize_query` +| Visualize charts for ES|QL queries. +|=== + +Additional functions are available when your cluster has APM data: + +|=== +| Function | Description + +| `get_apm_correlations` +| Get field values that are more prominent in the foreground set than the background set. This can be useful in determining which attributes (such as `error.message`, `service.node.name`, or `transaction.name`) are contributing to, for instance, a higher latency. Another option is a time-based comparison, where you compare before and after a change point. + +| `get_apm_downstream_dependencies` +| Get the downstream dependencies (services or uninstrumented backends) for a service. Map the downstream dependency name to a service by returning both `span.destination.service.resource` and `service.name`. Use this to drill down further if needed. + +| `get_apm_error_document` +| Get a sample error document based on the grouping name. This also includes the stacktrace of the error, which might hint to the cause. + +| `get_apm_service_summary` +| Get a summary of a single service, including the language, service version, deployments, the environments, and the infrastructure that it is running in. For example, the number of pods and a list of their downstream dependencies. It also returns active alerts and anomalies. + +| `get_apm_services_list` +| Get the list of monitored services, their health statuses, and alerts. + +| `get_apm_timeseries` +| Display different APM metrics (such as throughput, failure rate, or latency) for any service or all services and any or all of their dependencies. Displayed both as a time series and as a single statistic. Additionally, the function returns any changes, such as spikes, step and trend changes, or dips. You can also use it to compare data by requesting two different time ranges, or, for example, two different service versions. +|=== + +[discrete] +[[observability-ai-assistant-use-contextual-prompts]] +=== Use contextual prompts + +AI Assistant contextual prompts throughout {obs-serverless} provide the following information: + +* **Alerts**: Provides possible causes and remediation suggestions for log rate changes. +* **Application performance monitoring (APM)**: Explains APM errors and provides remediation suggestions. +* **Logs**: Explains log messages and generates search patterns to find similar issues. + +// Not included in initial serverless launch + +// - **Universal Profiling**: explains the most expensive libraries and functions in your fleet and provides optimization suggestions. + +// - **Infrastructure Observability**: explains the processes running on a host. + +For example, in the log details, you'll see prompts for **What's this message?** and **How do I find similar log messages?**: + +[role="screenshot"] +image::images/ai-assistant-logs-prompts.png[Observability AI assistant example prompts for logs, 60%] + +Clicking a prompt generates a message specific to that log entry. +You can continue a conversation from a contextual prompt by clicking **Start chat** to open the AI Assistant chat. + +[role="screenshot"] +image::images/ai-assistant-logs.png[Observability AI assistant example, 60%] + +[discrete] +[[observability-ai-assistant-add-the-ai-assistant-connector-to-alerting-workflows]] +=== Add the AI Assistant connector to alerting workflows + +You can use the {kibana-ref}/obs-ai-assistant-action-type.html[Observability AI Assistant connector] to add AI-generated insights and custom actions to your alerting workflows. +To do this: + +. <> and specify the conditions that must be met for the alert to fire. +. Under **Actions**, select the **Observability AI Assistant** connector type. +. In the **Connector** list, select the AI connector you created when you set up the assistant. +. In the **Message** field, specify the message to send to the assistant: + +[role="screenshot"] +image::images/obs-ai-assistant-action-high-cpu.png[Add an Observability AI assistant action while creating a rule in the Observability UI] + +You can ask the assistant to generate a report of the alert that fired, +recall any information or potential resolutions of past occurrences stored in the knowledge base, +provide troubleshooting guidance and resolution steps, +and also include other active alerts that may be related. +As a last step, you can ask the assistant to trigger an action, +such as sending the report (or any other message) to a Slack webhook. + +.NOTE +[NOTE] +==== +Currently you can only send messages to Slack, email, Jira, PagerDuty, or a webhook. +Additional actions will be added in the future. +==== + +When the alert fires, contextual details about the event—such as when the alert fired, +the service or host impacted, and the threshold breached—are sent to the AI Assistant, +along with the message provided during configuration. +The AI Assistant runs the tasks requested in the message and creates a conversation you can use to chat with the assistant: + +[role="screenshot"] +image::images/obs-ai-assistant-output.png[AI Assistant conversation created in response to an alert] + +[IMPORTANT] +==== +Conversations created by the AI Assistant are public and accessible to every user with permissions to use the assistant. +==== + +It might take a minute or two for the AI Assistant to process the message and create the conversation. + +Note that overly broad prompts may result in the request exceeding token limits. +For more information, refer to <>. +Also, attempting to analyze several alerts in a single connector execution may cause you to exceed the function call limit. +If this happens, modify the message specified in the connector configuration to avoid exceeding limits. + +When asked to send a message to another connector, such as Slack, +the AI Assistant attempts to include a link to the generated conversation. + +[role="screenshot"] +image::images/obs-ai-assistant-slack-message.png[Message sent by Slack by the AI Assistant includes a link to the conversation] + +The Observability AI Assistant connector is called when the alert fires and when it recovers. + +To learn more about alerting, actions, and connectors, refer to <>. + +[discrete] +[[observability-ai-assistant-known-issues]] +== Known issues + +[discrete] +[[token-limits]] +=== Token limits + +Most LLMs have a set number of tokens they can manage in single a conversation. +When you reach the token limit, the LLM will throw an error, and Elastic will display a "Token limit reached" error. +The exact number of tokens that the LLM can support depends on the LLM provider and model you're using. +If you are using an OpenAI connector, you can monitor token usage in **OpenAI Token Usage** dashboard. +For more information, refer to the {kibana-ref}/openai-action-type.html#openai-connector-token-dashboard[OpenAI Connector documentation]. diff --git a/docs/en/serverless/application-and-service-monitoring.asciidoc b/docs/en/serverless/application-and-service-monitoring.asciidoc new file mode 100644 index 0000000000..96002a5037 --- /dev/null +++ b/docs/en/serverless/application-and-service-monitoring.asciidoc @@ -0,0 +1,17 @@ +[[application-and-service-monitoring]] += Application and service monitoring + +++++ +Applications and services +++++ + +Explore the topics in this section to learn how to observe and monitor software applications and services running in your environment. + +[cols="1,1"] +|=== +|<> +|Monitor software services and applications in real time, by collecting detailed performance information on response time for incoming requests, database queries, calls to caches, external HTTP requests, and more. + +|<> +|Monitor the availability of network endpoints and services. +|=== \ No newline at end of file diff --git a/docs/en/serverless/dashboards/dashboards-and-visualizations.asciidoc b/docs/en/serverless/dashboards/dashboards-and-visualizations.asciidoc new file mode 100644 index 0000000000..6dca71ecc2 --- /dev/null +++ b/docs/en/serverless/dashboards/dashboards-and-visualizations.asciidoc @@ -0,0 +1,44 @@ +[[observability-dashboards]] += Get started with dashboards + +// :description: Visualize your observability data using pre-built dashboards or create your own. +// :keywords: serverless, observability, overview + +Elastic provides a wide range of pre-built dashboards for visualizing observability data from a variety of sources. +These dashboards are loaded automatically when you install https://docs.elastic.co/integrations[Elastic integrations]. + +You can also create new dashboards and visualizations based on your data views to get a full picture of your data. + +In your Observability project, go to **Dashboards** to see installed dashboards or create your own. +This example shows dashboards loaded by the System integration: + +[role="screenshot"] +image::images/dashboards.png[Screenshot showing list of System dashboards] + +Notice you can filter the list of dashboards: + +* Use the text search field to filter by name or description. +* Use the **Tags** menu to filter by tag. To create a new tag or edit existing tags, click **Manage tags**. +* Click a dashboard's tags to toggle filtering for each tag. + +[discrete] +[[observability-dashboards-create-new-dashboards]] +== Create new dashboards + +To create a new dashboard, click **Create Dashboard** and begin adding visualizations. +You can create charts, graphs, maps, tables, and other types of visualizations from your data, +or you can add visualizations from the library. + +You can also add other types of panels — such as filters, links, and text — and add +controls like time sliders. + +For more information about creating dashboards, +refer to {kibana-ref}/create-a-dashboard-of-panels-with-web-server-data.html[Create your first dashboard]. + +[NOTE] +==== +The tutorial about creating your first dashboard is written for {kib} users, +but the steps for serverless are very similar. +To load the sample data in serverless, go to **Project Settings** → **Integrations** in the navigation pane, +then search for "sample data". +==== diff --git a/docs/en/serverless/images/get-started.svg b/docs/en/serverless/images/get-started.svg new file mode 100644 index 0000000000..487355b2f9 --- /dev/null +++ b/docs/en/serverless/images/get-started.svg @@ -0,0 +1,21 @@ + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/en/serverless/incident-management.asciidoc b/docs/en/serverless/incident-management.asciidoc new file mode 100644 index 0000000000..8c8bf6b8a4 --- /dev/null +++ b/docs/en/serverless/incident-management.asciidoc @@ -0,0 +1,19 @@ +[[incident-management]] += Incident management + +Explore the topics in this section to learn how to respond to incidents detected in your {observability} data. + + +[cols="1,1"] +|=== +|<> +|Trigger alerts when incidents occur, and use built-in connectors to send the alerts to email, slack, or other third-party systems, such as your external incident management application. + +|<> +|Collect and share information about {observability} issues by opening cases and optionally sending them to your external incident management application. + +|<> +|Set clear, measurable targets for your service performance, based on factors like availability, response times, error rates, and other key metrics. +|=== + + diff --git a/docs/en/serverless/index.asciidoc b/docs/en/serverless/index.asciidoc new file mode 100644 index 0000000000..908c8a6b84 --- /dev/null +++ b/docs/en/serverless/index.asciidoc @@ -0,0 +1,225 @@ +include::{asciidoc-dir}/../../shared/versions/stack/current.asciidoc[] +include::{asciidoc-dir}/../../shared/attributes.asciidoc[] + +[[what-is-observability-serverless]] +== Elastic Observability serverless + +++++ +Elastic Observability +++++ + +include::./what-is-observability-serverless.asciidoc[leveloffset=+2] + +// Group: Get started with Elastic Observability +include::observability-get-started.asciidoc[leveloffset=+2] + +// What is Observability? +include::./observability-overview.asciidoc[leveloffset=+3] + +// Observability billing dimensions +include::./projects/billing.asciidoc[leveloffset=+3] + +// Create an Elastic Observability Serverless project +include::./projects/create-an-observability-project.asciidoc[leveloffset=+3] + +// Quickstarts +include::./quickstarts/monitor-hosts-with-elastic-agent.asciidoc[leveloffset=+3] +include::./quickstarts/k8s-logs-metrics.asciidoc[leveloffset=+3] +include::./quickstarts/collect-data-with-aws-firehose.asciidoc[leveloffset=+3] + +// Dashboards +include::./dashboards/dashboards-and-visualizations.asciidoc[leveloffset=+3] + +// Group: Application and service monitoring +include::./application-and-service-monitoring.asciidoc[leveloffset=+2] + +// APM +include::./apm/apm.asciidoc[leveloffset=+3] +include::./apm/apm-get-started.asciidoc[leveloffset=+4] +include::./apm/apm-send-traces-to-elastic.asciidoc[leveloffset=+4] +include::./apm-agents/apm-agents-elastic-apm-agents.asciidoc[leveloffset=+5] +include::./apm-agents/apm-agents-opentelemetry.asciidoc[leveloffset=+5] +include::./apm-agents/apm-agents-opentelemetry-opentelemetry-native-support.asciidoc[leveloffset=+6] +include::./apm-agents/apm-agents-opentelemetry-collect-metrics.asciidoc[leveloffset=+6] +include::./apm-agents/apm-agents-opentelemetry-limitations.asciidoc[leveloffset=+6] +include::./apm-agents/apm-agents-opentelemetry-resource-attributes.asciidoc[leveloffset=+6] +include::./apm-agents/apm-agents-aws-lambda-functions.asciidoc[leveloffset=+5] +include::./apm/apm-view-and-analyze-traces.asciidoc[leveloffset=+4] +include::./apm/apm-find-transaction-latency-and-failure-correlations.asciidoc[leveloffset=+5] +include::./apm/apm-integrate-with-machine-learning.asciidoc[leveloffset=+5] +include::./apm/apm-create-custom-links.asciidoc[leveloffset=+5] +include::./apm/apm-track-deployments-with-annotations.asciidoc[leveloffset=+5] +include::./apm/apm-query-your-data.asciidoc[leveloffset=+5] +include::./apm/apm-filter-your-data.asciidoc[leveloffset=+5] +include::./apm/apm-observe-lambda-functions.asciidoc[leveloffset=+5] +include::./apm/apm-ui-overview.asciidoc[leveloffset=+5] +include::./apm/apm-ui-services.asciidoc[leveloffset=+6] +include::./apm/apm-ui-traces.asciidoc[leveloffset=+6] +include::./apm/apm-ui-dependencies.asciidoc[leveloffset=+6] +include::./apm/apm-ui-service-map.asciidoc[leveloffset=+6] +include::./apm/apm-ui-service-overview.asciidoc[leveloffset=+6] +include::./apm/apm-ui-transactions.asciidoc[leveloffset=+6] +include::./apm/apm-ui-trace-sample-timeline.asciidoc[leveloffset=+6] +include::./apm/apm-ui-errors.asciidoc[leveloffset=+6] +include::./apm/apm-ui-metrics.asciidoc[leveloffset=+6] +include::./apm/apm-ui-infrastructure.asciidoc[leveloffset=+6] +include::./apm/apm-ui-logs.asciidoc[leveloffset=+6] +include::./apm/apm-data-types.asciidoc[leveloffset=+4] +include::./apm/apm-distributed-tracing.asciidoc[leveloffset=+4] +include::./apm/apm-reduce-your-data-usage.asciidoc[leveloffset=+4] +include::./apm/apm-transaction-sampling.asciidoc[leveloffset=+5] +include::./apm/apm-compress-spans.asciidoc[leveloffset=+5] +include::./apm/apm-stacktrace-collection.asciidoc[leveloffset=+5] +include::./apm/apm-keep-data-secure.asciidoc[leveloffset=+4] +include::./apm/apm-troubleshooting.asciidoc[leveloffset=+4] +include::./apm/apm-reference.asciidoc[leveloffset=+4] +include::./apm/apm-kibana-settings.asciidoc[leveloffset=+5] +include::./apm/apm-server-api.asciidoc[leveloffset=+5] + +// Synthetics +include::./synthetics/synthetics-intro.asciidoc[leveloffset=+3] + +include::./synthetics/synthetics-get-started.asciidoc[leveloffset=+4] +include::./synthetics/synthetics-get-started-project.asciidoc[leveloffset=+5] +include::./synthetics/synthetics-get-started-ui.asciidoc[leveloffset=+5] + +include::./synthetics/synthetics-journeys.asciidoc[leveloffset=+4] +include::./synthetics/synthetics-create-test.asciidoc[leveloffset=+5] +include::./synthetics/synthetics-monitor-use.asciidoc[leveloffset=+5] +include::./synthetics/synthetics-recorder.asciidoc[leveloffset=+5] + +include::./synthetics/synthetics-lightweight.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-manage-monitors.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-params-secrets.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-analyze.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-private-location.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-command-reference.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-configuration.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-mfa.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-settings.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-feature-roles.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-manage-retention.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-scale-and-architect.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-security-encryption.asciidoc[leveloffset=+4] + +include::./synthetics/synthetics-troubleshooting.asciidoc[leveloffset=+4] + +// Group: Infrastructure and hosts +include::./infrastructure-and-host-monitoring-intro.asciidoc[leveloffset=+2] + +include::./infra-monitoring/infra-monitoring.asciidoc[leveloffset=+3] +include::./infra-monitoring/get-started-with-metrics.asciidoc[leveloffset=+4] +include::./infra-monitoring/view-infrastructure-metrics.asciidoc[leveloffset=4] +include::./infra-monitoring/analyze-hosts.asciidoc[leveloffset=+4] +include::./infra-monitoring/detect-metric-anomalies.asciidoc[leveloffset=+4] +include::./infra-monitoring/configure-infra-settings.asciidoc[leveloffset=+4] + +include::./infra-monitoring/troubleshooting-infra.asciidoc[leveloffset=+3] +include::./infra-monitoring/handle-no-results-found-message.asciidoc[leveloffset=+4] + +include::./infra-monitoring/metrics-reference.asciidoc[leveloffset=+3] +include::./infra-monitoring/host-metrics.asciidoc[leveloffset=+4] +include::./infra-monitoring/container-metrics.asciidoc[leveloffset=+4] +include::./infra-monitoring/kubernetes-pod-metrics.asciidoc[leveloffset=+4] +include::./infra-monitoring/aws-metrics.asciidoc[leveloffset=+4] + +// Group: Logs +// TODO: Check the diff to see which changes need to be applied to these files. Also ask Mike if it's expected for serverless and stateful to be out of sync. + +include::./logging/log-monitoring.asciidoc[leveloffset=+2] + +include::./logging/get-started-with-logs.asciidoc[leveloffset=+3] +include::./logging/stream-log-files.asciidoc[leveloffset=+3] +include::./logging/correlate-application-logs.asciidoc[leveloffset=+3] +include::./logging/plaintext-application-logs.asciidoc[leveloffset=+4] +include::./logging/ecs-application-logs.asciidoc[leveloffset=+4] +include::./logging/send-application-logs.asciidoc[leveloffset=+4] +include::./logging/parse-log-data.asciidoc[leveloffset=+3] +include::./logging/filter-and-aggregate-logs.asciidoc[leveloffset=+3] +include::./logging/view-and-monitor-logs.asciidoc[leveloffset=+3] +include::./logging/add-logs-service-name.asciidoc[leveloffset=+3] +include::./logging/run-log-pattern-analysis.asciidoc[leveloffset=+3] +include::./logging/troubleshoot-logs.asciidoc[leveloffset=+3] + +//TODO: Figure out where to put this. It's under "view and analyze data" in stateful, but that category doesn't exist in serverless yet. +include::./inventory.asciidoc[leveloffset=+2] + +// Group: Incident management +include::./incident-management.asciidoc[leveloffset=+2] + +// Alerting +include::./alerting/alerting.asciidoc[leveloffset=+3] +include::./alerting/create-manage-rules.asciidoc[leveloffset=+4] +include::./alerting/aiops-generate-anomaly-alerts.asciidoc[leveloffset=+5] +include::./alerting/create-anomaly-alert-rule.asciidoc[leveloffset=+5] +include::./alerting/create-custom-threshold-alert-rule.asciidoc[leveloffset=+5] +include::./alerting/create-elasticsearch-query-alert-rule.asciidoc[leveloffset=+5] +include::./alerting/create-error-count-threshold-alert-rule.asciidoc[leveloffset=+5] +include::./alerting/create-failed-transaction-rate-threshold-alert-rule.asciidoc[leveloffset=+5] +include::./alerting/create-inventory-threshold-alert-rule.asciidoc[leveloffset=+5] +include::./alerting/create-latency-threshold-alert-rule.asciidoc[leveloffset=+5] +include::./alerting/create-slo-burn-rate-alert-rule.asciidoc[leveloffset=+5] +include::./alerting/synthetic-monitor-status-alert.asciidoc[leveloffset=+5] +include::./alerting/aggregation-options.asciidoc[leveloffset=+4] +include::./alerting/rate-aggregation.asciidoc[leveloffset=+5] +include::./alerting/view-alerts.asciidoc[leveloffset=+4] +include::./alerting/triage-slo-burn-rate-breaches.asciidoc[leveloffset=+5] +include::./alerting/triage-threshold-breaches.asciidoc[leveloffset=+5] + +// Cases +include::./cases/cases.asciidoc[leveloffset=+3] + +include::./cases/create-manage-cases.asciidoc[leveloffset=+4] + +include::./cases/manage-cases-settings.asciidoc[leveloffset=+4] + +//SLOs +include::./slos/slos.asciidoc[leveloffset=+3] + +include::./slos/create-an-slo.asciidoc[leveloffset=+4] + +//Data Set Quality +include::./monitor-datasets.asciidoc[leveloffset=+2] + +//Observability AI Assistant +include::./ai-assistant/ai-assistant.asciidoc[leveloffset=+2] + +//Machine learning + +include::./machine-learning/machine-learning.asciidoc[leveloffset=+2] +include::./machine-learning/aiops-detect-anomalies.asciidoc[leveloffset=+3] +include::./machine-learning/aiops-tune-anomaly-detection-job.asciidoc[leveloffset=+4] +include::./machine-learning/aiops-forecast-anomaly.asciidoc[leveloffset=+4] +include::./machine-learning/aiops-analyze-spikes.asciidoc[leveloffset=+3] +include::./machine-learning/aiops-detect-change-points.asciidoc[leveloffset=+3] + +// Reference group + +include::./reference.asciidoc[leveloffset=+2] + +// Fields + +include::./reference/metrics-app-fields.asciidoc[leveloffset=+3] + +// Elastic Entity Model + +include::./reference/elastic-entity-model.asciidoc[leveloffset=+3] + +// Technical preview limitations + +include::./limitations.asciidoc[leveloffset=+2] + +// add redirects file +include::redirects.asciidoc[] \ No newline at end of file diff --git a/docs/en/serverless/infra-monitoring/infra-monitoring.asciidoc b/docs/en/serverless/infra-monitoring/infra-monitoring.asciidoc new file mode 100644 index 0000000000..3dbc6bcdbf --- /dev/null +++ b/docs/en/serverless/infra-monitoring/infra-monitoring.asciidoc @@ -0,0 +1,30 @@ +[[observability-infrastructure-monitoring]] += Analyze infrastructure and host metrics + +// :description: Monitor metrics from your servers, Docker, Kubernetes, Prometheus, and other services and applications. +// :keywords: serverless, observability, overview + +{obs-serverless} allows you to visualize infrastructure metrics to help diagnose problematic spikes, +identify high resource utilization, automatically discover and track pods, +and unify your metrics with logs and APM data. + +Using {agent} integrations, you can ingest and analyze metrics from servers, +Docker containers, Kubernetes orchestrations, explore and analyze application +telemetry, and more. + +For more information, refer to the following links: + +* <>: +Learn how to onboard your system metrics data quickly. +* <>: +Use the **Inventory page** to get a metrics-driven view of your infrastructure grouped by resource type. +* <>: +Use the **Hosts** page to get a metrics-driven view of your infrastructure backed by an easy-to-use interface called Lens. +* <>: Detect and inspect memory usage and network traffic anomalies for hosts and Kubernetes pods. +* <>: Learn how to configure infrastructure UI settings. +* <>: Learn about key metrics used for infrastructure monitoring. +* <>: Learn about the fields required to display data in the Infrastructure UI. + +By default, the Infrastructure UI displays metrics from {es} indices that +match the `metrics-*` and `metricbeat-*` index patterns. To learn how to change +this behavior, refer to <>. diff --git a/docs/en/serverless/infrastructure-and-host-monitoring-intro.asciidoc b/docs/en/serverless/infrastructure-and-host-monitoring-intro.asciidoc new file mode 100644 index 0000000000..e621408bc9 --- /dev/null +++ b/docs/en/serverless/infrastructure-and-host-monitoring-intro.asciidoc @@ -0,0 +1,20 @@ +[[infrastructure-and-host-monitoring-intro]] += Infrastructure and host monitoring + +++++ +Infrastructure and hosts +++++ + +Explore the topics in this section to learn how to observe and monitor hosts and other systems running in your environment. + +[cols="1,1"] +|=== +|<> +|Visualize infrastructure metrics to help diagnose problematic spikes, identify high resource utilization, automatically discover and track pods, and unify your metrics with other observability data. + +|<> +|Troubleshoot common issues on your own or ask for help. + +|<> +|Learn about the key metrics displayed in the Infrastructure UI and how they are calculated. +|=== diff --git a/docs/en/serverless/logging/log-monitoring.asciidoc b/docs/en/serverless/logging/log-monitoring.asciidoc new file mode 100644 index 0000000000..ccc9ce684e --- /dev/null +++ b/docs/en/serverless/logging/log-monitoring.asciidoc @@ -0,0 +1,120 @@ +[[observability-log-monitoring]] += Log monitoring + +++++ +Logs +++++ + +// :description: Use Elastic to deploy and manage logs at a petabyte scale, and get insights from your logs in minutes. +// :keywords: serverless, observability, overview + +Elastic Observability allows you to deploy and manage logs at a petabyte scale, giving you insights into your logs in minutes. You can also search across your logs in one place, troubleshoot in real time, and detect patterns and outliers with categorization and anomaly detection. For more information, refer to the following links: + +* <>: Onboard system log data from a machine or server. +* <>: Send log files to your Observability project using a standalone {agent}. +* <>: Parse your log data and extract structured fields that you can use to analyze your data. +* <>: Filter and aggregate your log data to find specific information, gain insight, and monitor your systems more efficiently. +* <>: Find information on visualizing and analyzing logs. +* <>: Find patterns in unstructured log messages and make it easier to examine your data. +* <>: Find solutions for errors you might encounter while onboarding your logs. + +[discrete] +[[observability-log-monitoring-send-logs-data-to-your-project]] +== Send logs data to your project + +You can send logs data to your project in different ways depending on your needs: + +* {agent} +* {filebeat} + +When choosing between {agent} and {filebeat}, consider the different features and functionalities between the two options. +See {fleet-guide}/beats-agent-comparison.html[{beats} and {agent} capabilities] for more information on which option best fits your situation. + +[discrete] +[[observability-log-monitoring-agent]] +=== {agent} + +{agent} uses https://www.elastic.co/integrations/data-integrations[integrations] to ingest logs from Kubernetes, MySQL, and many more data sources. +You have the following options when installing and managing an {agent}: + +[discrete] +[[observability-log-monitoring-fleet-managed-agent]] +==== {fleet}-managed {agent} + +Install an {agent} and use {fleet} to define, configure, and manage your agents in a central location. + +See {fleet-guide}/install-fleet-managed-elastic-agent.html[install {fleet}-managed {agent}]. + +[discrete] +[[observability-log-monitoring-standalone-agent]] +==== Standalone {agent} + +Install an {agent} and manually configure it locally on the system where it’s installed. +You are responsible for managing and upgrading the agents. + +See {fleet-guide}/install-standalone-elastic-agent.html[install standalone {agent}]. + +[discrete] +[[observability-log-monitoring-agent-in-a-containerized-environment]] +==== {agent} in a containerized environment + +Run an {agent} inside of a container — either with {fleet-server} or standalone. + +See {fleet-guide}/install-elastic-agents-in-containers.html[install {agent} in containers]. + +[discrete] +[[observability-log-monitoring-filebeat]] +=== {filebeat} + +{filebeat} is a lightweight shipper for forwarding and centralizing log data. +Installed as a service on your servers, {filebeat} monitors the log files or locations that you specify, collects log events, and forwards them to your Observability project for indexing. + +* {filebeat-ref}/filebeat-overview.html[{filebeat} overview]: General information on {filebeat} and how it works. +* {filebeat-ref}/filebeat-installation-configuration.html[{filebeat} quick start]: Basic installation instructions to get you started. +* {filebeat-ref}/setting-up-and-running.html[Set up and run {filebeat}]: Information on how to install, set up, and run {filebeat}. + +[discrete] +[[observability-log-monitoring-configure-logs]] +== Configure logs + +The following resources provide information on configuring your logs: + +* {ref}/data-streams.html[Data streams]: Efficiently store append-only time series data in multiple backing indices partitioned by time and size. +* {kibana-ref}/data-views.html[Data views]: Query log entries from the data streams of specific datasets or namespaces. +* {ref}/example-using-index-lifecycle-policy.html[Index lifecycle management]: Configure the built-in logs policy based on your application's performance, resilience, and retention requirements. +* {ref}/ingest.html[Ingest pipeline]: Parse and transform log entries into a suitable format before indexing. +* {ref}/mapping.html[Mapping]: Define how data is stored and indexed. + +[discrete] +[[observability-log-monitoring-view-and-monitor-logs]] +== View and monitor logs + +Use **Logs Explorer** to search, filter, and tail all your logs ingested into your project in one place. + +The following resources provide information on viewing and monitoring your logs: + +* <>: Discover and explore all of the log events flowing in from your servers, virtual machines, and containers in a centralized view. +* <>: Use {ml} to detect log anomalies automatically. + +[discrete] +[[observability-log-monitoring-monitor-data-sets]] +== Monitor data sets + +The **Data Set Quality** page provides an overview of your data sets and their quality. +Use this information to get an idea of your overall data set quality, and find data sets that contain incorrectly parsed documents. + +<> + +[discrete] +[[observability-log-monitoring-application-logs]] +== Application logs + +Application logs provide valuable insight into events that have occurred within your services and applications. +See <>. + +//// +/* ## Create a logs threshold alert + +You can create a rule to send an alert when the log aggregation exceeds a threshold. +See Create a logs threshold rule. */ +//// diff --git a/docs/en/serverless/machine-learning/aiops-analyze-spikes.asciidoc b/docs/en/serverless/machine-learning/aiops-analyze-spikes.asciidoc new file mode 100644 index 0000000000..7496a75bee --- /dev/null +++ b/docs/en/serverless/machine-learning/aiops-analyze-spikes.asciidoc @@ -0,0 +1,71 @@ +[[observability-aiops-analyze-spikes]] += Analyze log spikes and drops + +// :description: Find and investigate the causes of unusual spikes or drops in log rates. +// :keywords: serverless, observability, how-to + +// + +{obs-serverless} provides built-in log rate analysis capabilities, +based on advanced statistical methods, +to help you find and investigate the causes of unusual spikes or drops in log rates. + +To analyze log spikes and drops: + +. In your {obs-serverless} project, go to **AIOps** → **Log rate analysis**. +. Choose a data view or saved search to access the log data you want to analyze. +. In the histogram chart, click a spike (or drop) to start the analysis. ++ +[role="screenshot"] +image::images/log-rate-histogram.png[Histogram showing log spikes and drops ] ++ +When the analysis runs, it identifies statistically significant field-value combinations that contribute to the spike or drop, +and then displays them in a table: ++ +[role="screenshot"] +image::images/log-rate-analysis-results.png[Histogram showing log spikes and drops ] ++ +Notice that you can optionally turn on **Smart grouping** to summarize the results into groups. +You can also click **Filter fields** to remove fields that are not relevant. ++ +The table shows an indicator of the level of impact and a sparkline showing the shape of the impact in the chart. +. Select a row to display the impact of the field on the histogram chart. +. From the **Actions** menu in the table, you can choose to view the field in **Discover**, +view it in <>, +or copy the table row information to the clipboard as a query filter. + +To pin a table row, click the row, then move the cursor to the histogram chart. +It displays a tooltip with exact count values for the pinned field which enables closer investigation. + +Brushes in the chart show the baseline time range and the deviation in the analyzed data. +You can move the brushes to redefine both the baseline and the deviation and rerun the analysis with the modified values. + +[discrete] +[[log-pattern-analysis]] +== Log pattern analysis + +// + +Use log pattern analysis to find patterns in unstructured log messages and examine your data. +When you run a log pattern analysis, it performs categorization analysis on a selected field, +creates categories based on the data, and then displays them together in a chart. +The chart shows the distribution of each category and an example document that matches the category. +Log pattern analysis is useful when you want to examine how often different types of logs appear in your data set. +It also helps you group logs in ways that go beyond what you can achieve with a terms aggregation. + +To run log pattern analysis: + +. Follow the steps under <> to run a log rate analysis. +. From the **Actions** menu, choose **View in Log Pattern Analysis**. +. Select a category field and optionally apply any filters that you want. +. Click **Run pattern analysis**. ++ +The results of the analysis are shown in a table: ++ +[role="screenshot"] +image::images/log-pattern-analysis.png[Log pattern analysis of the message field ] +. From the **Actions** menu, click the plus (or minus) icon to open **Discover** and show (or filter out) the given category there, which helps you to further examine your log messages. + +// TODO: Question: Is the log pattern analysis only available through the log rate analysis UI? + +// TODO: Add some good examples to this topic taken from existing docs or recommendations from reviewers. diff --git a/docs/en/serverless/machine-learning/aiops-detect-anomalies.asciidoc b/docs/en/serverless/machine-learning/aiops-detect-anomalies.asciidoc new file mode 100644 index 0000000000..c87a36cf8c --- /dev/null +++ b/docs/en/serverless/machine-learning/aiops-detect-anomalies.asciidoc @@ -0,0 +1,273 @@ +[[observability-aiops-detect-anomalies]] += Detect anomalies + +// :description: Detect anomalies by comparing real-time and historical data from different sources to look for unusual, problematic patterns. +// :keywords: serverless, observability, how-to + +:role: Editor +:goal: create, run, and view {anomaly-job}s +include::../partials/roles.asciidoc[] +:role!: + +:goal!: + +The anomaly detection feature in {obs-serverless} automatically models the normal behavior of your time series data — learning trends, +periodicity, and more — in real time to identify anomalies, streamline root cause analysis, and reduce false positives. + +To set up anomaly detection, you create and run anomaly detection jobs. +Anomaly detection jobs use proprietary {ml} algorithms to detect anomalous events or patterns, such as: + +* Anomalies related to temporal deviations in values, counts, or frequencies +* Anomalies related to unusual locations in geographic data +* Statistical rarity +* Unusual behaviors for a member of a population + +To learn more about anomaly detection algorithms, refer to the {ml-docs}/ml-ad-algorithms.html[{ml}] documentation. +Note that the {ml} documentation may contain details that are not valid when using a serverless project. + +.Some terms you might need to know +[NOTE] +==== +A _datafeed_ retrieves time series data from {es} and provides it to an +anomaly detection job for analysis. + +The job uses _buckets_ to divide the time series into batches for processing. +For example, a job may use a bucket span of 1 hour. + +Each {anomaly-job} contains one or more _detectors_, which define the type of +analysis that occurs (for example, `max`, `average`, or `rare` analytical +functions) and the fields that are analyzed. Some of the analytical functions +look for single anomalous data points. For example, `max` identifies the maximum +value that is seen within a bucket. Others perform some aggregation over the +length of the bucket. For example, `mean` calculates the mean of all the data +points seen within the bucket. + +To learn more about anomaly detection, refer to the {ml-docs}/ml-ad-overview.html[{ml}] documentation. +==== + +[discrete] +[[create-anomaly-detection-job]] +== Create and run an anomaly detection job + +. In your {obs-serverless} project, go to **AIOps** → **Anomaly detection**. +. Click **Create anomaly detection job** (or **Create job** if other jobs exist). +. Choose a data view or saved search to access the data you want to analyze. +. Select the wizard for the type of job you want to create. +The following wizards are available. +You might also see specialized wizards based on the type of data you are analyzing. ++ +[TIP] +==== +In general, it is a good idea to start with single metric anomaly detection jobs for your key performance indicators. +After you examine these simple analysis results, you will have a better idea of what the influencers might be. +Then you can create multi-metric jobs and split the data or create more complex analysis functions as necessary. +==== ++ +-- +Single metric:: +Creates simple jobs that have a single detector. A _detector_ applies an analytical function to specific fields in your data. In addition to limiting the number of detectors, the single metric wizard omits many of the more advanced configuration options. + +Multi-metric:: +Creates jobs that can have more than one detector, which is more efficient than running multiple jobs against the same data. + +Population:: +Creates jobs that detect activity that is unusual compared to the behavior of the population. + +Advanced:: +Creates jobs that can have multiple detectors and enables you to configure all job settings. + +Categorization:: +Creates jobs that group log messages into categories and use `count` or `rare` functions to detect anomalies within them. + +Rare:: +Creates jobs that detect rare occurrences in time series data. Rare jobs use the `rare` or `freq_rare` functions and also detect rare occurrences in populations. + +Geo:: +Creates jobs that detect unusual occurrences in the geographic locations of your data. Your data set must contain geo data. +-- ++ +For more information about job types, refer to the {ml-docs}/ml-anomaly-detection-job-types.html[{ml}] documentation. ++ +.Not sure what type of job to create? +[NOTE] +==== +Before selecting a wizard, click **Data Visualizer** to explore the fields and metrics in your data. +To get the best results, you must understand your data, including its data types and the range and distribution of values. + +In the **Data Visualizer**, use the time filter to select a time period that you’re interested in exploring, +or click **Use full data** to view the full time range of data. +Expand the fields to see details about the range and distribution of values. +When you're done, go back to the first step and create your job. +==== +. Step through the instructions in the job creation wizard to configure your job. +You can accept the default settings for most settings now and <> later. +. If you want the job to start immediately when the job is created, make sure that option is selected on the summary page. +. When you're done, click **Create job**. +When the job runs, the {ml} features analyze the input stream of data, model its behavior, and perform analysis based on the detectors in each job. +When an event occurs outside of the baselines of normal behavior, that event is identified as an anomaly. +. After the job is started, click **View results**. + +[discrete] +[[observability-aiops-detect-anomalies-view-the-results]] +== View the results + +After the anomaly detection job has processed some data, +you can view the results in {obs-serverless}. + +[TIP] +==== +Depending on the capacity of your machine, +you might need to wait a few seconds for the analysis to generate initial results. +==== + +If you clicked **View results** after creating the job, the results open in either the **Single Metric Viewer** or **Anomaly Explorer**. +To switch between these tools, click the icons in the upper-left corner of each tool. + +Read the following sections to learn more about these tools: + +* <> +* <> + +[discrete] +[[view-single-metric]] +== View single metric job results + +The **Single Metric Viewer** contains a chart that represents the actual and expected values over time: + +[role="screenshot"] +image::images/anomaly-detection-single-metric-viewer.png[Single Metric Viewer showing analysis ] + +* The line in the chart represents the actual data values. +* The shaded area represents the bounds for the expected values. +* The area between the upper and lower bounds are the most likely values for the model, using a 95% confidence level. +That is to say, there is a 95% chance of the actual value falling within these bounds. +If a value is outside of this area then it will usually be identified as anomalous. + +[TIP] +==== +Expected values are available only if **Enable model plot** was selected under Job Details +when you created the job. +==== + +To explore your data: + +. If the **Single Metric Explorer** is not already open, go to **AIOps** → **Anomaly detection** and click the Single Metric Explorer icon next to the job you created. +Note that the Single Metric Explorer icon will be grayed out for advanced or multi-metric jobs. +. In the time filter, specify a time range that covers the majority of the analyzed data points. +. Notice that the model improves as it processes more data. +At the beginning, the expected range of values is pretty broad, and the model is not capturing the periodicity in the data. +But it quickly learns and begins to reflect the patterns in your data. +The duration of the learning process heavily depends on the characteristics and complexity of the input data. +. Look for anomaly data points, depicted by colored dots or cross symbols, and hover over a data point to see more details about the anomaly. +Note that anomalies with medium or high multi-bucket impact are depicted with a cross symbol instead of a dot. ++ +.How are anomaly scores calculated? +[NOTE] +==== +Any data points outside the range that was predicted by the model are marked +as anomalies. In order to provide a sensible view of the results, an +_anomaly score_ is calculated for each bucket time interval. The anomaly score +is a value from 0 to 100, which indicates the significance of the anomaly +compared to previously seen anomalies. The highly anomalous values are shown in +red and the low scored values are shown in blue. An interval with a high +anomaly score is significant and requires investigation. +For more information about anomaly scores, refer to the {ml-docs}/ml-ad-explain.html[{ml}] documentation. +==== +. (Optional) Annotate your job results by drag-selecting a period of time and entering annotation text. +Annotations are notes that refer to events in a specific time period. +They can be created by the user or generated automatically by the anomaly detection job to reflect model changes and noteworthy occurrences. +. Under **Anomalies**, expand each anomaly to see key details, such as the time, the actual and expected ("typical") values, and their probability. +The **Anomaly explanation** section gives you further insights about each anomaly, such as its type and impact, to make it easier to interpret the job results: ++ +[role="screenshot"] +image::images/anomaly-detection-details.png[Single Metric Viewer showing anomaly details ] ++ +By default, the **Anomalies** table contains all anomalies that have a severity of "warning" or higher in the selected section of the timeline. +If you are only interested in critical anomalies, for example, you can change the severity threshold for this table. +. (Optional) From the **Actions** menu in the **Anomalies** table, you can choose to view relevant documents in **Discover** or create a job rule. +Job rules instruct anomaly detectors to change their behavior based on domain-specific knowledge that you provide. +To learn more, refer to <> + +After you have identified anomalies, often the next step is to try to determine +the context of those situations. For example, are there other factors that are +contributing to the problem? Are the anomalies confined to particular +applications or servers? You can begin to troubleshoot these situations by +layering additional jobs or creating multi-metric jobs. + +[discrete] +[[anomaly-explorer]] +== View advanced or multi-metric job results + +Conceptually, you can think of _multi-metric anomaly detection jobs_ as running multiple independent single metric jobs. +By bundling them together in a multi-metric job, however, +you can see an overall score and shared influencers for all the metrics and all the entities in the job. +Multi-metric jobs therefore scale better than having many independent single metric jobs. +They also provide better results when you have influencers that are shared across the detectors. + +.What is an influencer? +[NOTE] +==== +When you create an anomaly detection job, you can identify fields as _influencers_. +These are fields that you think contain information about someone or something that influences or contributes to anomalies. +As a best practice, do not pick too many influencers. +For example, you generally do not need more than three. +If you pick many influencers, the results can be overwhelming, and there is some overhead to the analysis. + +To learn more about influencers, refer to the {ml-docs}/ml-ad-run-jobs.html#ml-ad-influencers[{ml}] documentation. +==== + +You can also configure your anomaly detection jobs to split a single time series into multiple time series based on a categorical field. +For example, you could create a job for analyzing response code rates that has a single detector that splits the data based on the `response.keyword`, +and uses the `count` function to determine when the number of events is anomalous. +You might use a job like this if you want to look at both high and low request rates partitioned by response code. + +To view advanced or multi-metric results in the +**Anomaly Explorer**: + +. If the **Anomaly Explorer** is not already open, go to **AIOps** → **Anomaly detection** and click the Anomaly Explorer icon next to the job you created. +. In the time filter, specify a time range that covers the majority of the analyzed data points. +. If you specified influencers during job creation, the view includes a list of the top influencers for all of the detected anomalies in that same time period. +The list includes maximum anomaly scores, which in this case are aggregated for each influencer, for each bucket, across all detectors. +There is also a total sum of the anomaly scores for each influencer. +Use this list to help you narrow down the contributing factors and focus on the most anomalous entities. +. Under **Anomaly timeline**, click a section in the swim lanes to obtain more information about the anomalies in that time period. ++ +[role="screenshot"] +image::images/anomaly-explorer.png[Anomaly Explorer showing swim lanes with anomaly selected ] ++ +You can see exact times when anomalies occurred. +If there are multiple detectors or metrics in the job, you can see which caught the anomaly. +You can also switch to viewing this time series in the **Single Metric Viewer** by selecting **View series** in the **Actions** menu. +. Under **Anomalies** (in the **Anomaly Explorer**), expand an anomaly to see key details, such as the time, +the actual and expected ("typical") values, and the influencers that contributed to the anomaly: ++ +[role="screenshot"] +image::images/anomaly-detection-multi-metric-details.png[Anomaly Explorer showing anomaly details ] ++ +By default, the **Anomalies** table contains all anomalies that have a severity of "warning" or higher in the selected section of the timeline. +If you are only interested in critical anomalies, for example, you can change the severity threshold for this table. ++ +If your job has multiple detectors, the table aggregates the anomalies to show the highest severity anomaly per detector and entity, +which is the field value that is displayed in the **found for** column. ++ +To view all the anomalies without any aggregation, set the **Interval** to **Show all**. + +[TIP] +==== +The anomaly scores that you see in each section of the **Anomaly Explorer** might differ slightly. +This disparity occurs because for each job there are bucket results, influencer results, and record results. +Anomaly scores are generated for each type of result. +The anomaly timeline uses the bucket-level anomaly scores. +The list of top influencers uses the influencer-level anomaly scores. +The list of anomalies uses the record-level anomaly scores. +==== + +[discrete] +[[observability-aiops-detect-anomalies-next-steps]] +== Next steps + +After setting up an anomaly detection job, you may want to: + +* <> +* <> +* <> diff --git a/docs/en/serverless/machine-learning/aiops-detect-change-points.asciidoc b/docs/en/serverless/machine-learning/aiops-detect-change-points.asciidoc new file mode 100644 index 0000000000..fbdce3cc33 --- /dev/null +++ b/docs/en/serverless/machine-learning/aiops-detect-change-points.asciidoc @@ -0,0 +1,68 @@ +[[observability-aiops-detect-change-points]] += Detect change points + +// :description: Detect distribution changes, trend changes, and other statistically significant change points in a metric of your time series data. +// :keywords: serverless, observability, how-to + +// + +The change point detection feature in {obs-serverless} detects distribution changes, +trend changes, and other statistically significant change points in time series data. +Unlike anomaly detection, change point detection does not require you to configure a job or generate a model. +Instead you select a metric and immediately see a visual representation that splits the time series into two parts, before and after the change point. + +{obs-serverless} uses a {ref}/search-aggregations-change-point-aggregation.html[change point aggregation] +to detect change points. This aggregation can detect change points when: + +* a significant dip or spike occurs +* the overall distribution of values has changed significantly +* there was a statistically significant step up or down in value distribution +* an overall trend change occurs + +To detect change points: + +. In your {obs-serverless} project, go to **AIOps** → **Change point detection**. +. Choose a data view or saved search to access the data you want to analyze. +. Select a function: **avg**, **max**, **min**, or **sum**. +. In the time filter, specify a time range over which you want to detect change points. +. From the **Metric field** list, select a field you want to check for change points. +. (Optional) From the **Split field** list, select a field to split the data by. +If the cardinality of the split field exceeds 10,000, only the first 10,000 values, sorted by document count, are analyzed. +Use this option when you want to investigate the change point across multiple instances, pods, clusters, and so on. +For example, you may want to view CPU utilization split across multiple instances without having to jump across multiple dashboards and visualizations. + +[NOTE] +==== +You can configure a maximum of six combinations of a function applied to a metric field, partitioned by a split field, to identify change points. +==== + +The change point detection feature automatically dissects the time series into multiple points within the given time window, +tests whether the behavior is statistically different before and after each point in time, and then detects a change point if one exists: + +[role="screenshot"] +image::images/change-point-detection.png[Change point detection UI showing change points split by process ] + +The resulting view includes: + +* The timestamp of the change point +* A preview chart +* The type of change point and its p-value. The p-value indicates the magnitude of the change; lower values indicate more significant changes. +* The name and value of the split field, if used. + +If the analysis is split by a field, a separate chart is shown for every partition that has a detected change point. +The chart displays the type of change point, its value, and the timestamp of the bucket where the change point has been detected. + +On the **Change point detection page**, you can also: + +* Select a subset of charts and click **View selected** to view only the selected charts. ++ +[role="screenshot"] +image::images/change-point-detection-view-selected.png[View selected change point detection charts ] +* Filter the results by specific types of change points by using the change point type selector: ++ +[role="screenshot"] +image::images/change-point-detection-filter-by-type.png[Change point detection filter by type list] +* Attach change points to a chart or dashboard by using the context menu: ++ +[role="screenshot"] +image::images/change-point-detection-attach-charts.png[Change point detection add to charts menu] diff --git a/docs/en/serverless/machine-learning/aiops-forecast-anomaly.asciidoc b/docs/en/serverless/machine-learning/aiops-forecast-anomaly.asciidoc new file mode 100644 index 0000000000..e7f0f2dad6 --- /dev/null +++ b/docs/en/serverless/machine-learning/aiops-forecast-anomaly.asciidoc @@ -0,0 +1,45 @@ +[[observability-aiops-forecast-anomalies]] += Forecast future behavior + +// :description: Predict future behavior of your data by creating a forecast for an anomaly detection job. +// :keywords: serverless, observability, how-to + +:role: Editor +:goal: create a forecast for an {anomaly-job} +include::../partials/roles.asciidoc[] +:role!: + +:goal!: + +In addition to detecting anomalous behavior in your data, +you can use the {ml} features to predict future behavior. + +You can use a forecast to estimate a time series value at a specific future date. +For example, you might want to determine how much disk usage to expect +next Sunday at 09:00. + +You can also use a forecast to estimate the probability of a time series value occurring at a future date. +For example, you might want to determine how likely it is that your disk utilization will reach 100% before the end of next week. + +To create a forecast: + +. <> and view the results in the **Single Metric Viewer**. +. Click **Forecast**. +. Specify a duration for your forecast. +This value indicates how far to extrapolate beyond the last record that was processed. +You must use time units, for example 1w, 1d, 1h, and so on. +. Click **Run**. +. View the forecast in the **Single Metric Viewer**: ++ +[role="screenshot"] +image::images/anomaly-detection-forecast.png[Single Metric Viewer showing forecast ] ++ +** The line in the chart represents the predicted data values. +** The shaded area represents the bounds for the predicted values, which also gives an indication of the confidence of the predictions. +** Note that the bounds generally increase with time (that is to say, the confidence levels decrease), +since you are forecasting further into the future. +Eventually if the confidence levels are too low, the forecast stops. +. (Optional) After the job has processed more data, click the **Forecast** button again to compare the forecast to actual data. ++ +The resulting chart will contain the actual data values, the bounds for the expected values, the anomalies, the forecast data values, and the bounds for the forecast. +This combination of actual and forecast data gives you an indication of how well the {ml} features can extrapolate the future behavior of the data. diff --git a/docs/en/serverless/machine-learning/aiops-tune-anomaly-detection-job.asciidoc b/docs/en/serverless/machine-learning/aiops-tune-anomaly-detection-job.asciidoc new file mode 100644 index 0000000000..a1d2048d33 --- /dev/null +++ b/docs/en/serverless/machine-learning/aiops-tune-anomaly-detection-job.asciidoc @@ -0,0 +1,184 @@ +[[observability-aiops-tune-anomaly-detection-job]] += Tune your anomaly detection job + +// :description: Tune your job by creating calendars, adding job rules, and defining custom URLs. +// :keywords: serverless, observability, how-to + +:role: Editor +:goal: create calendars, add job rules, and define custom URLs +include::../partials/roles.asciidoc[] +:role!: + +:goal!: + +After you run an anomaly detection job and view the results, +you might find that you need to alter the job configuration or settings. + +To further tune your job, you can: + +* <> that contain a list of scheduled events for which you do not want to generate anomalies, such as planned system outages or public holidays. +* <> that instruct anomaly detectors to change their behavior based on domain-specific knowledge that you provide. +Your job rules can use filter lists, which contain values that you can use to include or exclude events from the {ml} analysis. +* <> to make dashboards and other resources readily available when viewing job results. + +For more information about tuning your job, +refer to the how-to guides in the {ml-docs}/anomaly-how-tos.html[{ml}] documentation. +Note that the {ml} documentation may contain details that are not valid when using a fully-managed Elastic project. + +[TIP] +==== +You can also create calendars and add URLs when configuring settings during job creation, +but generally it's easier to start with a simple job and add complexity later. +==== + +[discrete] +[[create-calendars]] +== Create calendars + +Sometimes there are periods when you expect unusual activity to take place, +such as bank holidays, "Black Friday", or planned system outages. +If you identify these events in advance, no anomalies are generated during that period. +The {ml} model is not ill-affected, and you do not receive spurious results. + +To create a calendar and add scheduled events: + +. In your {obs-serverless} project, go to **AIOps** → **Anomaly detection**. +. Click **Settings**. +. Under **Calendars**, click **Create**. +. Enter an ID and description for the calendar. +. Select the jobs you want to apply the calendar to, or turn on **Apply calendar to all jobs**. +. Under **Events**, click **New event** or click **Import events** to import events from an iCalendar (ICS) file: ++ +[role="screenshot"] +image::images/anomaly-detection-create-calendar.png[Create new calendar page] ++ +A scheduled event must have a start time, end time, and calendar ID. +In general, scheduled events are short in duration (typically lasting from a few hours to a day) and occur infrequently. +If you have regularly occurring events, such as weekly maintenance periods, +you do not need to create scheduled events for these circumstances; +they are already handled by the {ml} analytics. +If your ICS file contains recurring events, only the first occurrence is imported. +. When you're done adding events, save your calendar. + +You must identify scheduled events _before_ your anomaly detection job analyzes the data for that time period. +{ml-cap} results are not updated retroactively. +Bucket results are generated during scheduled events, but they have an anomaly score of zero. + +[TIP] +==== +If you use long or frequent scheduled events, +it might take longer for the {ml} analytics to learn to model your data, +and some anomalous behavior might be missed. +==== + +[discrete] +[[create-job-rules]] +== Create job rules and filters + +By default, anomaly detection is unsupervised, +and the {ml} models have no awareness of the domain of your data. +As a result, anomaly detection jobs might identify events that are statistically significant but are uninteresting when you know the larger context. + +You can customize anomaly detection by creating custom job rules. +_Job rules_ instruct anomaly detectors to change their behavior based on domain-specific knowledge that you provide. +When you create a rule, you can specify conditions, scope, and actions. +When the conditions of a rule are satisfied, its actions are triggered. + +.Example use case for creating a job rule +[NOTE] +==== +If you have an anomaly detector that is analyzing CPU usage, +you might decide you are only interested in anomalies where the CPU usage is greater than a certain threshold. +You can define a rule with conditions and actions that instruct the detector to refrain from generating {ml} results when there are anomalous events related to low CPU usage. +You might also decide to add a scope for the rule so that it applies only to certain machines. +The scope is defined by using {ml} filters. +==== + +_Filters_ contain a list of values that you can use to include or exclude events from the {ml} analysis. +You can use the same filter in multiple anomaly detection jobs. + +.Example use case for creating a filter list +[NOTE] +==== +If you are analyzing web traffic, you might create a filter that contains a list of IP addresses. +The list could contain IP addresses that you trust to upload data to your website or to send large amounts of data from behind your firewall. +You can define the rule's scope so that the action triggers only when a specific field in your data matches (or doesn't match) a value in the filter. +This gives you much greater control over which anomalous events affect the {ml} model and appear in the {ml} results. +==== + +To create a job rule, first create any filter lists you want to use in the rule, then configure the rule: + +. In your {obs-serverless} project, go to **AIOps** → **Anomaly detection**. +. (Optional) Create one or more filter lists: ++ +.. Click **Settings**. +.. Under **Filter lists**, click **Create**. +.. Enter the filter list ID. This is the ID you will select when you want to use the filter list in a job rule. +.. Click **Add item** and enter one item per line. +.. Click **Add** then save the filter list: ++ +[role="screenshot"] +image::images/anomaly-detection-create-filter-list.png[Create filter list] +. Open the job results in the **Single Metric Viewer** or **Anomaly Explorer**. +. From the **Actions** menu in the **Anomalies** table, select **Configure job rules**. ++ +[role="screenshot"] +image::images/anomaly-detection-configure-job-rules.png[Configure job rules menu selection] +. Choose which actions to take when the job rule matches the anomaly: **Skip result**, **Skip model update**, or both. +. Under **Conditions**, add one or more conditions that must be met for the action to be triggered. +. Under **Scope** (if available), add one or more filter lists to limit where the job rule applies. +. Save the job rule. +Note that changes to job rules take effect for new results only. +To apply these changes to existing results, you must clone and rerun the job. + +[discrete] +[[define-custom-urls]] +== Define custom URLs + +You can optionally attach one or more custom URLs to your anomaly detection jobs. +Links for these URLs will appear in the **Actions** menu of the anomalies table when viewing job results in the **Single Metric Viewer** or **Anomaly Explorer**. +Custom URLs can point to dashboards, the Discover app, or external websites. +For example, you can define a custom URL that enables users to drill down to the source data from the results set. + +To add a custom URL to the **Actions** menu: + +. In your {obs-serverless} project, go to **AIOps** → **Anomaly detection**. +. From the **Actions** menu in the job list, select **Edit job**. +. Select the **Custom URLs** tab, then click **Add custom URL**. +. Enter the label to use for the link text. +. Choose the type of resource you want to link to: ++ +|=== +| If you select... | Do this... + +| {kib} dashboard +| Select the dashboard you want to link to. + +| Discover +| Select the data view to use. + +| Other +| Specify the URL for the external website. +|=== +. Click **Test** to test your link. +. Click **Add**, then save your changes. + +Now when you view job results in **Single Metric Viewer** or **Anomaly Explorer**, +the **Actions** menu includes the custom link: + +[role="screenshot"] +image::images/anomaly-detection-custom-url.png[Create filter list] + +[TIP] +==== +It is also possible to use string substitution in custom URLs. +For example, you might have a **Raw data** URL defined as: + +`discover#/?_g=(time:(from:'$earliest$',mode:absolute,to:'$latest$'))&_a=(index:ff959d40-b880-11e8-a6d9-e546fe2bba5f,query:(language:kuery,query:'customer_full_name.keyword:"$customer_full_name.keyword$"'))`. + +The value of the `customer_full_name.keyword` field is passed to the target page when the link is clicked. + +For more information about using string substitution, +refer to the {ml-docs}/ml-configuring-url.html#ml-configuring-url-strings[{ml}] documentation. +Note that the {ml} documentation may contain details that are not valid when using a fully-managed Elastic project. +==== diff --git a/docs/en/serverless/machine-learning/machine-learning.asciidoc b/docs/en/serverless/machine-learning/machine-learning.asciidoc new file mode 100644 index 0000000000..3c49557cee --- /dev/null +++ b/docs/en/serverless/machine-learning/machine-learning.asciidoc @@ -0,0 +1,26 @@ +[[observability-machine-learning]] += Machine learning and AIOps + +++++ +Machine learning +++++ + +// :description: Automate anomaly detection and accelerate root cause analysis with AIOps. +// :keywords: serverless, observability, overview + +The machine learning capabilities available in {obs-serverless} enable you to consume and process large observability data sets at scale, reducing the time and effort required to detect, understand, investigate, and resolve incidents. +Built on predictive analytics and {ml}, our AIOps capabilities require no prior experience with {ml}. +DevOps engineers, SREs, and security analysts can get started right away using these AIOps features with little or no advanced configuration: + +|=== +| Feature | Description + +| <> +| Detect anomalies by comparing real-time and historical data from different sources to look for unusual, problematic patterns. + +| <> +| Find and investigate the causes of unusual spikes or drops in log rates. + +| <> +| Detect distribution changes, trend changes, and other statistically significant change points in a metric of your time series data. +|=== diff --git a/docs/en/serverless/monitor-datasets.asciidoc b/docs/en/serverless/monitor-datasets.asciidoc new file mode 100644 index 0000000000..897253df6a --- /dev/null +++ b/docs/en/serverless/monitor-datasets.asciidoc @@ -0,0 +1,76 @@ +[[observability-monitor-datasets]] += Data set quality monitoring + +++++ +Data set quality +++++ + +// :description: Monitor data sets to find degraded documents. +// :keywords: serverless, observability, how-to + +beta:[] + +The **Data Set Quality** page provides an overview of your log, metric, trace, and synthetic data sets. +Use this information to get an idea of your overall data set quality and find data sets that contain incorrectly parsed documents. + +Access the Data Set Quality page from the main menu at **Project settings** → **Management** → **Data Set Quality**. +By default, the page only shows log data sets. To see other data set types, select them from the **Type** menu. + +.Requirements +[NOTE] +==== +Users with the `viewer` role can view the Data Sets Quality summary. To view the Active Data Sets and Estimated Data summaries, users need the `monitor` {ref}/security-privileges.html#privileges-list-indices[index privilege] for the `logs-*-*` index. +==== + +The quality of your data sets is based on the percentage of degraded documents in each data set. +A degraded document in a data set contains the {ref}/mapping-ignored-field.html[`_ignored`] property because one or more of its fields were ignored during indexing. +Fields are ignored for a variety of reasons. +For example, when the {ref}/mapping-ignored-field.html[`ignore_malformed`] parameter is set to true, if a document field contains the wrong data type, the malformed field is ignored and the rest of the document is indexed. + +From the data set table, you'll find information for each data set such as its namespace, when the data set was last active, and the percentage of degraded docs. +The percentage of degraded documents determines the data set's quality according to the following scale: + +* Good (image:images/green-dot-icon.png[Good icon]): 0% of the documents in the data set are degraded. +* Degraded (image:images/yellow-dot-icon.png[Degraded icon]): Greater than 0% and up to 3% of the documents in the data set are degraded. +* Poor (image:images/red-dot-icon.png[Poor icon]): Greater than 3% of the documents in the data set are degraded. + +Opening the details of a specific data set shows the degraded documents history, a summary for the data set, and other details that can help you determine if you need to investigate any issues. + +[discrete] +[[observability-monitor-datasets-investigate-issues]] +== Investigate issues + +The Data Set Quality page has a couple of different ways to help you find ignored fields and investigate issues. +From the data set table, you can open the data set's details page, and view commonly ignored fields and information about those fields. +Open a logs data set in Logs Explorer or other data set types in Discover to find ignored fields in individual documents. + +[discrete] +[[observability-monitor-datasets-find-ignored-fields-in-data-sets]] +=== Find ignored fields in data sets + +To open the details page for a data set with poor or degraded quality and view ignored fields: + +. From the data set table, click image:images/icons/expand.svg[expand icon] next to a data set with poor or degraded quality. +. From the details, scroll down to **Quality issues**. + +The **Quality issues** section shows fields that have been ignored, the number of documents that contain ignored fields, and the timestamp of last occurrence of the field being ignored. + +[discrete] +[[observability-monitor-datasets-find-ignored-fields-in-individual-logs]] +=== Find ignored fields in individual logs + +To use Logs Explorer or Discover to find ignored fields in individual logs: + +. Find data sets with degraded documents using the **Degraded Docs** column of the data sets table. +. Click the percentage in the **Degraded Docs** column to open the data set in Logs Explorer or Discover. + +The **Documents** table in Logs Explorer or Discover is automatically filtered to show documents that were not parsed correctly. +Under the **actions** column, you'll find the degraded document icon (image:images/icons/indexClose.svg[degraded document icon]). + +Now that you know which documents contain ignored fields, examine them more closely to find the origin of the issue: + +. Under the **actions** column, click image:images/icons/expand.svg[expand icon] to open the document details. +. Select the **JSON** tab. +. Scroll towards the end of the JSON to find the `ignored_field_values`. + +Here, you'll find all of the `_ignored` fields in the document and their values, which should provide some clues as to why the fields were ignored. diff --git a/docs/en/serverless/observability-get-started.asciidoc b/docs/en/serverless/observability-get-started.asciidoc new file mode 100644 index 0000000000..30defd72be --- /dev/null +++ b/docs/en/serverless/observability-get-started.asciidoc @@ -0,0 +1,78 @@ +[[observability-get-started]] += Get started with Elastic Observability + +++++ +Get started +++++ + +New to Elastic {observability}? Discover more about our observability features and how to get started. + +[discrete] +== Learn about Elastic {observability} + +Learn about key features available to help you get value from your observability data and what it will cost you: + +* <> +* <> + +[discrete] +[[get-started-with-use-case]] +== Get started with your use case + +Learn how to create an Observability project and use Elastic +Observability to gain deeper insight into the behavior of your applications and +systems. + +image::images/get-started.svg[] + +1. **Choose your source.** Elastic integrates with hundreds of data sources for +unified visibility across all your applications and systems. + +2. **Ingest your data.** Turn-key integrations provide a repeatable workflow to +ingest data from all your sources: you install an integration, configure it, and +deploy an agent to collect your data. + +3. **View your data.** Navigate seamlessly between Observabilty UIs and +dashboards to identify and resolve problems quickly. + +4. **Customize.** Expand your deployment and add features like alerting and anomaly +detection. + +To get started, <>, +then follow one of our <> to learn how to ingest and visualize your observability data. + +[discrete] +[[quickstarts-overview]] +=== Quickstarts + +Our quickstarts dramatically reduce your time-to-value by offering a fast path to ingest and visualize your Observability data. +Each quickstart provides: + +* A highly opinionated, fast path to data ingestion +* Sensible configuration defaults with minimal configuration required +* Auto-detection of logs and metrics for monitoring hosts +* Quick access to related dashboards and visualizations + +Follow the steps in these guides to get started quickly: + +* <> +* <> +* <> + +[discrete] +=== Get started with other features + +Want to use {fleet} or some other feature not covered in the quickstarts? +Follow the steps in these guides to get started: + +* <> +* <> +* <> + +[discrete] +== Additional guides + +Ready to dig into more features of Elastic Observability? See these guides: + +* <> +* <> diff --git a/docs/en/serverless/observability-overview.asciidoc b/docs/en/serverless/observability-overview.asciidoc new file mode 100644 index 0000000000..8100cac66b --- /dev/null +++ b/docs/en/serverless/observability-overview.asciidoc @@ -0,0 +1,149 @@ +[[observability-serverless-observability-overview]] += Observability overview + +// :description: Learn how to accelerate problem resolution with open, flexible, and unified observability powered by advanced machine learning and analytics. +// :keywords: serverless, observability, overview + +{obs-serverless} provides granular insights and context into the behavior of applications running in your environments. +It's an important part of any system that you build and want to monitor. +Being able to detect and fix root cause events quickly within an observable system is a minimum requirement for any analyst. + +{obs-serverless} provides a single stack to unify your logs, metrics, and application traces. +Ingest your data directly to your Observability project, where you can further process and enhance the data, +before visualizing it and adding alerts. + +image::images/serverless-capabilities.svg[{obs-serverless} overview diagram] + +[discrete] +[[apm-overview]] +== Log monitoring + +Analyze log data from your hosts, services, Kubernetes, Apache, and many more. + +In **Logs Explorer** (powered by Discover), you can quickly search and filter your log data, +get information about the structure of the fields, and display your findings in a visualization. + +[role="screenshot"] +image::images/log-explorer-overview.png[Logs Explorer showing log events] + +<> + +// RUM is not supported for this release. + +// Synthetic monitoring is not supported for this release. + +// Universal Profiling is not supported for this release. + +[discrete] +[[observability-serverless-observability-overview-application-performance-monitoring-apm]] +== Application performance monitoring (APM) + +Instrument your code and collect performance data and errors at runtime by installing APM agents like Java, Go, .NET, and many more. +Then use {obs-serverless} to monitor your software services and applications in real time: + +* Visualize detailed performance information on your services. +* Identify and analyze errors. +* Monitor host-level and APM agent-specific metrics like JVM and Go runtime metrics. + +The **Service** inventory provides a quick, high-level overview of the health and general performance of all instrumented services. + +[role="screenshot"] +image::images/services-inventory.png[Service inventory showing health and performance of instrumented services] + +<> + +[discrete] +[[metrics-overview]] +== Infrastructure monitoring + +Monitor system and service metrics from your servers, Docker, Kubernetes, Prometheus, and other services and applications. + +The **Infrastructure** UI provides a couple ways to view and analyze metrics across your infrastructure: + +The **Inventory** page provides a view of your infrastructure grouped by resource type. + +[role="screenshot"] +image::images/metrics-app.png[{infrastructure-app} in {kib}] + +The **Hosts** page provides a dashboard-like view of your infrastructure and is backed by an easy-to-use interface called Lens. + +[role="screenshot"] +image::images/hosts.png[Screenshot of the Hosts page] + +From either page, you can view health and performance metrics to get visibility into the overall health of your infrastructure. +You can also drill down into details about a specific host, including performance metrics, host metadata, running processes, +and logs. + +<> + +[discrete] +[[observability-serverless-observability-overview-synthetic-monitoring]] +== Synthetic monitoring + +Simulate actions and requests that an end user would perform on your site at predefined intervals and in a controlled environment. +The end result is rich, consistent, and repeatable data that you can trend and alert on. + +For more information, see <>. + +[discrete] +[[observability-serverless-observability-overview-alerting]] +== Alerting + +Stay aware of potential issues in your environments with {obs-serverless}’s alerting +and actions feature that integrates with log monitoring and APM. +It provides a set of built-in actions and specific threshold rules +and enables central management of all rules. + +On the **Alerts** page, the **Alerts** table provides a snapshot of alerts occurring within the specified time frame. The table includes the alert status, when it was last updated, the reason for the alert, and more. + +[role="screenshot"] +image::images/observability-alerts-overview.png[Summary of Alerts on the {obs-serverless} overview page] + +<> + +[discrete] +[[observability-serverless-observability-overview-service-level-objectives-slos]] +== Service-level objectives (SLOs) + +Set clear, measurable targets for your service performance, +based on factors like availability, response times, error rates, and other key metrics. +Then monitor and track your SLOs in real time, +using detailed dashboards and alerts that help you quickly identify and troubleshoot issues. + +From the SLO overview list, you can see all of your SLOs and a quick summary of what’s happening in each one: + +[role="screenshot"] +image::images/slo-dashboard.png[Dashboard showing list of SLOs] + +<> + +[discrete] +[[observability-serverless-observability-overview-cases]] +== Cases + +Collect and share information about observability issues by creating cases. +Cases allow you to track key investigation details, +add assignees and tags to your cases, set their severity and status, and add alerts, +comments, and visualizations. You can also send cases to third-party systems, +such as ServiceNow and Jira. + +[role="screenshot"] +image::images/cases.png[Screenshot showing list of cases] + +<> + +[discrete] +[[observability-serverless-observability-overview-aiops]] +== Machine learning and AIOps + +Reduce the time and effort required to detect, understand, investigate, and resolve incidents at scale +by leveraging predictive analytics and machine learning: + +* Detect anomalies by comparing real-time and historical data from different sources to look for unusual, problematic patterns. +* Find and investigate the causes of unusual spikes or drops in log rates. +* Detect distribution changes, trend changes, and other statistically significant change points in a metric of your time series data. + +[role="screenshot"] +image::images/log-rate-analysis.png[Log rate analysis page showing log rate spike ] + +<> diff --git a/docs/en/serverless/quickstarts/collect-data-with-aws-firehose.asciidoc b/docs/en/serverless/quickstarts/collect-data-with-aws-firehose.asciidoc new file mode 100644 index 0000000000..fdf8f4514e --- /dev/null +++ b/docs/en/serverless/quickstarts/collect-data-with-aws-firehose.asciidoc @@ -0,0 +1,130 @@ +[[collect-data-with-aws-firehose]] += Quickstart: Collect data with AWS Firehose + +In this quickstart guide, you'll learn how to use AWS Firehose to send logs and metrics to Elastic. + +The AWS Firehose streams are created using a CloudFormation template, which can collect all available CloudWatch logs and metrics for your AWS account. + +This approach requires minimal configuration as the CloudFormation template creates a Firehose stream, enables CloudWatch metrics collection across all namespaces, and sets up an account-level subscription filter for CloudWatch log groups to send logs to Elastic via Firehose. +You can use an AWS CLI command or upload the template to the AWS CloudFormation portal to customize the following parameter values: + +[%collapsible] +.Required Input Parameters +==== +* `ElasticEndpointURL`: Elastic endpoint URL. +* `ElasticAPIKey`: Elastic API Key. +==== + +[%collapsible] +.Optional Input Parameters +==== +* `HttpBufferInterval`: The Kinesis Firehose HTTP buffer interval, in seconds. Default is `60`. +* `HttpBufferSize`: The Kinesis Firehose HTTP buffer size, in MiB. Default is `1`. +* `S3BackupMode`: Source record backup in Amazon S3, failed data only or all data. Default is `FailedDataOnly`. +* `S3BufferInterval`: The Kinesis Firehose S3 buffer interval, in seconds. Default is `300`. +* `S3BufferSize`: The Kinesis Firehose S3 buffer size, in MiB. Default is `5`. +* `S3BackupBucketARN`: By default, an S3 bucket for backup will be created. You can override this behaviour by providing an ARN of an existing S3 bucket that ensures the data can be recovered if record processing transformation does not produce the desired results. +* `Attributes`: List of attribute name-value pairs for HTTP endpoint separated by commas. For example "name1=value1,name2=value2". +==== + +[%collapsible] +.Optional Input Parameters Specific for Metrics +==== +* `EnableCloudWatchMetrics`: Enable CloudWatch Metrics collection. Default is `true`. When CloudWatch metrics collection is enabled, by default a metric stream will be created with metrics from all namespaces. +* `FirehoseStreamNameForMetrics`: Name for Amazon Data Firehose Stream for collecting CloudWatch metrics. Default is `elastic-firehose-metrics`. +* `IncludeOrExclude`: Select the metrics you want to stream. You can include or exclude specific namespaces and metrics. If no filter namespace is given, then default to all namespaces. Default is `Include`. +* `MetricNameFilters`: Comma-delimited list of namespace-metric names pairs to use for filtering metrics from the stream. If no metric name filter is given, then default to all namespaces and all metrics. For example "AWS/EC2:CPUUtilization|NetworkIn|NetworkOut,AWS/RDS,AWS/S3:AllRequests". +* `IncludeLinkedAccountsMetrics`: If you are creating a metric stream in a monitoring account, specify `true` to include metrics from source accounts that are linked to this monitoring account, in the metric stream. Default is `false`. +* `Tags`: Comma-delimited list of tags to apply to the metric stream. For example "org:eng,project:firehose". +==== + +[%collapsible] +.Optional Input Parameters Specific for Logs +==== +* `EnableCloudWatchLogs`: Enable CloudWatch Logs collection. Default is `true`. When CloudWatch logs collection is enabled, an account-level subscription filter policy is created for all CloudWatch log groups (except the log groups created for Firehose logs). +* `FirehoseStreamNameForLogs`: Name for Amazon Data Firehose Stream for collecting CloudWatch logs. Default is `elastic-firehose-logs`. +==== + +IMPORTANT: Some AWS services need additional manual configuration to properly ingest logs and metrics. For more information, check the +link:https://www.elastic.co/docs/current/integrations/aws[AWS integration] documentation. + +Data collection with AWS Firehose is supported on Amazon Web Services. + +[discrete] +== Prerequisites + +* An {obs-serverless} project. To learn more, refer to <>. +* A user with the **Admin** role or higher—required to onboard system logs and metrics. To learn more, refer to <>. +* An active AWS account and the necessary permissions to create delivery streams. + +NOTE: The default CloudFormation stack is created in the AWS region selected for the user's account. This region can be modified either through the AWS Console interface or by specifying a `--region` parameter in the AWS CLI command when creating the stack. + +[discrete] +== Limitations + +The AWS Firehose receiver has the following limitations: + +* It does not support AWS PrivateLink. +* The CloudFormation template detects and ingests logs and metrics within a single AWS region only. + +The following table shows the type of data ingested by the supported AWS services: + +|=== +| AWS Service | Data type + +| VPC Flow Logs |Logs +| API Gateway|Logs, Metrics +| CloudTrail | Logs +| Network Firewall | Logs, Metrics +| Route53 | Logs +| WAF | Logs +| DynamoDB | Metrics +| EBS | Metrics +| EC2 | Metrics +| ECS | Metrics +| ELB | Metrics +| EMR | Metrics +| MSK | Metrics +| Kinesis Data Stream | Metrics +| Lambda | Metrics +| NAT Gateway | Metrics +| RDS | Metrics +| S3 | Metrics +| SNS | Metrics +| SQS | Metrics +| Transit Gateway | Metrics +| AWS Usage | Metrics +| VPN | Metrics +| Uncategorized Firehose Logs | Logs + +|=== + +[discrete] +== Collect your data + +. <>, or open an existing one. +. In your {obs-serverless} project, go to **Add Data**. +. Go to **Cloud** > **AWS**, and then select **AWS Firehose**. ++ +[role="screenshot"] +image::images/quickstart-aws-firehose-entry-point.png[AWS Firehose entry point] + +. Click **Create Firehose Stream in AWS** to create a CloudFormation stack from the CloudFormation template. + +. Go back to the **Add Observability Data** page. + +[discrete] +== Visualize your data + +After installation is complete and all relevant data is flowing into Elastic, +the **Visualize your data** section allows you to access the different dashboards for the various services. + +[role="screenshot"] +image::images/quickstart-aws-firehose-dashboards.png[AWS Firehose dashboards] + +Here is an example of the VPC Flow logs dashboard: + +[role="screenshot"] +image::images/quickstart-aws-firehose-vpc-flow.png[AWS Firehose VPC flow] + +Refer to <> for a description of other useful features. diff --git a/docs/en/serverless/quickstarts/k8s-logs-metrics.asciidoc b/docs/en/serverless/quickstarts/k8s-logs-metrics.asciidoc new file mode 100644 index 0000000000..aad133d18b --- /dev/null +++ b/docs/en/serverless/quickstarts/k8s-logs-metrics.asciidoc @@ -0,0 +1,51 @@ +[[observability-quickstarts-k8s-logs-metrics]] += Quickstart: Monitor your Kubernetes cluster with Elastic Agent + +// :description: Learn how to monitor your cluster infrastructure running on Kubernetes. +// :keywords: serverless, observability, how-to + +In this quickstart guide, you'll learn how to create the Kubernetes resources that are required to monitor your cluster infrastructure. + +This new approach requires minimal configuration and provides you with an easy setup to monitor your infrastructure. You no longer need to download, install, or configure the Elastic Agent, everything happens automatically when you run the kubectl command. + +The kubectl command installs the standalone Elastic Agent in your Kubernetes cluster, downloads all the Kubernetes resources needed to collect metrics from the cluster, and sends it to Elastic. + +[discrete] +[[observability-quickstarts-k8s-logs-metrics-prerequisites]] +== Prerequisites + +* An {obs-serverless} project. To learn more, refer to <>. +* A user with the **Admin** role or higher—required to onboard system logs and metrics. To learn more, refer to <>. +* A running Kubernetes cluster. +* https://kubernetes.io/docs/reference/kubectl/[Kubectl]. + +[discrete] +[[observability-quickstarts-k8s-logs-metrics-collect-your-data]] +== Collect your data + +. <>, or open an existing one. +. In your {obs-serverless} project, go to **Add Data**. +. Select **Monitor infrastructure**, and then select **Kubernetes**. ++ +[role="screenshot"] +image::images/quickstart-k8s-entry-point.png[Kubernetes entry point] +. To install the Elastic Agent on your host, copy and run the install command. ++ +You will use the kubectl command to download a manifest file, inject user's API key generated by Kibana, and create the Kubernetes resources. +. Go back to the **Add Observability Data** page. +There might be a slight delay before data is ingested. When ready, you will see the message **We are monitoring your cluster**. +. Click **Explore Kubernetes cluster** to navigate to dashboards and explore your data. + +[discrete] +[[observability-quickstarts-k8s-logs-metrics-visualize-your-data]] +== Visualize your data + +After installation is complete and all relevant data is flowing into Elastic, +the **Visualize your data** section allows you to access the Kubernetes Cluster Overview dashboard that can be used to monitor the health of the cluster. + +[role="screenshot"] +image::images/quickstart-k8s-overview.png[Kubernetes overview dashboard] + +Furthermore, you can access other useful prebuilt dashboards for monitoring Kubernetes resources, for example running pods per namespace, as well as the resources they consume, like CPU and memory. + +Refer to <> for a description of other useful features. diff --git a/docs/en/serverless/quickstarts/monitor-hosts-with-elastic-agent.asciidoc b/docs/en/serverless/quickstarts/monitor-hosts-with-elastic-agent.asciidoc new file mode 100644 index 0000000000..77c781c4dc --- /dev/null +++ b/docs/en/serverless/quickstarts/monitor-hosts-with-elastic-agent.asciidoc @@ -0,0 +1,126 @@ +[[observability-quickstarts-monitor-hosts-with-elastic-agent]] += Quickstart: Monitor hosts with {agent} + +// :description: Learn how to scan your hosts to detect and collect logs and metrics. +// :keywords: serverless, observability, how-to + +In this quickstart guide, you'll learn how to scan your host to detect and collect logs and metrics, +then navigate to dashboards to further analyze and explore your observability data. +You'll also learn how to get value out of your observability data. + +To scan your host, you'll run an auto-detection script that downloads and installs {agent}, +which is used to collect observability data from the host and send it to Elastic. + +The script also generates an {agent} configuration file that you can use with your existing Infrastructure-as-Code tooling. + +[discrete] +[[observability-quickstarts-monitor-hosts-with-elastic-agent-prerequisites]] +== Prerequisites + +* An {obs-serverless} project. To learn more, refer to <>. +* A user with the **Admin** role or higher—required to onboard system logs and metrics. To learn more, refer to <>. +* Root privileges on the host—required to run the auto-detection script used in this quickstart. + +[discrete] +[[observability-quickstarts-monitor-hosts-with-elastic-agent-limitations]] +== Limitations + +* The auto-detection script currently scans for metrics and logs from Apache, Docker, Nginx, and the host system. +It also scans for custom log files. +* The auto-detection script works on Linux and MacOS only. Support for the `lsof` command is also required if you want to detect custom log files. +* If you've installed Apache or Nginx in a non-standard location, you'll need to specify log file paths manually when you run the scan. +* Because Docker Desktop runs in a VM, its logs are not auto-detected. + +[discrete] +[[observability-quickstarts-monitor-hosts-with-elastic-agent-collect-your-data]] +== Collect your data + +. <>, or open an existing one. +. In your {obs-serverless} project, go to **Add Data**. +. Select **Collect and analyze logs**, and then select **Auto-detect logs and metrics**. +. Copy the command that's shown. For example: ++ +[role="screenshot"] +image::images/quickstart-autodetection-command.png[Quick start showing command for running auto-detection] ++ +You'll run this command to download the auto-detection script and scan your system for observability data. +. Open a terminal on the host you want to scan, and run the command. +. Review the list of log files: ++ +** Enter `Y` to ingest all the log files listed. +** Enter `n` to either exclude log files or specify additional log paths. Enter `Y` to confirm your selections. + +When the script is done, you'll see a message like "{agent} is configured and running." + +There might be a slight delay before logs and other data are ingested. + +.Need to scan your host again? +[NOTE] +==== +You can re-run the script on the same host to detect additional logs. +The script will scan the host and reconfigure {agent} with any additional logs that are found. +If the script misses any custom logs, you can add them manually by entering `n` after the script has finished scanning the host. +==== + +[discrete] +[[observability-quickstarts-monitor-hosts-with-elastic-agent-visualize-your-data]] +== Visualize your data + +After installation is complete and all relevant data is flowing into Elastic, +the **Visualize your data** section will show links to assets you can use to analyze your data. +Depending on what type of observability data was collected, +the page may link to the following integration assets: + +|=== +| Integration asset | Description + +| **System** +| Prebuilt dashboard for monitoring host status and health using system metrics. + +| **Apache** +| Prebuilt dashboard for monitoring Apache HTTP server health using error and access log data. + +| **Docker** +| Prebuilt dashboard for monitoring the status and health of Docker containers. + +| **Nginx** +| Prebuilt dashboard for monitoring Nginx server health using error and access log data. + +| **Custom .log files** +| Logs Explorer for analyzing custom logs. +|=== + +For example, you can navigate the **Host overview** dashboard to explore detailed metrics about system usage and throughput. +Metrics that indicate a possible problem are highlighted in red. + +[role="screenshot"] +image::images/quickstart-host-overview.png[Host overview dashboard] + +[discrete] +[[observability-quickstarts-monitor-hosts-with-elastic-agent-get-value-out-of-your-data]] +== Get value out of your data + +After using the dashboards to examine your data and confirm you've ingested all the host logs and metrics you want to monitor, +you can use {obs-serverless} to gain deeper insight into your data. + +For host monitoring, the following capabilities and features are recommended: + +* In the <>, analyze and compare data collected from your hosts. +You can also: ++ +** <> for memory usage and network traffic on hosts. +** <> that notify you when an anomaly is detected or a metric exceeds a given value. +* In the <>, search and filter your log data, +get information about the structure of log fields, and display your findings in a visualization. +You can also: ++ +** <> to find degraded documents. +** <> to find patterns in unstructured log messages. +** <> that notify you when an Observability data type reaches or exceeds a given value. +* Use <> to apply predictive analytics and machine learning to your data: ++ +** <> by comparing real-time and historical data from different sources to look for unusual, problematic patterns. +** <>. +** <> in your time series data. + +Refer to <> for a description of other useful features. diff --git a/docs/en/serverless/redirects.asciidoc b/docs/en/serverless/redirects.asciidoc new file mode 100644 index 0000000000..8c9bffd8b7 --- /dev/null +++ b/docs/en/serverless/redirects.asciidoc @@ -0,0 +1,14 @@ +["appendix",role="exclude",id="redirects"] += Deleted pages + +The following pages have moved or been deleted. + +[role="exclude",id="observability-technical-preview-limitations"] +=== Technical preview limitations + +Refer to <>. + +[role="exclude",id="observability-aiops"] +=== AIOps + +Refer to <>. diff --git a/docs/en/serverless/reference.asciidoc b/docs/en/serverless/reference.asciidoc new file mode 100644 index 0000000000..8774ed826c --- /dev/null +++ b/docs/en/serverless/reference.asciidoc @@ -0,0 +1,7 @@ +[[reference]] += Reference + +This section contains reference information related to using Elastic {observability}. + +* <> +* <> diff --git a/docs/en/serverless/reference/elastic-entity-model.asciidoc b/docs/en/serverless/reference/elastic-entity-model.asciidoc new file mode 100644 index 0000000000..ecd42912c7 --- /dev/null +++ b/docs/en/serverless/reference/elastic-entity-model.asciidoc @@ -0,0 +1,57 @@ +[[observability-elastic-entity-model]] += Elastic Entity Model + +// :description: Learn about the model that empowers entity-centric Elastic solution features and workflows. +// :keywords: serverless, observability, overview + +The Elastic Entity Model consists of: + +* a data model and related entity indices +* an Entity Discovery Framework, which consists of {ref}/transforms.html[transforms] and {ref}/ingest.html[Ingest pipelines] that read from signal indices and write data to entity indices +* a set of management APIs that empower entity-centric Elastic solution features and workflows + +In Elastic Observability, +an _entity_ is an object of interest that can be associated with produced telemetry and identified as unique. +Note that this definition is intentionally closely aligned to the work of the https://github.com/open-telemetry/oteps/blob/main/text/entities/0256-entities-data-model.md#data-model[OpenTelemetry Entities SIG]. +Examples of entities include (but are not limited to) services, hosts, and containers. + +The concept of an entity is important as a means to unify observability signals based on the underlying entity that the signals describe. + +.Notes +[NOTE] +==== +The Elastic Entity Model currently supports the <> limited to service, host, and container entities. +==== + +[discrete] +[[observability-elastic-entity-model-enable-the-elastic-entity-model]] +== Enable the Elastic Entity Model + +:role: Admin +:goal: enable the Elastic Entity Model +include::../partials/roles.asciidoc[] +:role!: + +:goal!: + +You can enable the Elastic Entity Model from the new <>. If already enabled, you will not be prompted to enable the Elastic Entity Model. + +[discrete] +[[observability-elastic-entity-model-disable-the-elastic-entity-model]] +== Disable the Elastic Entity Model + +:role: Admin +:goal: enable the Elastic Entity Model +include::../partials/roles.asciidoc[] +:role!: + +:goal!: + +From the Dev Console, run the command: `DELETE kbn:/internal/entities/managed/enablement` + +[discrete] +[[observability-elastic-entity-model-limitations]] +== Limitations + +* https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cross-cluster-search.html[Cross-cluster search (CCS)] is not supported. EEM cannot leverage data stored on a remote cluster. +* Services are only detected from documents where `service.name` is detected in index patterns that match either `logs-*` or `apm-*`. diff --git a/docs/en/serverless/reference/metrics-app-fields.asciidoc b/docs/en/serverless/reference/metrics-app-fields.asciidoc new file mode 100644 index 0000000000..1e076018c6 --- /dev/null +++ b/docs/en/serverless/reference/metrics-app-fields.asciidoc @@ -0,0 +1,295 @@ +[[observability-infrastructure-monitoring-required-fields]] += Infrastructure app fields + +// :description: Learn about the fields required to display data in the Infrastructure UI. +// :keywords: serverless, observability, reference + +This section lists the fields the Infrastructure UI uses to display data. +Please note that some of the fields listed here are not {ecs-ref}/ecs-reference.html#_what_is_ecs[ECS fields]. + +[discrete] +[[observability-infrastructure-monitoring-required-fields-additional-field-details]] +== Additional field details + +The `event.dataset` field is required to display data properly in some views. This field +is a combination of `metricset.module`, which is the {metricbeat} module name, and `metricset.name`, +which is the metricset name. + +To determine each metric's optimal time interval, all charts use `metricset.period`. +If `metricset.period` is not available, then it falls back to 1 minute intervals. + +[discrete] +[[base-fields]] +== Base fields + +The `base` field set contains all fields which are on the top level. These fields are common across all types of events. + +|=== +| Field | Description | Type + +| `@timestamp` +a| Date/time when the event originated. + +This is the date/time extracted from the event, typically representing when the source generated the event. +If the event source has no original timestamp, this value is typically populated by the first time the pipeline received the event. +Required field for all events. + +Example: `May 27, 2020 @ 15:22:27.982` +| date + +| `message` +a| For log events the message field contains the log message, optimized for viewing in a log viewer. + +For structured logs without an original message field, other fields can be concatenated to form a human-readable summary of the event. + +If multiple messages exist, they can be combined into one message. + +Example: `Hello World` +| text +|=== + +[discrete] +[[host-fields]] +== Hosts fields + +These fields must be mapped to display host data in the {infrastructure-app}. + +|=== +| Field | Description | Type + +| `host.name` +a| Name of the host. + +It can contain what `hostname` returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use. + +Example: `MacBook-Elastic.local` +| keyword + +| `host.ip` +| IP of the host that records the event. +| ip +|=== + +[discrete] +[[docker-fields]] +== Docker container fields + +These fields must be mapped to display Docker container data in the {infrastructure-app}. + +|=== +| Field | Description | Type + +| `container.id` +a| Unique container id. + +Example: `data` +| keyword + +| `container.name` +| Container name. +| keyword + +| `container.ip_address` +a| IP of the container. + +_Not an ECS field_ +| ip +|=== + +[discrete] +[[kubernetes-fields]] +== Kubernetes pod fields + +These fields must be mapped to display Kubernetes pod data in the {infrastructure-app}. + +|=== +| Field | Description | Type + +| `kubernetes.pod.uid` +a| Kubernetes Pod UID. + +Example: `8454328b-673d-11ea-7d80-21010a840123` + +_Not an ECS field_ +| keyword + +| `kubernetes.pod.name` +a| Kubernetes pod name. + +Example: `nginx-demo` + +_Not an ECS field_ +| keyword + +| `kubernetes.pod.ip` +a| IP of the Kubernetes pod. + +_Not an ECS field_ +| keyword +|=== + +[discrete] +[[aws-ec2-fields]] +== AWS EC2 instance fields + +These fields must be mapped to display EC2 instance data in the {infrastructure-app}. + +|=== +| Field | Description | Type + +| `cloud.instance.id` +a| Instance ID of the host machine. + +Example: `i-1234567890abcdef0` +| keyword + +| `cloud.instance.name` +| Instance name of the host machine. +| keyword + +| `aws.ec2.instance.public.ip` +a| Instance public IP of the host machine. + +_Not an ECS field_ +| keyword +|=== + +[discrete] +[[aws-s3-fields]] +== AWS S3 bucket fields + +These fields must be mapped to display S3 bucket data in the {infrastructure-app}. + +|=== +| Field | Description | Type + +| `aws.s3.bucket.name` +a| The name or ID of the AWS S3 bucket. + +_Not an ECS field_ +| keyword +|=== + +[discrete] +[[aws-sqs-fields]] +== AWS SQS queue fields + +These fields must be mapped to display SQS queue data in the {infrastructure-app}. + +|=== +| Field | Description | Type + +| `aws.sqs.queue.name` +a| The name or ID of the AWS SQS queue. + +_Not an ECS field_ +| keyword +|=== + +[discrete] +[[aws-rds-fields]] +== AWS RDS database fields + +These fields must be mapped to display RDS database data in the {infrastructure-app}. + +|=== +| Field | Description | Type + +| `aws.rds.db_instance.arn` +a| Amazon Resource Name (ARN) for each RDS. + +_Not an ECS field_ +| keyword + +| `aws.rds.db_instance.identifier` +a| Contains a user-supplied database identifier. This identifier is the unique key that identifies a DB instance. + +_Not an ECS field_ +| keyword +|=== + +[discrete] +[[group-inventory-fields]] +== Additional grouping fields + +Depending on which entity you select in the **Inventory** view, these additional fields can be mapped to group entities by. + +|=== +| Field | Description | Type + +| `cloud.availability_zone` +a| Availability zone in which this host is running. + +Example: `us-east-1c` +| keyword + +| `cloud.machine.type` +a| Machine type of the host machine. + +Example: `t2.medium` +| keyword + +| `cloud.region` +a| Region in which this host is running. + +Example: `us-east-1` +| keyword + +| `cloud.instance.id` +a| Instance ID of the host machine. + +Example: `i-1234567890abcdef0` +| keyword + +| `cloud.provider` +a| Name of the cloud provider. Example values are `aws`, `azure`, `gcp`, or `digitalocean`. + +Example: `aws` +| keyword + +| `cloud.instance.name` +| Instance name of the host machine. +| keyword + +| `cloud.project.id` +a| Name of the project in Google Cloud. + +_Not an ECS field_ +| keyword + +| `service.type` +a| The type of service data is collected from. + +The type can be used to group and correlate logs and metrics from one service type. + +For example, the service type for metrics collected from {es} is `elasticsearch`. + +Example: `elasticsearch` + +_Not an ECS field_ +| keyword + +| `host.hostname` +a| Name of the host. This field is required if you want to use {ml-features} + +It normally contains what the `hostname` command returns on the host machine. + +Example: `Elastic.local` +| keyword + +| `host.os.name` +a| Operating system name, without the version. + +Multi-fields: + +os.name.text (type: text) + +Example: `Mac OS X` +| keyword + +| `host.os.kernel` +a| Operating system kernel version as a raw string. + +Example: `4.4.0-112-generic` +| keyword +|=== diff --git a/docs/en/serverless/slos/slos.asciidoc b/docs/en/serverless/slos/slos.asciidoc new file mode 100644 index 0000000000..cddcf857bf --- /dev/null +++ b/docs/en/serverless/slos/slos.asciidoc @@ -0,0 +1,104 @@ +[[observability-slos]] += Service-level objectives (SLOs) + +// :description: Set clear, measurable targets for your service performance with service-level objectives (SLOs). +// :keywords: serverless, observability, overview + +SLOs allow you to set clear, measurable targets for your service performance, based on factors like availability, response times, error rates, and other key metrics. +You can define SLOs based on different types of data sources, such as custom KQL queries and APM latency or availability data. + +Once you've defined your SLOs, you can monitor them in real time, with detailed dashboards and alerts that help you quickly identify and troubleshoot any issues that may arise. +You can also track your progress against your SLO targets over time, with a clear view of your error budgets and burn rates. + +[discrete] +[[slo-important-concepts]] +== Important concepts + +The following table lists some important concepts related to SLOs: + +|=== +| | + +| **Service-level indicator (SLI)** +| The measurement of your service's performance, such as service latency or availability. + +| **SLO** +| The target you set for your SLI. It specifies the level of performance you expect from your service over a period of time. + +| **Error budget** +| The amount of time that your SLI can fail to meet the SLO target before it violates your SLO. + +| **Burn rate** +| The rate at which your service consumes your error budget. +|=== + +[discrete] +[[slo-in-elastic]] +== SLO overview + +From the SLO overview, you can see all of your SLOs and a quick summary of what's happening in each one: + +[role="screenshot"] +image::images/slo-dashboard.png[Dashboard showing list of SLOs] + +Select an SLO from the overview to see additional details including: + +* **Burn rate:** the percentage of bad events over different time periods (1h, 6h, 24h, 72h) and the risk of exhausting your error budget within those time periods. +* **Historical SLI:** the SLI value and how it's trending over the SLO time window. +* **Error budget burn down:** the remaining error budget and how it's trending over the SLO time window. +* **Alerts:** active alerts if you've set any <> for the SLO. + +[role="screenshot"] +image::images/slo-detailed-view.png[Detailed view of a single SLO] + +[discrete] +[[filter-SLOs]] +== Search and filter SLOs + +You can apply searches and filters to quickly find the SLOs you're interested in. + +[role="screenshot"] +image::images/slo-filtering-options.png[Options for filtering SLOs in the overview] + +* **Apply structured filters:** Next to the search field, click the **Add filter** image:images/icons/plusInCircleFilled.svg[Add filter icon] icon to add a custom filter. Notice that you can use `OR` and `AND` to combine filters. The structured filter can be disabled, inverted, or pinned across all apps. +* **Enter a semi-structured search:** In the search field, start typing a field name to get suggestions for field names and operators that you can use to build a structured query. The semi-structured search will filter SLOs for matches, and only return matching SLOs. +* Use the **Status** and **Tags** menus to include or exclude SLOs from the view based on the status or defined tags. + +There are also options to sort and group the SLOs displayed in the overview: + +[role="screenshot"] +image::images/slo-group-by.png[SLOs sorted by SLO status and grouped by tags] + +* **Sort by**: SLI value, SLO status, Error budget consumed, or Error budget remaining. +* **Group by**: None, Tags, Status, or SLI type. +* Click icons to switch between a card view (image:images/icons/apps.svg[Card view icon]), list view (image:images/icons/list.svg[List view icon]), or compact view (image:images/icons/tableDensityCompact.svg[Compact view icon]]). + +[discrete] +[[observability-slos-slo-dashboard-panels]] +== SLO dashboard panels + +SLO data is also available as Dashboard _panels_. +Panels allow you to curate custom data views and visualizations to bring clarity to your data. + +Available SLO panels include: + +* **SLO Overview**: Visualize a selected SLO's health, including name, current SLI value, target, and status. +* **SLO Alerts**: Visualize one or more SLO alerts, including status, rule name, duration, and reason. In addition, configure and update alerts, or create cases directly from the panel. + +[role="screenshot"] +image::images/slo-dashboard-panel.png[Detailed view of an SLO dashboard panel] + +To learn more about Dashboards, see <>. + +[discrete] +[[slo-overview-next-steps]] +== Next steps + +Get started using SLOs to measure your service performance: + +// TODO: Find out if any special privileges are required to grant access to SLOs and document as required. Classic doclink was Configure SLO access + +* <> +* <> +* <> +* <> diff --git a/docs/en/serverless/what-is-observability-serverless.asciidoc b/docs/en/serverless/what-is-observability-serverless.asciidoc new file mode 100644 index 0000000000..d75b81b35a --- /dev/null +++ b/docs/en/serverless/what-is-observability-serverless.asciidoc @@ -0,0 +1,27 @@ +// :keywords: serverless, observability, overview + +Elastic Observability accelerates problem resolution with open, flexible, and unified observability powered by advanced machine learning and analytics. Elastic ingests all operational and business telemetry and correlates for faster root cause detection. + +Not using serverless? Go to the {observability-guide}/index.html[Elastic Observability docs]. + +[discrete] +== Get started + +* <>: Discover more about our observability features and how to get started. +* <>: Scan your host to detect and collect logs and metrics. +* <>: Create the Kubernetes resources that are required to monitor your cluster infrastructure. +* <>: Add your log data to Elastic Observability and start exploring your logs. +* <>: Collect Application Performance Monitoring (APM) data and visualize it in real time. +* <>: Add your metrics data to Elastic Observability and visualize it in real time. + +[discrete] +== How to + +* <>: Use Discover to explore your log data. +* <>: Create rules to detect complex conditions and trigger alerts. +* <>: Measure key metrics important to the business. +* <>: Find unusual behavior in time series data. +* <>: Monitor your software services and applications in real time. +* <>: Reuse existing APM instrumentation to capture logs, traces, and metrics. +* <>: Get a metrics-driven view of your hosts backed by an interface called Lens. +