apify · TC-MO · Jan 13, 2025 · Dec 11, 2024 · Dec 12, 2024 · Dec 12, 2024
diff --git a/.github/styles/config/vocabularies/Docs/accept.txt b/.github/styles/config/vocabularies/Docs/accept.txt
@@ -1,9 +1,88 @@
-apify(?=-\w+)
+Apify(?=-\w+)
+@apify\.com
+\bApify\b
 Actor(s)?
+SDK(s)
+[Ss]torages
+Crawlee
+[Aa]utoscaling
+CU
+
 booleans
-Docusaurus
 env
+npm
+serverless
+[Bb]oolean
+node_modules
+[Rr]egex
+[Mm]onorepo
+[Gg]ist
+SDK
+Dockerfile
+Docker's
+
+Docusaurus
 navbar
 nginx
-npm
 
+:::caution
+:::note
+:::info
+:::tip
+:::warning
+
+maxWidth
+startUrls
+
+PDFs
+dataset's
+gif
+Gzip
+
+API's
+APIs
+webhook's
+idempotency
+backoff
+
+Authy
+reCaptcha
+OAuth
+untrusted
+unencrypted
+proxied
+
+LLM
+embedder
+chatbot
+[Ll]angchain
+
+[Kk]eboola
+[Aa]irbyte
+[Qq]drant
+[Pp]inecone
+[Mm]ilvus
+[Zz]illiz
+llama_index
+[Ff]lowise
+
+exploitability
+[Ww]hitepaper
+[Cc]ron
+scalably
+metamorph
+hostname
+IPs
+unscoped
+multistep
+[Aa]utogenerated
+preconfigured
+[Dd]atacenter
+
+[Ww]ikipedia
+[Zz]apier
+[Tt]rello
+[Pp]refill
+
+
+[Mm]ultiselect
diff --git a/.github/workflows/typos-check.yaml b/.github/workflows/typos-check.yaml
diff --git a/.github/workflows/vale.yaml b/.github/workflows/vale.yaml
@@ -32,3 +32,4 @@ jobs:
                     fail_on_error: true
                     vale_flags: '--minAlertLevel=error'
                     reporter: github-pr-annotations
+
diff --git a/vale.ini → .vale.ini b/vale.ini → .vale.ini
@@ -2,15 +2,19 @@ StylesPath = .github/styles
 MinAlertLevel = warning
 IgnoredScopes = code, tt, table, tr, td
 
-vocabularies = Docs
+Vocab = Docs
 
 Packages = write-good, Microsoft
 
 [formats]
 mdx = md
 
 [*.md]
-BasedOnStyles = Apify, write-good, Microsoft
+BasedOnStyles = Vale, Apify, write-good, Microsoft
+# Ignore URLs, HTML/XML tags starting with capital letter, lines containing = sign, http & https URL ending with ] or ) & email addresses
+TokenIgnores = (<\/?[A-Z].+>), ([^\n]+=[^\n]*), (\[[^\]]+\]\([^\)]+\)), ([^\n]+@[^\n]+\.[^\n]), ({[^}]*}), (`[^`]*`), (`\w+`)
+Vale.Spelling = YES
+
 
 # Disabling rules (NO)
 Microsoft.Contractions = NO

diff --git a/_typos.toml b/_typos.toml
@@ -86,9 +86,9 @@ navigator.permissions.query('some_permission');
 ```
 
 ### With canvases {#with-canvases}
-<!-- vale off -->
+
 This technique is based on rendering [WebGL](https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API) scenes to a canvas element and observing the pixels rendered. WebGL rendering is tightly connected with the hardware, and therefore provides high entropy. Here's a quick breakdown of how it works:
-<!-- vale on -->
+
 1. A JavaScript script creates a [`<canvas>` element](https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API) and renders some font or a custom shape.
 2. The script then gets the pixel-map from the `<canvas>` element.
 3. The collected pixel-map is stored in a cryptographic hash specific to the device's hardware.

@@ -16,10 +16,10 @@ Unfortunately, most APIs will require a valid cookie to be included in the `cook
 Luckily, there are ways to retrieve and set cookies for requests prior to sending them, which will be covered more in-depth within future Scraping Academy modules. The most important things to know at the moment are:
 
 ## Cookies {#cookies}
-<!-- vale off -->
+
 1. For sites that heavily rely on cookies for user-verification and request authorization, certain generic requests (such as to the website's main page, or to the target page) will return back a (or multiple) `set-cookie` header(s).
 2. The `set-cookie` response header(s) can be parsed and used as the `cookie` header in the headers of a request. A great package for parsing these values from a response's headers is [`set-cookie-parser`](https://www.npmjs.com/package/set-cookie-parser). With this package, cookies can be parsed from headers like so:
-<!-- vale on -->
+
 ```js
 import axios from 'axios';
 

@@ -5,7 +5,7 @@ slug: /actors/development/actor-definition/actor-json
 sidebar_position: 1
 ---
 
-**Learn how to write the main Actor config in the `.actor/actor.json` file.**
+**Learn how to write the main Actor configuration in the `.actor/actor.json` file.**
 
 ---
 

@@ -18,7 +18,7 @@ The Actor input schema serves three main purposes:
 - It simplifies invoking your Actors from external systems by generating calling code and connectors for integrations.
 
 To define an input schema for an Actor, set `input` field in the `.actor/actor.json` file to an input schema object (described below), or path to a JSON file containing the input schema object.
-For backwards compatibility, if the `input` field is omitted, the system looks for an `INPUT_SCHEMA.json` file either in the `.actor` directory or the Actor's top-level directory—but note that this functionality is deprececated and might be removed in the future. The maximum allowed size for the input schema file is 500 kB.
+For backwards compatibility, if the `input` field is omitted, the system looks for an `INPUT_SCHEMA.json` file either in the `.actor` directory or the Actor's top-level directory—but note that this functionality is deprecated and might be removed in the future. The maximum allowed size for the input schema file is 500 kB.
 
 When you provide an input schema, the system will validate the input data passed to the Actor on start (via the API or Apify Console) against the specified schema to ensure compliance before starting the Actor.
 If the input object doesn't conform the schema, the caller receives an error and the Actor is not started.
@@ -343,7 +343,7 @@ The object where the proxy configuration is stored has the following structure:
 }
 ```
 
-Example of a blackbox object:
+Example of a black box object:
 
 ```json
 {

@@ -11,7 +11,7 @@ slug: /actors/development/builds-and-runs/builds
 
 ## Understand Actor builds
 
-Before an Actor can be run, it needs to be built. The build process creates a snapshot of a specific version of the Actor's settings, including its [source code](../actor_definition/source_code.md) and [environment variables](../programming_interface/environment_variables.md). This snapshot is then used to create a Docker image containing everything the Actor needs for its run, such as NPM packages, web browsers, etc.
+Before an Actor can be run, it needs to be built. The build process creates a snapshot of a specific version of the Actor's settings, including its [source code](../actor_definition/source_code.md) and [environment variables](../programming_interface/environment_variables.md). This snapshot is then used to create a Docker image containing everything the Actor needs for its run, such as `npm` packages, web browsers, etc.
 
 ### Build numbers
 

@@ -18,7 +18,7 @@ Long-running [Actor](../../index.mdx) jobs may need to migrate between servers.
 To prevent data loss, long-running Actors should:
 
 - Periodically save (persist) their state.
-- Listem for [migration events](/sdk/js/api/apify/class/PlatformEventManager)
+- Listen for [migration events](/sdk/js/api/apify/class/PlatformEventManager)
 - Check for persisted state when starting, allowing them to resume from where they left off.
 
 For short-running Actors, the risk of restarts and the cost of repeated runs are low, so you can typically ignore state persistence.

@@ -30,7 +30,7 @@ To set up automated builds and tests for your Actors you need to:
     ![Apify token in app](./images/ci-token.png)
 
 1. Add your Apify token to GitHub secrets
-   1. Go to your repo > Settings > Secrets > New repository secret
+   1. Go to your repository > Settings > Secrets > New repository secret
    1. Name the secret & paste in your token
 1. Add the Builds Actor API endpoint URL to GitHub secrets
    1. Use this format:
@@ -43,7 +43,7 @@ To set up automated builds and tests for your Actors you need to:
 
    1. Name the secret
 1. Create GitHub Actions workflow files:
-   1. In your repo, create the `.github/workflows` directory
+   1. In your repository, create the `.github/workflows` directory
    2. Add `latest.yml` and `beta.yml` files with the following content
 
     <Tabs groupId="main">

@@ -13,7 +13,7 @@ Deploying an Actor involves uploading your [source code](/platform/actors/develo
 
 ## Deploy using Apify CLI
 
-The fastest way to deploy and build your Actor is by uising the [Apify CLI](/cli). If you've completed one of the tutorials from the [academy](/academy), you should have already have it installed. If not, follow the [Apify CLI installation instructions](/cli/docs/installation).
+The fastest way to deploy and build your Actor is by using the [Apify CLI](/cli). If you've completed one of the tutorials from the [academy](/academy), you should have already have it installed. If not, follow the [Apify CLI installation instructions](/cli/docs/installation).
 
 To deploy your Actor using Apify CLI:
 
@@ -49,7 +49,7 @@ You can also pull an existing Actor from the Apify platform to your local machin
 apify pull [ACTORID]
 ```
 
-This command fetches the Actor's files to your current directory. If the Actor is defined as a Git repository, it will be cloned, for Actors defined in the Web IDE, the command will fetch the files diresctly.
+This command fetches the Actor's files to your current directory. If the Actor is defined as a Git repository, it will be cloned, for Actors defined in the Web IDE, the command will fetch the files directly.
 
 You can specify a particular version of the Actor to pull by using the `--version` flag:
 

@@ -9,10 +9,13 @@ sidebar_position: 1
 
 ---
 
-This section explains the various sources types available for Apify Actors and how to deploy an Actor from Github using CLI or Gist. Apify Actors supporst four source types:
+This section explains the various sources types available for Apify Actors and how to deploy an Actor from GitHub using CLI or Gist. Apify Actors supports four source types:
 
 - [Web IDE](#web-ide)
 - [Git repository](#git-repository)
+  - [Private repositories](#private-repositories)
+    - [How to configure deployment keys](#how-to-configure-deployment-keys)
+  - [Actor monorepos](#actor-monorepos)
 - [Zip file](#zip-file)
 - [GitHub Gist](#github-gist)
 
@@ -22,15 +25,15 @@ This is the default option when your Actor's source code is hosted on the Apify
 
 A `Dockerfile` is mandatory for all Actors. When using the default NodeJS Dockerfile, you'll typically need `main.js` for your source code and `package.json` for [NPM](https://www.npmjs.com/) package configurations.
 
-For more information on creating custom Dockersfiles or using Apify's base images, refer to the [Dockerfile](/platform/actors/development/actor-definition/dockerfile#custom-dockerfile) and [base Docker images](/platform/actors/development/actor-definition/dockerfile#base-docker-images) documentation.
+For more information on creating custom Dockerfiles or using Apify's base images, refer to the [Dockerfile](/platform/actors/development/actor-definition/dockerfile#custom-dockerfile) and [base Docker images](/platform/actors/development/actor-definition/dockerfile#base-docker-images) documentation.
 
 ## Git repository
 
 <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/NEzT_p_RE1Q" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 
 Hosting your Actor's source code in a Git repository allows for multiple files and directories, a custom `Dockerfile` for build process control, and a user description fetched from `README.md`. Specify the repository location using the **Git URL** setting with `https`, `git`, or `ssh` protocols.
 
-To deploy an Actor from GitHub, set the **Source Type** to **Git repository** and enter the GitHub repository URL in the **Git URL** field. You can optionally specify a branch or tag by adding a URL fragmend (e.g., `#develop`).
+To deploy an Actor from GitHub, set the **Source Type** to **Git repository** and enter the GitHub repository URL in the **Git URL** field. You can optionally specify a branch or tag by adding a URL fragment (e.g., `#develop`).
 
 To use a specific directory, add it after the branch/tag, separated by a colon (e.g., `#develop:some/dir`)
 
@@ -72,14 +75,14 @@ Remember that each key can only be used once per Git hosting service (GitHub, Bi
 
 To manage multiple Actors in a single repository, use the `dockerContextDix` property in the [Actor definition](/platform/actors/development/actor-definition/actor-json) to set the Docker context directory (if not provided then the repository root is used). In the Dockerfile, copy both the Actor's source and any shared code into the Docker image.
 
-To enable sharing Dockerfiles between multiple Actors, the Actor build process passes the `ACTOR_PATH_IN_DOCKER_CONTEXT` build arg to the Docker build.
+To enable sharing Dockerfiles between multiple Actors, the Actor build process passes the `ACTOR_PATH_IN_DOCKER_CONTEXT` build argument to the Docker build.
 It contains the relative path from `dockerContextDir` to the directory selected as the root of the Actor in the Apify Console (the "directory" part of the Actor's git URL).
 
 For an example, see the [`apify/actor-monorepo-example`](https://github.com/apify/actor-monorepo-example) repository. To build Actors from this monorepo, you would set the source URL (including branch name and folder) as `https://github.com/apify/actor-monorepo-example#main:actors/javascript-actor` and `https://github.com/apify/actor-monorepo-example#main:actors/typescript-actor` respectively.
 
 ## Zip file
 
-Actors can also use source code from a Zip archive hosted on an external URL. This option supports multiple files and directories, allows for custom `Dockerfile`, and uses `README.md` for the Actor description. If not using a [custom Dockerfile](../actor_definition/docker.md#custom-dockerfile), ensure your main applicat file is named `main.js`.
+Actors can also use source code from a Zip archive hosted on an external URL. This option supports multiple files and directories, allows for custom `Dockerfile`, and uses `README.md` for the Actor description. If not using a [custom Dockerfile](../actor_definition/docker.md#custom-dockerfile), ensure your main file is named `main.js`.
 
 :::note Automatic use of ZIP file
 
@@ -91,6 +94,6 @@ This source type is used automatically when you are using Apify-CLI and the sour
 
 For smaller projects, GitHub Gist offers a simpler alternative to full Git repositories or hosted Zip files. To use a GitHub Gist, create your Gist at [https://gist.github.com/](https://gist.github.com/), set the **Source type** to **GitHub Gist**, and paste the Gist URL in the provided field.
 
-Like other source types, Gists can include multiple files, directories, and a custom Dockersfile. The Actor description is taken from `README.md`.
+Like other source types, Gists can include multiple files, directories, and a custom Dockerfile. The Actor description is taken from `README.md`.
 
 By understanding these source types, you can choose the most appropriate option for hosting and deploying your Apify Actors. Each type offers unique advantages, allowing you to select the best fit for your project's size, complexity, and collaboration needs.
@@ -11,7 +11,7 @@ slug: /actors/development/performance
 
 ## Optimization Tips
 
-This guide provides tips to help you maximize the poerformance of your Actors, minimize costs, and achieve optimal results.
+This guide provides tips to help you maximize the performance of your Actors, minimize costs, and achieve optimal results.
 
 ### Run batch jobs instead of single jobs
 

@@ -53,7 +53,7 @@ Here's a table of key system environment variables:
 | `APIFY_DISABLE_OUTDATED_WARNING` | Controls the display of outdated version warnings. Set to `1` to suppress notifications about updates. |
 | `APIFY_WORKFLOW_KEY` | Identifier used for grouping related runs and API calls together. |
 | `APIFY_META_ORIGIN` | Specifies how an Actor run was started. Possible values are [here](/platform/actors/running/runs-and-builds#origin) |
-| `APIFY_SDK_LATEST_VERSION` | Specifies the most recent release version of the Apify SDK for Javascript. Used for checking for updates. |
+| `APIFY_SDK_LATEST_VERSION` | Specifies the most recent release version of the Apify SDK for JavaScript. Used for checking for updates. |
 | `APIFY_INPUT_SECRETS_KEY_FILE` | Path to the secret key used to decrypt [Secret inputs](/platform/actors/development/actor-definition/input-schema/secret-input). |
 | `APIFY_INPUT_SECRETS_KEY_PASSPHRASE` | Passphrase for the input secret key specified in `APIFY_INPUT_SECRETS_KEY_FILE`. |
 

@@ -22,7 +22,7 @@ Apify's system notifies Actors about various events, such as:
 - Abort operations triggered by another Actor
 - CPU overload
 
-These events help you manage your Actor's behavior and resources effecetively.
+These events help you manage your Actor's behavior and resources effectively.
 
 ## System events
 

@@ -11,7 +11,7 @@ slug: /actors/development/quick-start/locally
 
 :::info Prerequisites
 
-You need to have [Node.js](https://nodejs.org/en/) version 16 or higher with NPM installed on your computer.
+You need to have [Node.js](https://nodejs.org/en/) version 16 or higher with `npm` installed on your computer.
 
 :::
 

@@ -31,7 +31,7 @@ https://apify.com/actor-badge?actor=<USERNAME>/<ACTOR>
 In order to embed the badge in the HTML documentation, just use it as an image wrapped in a link as shown in the example below. Don't froget to use the `username` and `actor-name` of your Actor.
 
 #### Example
-<!-- vale off -->
+
 <Tabs>
   <TabItem value="html" label="HTML" default>
     ```html
@@ -40,13 +40,13 @@ In order to embed the badge in the HTML documentation, just use it as an image w
     </a>
     ```
   </TabItem>
-  <TabItem value="markdown" label="Markdown">  
+  <TabItem value="markdown" label="Markdown">
     ```markdown
     [![Website Content Crawler Actor](https://apify.com/actor-badge?actor=apify/website-content-crawler)](https://apify.com/apify/website-content-crawler)
     ```
   </TabItem>
 </Tabs>
-<!-- vale on -->
+
 ### Supported Actor states
 
 The badge indicates the state of the Actor in the Apify platform as the result of the [automated testing](../development/automated_tests.md).

@@ -63,9 +63,9 @@ A good middle ground is `4096MB`. If you need the results faster, increase the m
 Autoscaling only applies to solutions that run multiple tasks (URLs) for at least 30 seconds. If you need to scrape just one URL or use Actors like [Google Sheets](https://apify.com/lukaskrivka/google-sheets) that do just a single isolated job, we recommend you lower the memory.
 
 [//]: # (TODO: It's pretty outdated, we now have platform credits in pricing)
-<!-- vale off -->
+
 [//]: # (If you read that you can scrape 1000 pages of data for 1 CU and you want to scrape approximately 2 million of them monthly, that means you need 2000 CUs monthly and should [subscribe to the Business plan]&#40;https://console.apify.com/billing-new#/subscription&#41;.)
-<!-- vale on -->
+
 
 If the Actor doesn't have this information, or you want to use your own solution, just run your solution like you want to use it long term. Let's say that you want to scrape the data **every hour for the whole month**. You set up a reasonable memory allocation like `4096MB`, and the whole run takes 15 minutes. That should consume 1 CU (4 \* 0.25 = 1). Now, you just need to multiply that by the number of hours in the day and by the number of days in the month, and you get an estimated usage of 720 (1 \* 24 \* 30) CUs monthly.
 

@@ -13,7 +13,7 @@ slug: /integrations/actors
 
 :::note Integration Actors
 
-You can check out a catalogue of our Integaration Actors within [Apify Store](https://apify.com/store/categories/integrations).
+You can check out a catalogue of our Integration Actors within [Apify Store](https://apify.com/store/categories/integrations).
 
 :::
Original file line number	Diff line number	Diff line change
Expand Up		@@ -32,3 +32,4 @@ jobs:
		fail_on_error: true
		vale_flags: '--minAlertLevel=error'
		reporter: github-pr-annotations
-Original file line number
+Diff line change
@@ Expand Up / @@ -5,7 +5,7 @@ slug: /actors/development/actor-definition/actor-json @@
     sidebar_position: 1
     ---
-    **Learn how to write the main Actor config in the `.actor/actor.json` file.**
+    **Learn how to write the main Actor configuration in the `.actor/actor.json` file.**
     ---
@@ Expand Down @@