Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add spell check capabilities to Vale linter #1345

Merged
merged 13 commits into from
Jan 13, 2025
85 changes: 82 additions & 3 deletions .github/styles/config/vocabularies/Docs/accept.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,88 @@
apify(?=-\w+)
Apify(?=-\w+)
@apify\.com
\bApify\b
Actor(s)?
SDK(s)
[Ss]torages
Crawlee
[Aa]utoscaling
CU

booleans
Docusaurus
env
npm
serverless
[Bb]oolean
node_modules
[Rr]egex
[Mm]onorepo
[Gg]ist
SDK
Dockerfile
Docker's

Docusaurus
navbar
nginx
npm

:::caution
:::note
:::info
:::tip
:::warning

maxWidth
startUrls

PDFs
dataset's
gif
Gzip

API's
APIs
webhook's
idempotency
backoff

Authy
reCaptcha
OAuth
untrusted
unencrypted
proxied

LLM
embedder
chatbot
[Ll]angchain

[Kk]eboola
[Aa]irbyte
[Qq]drant
[Pp]inecone
[Mm]ilvus
[Zz]illiz
llama_index
[Ff]lowise

exploitability
[Ww]hitepaper
[Cc]ron
scalably
metamorph
hostname
IPs
unscoped
multistep
[Aa]utogenerated
preconfigured
[Dd]atacenter

[Ww]ikipedia
[Zz]apier
[Tt]rello
[Pp]refill


[Mm]ultiselect
18 changes: 0 additions & 18 deletions .github/workflows/typos-check.yaml

This file was deleted.

1 change: 1 addition & 0 deletions .github/workflows/vale.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ jobs:
fail_on_error: true
vale_flags: '--minAlertLevel=error'
reporter: github-pr-annotations

8 changes: 6 additions & 2 deletions vale.ini → .vale.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@ StylesPath = .github/styles
MinAlertLevel = warning
IgnoredScopes = code, tt, table, tr, td

vocabularies = Docs
Vocab = Docs

Packages = write-good, Microsoft

[formats]
mdx = md

[*.md]
BasedOnStyles = Apify, write-good, Microsoft
BasedOnStyles = Vale, Apify, write-good, Microsoft
# Ignore URLs, HTML/XML tags starting with capital letter, lines containing = sign, http & https URL ending with ] or ) & email addresses
TokenIgnores = (<\/?[A-Z].+>), ([^\n]+=[^\n]*), (\[[^\]]+\]\([^\)]+\)), ([^\n]+@[^\n]+\.[^\n]), ({[^}]*}), (`[^`]*`), (`\w+`)
Vale.Spelling = YES


# Disabling rules (NO)
Microsoft.Contractions = NO
Expand Down
12 changes: 0 additions & 12 deletions _typos.toml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,9 @@ navigator.permissions.query('some_permission');
```

### With canvases {#with-canvases}
<!-- vale off -->

This technique is based on rendering [WebGL](https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API) scenes to a canvas element and observing the pixels rendered. WebGL rendering is tightly connected with the hardware, and therefore provides high entropy. Here's a quick breakdown of how it works:
<!-- vale on -->

1. A JavaScript script creates a [`<canvas>` element](https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API) and renders some font or a custom shape.
2. The script then gets the pixel-map from the `<canvas>` element.
3. The collected pixel-map is stored in a cryptographic hash specific to the device's hardware.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ Unfortunately, most APIs will require a valid cookie to be included in the `cook
Luckily, there are ways to retrieve and set cookies for requests prior to sending them, which will be covered more in-depth within future Scraping Academy modules. The most important things to know at the moment are:

## Cookies {#cookies}
<!-- vale off -->

1. For sites that heavily rely on cookies for user-verification and request authorization, certain generic requests (such as to the website's main page, or to the target page) will return back a (or multiple) `set-cookie` header(s).
2. The `set-cookie` response header(s) can be parsed and used as the `cookie` header in the headers of a request. A great package for parsing these values from a response's headers is [`set-cookie-parser`](https://www.npmjs.com/package/set-cookie-parser). With this package, cookies can be parsed from headers like so:
<!-- vale on -->

```js
import axios from 'axios';

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ slug: /actors/development/actor-definition/actor-json
sidebar_position: 1
---

**Learn how to write the main Actor config in the `.actor/actor.json` file.**
**Learn how to write the main Actor configuration in the `.actor/actor.json` file.**

---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The Actor input schema serves three main purposes:
- It simplifies invoking your Actors from external systems by generating calling code and connectors for integrations.

To define an input schema for an Actor, set `input` field in the `.actor/actor.json` file to an input schema object (described below), or path to a JSON file containing the input schema object.
For backwards compatibility, if the `input` field is omitted, the system looks for an `INPUT_SCHEMA.json` file either in the `.actor` directory or the Actor's top-level directory—but note that this functionality is deprececated and might be removed in the future. The maximum allowed size for the input schema file is 500 kB.
For backwards compatibility, if the `input` field is omitted, the system looks for an `INPUT_SCHEMA.json` file either in the `.actor` directory or the Actor's top-level directory—but note that this functionality is deprecated and might be removed in the future. The maximum allowed size for the input schema file is 500 kB.

When you provide an input schema, the system will validate the input data passed to the Actor on start (via the API or Apify Console) against the specified schema to ensure compliance before starting the Actor.
If the input object doesn't conform the schema, the caller receives an error and the Actor is not started.
Expand Down Expand Up @@ -343,7 +343,7 @@ The object where the proxy configuration is stored has the following structure:
}
```

Example of a blackbox object:
Example of a black box object:

```json
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ slug: /actors/development/builds-and-runs/builds

## Understand Actor builds

Before an Actor can be run, it needs to be built. The build process creates a snapshot of a specific version of the Actor's settings, including its [source code](../actor_definition/source_code.md) and [environment variables](../programming_interface/environment_variables.md). This snapshot is then used to create a Docker image containing everything the Actor needs for its run, such as NPM packages, web browsers, etc.
Before an Actor can be run, it needs to be built. The build process creates a snapshot of a specific version of the Actor's settings, including its [source code](../actor_definition/source_code.md) and [environment variables](../programming_interface/environment_variables.md). This snapshot is then used to create a Docker image containing everything the Actor needs for its run, such as `npm` packages, web browsers, etc.

### Build numbers

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Long-running [Actor](../../index.mdx) jobs may need to migrate between servers.
To prevent data loss, long-running Actors should:

- Periodically save (persist) their state.
- Listem for [migration events](/sdk/js/api/apify/class/PlatformEventManager)
- Listen for [migration events](/sdk/js/api/apify/class/PlatformEventManager)
- Check for persisted state when starting, allowing them to resume from where they left off.

For short-running Actors, the risk of restarts and the cost of repeated runs are low, so you can typically ignore state persistence.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ To set up automated builds and tests for your Actors you need to:
![Apify token in app](./images/ci-token.png)

1. Add your Apify token to GitHub secrets
1. Go to your repo > Settings > Secrets > New repository secret
1. Go to your repository > Settings > Secrets > New repository secret
1. Name the secret & paste in your token
1. Add the Builds Actor API endpoint URL to GitHub secrets
1. Use this format:
Expand All @@ -43,7 +43,7 @@ To set up automated builds and tests for your Actors you need to:

1. Name the secret
1. Create GitHub Actions workflow files:
1. In your repo, create the `.github/workflows` directory
1. In your repository, create the `.github/workflows` directory
2. Add `latest.yml` and `beta.yml` files with the following content

<Tabs groupId="main">
Expand Down
4 changes: 2 additions & 2 deletions sources/platform/actors/development/deployment/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Deploying an Actor involves uploading your [source code](/platform/actors/develo

## Deploy using Apify CLI

The fastest way to deploy and build your Actor is by uising the [Apify CLI](/cli). If you've completed one of the tutorials from the [academy](/academy), you should have already have it installed. If not, follow the [Apify CLI installation instructions](/cli/docs/installation).
The fastest way to deploy and build your Actor is by using the [Apify CLI](/cli). If you've completed one of the tutorials from the [academy](/academy), you should have already have it installed. If not, follow the [Apify CLI installation instructions](/cli/docs/installation).

To deploy your Actor using Apify CLI:

Expand Down Expand Up @@ -49,7 +49,7 @@ You can also pull an existing Actor from the Apify platform to your local machin
apify pull [ACTORID]
```

This command fetches the Actor's files to your current directory. If the Actor is defined as a Git repository, it will be cloned, for Actors defined in the Web IDE, the command will fetch the files diresctly.
This command fetches the Actor's files to your current directory. If the Actor is defined as a Git repository, it will be cloned, for Actors defined in the Web IDE, the command will fetch the files directly.

You can specify a particular version of the Actor to pull by using the `--version` flag:

Expand Down
15 changes: 9 additions & 6 deletions sources/platform/actors/development/deployment/source_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@ sidebar_position: 1

---

This section explains the various sources types available for Apify Actors and how to deploy an Actor from Github using CLI or Gist. Apify Actors supporst four source types:
This section explains the various sources types available for Apify Actors and how to deploy an Actor from GitHub using CLI or Gist. Apify Actors supports four source types:

- [Web IDE](#web-ide)
- [Git repository](#git-repository)
- [Private repositories](#private-repositories)
- [How to configure deployment keys](#how-to-configure-deployment-keys)
- [Actor monorepos](#actor-monorepos)
- [Zip file](#zip-file)
- [GitHub Gist](#github-gist)

Expand All @@ -22,15 +25,15 @@ This is the default option when your Actor's source code is hosted on the Apify

A `Dockerfile` is mandatory for all Actors. When using the default NodeJS Dockerfile, you'll typically need `main.js` for your source code and `package.json` for [NPM](https://www.npmjs.com/) package configurations.

For more information on creating custom Dockersfiles or using Apify's base images, refer to the [Dockerfile](/platform/actors/development/actor-definition/dockerfile#custom-dockerfile) and [base Docker images](/platform/actors/development/actor-definition/dockerfile#base-docker-images) documentation.
For more information on creating custom Dockerfiles or using Apify's base images, refer to the [Dockerfile](/platform/actors/development/actor-definition/dockerfile#custom-dockerfile) and [base Docker images](/platform/actors/development/actor-definition/dockerfile#base-docker-images) documentation.

## Git repository

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/NEzT_p_RE1Q" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

Hosting your Actor's source code in a Git repository allows for multiple files and directories, a custom `Dockerfile` for build process control, and a user description fetched from `README.md`. Specify the repository location using the **Git URL** setting with `https`, `git`, or `ssh` protocols.

To deploy an Actor from GitHub, set the **Source Type** to **Git repository** and enter the GitHub repository URL in the **Git URL** field. You can optionally specify a branch or tag by adding a URL fragmend (e.g., `#develop`).
To deploy an Actor from GitHub, set the **Source Type** to **Git repository** and enter the GitHub repository URL in the **Git URL** field. You can optionally specify a branch or tag by adding a URL fragment (e.g., `#develop`).

To use a specific directory, add it after the branch/tag, separated by a colon (e.g., `#develop:some/dir`)

Expand Down Expand Up @@ -72,14 +75,14 @@ Remember that each key can only be used once per Git hosting service (GitHub, Bi

To manage multiple Actors in a single repository, use the `dockerContextDix` property in the [Actor definition](/platform/actors/development/actor-definition/actor-json) to set the Docker context directory (if not provided then the repository root is used). In the Dockerfile, copy both the Actor's source and any shared code into the Docker image.

To enable sharing Dockerfiles between multiple Actors, the Actor build process passes the `ACTOR_PATH_IN_DOCKER_CONTEXT` build arg to the Docker build.
To enable sharing Dockerfiles between multiple Actors, the Actor build process passes the `ACTOR_PATH_IN_DOCKER_CONTEXT` build argument to the Docker build.
It contains the relative path from `dockerContextDir` to the directory selected as the root of the Actor in the Apify Console (the "directory" part of the Actor's git URL).

For an example, see the [`apify/actor-monorepo-example`](https://github.com/apify/actor-monorepo-example) repository. To build Actors from this monorepo, you would set the source URL (including branch name and folder) as `https://github.com/apify/actor-monorepo-example#main:actors/javascript-actor` and `https://github.com/apify/actor-monorepo-example#main:actors/typescript-actor` respectively.

## Zip file

Actors can also use source code from a Zip archive hosted on an external URL. This option supports multiple files and directories, allows for custom `Dockerfile`, and uses `README.md` for the Actor description. If not using a [custom Dockerfile](../actor_definition/docker.md#custom-dockerfile), ensure your main applicat file is named `main.js`.
Actors can also use source code from a Zip archive hosted on an external URL. This option supports multiple files and directories, allows for custom `Dockerfile`, and uses `README.md` for the Actor description. If not using a [custom Dockerfile](../actor_definition/docker.md#custom-dockerfile), ensure your main file is named `main.js`.

:::note Automatic use of ZIP file

Expand All @@ -91,6 +94,6 @@ This source type is used automatically when you are using Apify-CLI and the sour

For smaller projects, GitHub Gist offers a simpler alternative to full Git repositories or hosted Zip files. To use a GitHub Gist, create your Gist at [https://gist.github.com/](https://gist.github.com/), set the **Source type** to **GitHub Gist**, and paste the Gist URL in the provided field.

Like other source types, Gists can include multiple files, directories, and a custom Dockersfile. The Actor description is taken from `README.md`.
Like other source types, Gists can include multiple files, directories, and a custom Dockerfile. The Actor description is taken from `README.md`.

By understanding these source types, you can choose the most appropriate option for hosting and deploying your Apify Actors. Each type offers unique advantages, allowing you to select the best fit for your project's size, complexity, and collaboration needs.
2 changes: 1 addition & 1 deletion sources/platform/actors/development/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ slug: /actors/development/performance

## Optimization Tips

This guide provides tips to help you maximize the poerformance of your Actors, minimize costs, and achieve optimal results.
This guide provides tips to help you maximize the performance of your Actors, minimize costs, and achieve optimal results.

### Run batch jobs instead of single jobs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Here's a table of key system environment variables:
| `APIFY_DISABLE_OUTDATED_WARNING` | Controls the display of outdated version warnings. Set to `1` to suppress notifications about updates. |
| `APIFY_WORKFLOW_KEY` | Identifier used for grouping related runs and API calls together. |
| `APIFY_META_ORIGIN` | Specifies how an Actor run was started. Possible values are [here](/platform/actors/running/runs-and-builds#origin) |
| `APIFY_SDK_LATEST_VERSION` | Specifies the most recent release version of the Apify SDK for Javascript. Used for checking for updates. |
| `APIFY_SDK_LATEST_VERSION` | Specifies the most recent release version of the Apify SDK for JavaScript. Used for checking for updates. |
| `APIFY_INPUT_SECRETS_KEY_FILE` | Path to the secret key used to decrypt [Secret inputs](/platform/actors/development/actor-definition/input-schema/secret-input). |
| `APIFY_INPUT_SECRETS_KEY_PASSPHRASE` | Passphrase for the input secret key specified in `APIFY_INPUT_SECRETS_KEY_FILE`. |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Apify's system notifies Actors about various events, such as:
- Abort operations triggered by another Actor
- CPU overload

These events help you manage your Actor's behavior and resources effecetively.
These events help you manage your Actor's behavior and resources effectively.

## System events

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ slug: /actors/development/quick-start/locally

:::info Prerequisites

You need to have [Node.js](https://nodejs.org/en/) version 16 or higher with NPM installed on your computer.
You need to have [Node.js](https://nodejs.org/en/) version 16 or higher with `npm` installed on your computer.

:::

Expand Down
6 changes: 3 additions & 3 deletions sources/platform/actors/publishing/badge.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ https://apify.com/actor-badge?actor=<USERNAME>/<ACTOR>
In order to embed the badge in the HTML documentation, just use it as an image wrapped in a link as shown in the example below. Don't froget to use the `username` and `actor-name` of your Actor.

#### Example
<!-- vale off -->

<Tabs>
<TabItem value="html" label="HTML" default>
```html
Expand All @@ -40,13 +40,13 @@ In order to embed the badge in the HTML documentation, just use it as an image w
</a>
```
</TabItem>
<TabItem value="markdown" label="Markdown">
<TabItem value="markdown" label="Markdown">
```markdown
[![Website Content Crawler Actor](https://apify.com/actor-badge?actor=apify/website-content-crawler)](https://apify.com/apify/website-content-crawler)
```
</TabItem>
</Tabs>
<!-- vale on -->

### Supported Actor states

The badge indicates the state of the Actor in the Apify platform as the result of the [automated testing](../development/automated_tests.md).
Expand Down
4 changes: 2 additions & 2 deletions sources/platform/actors/running/usage_and_resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,9 @@ A good middle ground is `4096MB`. If you need the results faster, increase the m
Autoscaling only applies to solutions that run multiple tasks (URLs) for at least 30 seconds. If you need to scrape just one URL or use Actors like [Google Sheets](https://apify.com/lukaskrivka/google-sheets) that do just a single isolated job, we recommend you lower the memory.

[//]: # (TODO: It's pretty outdated, we now have platform credits in pricing)
<!-- vale off -->

[//]: # (If you read that you can scrape 1000 pages of data for 1 CU and you want to scrape approximately 2 million of them monthly, that means you need 2000 CUs monthly and should [subscribe to the Business plan]&#40;https://console.apify.com/billing-new#/subscription&#41;.)
<!-- vale on -->


If the Actor doesn't have this information, or you want to use your own solution, just run your solution like you want to use it long term. Let's say that you want to scrape the data **every hour for the whole month**. You set up a reasonable memory allocation like `4096MB`, and the whole run takes 15 minutes. That should consume 1 CU (4 \* 0.25 = 1). Now, you just need to multiply that by the number of hours in the day and by the number of days in the month, and you get an estimated usage of 720 (1 \* 24 \* 30) CUs monthly.

Expand Down
2 changes: 1 addition & 1 deletion sources/platform/integrations/actors/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ slug: /integrations/actors

:::note Integration Actors

You can check out a catalogue of our Integaration Actors within [Apify Store](https://apify.com/store/categories/integrations).
You can check out a catalogue of our Integration Actors within [Apify Store](https://apify.com/store/categories/integrations).

:::

Expand Down
Loading
Loading