diff --git a/docs/develop/docs.md b/docs/develop/docs.md index e310500a3c..9171b75b43 100644 --- a/docs/develop/docs.md +++ b/docs/develop/docs.md @@ -110,20 +110,20 @@ There are a lot of different options provided by Material for MkDocs — So many ???+ Note The default call-out, used to highlight something if there isn't a more relevant one — should generally be expanded by default but can be collapsable by the user if the note is long. -!!! Tip "Tip — May have a title stating the tip or best practice" +!!! Tip "Tip: May have a title stating the tip or best practice" Used to highlight a point that is useful for everyone to understand about the documented subject — should be expanded and kept brief. -???+ Info "Info — Must have a title describing the context under which this information is useful" +???+ Info "Info: Must have a title describing the context under which this information is useful" Used to deliver context-based content such as things that are dependant on operating system or environment — should be collapsed by default. -???+ Example "Example — Must have a title describing the content" +???+ Example "Example: Must have a title describing the content" Used to deliver additional information about a feature that could be useful in a _specific circumstance_ or that might not otherwise be considered — should be collapsed by default. -???+ Question "Question — Must have a title phrased in the form of a question" +???+ Question "Question: Must have a title phrased in the form of a question" Used to answer frequently asked questions about the documented subject — should be collapsed by default. -!!! Warning "Warning — Must have a title stating the warning" +!!! Warning "Warning: Must have a title stating the warning" Used to deliver important information — should always be expanded. -!!! Danger "Danger — Must have a title stating the warning" +!!! Danger "Danger: Must have a title stating the warning" Used to deliver information about serious unrecoverable actions such as deleting large amounts of data or resetting things — should always be expanded. diff --git a/docs/user-guide/archived-items.md b/docs/user-guide/archived-items.md index 77dcef1aa6..fd1d1c23e8 100644 --- a/docs/user-guide/archived-items.md +++ b/docs/user-guide/archived-items.md @@ -1,6 +1,6 @@ # Archived Items -Archived Items consist of one or more WACZ files created by a Crawl Workflow, or uploaded to Browsertrix. They can be individually replayed, or combind with other Archived Items in a a [Collection](collections.md). The Archived Items page lists all items in the organization. +Archived Items consist of one or more WACZ files created by a Crawl Workflow, or uploaded to Browsertrix. They can be individually replayed, or combined with other Archived Items in a a [Collection](collections.md). The Archived Items page lists all items in the organization. ## Uploading Web Archives @@ -24,11 +24,11 @@ For more details on navigating web archives within ReplayWeb.page, see the [Repl ### Files -The Fies tab lists the individually downloadable WACZ files that make up the Archived Item as well as their file sizes. +The Files tab lists the individually downloadable WACZ files that make up the Archived Item as well as their file sizes. ### Error Logs -The Error Logs tab displays a list of errors encountered durring crawling. Clicking an errors in the list will reveal additional information. +The Error Logs tab displays a list of errors encountered during crawling. Clicking an errors in the list will reveal additional information. All log entries with that were recorded in the creation of the Archived Item can be downloaded in JSONL format by pressing the _Download Logs_ button. diff --git a/docs/user-guide/browser-profiles.md b/docs/user-guide/browser-profiles.md index c79d85a312..a2554829ed 100644 --- a/docs/user-guide/browser-profiles.md +++ b/docs/user-guide/browser-profiles.md @@ -1,8 +1,8 @@ # Browser Profiles -Browser profiles are saved instances of a web browsing session that can be reused to crawl websites as they were configued, with any cookies or saved login sessions. Using a pre-configured profile also means that content that can only be viewed by logged in users can be archived, without archiving the actual login credentials. +Browser profiles are saved instances of a web browsing session that can be reused to crawl websites as they were configured, with any cookies or saved login sessions. Using a pre-configured profile also means that content that can only be viewed by logged in users can be archived, without archiving the actual login credentials. -!!! tip "Best practice — Create and use web archiving-specific accounts for crawling with browser profiles" +!!! tip "Best practice: Create and use web archiving-specific accounts for crawling with browser profiles" For the following reasons, we recommend creating dedicated accounts for archiving anything that is locked behind login credentials but otherwise public, especially on social media platforms. diff --git a/docs/user-guide/collections.md b/docs/user-guide/collections.md index 7430148eea..b2afa3e00f 100644 --- a/docs/user-guide/collections.md +++ b/docs/user-guide/collections.md @@ -2,8 +2,8 @@ Collections are the primary way of organizing and combining archived items into groups for presentation. -!!! tip "Tip — Combining items from multiple sources" - If the crawler has not captured every resource or interaction on a webpage, the [ArchiveWebpage browser extension](https://archiveweb.page/) can be used to manually capture missing content and upload it directly to your org. +!!! tip "Tip: Combining items from multiple sources" + If the crawler has not captured every resource or interaction on a webpage, the [ArchiveWeb.page browser extension](https://archiveweb.page/) can be used to manually capture missing content and upload it directly to your org. After adding the crawl and the upload to a collection, the content from both will become available in the replay viewer. @@ -19,4 +19,4 @@ Collections are private by default, but can be made public by marking them as sh After a collection has been made public, it can be shared with others using the public URL available in the share collection dialogue. The collection can also be embedded into other websites using the provided embed code. Unsharing the collection will break any previously shared links. -For further resources on embedding archived web content into your own website, see the [ReplayWebpage docs page on embedding](https://replayweb.page/docs/embedding). +For further resources on embedding archived web content into your own website, see the [ReplayWeb.page docs page on embedding](https://replayweb.page/docs/embedding). diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md index 73c86e76df..cf94af1705 100644 --- a/docs/user-guide/index.md +++ b/docs/user-guide/index.md @@ -1,17 +1,29 @@ -# Getting Started +# Browsertrix User Guide -## Signup +Welcome to the Browsertrix User Guide. This page covers the basics of using Browsertrix, Webrecorder's high-fidelity web archiving system. -### Invite Link +## Getting Started -If you have been sent an [invite](org-settings#members), enter a password and name to create a new account. Your account will be added to the organization you were invited to by an organization admin. +To get started crawling with Browsertrix: -### Open Registration - -If the server has enabled signups and you have been given a registration link, enter your email address, password, and name to create a new account. Your account will be added to the server's default organization. +1. Create an account and join an Organization [as described here](signup). +2. After being redirected to the organization's [Overview page](overview), click the _Create New_ button in the top right and select _[Crawl Workflow](crawl-workflows)_ to begin configuring your first crawl! +3. For a simple crawl, choose the _Seeded Crawl_ option, and enter a page url in the _Crawl Start URL_ field. By default, the crawler will archive all pages under the starting path. +4. Next, click _Review & Save_, and ensure the _Run on Save_ option is selected. Then click _Save Workflow_. +5. Wait a moment for the crawler to start and watch as it archives the website! --- -## Start Crawling! +After running your first crawl, check out the following to learn more about Browsertrix's features: + +- A detailed list of [crawl workflow setup](workflow-setup) options. +- Adding [exclusions](workflow-setup/#exclusions) to limit your crawl's scope and evading crawler traps by [editing exclusion rules while crawling](crawl-workflows/#live-exclusion-editing). +- Best practices for crawling with [browser profiles](browser-profiles) to capture content only available when logged in to a website. +- Managing archived items, including [uploading previously archived content](archived-items/#uploading-web-archives). +- Organizing and combining archived items with [collections](collections) for sharing and export. +- If you're an admin: [Inviting collaborators to your org](org-settings/#members). + + +### Have more questions? -A [Crawl Workflow](crawl-workflows) must be created in order to crawl websites automatically. A detailed list of all available workflow configuration options can be found on the [Crawl Workflow Setup](workflow-setup) page. +While our aim is to create intuitive interfaces, sometimes the complexities of web archiving require a little more explanation. If there's something that you found especially confusing or frustrating [please get in touch](mailto:docs-feedback@webrecorder.net)! diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md index 4b517be493..895e0834d3 100644 --- a/docs/user-guide/overview.md +++ b/docs/user-guide/overview.md @@ -1,4 +1,4 @@ -# Overview +# Org Overview The overview page delivers key statistics about the organization's resource usage. It also lets users create crawl workflows, uploaded archived items, collections, and browser profiles through the _Create New ..._ button. @@ -12,7 +12,9 @@ For all organizations the storage panel displays the total number of archived it ## Crawling -The crawling panel lists the amount of currently running and waiting crawls as well as the number of total pages captured. +For organizations with a set execution minute limit, the crawling panel displays a graph of how much execution time has been used and how much is currently remaining. Monthly execution time limits reset on the first of each month at 12:00 AM GMT. + +The crawling panel also lists the number of currently running and waiting crawls, as well as the total number of pages captured. ## Collections diff --git a/docs/user-guide/signup.md b/docs/user-guide/signup.md new file mode 100644 index 0000000000..be4dc28636 --- /dev/null +++ b/docs/user-guide/signup.md @@ -0,0 +1,9 @@ +# Signup + +## Invite Link + +If you have been sent an [invite](../org-settings/#members), enter a name and password to create a new account. Your account will be added to the organization you were invited to by an organization admin. + +## Open Registration + +If the server has enabled signups and you have been given a registration link, enter your email address, name, and password to create a new account. Your account will be added to the server's default organization. diff --git a/docs/user-guide/workflow-setup.md b/docs/user-guide/workflow-setup.md index ad1b3f7bb9..4fabc3a399 100644 --- a/docs/user-guide/workflow-setup.md +++ b/docs/user-guide/workflow-setup.md @@ -27,7 +27,7 @@ It is also available under the _Additional URLs_ section for Seeded Crawls where When enabled, the crawler will visit all the links it finds within each page defined in the _List of URLs_ field. ??? example "Crawling tags & search queries with URL List crawls" - This setting can be useful for crawling the content of specific tags or searh queries. Specify the tag or search query URL(s) in the _List of URLs_ field, e.g: `https://example.com/search?q=tag`, and enable _Include Any Linked Page_ to crawl all the content present on that search query page. + This setting can be useful for crawling the content of specific tags or search queries. Specify the tag or search query URL(s) in the _List of URLs_ field, e.g: `https://example.com/search?q=tag`, and enable _Include Any Linked Page_ to crawl all the content present on that search query page. ### Fail Crawl on Failed URL @@ -205,7 +205,7 @@ Leave optional notes about the workflow's configuration. ### Tags -Apply tags to the workflow. Tags applied to the workflow will propigate to every crawl created with it at the time of crawl creation. +Apply tags to the workflow. Tags applied to the workflow will propagate to every crawl created with it at the time of crawl creation. ### Collection Auto-Add diff --git a/mkdocs.yml b/mkdocs.yml index e2c23322af..a28d737ae8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -57,12 +57,13 @@ nav: - develop/frontend-dev.md - develop/docs.md - User Guide: - - user-guide/overview.md - user-guide/index.md + - user-guide/signup.md - Crawling: - user-guide/crawl-workflows.md - user-guide/workflow-setup.md - user-guide/browser-profiles.md + - user-guide/overview.md - user-guide/archived-items.md - user-guide/collections.md - user-guide/org-settings.md