Skip to content

Latest commit

 

History

History
1518 lines (1142 loc) · 78.9 KB

README.md

File metadata and controls

1518 lines (1142 loc) · 78.9 KB

Karate UI

UI Test Automation Made Simple.

Hello World

Index

Start ZIP Release | Java | Maven Quickstart | Karate - Main Index
Config driver | configure driver | configure driverTarget | Docker / karate-chrome | Driver Types
Concepts Syntax | Special Keys | Short Cuts | Chaining | Function Composition | Browser JavaScript | Debugging | Retries | Waits | Distributed Testing | Proxy
Locators Locator Types | Wildcards | Friendly Locators | rightOf() | leftOf() | above() | below() | near() | Locator Lookup
Browser driver.url | driver.dimensions | refresh() | reload() | back() | forward() | maximize() | minimize() | fullscreen() | quit()
Page dialog() | switchPage() | switchFrame() | close() | driver.title | screenshot()
Actions click() | input() | submit() | focus() | clear() | value(set) | select() | scroll() | mouse() | highlight() | highlightAll()
State html() | text() | value() | attribute() | enabled() | exists() | position() | locate() | locateAll()
Wait / JS retry() | waitFor() | waitForAny() | waitForUrl() | waitForText() | waitForEnabled() | waitForResultCount() | waitUntil() | delay() | script() | scriptAll() | Karate vs the Browser
Cookies cookie() cookie(set) | driver.cookies | deleteCookie() | clearCookies()
Chrome Java API | pdf() | screenshotFull()
Appium Screen Recording | hideKeyboard()

Capabilities

Comparison

To understand how Karate compares to other UI automation frameworks, this article can be a good starting point: The world needs an alternative to Selenium - so we built one.

Examples

Web Browser

Windows

Driver Configuration

configure driver

This below declares that the native (direct) Chrome integration should be used, on both Mac OS and Windows - from the default installed location.

* configure driver = { type: 'chrome' }

If you want to customize the start-up, you can use a batch-file:

* configure driver = { type: 'chrome', executable: 'chrome' }

Here a batch-file called chrome can be placed in the system PATH (and made executable) with the following contents:

"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" $*

For Windows it would be chrome.bat in the system PATH as follows:

"C:\Program Files (x86)\Google\Chrome\Application\chrome" %*

Another example for WebDriver, again assuming that chromedriver is in the PATH:

{ type: 'chromedriver', port: 9515, executable: 'chromedriver' }
key description
type see driver types
executable if present, Karate will attempt to invoke this, if not in the system PATH, you can use a full-path instead of just the name of the executable. batch files should also work
start default true, Karate will attempt to start the executable - and if the executable is not defined, Karate will even try to assume the default for the OS in use
port optional, and Karate would choose the "traditional" port for the given type
host optional, will default to localhost and you normally never need to change this
pollAttempts optional, will default to 20, you normally never need to change this (and changing pollInterval is preferred), and this is the number of attempts Karate will make to wait for the port to be ready and accepting connections before proceeding
pollInterval optional, will default to 250 (milliseconds) and you normally never need to change this (see pollAttempts) unless the driver executable takes a very long time to start
headless headless mode only applies to { type: 'chrome' } for now, also see DockerTarget and the webDriverCapabilities key
showDriverLog default false, will include webdriver HTTP traffic in Karate report, useful for troubleshooting or bug reports
showProcessLog default false, will include even executable (webdriver or browser) logs in the Karate report
addOptions default null, has to be a list / JSON array that will be appended as additional CLI arguments to the executable, e.g. ['--no-sandbox', '--windows-size=1920,1080']
beforeStart default null, an OS command that will be executed before commencing a Scenario (and before the executable is invoked if applicable) typically used to start video-recording
afterStart default null, an OS command that will be executed after a Scenario completes, typically used to stop video-recording and save the video file to an output folder
videoFile default null, the path to the video file that will be added to the end of the test report, if it does not exist, it will be ignored
httpConfig optional, and typically only used for remote WebDriver usage where the HTTP client configuration needs to be tweaked, e.g. { readTimeout: 120000 }
webDriverUrl see webDriverUrl
webDriverSession see webDriverSession
webDriverPath optional, and rarely used only in case you need to append a path such as /wd/hub - typically needed for Appium (or a Selenium Grid) on localhost, where host, port / executable etc. are involved.

For more advanced options such as for Docker, CI, headless, cloud-environments or custom needs, see configure driverTarget.

webDriverUrl

Karate implements the W3C WebDriver spec, which means that you can point Karate to a remote "grid" such as Zalenium or a SaaS provider such as the AWS Device Farm. The webDriverUrl driver configuration key is optional, but if specified, will be used as the W3C WebDriver remote server. Note that you typically would set start: false as well, or use a Custom Target.

For example, once you run the couple of Docker commands to get Zalenium running, you can do this:

* configure driver = { type: 'chromedriver', start: false, webDriverUrl: 'http://localhost:4444/wd/hub' }

Note that you can add showDriverLog: true to the above for troubleshooting if needed. You should be able to run tests in parallel with ease !

webDriverSession

When targeting a W3C WebDriver implementation, either as a local executable or Remote WebDriver, you can specify the JSON that will be passed as the payload to the Create Session API. The most important part of this payload is the capabilities. It will default to { browserName: '<name>' } for convenience where <name> will be chrome, firefox etc.

So most of the time this would be sufficient:

* configure driver = { type: 'chromedriver' }

Since it will result in the following request to the WebDriver /session:

{"capabilities":{"alwaysMatch":{"browserName":"chrome"}}}

But in some cases, especially when you need to talk to remote driver instances, you need to pass specific "shapes" of JSON expected by the particular implementation - or you may need to pass custom data or "extension" properties. Use the webDriverSession property in those cases. For example:

* def session = { capabilities: { browserName: 'chrome' }, desiredCapabilities: { browserName: 'chrome' } }
* configure driver = { type: 'chromedriver', webDriverSession: '#(session)', start: false, webDriverUrl: 'http://localhost:9515/wd/hub' }

Here are some of the things that you can customize, but note that these depend on the driver implementation.

Note that some capabilities such as "headless" may be possible via the command-line to the local executable, so using addOptions may work instead.

configure driverTarget

The configure driver options are fine for testing on "localhost" and when not in headless mode. But when the time comes for running your web-UI automation tests on a continuous integration server, things get interesting. To support all the various options such as Docker, headless Chrome, cloud-providers etc., Karate introduces the concept of a pluggable Target where you just have to implement two methods:

public interface Target {        
    
    Map<String, Object> start(com.intuit.karate.Logger logger);
    
    Map<String, Object> stop(com.intuit.karate.Logger logger);
    
}
  • start(): The Map returned will be used as the generated driver configuration. And the start() method will be invoked as soon as any Scenario requests for a web-browser instance (for the first time) via the driver keyword.

  • stop(): Karate will call this method at the end of every top-level Scenario (that has not been call-ed by another Scenario).

If you use the provided Logger instance in your Target code, any logging you perform will nicely appear in-line with test-steps in the HTML report, which is great for troubleshooting or debugging tests.

Combined with Docker, headless Chrome and Karate's parallel-execution capabilities - this simple start() and stop() lifecycle can effectively run web UI automation tests in parallel on a single node.

DockerTarget

Karate has a built-in implementation for Docker (DockerTarget) that supports 2 existing Docker images out of the box:

To use either of the above, you do this in a Karate test:

* configure driverTarget = { docker: 'justinribeiro/chrome-headless', showDriverLog: true }

Or for more flexibility, you could do this in karate-config.js and perform conditional logic based on karate.env. One very convenient aspect of configure driverTarget is that if in-scope, it will over-ride any configure driver directives that exist. This means that you can have the below snippet activate only for your CI build, and you can leave your feature files set to point to what you would use in "dev-local" mode.

function fn() {
    var config = {
        baseUrl: 'https://qa.mycompany.com'
    };
    if (karate.env == 'ci') {
        karate.configure('driverTarget', { docker: 'ptrthomas/karate-chrome' });
    }
    return config;
}

To use the recommended --security-opt seccomp=chrome.json Docker option, add a secComp property to the driverTarget configuration. And if you need to view the container display via VNC, set the vncPort to map the port exposed by Docker.

karate.configure('driverTarget', { docker: 'ptrthomas/karate-chrome', secComp: 'src/test/java/chrome.json', vncPort: 5900 });

Custom Target

If you have a custom implementation of a Target, you can easily construct any custom Java class and pass it to configure driverTarget. Here below is the equivalent of the above, done the "hard way":

var DockerTarget = Java.type('com.intuit.karate.driver.DockerTarget');
var options = { showDriverLog: true };
var target = new DockerTarget(options);
target.command = function(port){ return 'docker run -d -p ' 
    + port + ':9222 --security-opt seccomp=./chrome.json justinribeiro/chrome-headless' };
karate.configure('driverTarget', target);

The built-in DockerTarget is a good example of how to:

  • perform any pre-test set-up actions
  • provision a free port and use it to shape the start() command dynamically
  • execute the command to start the target process
  • perform an HTTP health check to wait until we are ready to receive connections
  • and when stop() is called, indicate if a video recording is present (after retrieving it from the stopped container)

Controlling this flow from Java can take a lot of complexity out your build pipeline and keep things cross-platform. And you don't need to line-up an assortment of shell-scripts to do all these things. You can potentially include the steps of deploying (and un-deploying) the application-under-test using this approach - but probably the top-level JUnit test-suite would be the right place for those.

karate-chrome

The karate-chrome Docker is an image created from scratch, using a Java / Maven image as a base and with the following features:

  • Chrome in "full" mode (non-headless)
  • Chrome DevTools protocol exposed on port 9222
  • VNC server exposed on port 5900 so that you can watch the browser in real-time
  • a video of the entire test is saved to /tmp/karate.mp4
  • after the test, when stop() is called, the DockerTarget will embed the video into the HTML report (expand the last step in the Scenario to view)

To try this or especially when you need to investigate why a test is not behaving properly when running within Docker, these are the steps:

  • start the container:
    • docker run --name karate --rm -p 9222:9222 -p 5900:5900 -e KARATE_SOCAT_START=true --cap-add=SYS_ADMIN ptrthomas/karate-chrome
    • it is recommended to use --security-opt seccomp=chrome.json instead of --cap-add=SYS_ADMIN
  • point your VNC client to localhost:5900 (password: karate)
    • for example on a Mac you can use this command: open vnc://localhost:5900
  • run a test using the following driver configuration, and this is one of the few times you would ever need to set the start flag to false
    • * configure driver = { type: 'chrome', start: false, showDriverLog: true }
  • you can even use the Karate VS Code extension to debug and step-through a test
  • if you omit the --rm part in the start command, after stopping the container, you can dump the logs and video recording using this command (here . stands for the current working folder, change it if needed):
    • docker cp karate:/tmp .
    • this would include the stderr and stdout logs from Chrome, which can be helpful for troubleshooting

For more information on the Docker containers for Karate and how to use them, refer to the wiki: Docker.

Driver Types

The recommendation is that you prefer chrome for development, and once you have the tests running smoothly - you can switch to a different WebDriver implementation.

type default port default executable description
chrome 9222 mac: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
win: C:/Program Files (x86)/Google/Chrome/Application/chrome.exe
"native" Chrome automation via the DevTools protocol
chromedriver 9515 chromedriver W3C Chrome Driver
geckodriver 4444 geckodriver W3C Gecko Driver (Firefox)
safaridriver 5555 safaridriver W3C Safari Driver
mswebdriver 17556 MicrosoftWebDriver W3C Microsoft Edge WebDriver
iedriver 5555 IEDriverServer IE (11 only) Driver
msedge 9222 MicrosoftEdge very experimental - using the DevTools protocol
winappdriver 4727 C:/Program Files (x86)/Windows Application Driver/WinAppDriver Windows Desktop automation, similar to Appium
android 4723 appium android automation via Appium
ios 4723 appium iOS automation via Appium

Distributed Testing

Karate can split a test-suite across multiple machines or Docker containers for execution and aggregate the results. Please refer to the wiki: Distributed Testing.

Locators

The standard locator syntax is supported. For example for web-automation, a / prefix means XPath and else it would be evaluated as a "CSS selector".

And input('input[name=someName]', 'test input')
When submit().click("//input[@name='commit']")
platform prefix means example
web (none) css selector input[name=someName]
web
android
ios
/ xpath //input[@name='commit']
web {} exact text content {a}Click Me
web {^} partial text content {^a}Click Me
win
android
ios
(none) name Submit
win
android
ios
@ accessibility id @CalculatorResults
win
android
ios
# id #MyButton
ios : -ios predicate string :name == 'OK' type == XCUIElementTypeButton
ios ^ -ios class chain ^**/XCUIElementTypeTable[name == 'dataTable']
android - -android uiautomator -input[name=someName]

Wildcard Locators

The "{}" and "{^}" locator-prefixes are designed to make finding an HTML element by text content super-easy. You will typically also match against a specific HTML tag (which is preferred, and faster at run-time). But even if you use "{*}" (or "{}" which is the equivalent short-cut) to match any tag, you are selecting based on what the user sees on the page.

When you use CSS and XPath, you need to understand the internal CSS class-names and XPath structure of the page. But when you use the visible text-content, for example the text within a <button> or hyperlink (<a>), performing a "selection" can be far easier. And this kind of locator is likely to be more stable and resistant to cosmetic changes to the underlying HTML.

You have the option to adjust the "scope" of the match, and here are examples:

Locator Description
click('{a}Click Me') the first <a> where the text-content is exactly: Click Me
click('{^span}Click') the first <span> where the text-content contains: Click
click('{div:2}Click Me') the second <div> where the text-content is exactly: Click Me
click('{span/a}Click Me') the first <a> where a <span> is the immediate parent, and where the text-content is exactly: Click Me
click('{^*:4}Me') the fourth HTML element (of any tag name) where the text-content contains: Me

Note that "{:4}" can be used as a short-cut instead of "{*:4}".

You can experiment by using XPath snippets like the "span/a" seen above for even more "narrowing down", but try to expand the "scope modifier" (the part within curly braces) only when you need to do "de-duping" in case the same user-facing text appears multiple times on a page.

Friendly Locators

The "wildcard" locators are great when the human-facing visible text is within the HTML element that you want to interact with. But this approach doesn't work when you have to deal with data-entry and <input> fields. This is where the "friendly locators" come in. You can ask for an element by its relative position to another element which is visible - such as a <span>, <div> or <label> and for which the locator is easy to obtain.

Method Finds Element
rightOf() to right of given locator
leftOf() to left of given locator
above() above given locator
below() below given locator
near() near given locator in any direction

The above methods return a chainable Finder instance. For example if you have HTML like this:

<input type="checkbox"><span>Check Three</span>

To click on the checkbox, you just need to do this:

* leftOf('{}Check Three').click()

By default, the HTML tag that will be searched for will be input. While rarely needed, you can over-ride this by calling the find(tagName) method like this:

* rightOf('{}Some Text').find('span').click()

One more variation supported is that instead of an HTML tag name, you can look for the textContent:

* rightOf('{}Some Text').find('{}Click Me').click()

One thing to watch out for is that the "origin" of the search will be the mid-point of the whole HTML element, not just the text. So especially when doing above() or below(), ensure that the "search path" is aligned the way you expect. If you get stuck trying to align the search path, especially if the "origin" is a small chunk of text that is aligned right or left - try near().

In addition to <input> fields, <select> boxes are directly supported like this, so internally a find('select') is "chained" automatically:

* below('{}State').select(0)

rightOf()

* rightOf('{}Input On Right').input('input right')

leftOf()

* leftOf('{}Input On Left').clear().input('input left')

above()

* above('{}Input On Right').click()

below()

* below('{}Input On Right').input('input below')

near()

One reason why you would need near() is because an <input> field may either be on the right or below the label depending on whether the "container" element had enough width to fit both on the same horizontal line. Of course this can be useful if the element you are seeking is diagonally offset from the locator you have.

 * near('{}Go to Page One').click()

Keywords

Only one keyword sets up UI automation in Karate, typically by specifying the URL to open in a browser. And then you would use the built-in driver JS object for all other operations, combined with Karate's match syntax for assertions where needed.

driver

Navigates to a new page / address. If this is the first instance in a test, this step also initializes the driver instance for all subsequent steps - using what is configured.

Given driver 'https://github.com/login'

And yes, you can use variable expressions from karate-config.js. For example:

* driver webUrlBase + '/page-01'

As seen above, you don't have to force all your steps to use the Given, When, Then BDD convention, and you can just use "*" instead.

driver JSON

A variation where the argument is JSON instead of a URL / address-string, used typically if you are testing a desktop (or mobile) application. This example is for Windows, and you can provide the app, appArguments and other parameters expected by the WinAppDriver via the webDriverSession. For example:

* def session = { desiredCapabilities: { app: 'Microsoft.WindowsCalculator_8wekyb3d8bbwe!App' } }
Given driver { webDriverSession: '#(session)' }

So this is just for convenience and readability, using configure driver can do the same thing like this:

* def session = { desiredCapabilities: { app: 'Microsoft.WindowsCalculator_8wekyb3d8bbwe!App' } }
* configure driver = { webDriverSession: '#(session)' }
Given driver {}

This design is so that you can use (and data-drive) all the capabilities supported by the target driver - which can vary a lot depending on whether it is local, remote, for desktop or mobile etc.

Syntax

The built-in driver JS object is where you script UI automation. It will be initialized only after the driver keyword has been used to navigate to a web-page (or application).

You can refer to the Java interface definition of the driver object to better understand what the various operations are. Note that Map<String, Object> translates to JSON, and JavaBean getters and setters translate to JS properties - e.g. driver.getTitle() becomes driver.title.

Methods

As a convenience, all the methods on the driver have been injected into the context as special (JavaScript) variables so you can omit the "driver." part and save a lot of typing. For example instead of:

And driver.input('#eg02InputId', Key.SHIFT)
Then match driver.text('#eg02DivId') == '16'

You can shorten all that to:

And input('#eg02InputId', Key.SHIFT)
Then match text('#eg02DivId') == '16'

When it comes to JavaBean getters and setters, you could call them directly, but the driver.propertyName form is much better to read, and you save the trouble of typing out round brackets. So instead of doing this:

And match getUrl() contains 'page-01'
When setUrl(webUrlBase + '/page-02')

You should prefer this form, which is more readable:

And match driver.url contains 'page-01'
When driver.url = webUrlBase + '/page-02'

Note that to navigate to a new address you can use driver - which is more concise.

Chaining

All the methods that return the following Java object types are "chain-able". This means that you can combine them to concisely express certain types of "intent" - without having to repeat the locator.

For example, to retry() until an HTML element is present and then click() it:

# retry returns a "Driver" instance
* retry().click('#someId')

Or to wait until a button is enabled using the default retry configuration:

# waitUntilEnabled() returns an "Element" instance
* waitUntilEnabled('#someBtn').click()

Or to temporarily over-ride the retry configuration and wait:

* retry(5, 10000).waitUntilEnabled('#someBtn').click()

Or to move the mouse() to a given [x, y] co-ordinate and perform a click:

* mouse(100, 200).click()

Or to use Friendly Locators:

* rightOf('{}Input On Right').input('input right')

Also see waits.

Syntax

driver.url

Get the current URL / address for matching. Example:

Then match driver.url == webUrlBase + '/page-02'

This can also be used as a "setter" to navigate to a new URL during a test. But always use the driver keyword when you start a test and you can choose to prefer that shorter form in general.

* driver.url = 'http://localhost:8080/test'

driver.title

Get the current page title for matching. Example:

Then match driver.title == 'Test Page'

Note that if you do this immediately after a page-load, in some cases you need to wait for the page to fully load. You can use a waitForUrl() before attempting to access driver.title to make sure it works.

driver.dimensions

Set the size of the browser window:

 And driver.dimensions = { x: 0, y: 0, width: 300, height: 800 }

This also works as a "getter" to get the current window dimensions.

* def dims = driver.dimensions

The result JSON will be in the form: { x: '#number', y: '#number', width: '#number', height: '#number' }

position()

Get the position and size of an element by locator as follows:

* def pos = position('#someid')

The result JSON will be in the form: { x: '#number', y: '#number', width: '#number', height: '#number' }

input()

2 string arguments: locator and value to enter.

* input('input[name=someName]', 'test input')

As a convenience, there is a second form where you can pass an array as the second argument:

* input('input[name=someName]', ['test', ' input', Key.ENTER])

And an extra convenience third argument is a time-delay (in milliseconds) that will be applied before each array value. This is sometimes needed to "slow down" keystrokes, especially when there is a lot of JavaScript or security-validation behind the scenes.

* input('input[name=someName]', ['a', 'b', 'c', Key.ENTER], 200)

Special Keys

Special keys such as ENTER, TAB etc. can be specified like this:

* input('#someInput', 'test input' + Key.ENTER)

A special variable called Key will be available and you can see all the possible key codes here.

Also see value(locator, value) and clear()

submit()

Karate has an elegant approach to handling any action such as click() that results in a new page load. You "signal" that a submit is expected by calling the submit() function (which returns a Driver object) and then "chaining" the action that is expected to trigger a page load.

When submit().click('*Page Three')

The advantage of this approach is that it works with any of the actions. So even if your next step is the ENTER key, you can do this:

When submit().input('#someform', Key.ENTER)

Karate will do the best it can to detect a page change and wait for the load to complete before proceeding to any step that follows.

You can even mix this into mouse() actions.

For some SPAs (Single Page Applications) the detection of a "page load" may be difficult because page-navigation (and the browser history) is taken over by JavaScript. In such cases, you can always fall-back to a waitForUrl() or a more generic waitFor().

waitForUrl() instead of submit()

Sometimes, because of an HTTP re-direct, it can be difficult for Karate to detect a page URL change, or it will be detected too soon, causing your test to fail. In such cases, you can use waitForUrl(). For convenience, it will do a string contains match (not an exact match) so you don't need to worry about http vs https for example. Just supply a portion of the URL you are expecting. As another convenience, it will return a string which is the actual URL in case you need to use it for further actions in the test script.

So instead of this, which uses submit():

Given driver 'https://google.com'
And input('input[name=q]', 'karate dsl')
When submit().click('input[name=btnI]')
Then match driver.url == 'https://github.com/intuit/karate'

You can do this. Note that waitForUrl() will also act as an assertion, so you don't have to do an extra match.

Given driver 'https://google.com'
And input('input[name=q]', 'karate dsl')
When click('input[name=btnI]')
And waitForUrl('https://github.com/intuit/karate')

And you can even chain a retry() before the waitForUrl() if you know that it is going to take a long time:

And retry(5, 10000).waitForUrl('https://github.com/intuit/karate')

waitFor() instead of submit()

This is very convenient to use for the first element you need to interact with on a freshly-loaded page. It can be used instead of waitForUrl() and you can still perform a page URL assertion as seen below.

Here is an example of waiting for a search box to appear after a click(), and note how we re-use the Element reference returned by waitFor() to proceed with the flow. We even slip in a page-URL assertion without missing a beat.

When click('{a}Find File')
And def search = waitFor('input[name=query]')
Then match driver.url == 'https://github.com/intuit/karate/find/master'
Given search.input('karate-logo.png')

Of course if you did not care about the page URL assertion (you can still do it later), you could do this

waitFor('input[name=query]').input('karate-logo.png')

delay()

Of course, resorting to a "sleep" in a UI test is considered a very bad-practice and you should always use retry() instead. But sometimes it is un-avoidable, for example to wait for animations to render - before taking a screenshot. The nice thing here is that it returns a Driver instance, so you can chain any other method and the "intent" will be clear. For example:

* delay(1000).screenshot()

The other situation where we have found a delay() un-avoidable is for some super-secure sign-in forms - where a few milliseconds delay before hitting the submit button is needed.

click()

Just triggers a click event on the DOM element:

* click('input[name=someName]')

Also see submit() and mouse().

select()

You can use this for plain-vanilla <select> boxes that have not been overly "ehnanced" by JavaScript. Nowadays, most "select" (or "multi-select") user experiences are JavaScript widgets, so you would be needing to fire a click() or two to get things done. But if you are really dealing with an HTML <select>, then read on.

There are four variations and use the locator prefix conventions for exact and contains matches against the <option> text-content.

# select by displayed text
Given select('select[name=data1]', '{}Option Two')

# select by partial displayed text
And select('select[name=data1]', '{^}Two')

# select by `value`
Given select('select[name=data1]', 'option2')

# select by index
Given select('select[name=data1]', 2)

If you have trouble with <select> boxes, try using script() to execute custom JavaScript within the page as a work-around.

focus()

* focus('.myClass')

clear()

* clear('#myInput')

If this does not work, try value(selector, value).

scroll()

Scrolls to the element.

* scroll('#myInput')

Since a scroll() + click() (or input()) is a common combination, you can chain these:

* scroll('#myBtn').click()
* scroll('#myTxt').input('hello')

mouse()

This returns an instance of Mouse on which you can chain actions. A common need is to move (or hover) the mouse, and for this you call the move() method.

The mouse().move() method has two forms. You can pass 2 integers as the x and y co-ordinates or you can pass the locator string of the element to move to. Make sure you call go() at the end - if the last method in the chain is not click() or up().

* mouse().move(100, 200).go()
* mouse().move('#eg02RightDivId').click()
# this is a "click and drag" action
* mouse().down().move('#eg02LeftDivId').up()

You can even chain a submit() to wait for a page load if needed:

* mouse().move('#menuItem').submit().click();

Since moving the mouse is a common task, these short-cuts can be used:

* mouse('#menuItem32').click()
* mouse(100, 200).go()
* waitUntilEnabled('#someBtn').mouse().click()

These are useful in situations where the "normal" click() does not work - especially when the element you are clicking is not a normal hyperlink (<a href="">) or <button>.

close()

Close the page / tab.

quit()

Close the browser.

html()

Get the outerHTML, so will include the markup of the selected element. Useful for match contains assertions. Example:

And match html('#eg01DivId') == '<div id="eg01DivId">this div is outside the iframe</div>'

text()

Get the textContent. Example:

And match text('.myClass') == 'Class Locator Test'

value()

Get the HTML form-element value. Example:

And match value('.myClass') == 'some value'

value(set)

Set the HTML form-element value. Example:

When value('#eg01InputId', 'something more')

attribute()

Get the HTML element attribute value by attribute name. Example:

And match attribute('#eg01SubmitId', 'type') == 'submit'

enabled()

If the element is enabled and not disabled:

And match enabled('#eg01DisabledId') == false

Also see waitUntil() for an example of how to wait until an element is "enabled" or until any other element property becomes the target value.

waitForUrl()

Very handy for waiting for an expected URL change and asserting if it happened. See waitForUrl() instead of submit().

Also see waits.

waitForText()

This is just a convenience short-cut for waitUntil(locator, "_.textContent.includes('" + expected + "')") since it is so frequently needed. Note the use of the JavaScript String.includes() function to do a text contains match for convenience. The need to "wait until some text appears" is so common, and with this - you don't need to worry about dealing with white-space such as line-feeds and invisible tab characters.

Of course, try not to use single-quotes within the string to be matched, or escape them using a back-slash (\) character.

* waitForText('#eg01WaitId', 'APPEARED')

And if you really need to scan the whole page for some text, you can use this, but it is better to be more specific for better performance:

* waitForText('body', 'APPEARED')

waitForEnabled()

This is just a convenience short-cut for waitUntil(locator, '!_.disabled') since it is so frequently needed:

And waitForEnabled('#someId').click()

Also see waits.

waitForResultCount()

A very powerful and useful way to wait until the number of elements that match a given locator is equal to a given number. This is super-useful when you need to wait for say a table of slow-loading results, and where the table may contain fewer elements at first. There are two variations. The first will simply return a List of Element instances.

  * waitForResultCount('div#eg01 div', 4)  

Most of the time, you just want to wait until a certain number of matching elements, and then move on with your flow, and in that case, the above is sufficient. If you need to actually do something with each returned Element, see locateAll() or the option below.

The second variant takes a third argument, which is going to do the same thing as the scriptAll() method:

  When def list = waitForResultCount('div#eg01 div', 4, '_.innerHTML')
  Then match list == '#[4]'
  And match each list contains '@@data'

So in a single step we can wait for the number of elements to match and extract data as an array.

waitFor()

This is typically used for the first element you need to interact with on a freshly loaded page. Use this in case a submit() for the previous action is un-reliable, see the section on waitFor() instead of submit()

This will wait until the element (by locator) is present in the page and uses the configured retry() settings. This will fail the test if the element does not appear after the configured number of re-tries have been attempted.

Since waitFor() returns an Element instance on which you can call "chained" methods, this can be the pattern you use, which is very convenient and readable:

And waitFor('#eg01WaitId').click()

Also see waits.

waitForAny()

Rarely used - but accepts multiple arguments for those tricky situations where a particular element may or may not be present in the page. It returns the Element representation of whichever element was found first, so that you can perform conditional logic to handle accordingly.

But since the exists() API is designed to handle the case when a given locator does not exist, you can write some very concise tests, without needing to examine the returned object from waitForAny().

Here is a real-life example combined with the use of retry():

* retry(5, 10000).waitForAny('#nextButton', '#randomButton')
* exists('#nextButton').click()
* exists('#randomButton').click()

If you have more than two locators you need to wait for, use the single-argument-as-array form, like this:

* waitForAny(['#nextButton', '#randomButton', '#blueMoonButton'])

Also see waits.

exists()

This method returns an Element instance which means it can be chained as you expect. But there is a twist. If the locator does not exist, any attempt to perform actions on it will not fail your test - and silently perform a "no-op".

This is designed specifically for the kind of situation described in the example for waitForAny(). If you wanted to check if the Element returned exists, you can use the "getter" as follows:

* assert exists('#someId').exists

But the above is more elegantly expressed using locate():

* assert locate('#someId').exists

But what is most useful is how you can now click only if element exists. As you can imagine this can handle un-predictable dialogs, advertisements and the like.

* exists('#elusiveButton').click()
# or if you need to click something else
* if (locate('#elusivePopup').exists) click('#elusiveButton')

And yes, you can use an if statement in Karate !

Note that the exists() API is a little different from the other Element actions, because it will not honor any intent to retry() and immediately check the HTML for the given locator. This is important because it is designed to answer the question: "does the element exist in the HTML page right now ?"

waitUntil()

Wait for the browser JS expression to evaluate to true. Will poll using the retry() settings configured.

* waitUntil("document.readyState == 'complete'")

Note that the JS here has to be a "raw" string that is simply sent to the browser as-is and evaluated there. This means that you cannot use any Karate JS objects or API-s such as karate.get() or driver.title. So trying to use driver.title == 'My Page' will not work, instead you have to do this:

* waitUntil("document.title == 'My Page'")

Also see Karate vs the Browser.

waitUntil(locator,js)

A very useful variant that takes a locator parameter is where you supply a JavaScript "predicate" function that will be evaluated on the element returned by the locator in the HTML DOM. Most of the time you will prefer the short-cut boolean-expression form that begins with an underscore (or "!"), and Karate will inject the JavaScript DOM element reference into a variable named "_".

Here is a real-life example:

One limitation is that you cannot use double-quotes within these expressions, so stick to the pattern seen below.

And waitUntil('.alert-message', "_.innerHTML.includes('Some Text')")

Karate vs the Browser

One thing you need to get used to is the "separation" between the code that is evaluated by Karate and the JavaScript that is sent to the browser (as a raw string) and evaluated. Pay attention to the fact that the includes() function you see in the above example - is pure JavaScript.

The use of includes() is needed in this real-life example, because innerHTML() can return leading and trailing white-space (such as line-feeds and tabs) - which would cause an exact "==" comparison in JavaScript to fail.

But guess what - this example is baked into a Karate API, see waitForText().

For an example of how JavaScript looks like on the "Karate side" see Function Composition.

This form of waitUntil() is very useful for waiting for some HTML element to stop being disabled. Note that Karate will fail the test if the waitUntil() returned false - even after the configured number of re-tries were attempted.

And waitUntil('#eg01WaitId', "function(e){ return e.innerHTML == 'APPEARED!' }")

# if the expression begins with "_" or "!", Karate will wrap the function for you !
And waitUntil('#eg01WaitId', "_.innerHTML == 'APPEARED!'")
And waitUntil('#eg01WaitId', '!_.disabled')

Also see waitForEnabled() which is the preferred short-cut for the last example above, also look at the examples for chaining and then the section on waits.

waitUntil(function)

A very powerful variation of waitUntil() takes a full-fledged JavaScript function as the argument. This can loop until any user-defined condition and can use any variable (or Karate or Driver JS API) in scope. The signal to stop the loop is to return any not-null object. And as a convenience, whatever object is returned, can be re-used in future steps.

This is best explained with an example. Note that scriptAll() will return an array, as opposed to script().

When search.input('karate-logo.png')

# note how we return null to keep looping
And def searchFunction =
  """
  function() {
    var results = scriptAll('.js-tree-browser-result-path', '_.innerText');
    return results.size() == 2 ? results : null;
  }
  """

# note how we returned an array from the above when the condition was met
And def searchResults = waitUntil(searchFunction)

# and now we can use the results like normal
Then match searchResults contains 'karate-core/src/main/resources/karate-logo.png'

The above logic can actually be replaced with Karate's built-in short-cut - which is waitForResultCount() Also see waits.

Function Composition

The above example can be re-factored in a very elegant way as follows, using Karate's native support for JavaScript:

# this can be a global re-usable function !
And def innerText = function(locator){ return scriptAll(locator, '_.innerText') }

# we compose a function using another function (the one above)
And def searchFunction =
  """
  function() {
    var results = innerText('.js-tree-browser-result-path');
    return results.size() == 2 ? results : null;
  }
  """

The great thing here is that the innnerText() function can be defined in a common feature which all your scripts can re-use. You can see how it can be re-used anywhere to scrape the contents out of any HTML tabular data, and all you need to do is supply the locator that matches the elements you are interested in.

retry()

For tests that need to wait for slow pages or deal with un-predictable element load-times or state / visibility changes, Karate allows you to temporarily tweak the internal retry settings. Here are the few things you need to know.

Retry Defaults

The default retry settings are:

  • count: 3, interval: 3000 milliseconds (try three times, and wait for 3 seconds before the next re-try attempt)
  • it is recommended that you stick to these defaults, which should suffice for most applications
  • if you really want, you can change this "globally" in karate-config.js like this:
    • configure('retry', { count: 10, interval: 5000 });
  • or any time within a script (*.feature file) like this:
    • * configure retry = { count: 10, interval: 5000 }

Retry Actions

By default, all actions such as click() will not be re-tried - and this is what you would stick to most of the time, for tests that run smoothly and quickly. But some troublesome parts of your flow will require re-tries, and this is where the retry() API comes in. There are 3 forms:

  • retry() - just signals that the next action will be re-tried if it fails, using the currently configured retry settings
  • retry(count) - the next action will temporarily use the count provided, as the limit for retry-attempts
  • retry(count, interval) - temporarily change the retry count and retry interval (in milliseconds) for the next action

And since you can chain the retry() API, you can have tests that clearly express the "intent to wait". This results in easily understandable one-liners, only at the point of need, and to anyone reading the test - it will be clear as to where extra "waits" have been applied.

Here are the various combinations for you to compare using click() as an example.

Script Description
click('#myId') Try to stick to this default form for 95% of your test. If the element is not found, the test will fail immediately. But your tests will run smoothly and super-fast.
waitFor('#myId').click() Use waitFor() for the first element on a newly loaded page or any element that takes time to load after the previous action. For the best performance, use this only if using submit() for the (previous) action (that triggered the page-load) is not reliable. It uses the currently configured retry settings. With the defaults, the test will fail after waiting for 3 x 3000 ms which is 9 seconds. Prefer this instead of any of the options below, or in other words - stick to the defaults as far as possible.
retry().click('#myId') This happens to be exactly equivalent to the above ! When you request a retry(), internally it is just a waitFor(). Prefer the above form as it is more readable. The advantage of this form is that it is easy to quickly add (and remove) when working on a test in development mode.
retry(5).click('#myId') Temporarily use 5 as the max retry attempts to use and apply a "wait". Since retry() expresses an intent to "wait", the waitFor() can be omitted for the chained action.
retry(5, 10000).click('#myId') Temporarily use 5 as the max retry attempts and 10 seconds as the time to wait before the next retry attempt. Again like the above, the waitFor() is implied. The test will fail if the element does not load within 50 seconds.

Wait API

The set of built-in functions that start with "wait" handle all the cases you would need to typically worry about. Keep in mind that:

  • all of these examples will retry() internally by default
  • you can prefix a retry() only if you need to over-ride the settings for this "wait" - as shown in the second row
Script Description
waitFor('#myId') waits for an element as described above
retry(10).waitFor('#myId') like the above, but temporarily over-rides the settings to wait for a longer time, and this can be done for all the below examples as well
waitForUrl('google.com') for convenience, this uses a string contains match - so for example you can omit the http or https prefix
waitForText('#myId', 'appeared') frequently needed short-cut for waiting until a string appears - and this uses a "string contains" match for convenience
waitForEnabled('#mySubmit') frequently needed short-cut for waitUntil(locator, '!_disabled')
waitForResultCount('.myClass', 4) wait until a certain number of rows of tabular data is present
waitForAny('#myId', '#maybe') handle if an element may or may not appear, and if it does, handle it - for e.g. to get rid of an ad popup or dialog
waitUntil(expression) wait until any user defined JavaScript statement to evaluate to true in the browser
waitUntil(function) use custom logic to handle any kind of situation where you need to wait, and use other API calls if needed

Also see the examples for chaining.

script()

Will actually attempt to evaluate the given string as JavaScript within the browser.

* assert 3 == script("1 + 2")

To avoid problems, stick to the pattern of using double-quotes to "wrap" the JavaScript snippet, and you can use single-quotes within.

* script("console.log('hello world')")

A more useful variation is to perform a JavaScript eval on a reference to the HTML DOM element retrieved by a locator. For example:

And match script('#eg01WaitId', "function(e){ return e.innerHTML }") == 'APPEARED!'
# which can be shortened to:
And match script('#eg01WaitId', '_.innerHTML') == 'APPEARED!'

Normally you would use text() to do the above, but you get the idea. Expressions follow the same short-cut rules as for waitUntil().

Here is an interesting example where a JavaScript event can be triggered on a given HTML element:

* waitFor('#someId').script("_.dispatchEvent(new Event('change'))")

Also see the plural form scriptAll().

scriptAll()

Just like script(), but will perform the script eval() on all matching elements (not just the first) - and return the results as a JSON array / list. This is very useful for "bulk-scraping" data out of the HTML (such as <table> rows) - which you can then proceed to use in match assertions:

# get text for all elements that match css selector
When def list = scriptAll('div div', '_.textContent')
Then match list == '#[3]'
And match each list contains '@@data'

See Function Composition for another good example. Also see the singular form script().

scriptAll() with filter

scriptAll() can take a third argument which has to be a JavaScript "predicate" function, that returns a boolean true or false. This is very useful to "filter" the results that match a desired condition - typically a text comparison. For example if you want to get only the cells out of a <table> that contain the text "data" you can do this:

* def list = scriptAll('div div', '_.textContent', function(x){ return x.contains('data') })
* match list == ['data1', 'data2']

Note that the JS in this case is run by Karate not the browser, so you use the Java String.contains() API not the JavaScript String.includes() one.

locate()

Rarely used, but when you want to just instantiate an Element instance, typically when you are writing custom re-usable functions. See also locateAll()

* def e = locate('{}Click Me')
# now you can have multiple steps refer to "e"
* if (e.exists) karate.call('some.feature')

It is also useful if you just want to check if an element is present - and this is a bit more elegant than using exists():

* if (locate('{}Click Me').exists) karate.call('some.feature')

locateAll()

This will return all elements that match the locator as a list of Element instances. You can now use Karate's core API and call chained methods. Here are some examples:

# find all elements with the text-content "Click Me"
* def elements = locateAll('{}Click Me')
* match karate.sizeOf(elements) == 7
* elements.get(6).click()
* match elements.get(3).script('_.tagName') == 'BUTTON'

Take a look at how to loop and transform data for more ideas.

refresh()

Normal page reload, does not clear cache.

reload()

Hard page reload, which will clear the cache.

back()

forward()

maximize()

minimize()

fullscreen()

cookie(set)

Set a cookie. The method argument is JSON, so that you can pass more data in addition to the value such as domain and url. Most servers expect the domain to be set correctly like this:

Given def myCookie = { name: 'hello', value: 'world', domain: '.mycompany.com' }
When cookie(myCookie)
Then match driver.cookies contains '#(^myCookie)'

Note that you can do the above as a one-liner like this: * cookie({ name: 'hello', value: 'world' }), just keep in mind here that then it would follow the rules of Enclosed JavaScript (not Embedded Expressions)

cookie()

Get a cookie by name. Note how Karate's match syntax comes in handy.

* def cookie1 = { name: 'foo', value: 'bar' }
And match driver.cookies contains '#(^cookie1)'
And match cookie('foo') contains cookie1

driver.cookies

See above examples.

deleteCookie()

Delete a cookie by name:

When deleteCookie('foo')
Then match driver.cookies !contains '#(^cookie1)'

clearCookies()

Clear all cookies.

When clearCookies()
Then match driver.cookies == '#[0]'

dialog()

There are two forms. The first takes a single boolean argument - whether to "accept" or "cancel". The second form has an additional string argument which is the text to enter for cases where the dialog is expecting user input.

Also works as a "getter" to retrieve the text of the currently visible dialog:

* match driver.dialog == 'Please enter your name:'

switchPage()

When multiple browser tabs are present, allows you to switch to one based on page title (or URL).

When switchPage('Page Two')

switchFrame()

This "sets context" to a chosen frame (or <iframe>) within the page. There are 2 variants, one that takes an integer as the param, in which case the frame is selected based on the order of appearance in the page:

When switchFrame(0)

Or you use a locator that points to the <iframe> element that you need to "switch to".

When switchFrame('#frame01')

After you have switched, any future actions such as click() would operate within the "selected" <iframe>. To "reset" so that you are back to the "root" page, just switch to null (or integer value -1):

When switchFrame(null)

screenshot()

There are two forms, if a locator is provided - only that HTML element will be captured, else the entire browser viewport will be captured. This method returns a byte array.

This will also do automatically perform a karate.embed() - so that the image appears in the HTML report.

* screenshot()
# or
* screenshot('#someDiv')

If you want to disable the "auto-embedding" into the HTML report, pass an additional boolean argument as false, e.g:

* screenshot(false)
# or
* screenshot('#someDiv', false)

highlight()

To visually highlight an element in the browser, especially useful when working in the debugger.

* highlight('#eg01DivId')

highlightAll()

Plural form of the above.

* highlightAll('input')

Debugging

You can use the Visual Studio Karate entension for stepping through and debugging a test. You can see a demo video here. We recommend that you get comfortable with this because it is going to save you lots of time. And creating tests may actually turn out to be fun !

When you are in a hurry, you can pause a test in the middle of a flow just to look at the browser developer tools to see what CSS selectors you need to use. For this you can use karate.stop() - but of course, NEVER forget to remove this before you move on to something else !

* karate.stop()

And then you would see something like this in the console:

*** waiting for socket, type the command below:
curl http://localhost:61963
in a new terminal (or open the URL in a web-browser) to proceed ...

In most IDE-s, you would even see the URL above as a clickable hyperlink, so just clicking it would end the stop(). This is really convenient in "dev-local" mode.

Locator Lookup

Other UI automation frameworks spend a lot of time encouraging you to follow a so-called "Page Object Model" for your tests. The Karate project team is of the opinion that things can be made simpler.

One indicator of a good automation framework is how much work a developer needs to do in order to perform any automation action - such as clicking a button, or retrieving the value of some HTML object / property. In Karate - these are typically one-liners. And especially when it comes to test-automation, we have found that attempts to apply patterns in the pursuit of code re-use, more often than not - results in hard-to-maintain code, and severely impacts readability.

That said, there is some benefit to re-use of just locators and Karate's support for JSON and reading files turns out to be a great way to achieve DRY-ness in tests. Here is one suggested pattern you can adopt.

First, you can maintain a JSON "map" of your application locators. It can look something like this. Observe how you can mix different locator types, because they are all just string-values that behave differently depending on whether the first character is a "/" (XPath), "{}" (wildcard), or not (CSS). Also note that this is pure JSON which means that you have excellent IDE support for syntax-coloring, formatting, indenting, and ensuring well-formed-ness. And you can have a "nested" heirarchy, which means you can neatly "name-space" your locator reference look-ups - as you will see later below.

{
  "testAccounts": {
    "numTransactions": "input[name=numTransactions]",
    "submit": "#submitButton"
  },
  "leftNav": {
    "home": "{span}Home",
    "invoices": "{span}Invoices",
    "transactions": "{span}Transactions"
  },
  "transactions": {
    "addFirst": ".transactions .qwl-secondary-button",
    "descriptionInput": ".description-cell input",
    "description": ".description-cell .header5",
    "amount": ".amount-cell input",
  }
}

Karate has great options for re-usability, so once the above JSON is saved as locators.json, you can do this in a common.feature:

* call read 'locators.json'

This looks deceptively simple, but what happens is very interesting. It will inject all top-level "keys" of the JSON file into the Karate "context" as global variables. In normal programming languages, global variables are a bad thing, but for test-automation (when you know what you are doing) - this can be really convenient.

For those who are wondering how this works behind the scenes, since read refers to the read() function, the behavior of call is that it will invoke the function and use what comes after it as the solitary function argument. And this call is using shared scope.

So now you have testAccounts, leftNav and transactions as variables, and you have a nice "name-spacing" of locators to refer to - within your different feature files:

* input(testAccounts.numTransactions, '0')
* click(testAccounts.submit)
* click(leftNav.transactions)

* retry().click(transactions.addFirst)
* retry().input(transactions.descriptionInput, 'test')

And this is how you can have all your locators defined in one place and re-used across multiple tests. You can experiment for yourself (probably depending on the size of your test-automation team) if this leads to any appreciable benefits, because the down-side is that you need to keep switching between 2 files - when writing and maintaining tests.

Chrome Java API

Karate also has a Java API to automate the Chrome browser directly, designed for common needs such as converting HTML to PDF - or taking a screenshot of a page. Here is an example:

import com.intuit.karate.FileUtils;
import com.intuit.karate.driver.chrome.Chrome;
import java.io.File;
import java.util.Collections;

public class Test {

    public static void main(String[] args) {
        Chrome chrome = Chrome.startHeadless();
        chrome.setLocation("https://github.com/login");
        byte[] bytes = chrome.pdf(Collections.EMPTY_MAP);
        FileUtils.writeToFile(new File("target/github.pdf"), bytes);
        bytes = chrome.screenshot();
        // this will attempt to capture the whole page, not just the visible part
        // bytes = chrome.screenshotFull();
        FileUtils.writeToFile(new File("target/github.png"), bytes);
        chrome.quit();
    }
    
}

Note that in addition to driver.screenshot() there is a driver.screenshotFull() API that will attempt to capture the whole "scrollable" page area, not just the part currently visible in the viewport.

The parameters that you can optionally customize via the Map argument to the pdf() method are documented here: Page.printToPDF .

If Chrome is not installed in the default location, you can pass a String argument like this:

Chrome.startHeadless(executable)
// or
Chrome.start(executable)

For more control or custom options, the start() method takes a Map<String, Object> argument where the following keys (all optional) are supported:

  • executable - (String) path to the Chrome executable or batch file that starts Chrome
  • headless - (Boolean) if headless
  • maxPayloadSize - (Integer) defaults to 4194304 (bytes, around 4 MB), but you can override it if you deal with very large output / binary payloads

screenshotFull()

Only supported for driver type chrome. See Chrome Java API. This will snapshot the entire page, not just what is visible in the viewport.

pdf()

Only supported for driver type chrome. See Chrome Java API.

Proxy

For driver type chrome, you can use the addOption key to pass command-line options that Chrome supports:

* configure driver = { type: 'chrome', addOptions: [ '--proxy-server="https://somehost:5000"' ] }

For the WebDriver based driver types like chromedriver, geckodriver etc, you can use the webDriverCapabilities driver configuration as per the W3C WebDriver spec:

* configure driver = { type: 'chromedriver', webDriverCapabilities: { proxy: { proxyType: 'manual', httpProxy: 'somehost:5000' } } }

Appium

Screen Recording

Only supported for driver type android | ios.

* driver.startRecordingScreen()
# test
* driver.saveRecordingScreen("invoice.mp4",true)

The above example would save the file and perform "auto-embedding" into the HTML report.

You can also use startRecordingScreen() and stopRecordingScreen(), and both methods take recording options as JSON input.

hideKeyboard()

Only supported for driver type android | ios, for hiding the "soft keyboard".