Test automation with Selenium

Selenium is an open source framework for browser automation that remotely controls all common browsers via standardized web driver interfaces. The main use case is the automated testing of web applications: Page elements can be read, clicked and filled. Selenium supports many programming languages and, since version 4, is considered to be much more stable because browser manufacturers supply the web driver themselves.

Key Takeaways

Selenium is not a testing tool, but a browser automation tool: It remotely controls any browser via the web driver, regardless of manufacturer and operating system.
Those who master CSS and XPath selectors and consistently use the page object pattern keep maintenance costs to a minimum, even with frequent UI changes.
Choosing the same programming language for testing and product code increases the willingness of developers to fix test errors themselves and write their own test cases.
Cucumber in combination with Selenium makes it visible to the product owner which technicality a test covers without having to read the code.

What Selenium really is: browser automation, not a testing tool

Selenium is an interface for browser automation and not a testing tool in the narrower sense. This distinction is often blurred. In tenders and job advertisements, Selenium appears as “the test automation tool”, but the Selenium project itself describes it as a means of remotely controlling browsers.

Nevertheless, the main practical use case is test automation. Because much of the software is built as a web application, it makes sense to precisely control a browser and use it to execute test cases.

Technically, the WebDriver sits between Selenium and the browser. There is a separate WebDriver for each browser, which takes over remote control. Selenium addresses this WebDriver and hides the differences between Chrome, Firefox, Safari or Opera. You write your test once without worrying about the details for each browser.

Since the WebDriver standard, the browser manufacturers themselves are responsible for ensuring that this connection works. Whoever brings out a web browser includes the appropriate WebDriver support. The result is more stability, more efficiency and better performance than in the past.

In which programming languages Selenium is used

Selenium is available as a connection for many programming languages, and this is precisely what makes it flexible. Testers can be written in scripting languages such as Ruby or Python, as well as in Java, Kotlin or C#. The number of implementations exceeds the languages that a single developer can even master.

The integration runs via the respective language library. In Java or Kotlin, you integrate the dependencies into your project and use them to access the Selenium library to control the browser.

This freedom has a concrete consequence for teams: you can choose the language that is used in the project anyway. This means that test automation does not fall outside the technical scope of the product.

How Selenium controls the browser

Selenium works on two levels, similar to how a human operates a browser. At the browser level, you open tabs, navigate, reload pages and set settings. At the page level, you read information and interact with the web elements.

At element level, you query states, click, enter text, select values or perform drag-and-drop. This essentially covers what a user does in the browser.

Browser settings can be controlled in the same way. A typical example: A web application offers a PDF for download, which the browser opens in the internal viewer by default. A setting can be made to force the file to be downloaded instead. Such configurations are among the adjustments that make a test run stable.

Good selectors determine the maintenance effort

The choice of selectors determines how maintenance-intensive test automation is. For simple cases, ID or tag name are sufficient. As soon as things get more complex, you need solid knowledge of CSS selectors and, above all, XPath selectors. If you put these together cleverly, you will keep the maintenance effort to a minimum.

A common mistake is the absolute XPath from the browser context menu. Right-click “Copy XPath” to create a path in the test code that extends from the root element to the target element. If one small thing changes on the page, the test breaks and nobody recognizes which element was meant. Selectors should be readable.

There are libraries that automatically replace a failed locator with a new one. However, a failed test with a surface change is not a defect, but a signal. The test has detected that something has changed, and that is its purpose.

If you follow the page object pattern, you change a locator in exactly one place, even if a hundred test scripts address it. Clean structure reduces the maintenance effort to a minimum.

Selenium is development work

Many problems that appear to be caused by Selenium are actually caused by the programming language used. Test automation with Selenium is development work, not a click script.

If you have a lot of problems that you think you have with Selenium, they are usually not with Selenium itself, but with the programming language you used to make it. (Boris Wrubel)

This leads to a clear recommendation for the choice of language: Take the language in which the product is built. This gets the developers on board. If an error occurs during a test run, they will find it more quickly and can take action themselves, both when fixing it and when writing new tests.

This approach fits in with the Scrum idea that there is no detached tester. Quality becomes the task of the whole team, not something that is outsourced to an individual.

How to integrate the technical side

The technical side comes into play via Cucumber. Selenium in conjunction with Cucumber makes product owners and business visible what a test is checking without them having to read the code. Well-written scenarios show the tested technicality directly.

Who writes the scenarios is a separate question. Having the department write the scenarios alone is not ideal. It makes more sense to formulate them together so that the business view and technical feasibility come together.

Cucumber steps need a sensible cut. Endlessly long scenarios in which every single UI step is described in highly parameterized form become confusing. In such cases, it is authorized to ask what Cucumber is for at all. If you map every click as a step, you can just as well write the process down cleanly in the test class. Cucumber is only worthwhile if the scenarios express technicality, not mouse movements.

How to cut UI tests correctly

UI tests belong at the top of the test pyramid, i.e. deliberately sparse. You pick out a business case and play it through at business case level. Variants that do not change anything significant on the surface belong in integration testing and unit testing.

This may look different at the start of the project. As long as the software is small, you like to test several variants via the UI and check whether things are displayed correctly in order to have something in hand at all. You can delete or archive such tests later when the lower levels take over the coverage.

This discipline prevents the slow, expensive UI layer from being overloaded with a hundred scenarios that are better off elsewhere.

Why good reporting determines acceptance in the team

Reporting is the lever that motivates developers to participate. If a Selenium test fails, the report must quickly make the cause visible. Otherwise, test automation will remain an island.

A useful report shows the step at which the test failed and the reason: an unidentified object or an unavailable backend service. This includes a screenshot, the HTML code of the page and one or two additional pieces of information as required. Videos of the test runs are possible, but are usually only worthwhile as an exception if normal reporting is not sufficient.

The central question that a report must answer is simple: does the application have an error or does the test have an error? Recognizing this quickly keeps the hurdle for the team low.

Jenkins offers one of the best Cucumber report engines for integration into the pipeline. Cucumber delivers its result as a JSON file, and there are open source tools on GitHub that prepare this file in a readable format and sometimes generate a PDF from it, including screenshots and other attachments.

SeleniumCucumberGrow: ready to go in under an hour

A common counter-argument is that building a greenfield site is too expensive, even though Selenium costs nothing. This is precisely the problem addressed by the open source project SeleniumCucumberGrow, whose name stands for Selenium, Cucumber and Grow.

The tool is a project generator. You enter the project name and package name, and it generates an executable Selenium project with an example that searches for “software testing” on a Wikipedia page. This eliminates the repeated rebuilding of package and class names that made earlier project copies so tedious.

The project’s claim is in the title of its conference presentations: Get started with less than one hour. Within an hour, the first test case is ready for use against your own website. The project is kept on the latest Selenium versions and dependencies and uses Cucumber’s on-board tools for reporting, supplemented by the information needed for quick troubleshooting. An extension for accessibility testing is in progress, which displays the element that has not passed an accessibility check.

What you need to get off to a clean start

The biggest pitfall is the lack of a concept for test automation. A working example alone is not enough; you need an idea of how to approach automation before you swap the Wikipedia demo for your own application.

Two things will get you started:

Follow Page Object pattern It bundles locators in one place and keeps testing maintainable.
Master CSS and XPath locators Without this knowledge, you’ll end up with brittle absolute paths that break at the slightest change.

Modern IDEs such as Eclipse or IntelliJ help with good plugins that accelerate test development. In combination with the page object pattern, this means that the first test case against your own website is actually ready in a short time.

Current developments: from Selenium 3 to 4

With Selenium 4, a lot has changed after a long announcement. Version 3 ran for a long time, the jump to 4 took some time, but is now well established.

Relative locators are new. You search for elements that are to the left, right, above or below another element. This simplifies localization for certain layouts.

The biggest simplification concerns WebDriver handling. Previously, you had to provide the appropriate ChromeDriver for your operating system for each Chrome version, place the binary in the correct directory and set the path. Boni Garcia’s WebDriver Manager, originally a separate GitHub project, has been integrated into the project as Selenium Manager. Since the 4.6/4.8 versions, the driver handling is included out of the box, and you no longer have to worry about downloading the correct driver.

Where test automation with AI is heading

Artificial intelligence will change test automation, most likely initially as support. Initial approaches leave parts of testing to AI, especially in the visual area.

In the mobile sector, there are libraries that receive an instruction such as “press the login button” and find out for themselves which element is the login button, whether using machine learning or other methods. This type of assisted automation is foreseeable.

The vision of a tester bot that you just give your website to and that tests independently and returns a report is further away. With it immediately comes the open question of trust: how sure can you be that the bot is testing the right thing? Or, more to the point: Who is testing the test automation?