The migration from Cypress to Playwright refers to the structured change of a test automation framework during ongoing operations. This is typically triggered by technical limitations such as a lack of iFrame support, scrolling problems and dependencies on third-party parallelization services. The changeover is best achieved via a hackathon to evaluate the framework, a clear prioritization of critical test paths and gradual migration in parallel with day-to-day business.
Key Takeaways
- A hackathon in which seven tools are tested on real test scenarios of your own website provides more decision-making certainty than any purely static market research or manufacturer demo.
- Cypress blocks the use of the third-party parallelization service “Currents” from version 13, which effectively forces teams who do not want to book the expensive Cypress cloud to switch frameworks.
- The transition from Cypress to Playwright takes longer than a year at Displate, despite clear prioritization, because the QA team has to maintain its regular testing operations in parallel.
- Anyone who translates Cypress code one-to-one into Playwright is fighting against the framework instead of with it, because the two tools take fundamentally different approaches to session and data management.
- Tests that run permanently against the production environment falsify analysis data if cookies and tracking parameters are not specifically deactivated.
Why Cypress is reaching its limits after years of use
Cypress is quick to learn, but forces teams into a very unique way of writing tests. The framework comes with a domain-specific language, its own logic for assertions and fixed ideas of what a test should check. Anyone who deviates from this works against the tool instead of with it.
This is exactly what became a problem for an e-commerce provider with its own factory. Behind the visible store lie many of its own admin functions: Management of artworks, copyrights, coordination with artists. This mix of standard store and custom code did not fit well with the assumptions Cypress makes about a typical test case.
A recurring point of friction was iFrames. Third-party providers such as payment service providers are often integrated as iFrames, and Cypress can only deal with this via additional plugins. Such plugins solve the problem in the short term, but have to be checked for compatibility every time a new version is released.
Screenshots in the event of an error also caused effort. On a very long page, the full-screen screenshot did not reliably land at the point where the test failed. Instead, it showed the header several times. Workaround followed workaround without a clean solution being found.
Cypress parallelization is a business model, not a feature
Cypress charges money for convenient parallelization in the cloud, and this was the trigger for the change. The framework itself is free, and local parallelization can be built using custom plugins. It only becomes convenient via the paid Cypress Cloud or via third-party providers.
The team used such a third-party provider because it needed more user seats, not more test runs. Cypress would have charged a much higher rate with many unused runs, without being flexible to the situation. These costs were higher than the entire QA budget for the year.
In addition, there was a conflict between Cypress and the third-party provider used, which went so far that newer Cypress versions could no longer be combined with the tool. As a result, the repository is still stuck on an older version, the last supported version. For the team, this was a signal to leave the Cypress ecosystem altogether. The decision was made in September 2023.
How to really choose a test automation tool
Reliable selection does not start with the market, but with your own use. Maciej Wyrodek and his team first analyzed what they were using Cypress for, what was working well and what was not. This led to a more general question: how do we want to test, and what characteristics does a framework need to have for this?
This clarification resulted in a list of criteria. The tools were given points in a large evaluation table, with some criteria weighing more heavily than others. This left around seven tools that went through to the next round.
The market research itself was sobering. Between 2019 and 2024, little had changed in terms of the tools available. The last major tool to emerge was Playwright, which was still young at the time. An earlier expectation that record-and-play tools would dominate the market had not materialized. Such tools exist, but are severely limited and only fit into the workflow for very specific requirements.
The hackathon: proof of concept instead of a feature list
A static evaluation is not enough; a tool has to prove that it works on its own system. That’s why the table was followed by a practical day. QA engineers and some developers built real proof of concepts with the candidates on their own website.
To ensure that the results remained comparable, there was a fixed list of ten test scenarios. Everyone tried to implement the same cases in the respective tool. This showed how many tests were created per tool and how time-consuming it was to get there.
The weaknesses became immediately apparent in the practical test. With one code-based tool, two experienced people were unable to create even one executable test in an entire day. The documentation was outdated after a major update, and answers could only be found in the issues of the GitHub repository. A tool that can only be used via support tickets was therefore no longer available.
Then explain to your boss why, after all this research, you’re still opting for the tool that was the obvious choice from the outset. But then you have the proof. Maciej Wyrodek
When AI test generation takes six minutes to reach the shopping cart
An AI-supported tool generated tests from natural language, but delivered unusable paths. The task was as follows: go to the homepage, select the first visible product, place it in the shopping cart, buy it. With thousands of products on the homepage, the automation got lost.
Instead of navigating directly to the product, the tool clicked through menus, from one collection to the next, then to a brand page. A newsletter window popped up and closed by itself. Only after six minutes of clicking did the test land on a product page and add something to the shopping cart. Such a scenario is not maintainable.
The reason lies in the mechanics: A language model regenerates the path as soon as you trigger the test again. Only when a generated path is confirmed and fixed as a script is something stable created. Without this fixation, the same tool produces a different path for each run.
Why Playwright clearly won the comparison
Playwright delivered the best results in the practical test and clearly outperformed the alternatives. The only two participants who completed all ten scenarios both worked with Playwright. One used its recording function, another combined Playwright with an AI-supported IDE and got all ten cases to run stably during the hackathon.
There is an honest caveat to this classification. The tests generated in this way were not yet ready for production. Without clean code, the entire test would have had to be re-recorded every time a change was made, similar to pure record-and-play. The hackathon showed the potential, not the finished result.
Despite the predictable choice, the effort was worthwhile. Only the practical test turns an assumption into a proven decision that can also be defended to management.
Migration in layers: from the critical path to the outside
A sensible migration sequence is based on the business value of the tests, not their order in the repository. The team structured its end-to-end testing in multiple layers:
| test suite | purpose | run interval |
|---|---|---|
| Top 10 | The ten most business-critical paths (“money-makers”) in production | Every 10 minutes during peak season, otherwise every 30 minutes |
| Smoke suite | Runs after every deployment, errors mean fix or rollback | Per deployment |
| Full regression | Broad coverage, errors are not critical but can be fixed quickly | Several times a day |
We started with the top 10: these tests need to run the fastest, are rarely the simplest and cut vertically through the entire system. This is precisely why they are suitable as the first touchstone for the new framework.
A pragmatic rule applies to running Cypress tests. Small defects such as a changed locator are repaired in Cypress. If the logic of a test changes, it is moved to Playwright. New features are tested exclusively in Playwright.
Session handling and request interception differ more than expected
Frameworks do not handle sessions, cookies and network requests in the same way, and this becomes expensive during migration. The different session management was particularly surprising during the changeover. The website uses a lot of A/B testing as well as data in local storage, session storage, cache and cookies. In order for tests to run reliably, this state must be set cleanly.
One specific reason for this is that many tests run directly in production. This is why analysis cookies must be deactivated, otherwise test runs will appear in the evaluations. In the past, it has already been noticed internally that a product page frequently used in testing generated high visitor numbers without any purchases, simply because the automation was accessing it hundreds of thousands of times.
The interception of API requests was solved more compactly in Cypress, in Playwright the same thing required more research. What was a two-liner in Cypress, such as reading an authorization request, required working through several instructions in Playwright.
The blocking of hosts was particularly underestimated. In Cypress, one option in the configuration block was sufficient. In Playwright there is no direct equivalent, the team had to build the request interception themselves. This requirement was not even on the criteria list because it was tacitly assumed that a tool with request handling would cover it.
Translating tests is not the same as rethinking tests
Porting Cypress code line by line to Playwright creates bad tests. The original agreement was to adopt the logic, but not the code one-to-one. Under deadline pressure, this rule was lost and Cypress code was rewritten directly into Playwright. Maciej describes this as trying to turn a square back into a circle.
This is not an individual failure, but a typical effect of pressure. The deadline is approaching, the new tool has not yet become second nature, and the familiar old code becomes a crutch. Those in the middle of familiarization fall back on familiar patterns, even if they don’t fit the new framework.
Some of these tests are old, one is older than the current QA lead’s affiliation with the company. Such grown tests require a conscious redesign instead of a translation when moving.
AI as a migration helper: potential with an open maintainability risk
AI can speed up the translation from Cypress to Playwright, but the maintainability of the results is not yet assured. A developer experimented with an AI-supported IDE and fed it with the Cypress repository, the Playwright documentation and the task of recreating a specific test in Playwright.
After a few attempts, working tests were created. In a live demo, however, the same procedure failed, showing the typical gap between individual success and reproducibility. The process is promising, but not yet reliable.
The real leverage lies elsewhere. If the entire team learns to use the AI IDE better and write more precise prompts, the speed of creating tests will increase. The central question remains whether the generated tests are maintainable in the long term.
Get front-end developers involved in test automation
The switch to Playwright is used to anchor testing more firmly in the development team. Frontend developers are already participating in the pull requests during the migration. As soon as the top 10 suite is stabilized, they will write and maintain their own tests.
This fits in with the QA team’s working model. It does not work with one tester per team, but like a platform team that supports others, comparable to the structure of a DevOps team. The developers mainly test their features themselves, while the QA team provides methodology and solves difficult cases, such as how a new payment method can be tested in a specific market.
This results in a realistic view of the pace. There is not enough capacity for migration throughout the year, in some months hardly any. Rotating responsibility distributes the automation work, and the order is based on the backlog, for which some capacity is deliberately reserved.


