Your team is probably feeling this already. A product manager writes a test in a ticket, a QA engineer translates it into Cypress or Playwright, the UI shifts a week later, and the test fails for reasons that have nothing to do with the feature.
That’s the gap plain-English tests can close, especially when an AI agent is the executor rather than a human reader. Writing test cases in plain English isn’t just about making them easier to read. It’s about making intent explicit enough that a machine can act on it reliably in a real browser, without forcing your team to maintain a pile of brittle selectors and helper code.
The catch is that most advice on test writing was built for manual QA or BDD ceremonies. It doesn’t help much when you need a test case that can run inside CI, produce a useful failure report, and survive normal product change. For AI execution, wording matters more than people expect. Structure matters even more.
The Anatomy of an Effective Plain-English Test
A good plain-English test is compact, specific, and boring in the right way. If a step needs interpretation, you’ve already introduced failure risk.
For AI execution, I use four core parts:
- Test ID
- Preconditions
- Numbered steps
- Expected result
That sounds basic, but skipping any one of them creates noise. AI agents do better when the test reads like an operating instruction, not a note from a sprint planning session.

Start with an identifier and purpose
The Test ID isn’t admin overhead. It gives your team a stable reference in pull requests, bug reports, and pipeline output.
A title like “Login works” is too loose. A title like AUTH-LOGIN-001 User signs in with valid credentials tells everyone what the test is about and what category it belongs to.
The purpose should be one sentence. Keep it narrow.
- Good: Verify that a registered user can sign in from the login page and reach the dashboard.
- Weak: Validate authentication, navigation, user state, and account access across the app.
One test should prove one thing clearly.
Preconditions are where flakiness starts or ends
Most unreliable tests don’t fail in the click step. They fail because the starting state was never defined.
Write preconditions that answer these questions:
- User state: Is the user logged out, logged in, suspended, or newly created?
- Data state: Does the account already exist?
- Environment state: Which page should be open at the start?
- Feature flags: Are any experiments or toggles relevant?
Example:
- Precondition 1: A user account exists with an active email and valid password.
- Precondition 2: The user is logged out.
- Precondition 3: The login page is accessible.
If your team needs a reusable starting format, a practical test case template for browser-based workflows helps keep cases consistent.
Practical rule: If a precondition only exists in someone’s head, it doesn’t exist.
Steps should be commands, not commentary
Write each step as one action. Don’t combine input, navigation, and assertion in a single sentence unless the sequence is inseparable.
A usable login test looks like this:
- Test ID: AUTH-LOGIN-001
- Title: User signs in with valid credentials
- Preconditions: Active user account exists. User is logged out. Login page is open.
Steps
- Enter the registered email address into the Email field.
- Enter the valid password into the Password field.
- Click the Sign in button.
Expected result
- The user is redirected to the dashboard.
- The dashboard header shows the signed-in user’s name.
- No login error message is visible.
Notice what’s missing. No implementation detail. No CSS selector language. No “wait for API response”. No assumption that the agent knows which dashboard widget matters.
Keep expected results singular in intent
You can list multiple checks, but they should support one outcome. The expected result isn’t a second script. It defines what success means.
That’s the main difference between a readable test and an executable one. A readable test says what happened. An executable one leaves little room for interpretation.
Translating User Needs into Testable Steps
Most bad tests are born before anyone writes step one. They start with a vague user story and stay vague all the way into execution.
Take a common backlog item: a user wants to filter products so they can find what they need. That sounds fine in planning. It’s not enough for an AI agent, and it’s not enough for a human tester either.

Break the story into product decisions
Start with the business need, then force it into observable behaviour.
For a product filter, I’d ask:
- Filter type: Can the user filter by category, price, brand, or availability?
- Result behaviour: Does the page update instantly or only after clicking Apply?
- Empty state: What should appear when no products match?
- Boundary rules: What are the minimum and maximum values for price?
- Persistence: Does the filter stay applied after refresh or navigation?
- Conflict handling: What happens if the user selects incompatible filters?
Those questions aren’t ceremony. They expose the assumptions that usually get left out of tickets.
Turn one story into several scenarios
From that single filter story, you often get multiple tests rather than one long script.
A practical split might look like this:
- Happy path: User applies a valid category filter and sees matching items.
- No-result path: User applies filters that return no products and sees an empty-state message.
- Boundary path: User enters the highest allowed price and the filter still applies correctly.
- Validation path: User enters an invalid price range and sees an error or blocked action.
- Persistence path: User refreshes the page and the selected filter remains visible, if that’s the intended behaviour.
Plain-English testing becomes useful at this stage. You’re not writing code first. You’re clarifying what the product should do so that the eventual execution is dependable.
The fastest way to produce flaky tests is to automate a requirement that nobody pinned down.
A rough story to a clean scenario
Here’s how I’d convert the vague filter story into one concrete case.
User story
As a shopper, I want to filter products by price so I can find items within my budget.
Questions answered
- The price filter has a minimum and maximum field.
- Clicking Apply updates the product grid.
- If no products match, the page displays a no-results message.
- The filter accepts values only within the configured range for the product catalogue.
Plain-English test
- Test ID: CATALOG-FILTER-003
- Title: User filters products by valid price range
- Preconditions: Product listing page is open. Products exist within the selected range.
Steps
- Enter the minimum allowed value for the desired price range into the Min price field.
- Enter the maximum allowed value for the desired price range into the Max price field.
- Click the Apply filter button.
Expected result
- The product grid refreshes.
- Only products within the selected price range are shown.
- No validation error is displayed.
That’s testable. It’s also reviewable by product, QA, and engineering without translating between different dialects.
Watch for hidden coupling
The mistake teams make here is folding unrelated checks into one scenario. A filter test suddenly becomes a sorting test, then a pagination test, then a saved-preferences test.
Don’t do that. Keep user intent clean. If the user need changes, start a new test.
When writing test cases in plain English for AI execution, clarity starts in requirement analysis, not in the wording of the final step.
Proven Phrasing Patterns and Common Pitfalls
The wording of a test can decide whether an AI agent moves through the browser cleanly or wanders into the wrong element, the wrong page state, or the wrong assertion.
This isn’t just style. According to a 2024 Australian QA benchmark by TestRail, test cases written in clear, plain English achieved 92% pass rates on their first run, compared to 78% for those using technical jargon or ambiguous phrasing (TestRail’s effective test case templates article).
Use language that points to one obvious action
The best phrasing usually has three traits:
- Direct verb: click, enter, select, open, verify
- Visible target: the button, field, tab, or message the user can identify
- Single intention: one action per line
Passive voice causes trouble because it hides the actor and blurs the sequence.
Compare these:
- Weak: The Sign in button should be clicked.
- Better: Click the Sign in button.
The second version is shorter and easier for both humans and machines to follow.
Plain-English Phrasing Do’s and Don’ts
| Anti-Pattern (Don't) | Effective Pattern (Do) |
|---|---|
| Click the button to continue | Click the Continue button |
| Enter valid details | Enter a registered email address into the Email field |
| Submit the form and verify success | Click the Submit button. Verify that the success message is visible |
| Go to settings and update the profile and save it | Open Settings. Update the Display name field. Click Save changes |
| Check the product appears correctly | Verify that the product card shows the selected product name |
| If needed, log in first | Add User is logged in to preconditions |
| Click the blue button on the right | Click the Sign up button in the header |
| Confirm the page loaded properly | Verify that the page heading Your Orders is visible |
| Create a new order with random data | Create a new order using the test data defined in preconditions |
| Verify everything worked | Verify that the order confirmation message is visible |
The mistakes that make tests brittle
Some anti-patterns look harmless because a person can infer the meaning. An AI agent can’t rely on that same level of shared context.
Common failure points include:
- Ambiguous references: “click the icon” when there are several icons on screen.
- Bundled actions: “log in and open billing and download the invoice”.
- Hidden assumptions: “use the existing test user” when no test user is defined.
- Overly technical phrasing: “trigger the auth submit event” instead of describing the visible action.
- Premature selector thinking: writing steps as if you’re already coding locators.
Write for the screen the agent sees, not the DOM you remember.
Phrasing that survives product change
Teams often swing too far in either direction. They write steps that are too vague to execute, or so rigid that any UI tweak breaks intent.
A stable middle ground looks like this:
- Name visible labels where possible.
- Describe relative context only when needed, such as “in the header” or “below the product image”.
- Avoid styling language unless colour or position is the actual requirement.
- Put data setup in preconditions instead of squeezing it into action steps.
Good plain-English tests don’t pretend the interface will never change. They anchor the instruction to what users can recognise. That’s why writing test cases in plain English works best when the language is specific without becoming implementation-heavy. You want the instruction to survive a component refactor, not collapse because a class name moved.
Writing Effective Assertions and Handling Dynamic Data
Clicks aren’t the hard part. Verification is.
A plain-English test only becomes valuable when its assertions prove something meaningful about system behaviour. “Page loaded” isn’t enough. “Element exists” usually isn’t enough either.
Write assertions around observable outcomes
Strong assertions describe what the user or system should clearly show after an action.
Useful assertion patterns include:
- Text content: Verify that the confirmation banner shows “Profile updated”.
- State checks: Verify that the Submit button is disabled.
- Visibility checks: Verify that the error message is not visible.
- Navigation checks: Verify that the account page heading is visible after save.
- Selection checks: Verify that the chosen plan is marked as selected.
These are better than broad statements like “verify success” because they tell the agent what evidence counts.
Handle dynamic values explicitly
Real products generate moving data. Usernames, order IDs, timestamps, and email addresses often change on every run.
The fix isn’t to avoid dynamic data. The fix is to name it.
A clean pattern is to define data once, then reference it consistently:
- Precondition: Generate a unique email address for this test run.
- Step: Enter the generated email address into the Email field.
- Assertion: Verify that the account summary shows the generated email address.
You can do the same with order numbers:
- Complete the purchase flow.
- Store the displayed order number from the confirmation page.
- Open the Orders page.
- Verify that the stored order number appears in the order list.
That keeps the test self-contained. It also stops later assertions from relying on hard-coded values that won’t exist on the next run.
Keep setup data separate from behavioural checks
A common mistake is mixing generated data rules into the action itself.
- Messy: Enter a random valid email and password and verify account creation works.
- Cleaner: A unique valid email address is available for this test run. Enter the generated email address into the Email field.
That separation matters because it makes failures easier to diagnose. If the test fails, you know whether the problem was data preparation, action execution, or application behaviour.
For teams validating structured payloads or API-backed UI states, this also pairs well with a practical guide to validating a JSON object in test workflows.
Check this explicitly: If a value is created during the test, say how later steps should refer to it.
Negative assertions need care
“Verify that no error appears” sounds simple, but it can become weak if the test doesn’t define which error matters.
Prefer explicit language:
- Verify that the “Invalid password” message is not visible.
- Verify that the cart count does not increase.
- Verify that the Save button remains disabled until all required fields are completed.
That gives the AI agent a concrete absence to verify, not a vague sense that “nothing bad happened”.
Connecting Your Tests to the CI Pipeline with an AI Agent
A plain-English test file sitting in a doc is still manual work. The primary payoff comes when the same scenarios run automatically on pull requests, staging deploys, or release checks.
That’s where AI execution becomes more than a writing style. It becomes an operating model for quality.

Why teams are moving away from coded UI suites
For small product teams, coded browser tests often start as a sensible choice and then turn into maintenance debt. Locators drift. Helper abstractions pile up. A small UI change breaks five unrelated specs.
That cost is real. According to the 2025 Australian Computer Society Digital Pulse report, maintenance of brittle test automation frameworks like Playwright and Cypress consumes up to 30% of development time in Australian SaaS startups (Coursera’s summary referencing the ACS Digital Pulse report).
That doesn’t mean code-based testing is useless. It means teams should be selective about where code is worth the upkeep.
What CI-friendly plain-English tests need
If you want an AI agent to run tests reliably in CI, the test case should include more than user actions.
It should define:
- Trigger context: When should the suite run, such as pull request, nightly build, or post-deploy?
- Environment target: Which environment should the agent open?
- Data expectations: What seeded or generated data must exist?
- Pass criteria: Which visible outcomes count as success?
- Failure output: What evidence should the pipeline report back, such as failed step, screenshot, or run log?
A CI-ready test is less like a manual script and more like an executable specification.
A practical pipeline flow
A simple workflow usually looks like this:
- A developer opens a pull request.
- The CI system deploys a preview environment.
- The AI agent receives a small suite of plain-English tests tied to that change.
- The agent runs those tests in a real browser.
- The pipeline posts results back into the pull request.
The biggest benefit isn’t just automation. It’s that product, QA, and engineering can all review the same test text.
That changes ownership. Quality stops being trapped inside whoever knows the test framework best.
Where tools fit
This model works with common CI systems like GitHub Actions and GitLab CI, provided the execution layer can consume plain-English scenarios and return structured results. One option is an AI testing agent workflow. Tools like e2eAgent.io are built around that pattern. They execute browser tests from plain-English instructions rather than requiring teams to hand-author selectors and test code.
The trade-off is straightforward. You give up some low-level scripting control in exchange for lower maintenance and broader team participation.
That trade is often worth making for regression flows, release checks, and cross-functional acceptance tests. It’s less compelling for highly custom framework logic or deep implementation-level edge cases, where code still gives tighter control.
Common Questions About Plain-English Testing
Is this the same as BDD with Cucumber
Not quite. BDD frameworks give you a formal syntax and often a layer of step definitions underneath. Plain-English AI testing focuses less on ceremony and more on executable clarity.
If your team already has a disciplined BDD setup, that may still work. But many teams end up maintaining both the feature file and the underlying automation glue. Plain-English AI execution removes much of that translation layer.
Can it handle conditional logic
Yes, but the test should stay readable. If a scenario has many branches, split it into multiple tests instead of writing one giant conditional narrative.
A/B tests, role-based differences, and feature-flag behaviour usually work better as separate cases with explicit preconditions.
Is it only useful for UI testing
No. The most obvious use is browser-based testing, but the same writing discipline helps with API-backed checks and state validation. The key is still the same. Name the input, the action, and the expected outcome clearly.
When teams struggle, it’s rarely because the system is too complex. It’s because the test mixes too many concerns.
What’s the learning curve for non-technical team members
Usually lower than expected, provided someone sets the standard. Product managers and manual testers can write valuable test cases quickly when the structure is fixed and examples are available.
What they need isn’t coding skill. They need feedback on ambiguity, scope, and expected results.
“Click the button” is easy to write. “Click the Save button in the billing form” is what makes the test executable.
Will this replace engineers who write automation
No. It changes where engineers spend time.
Instead of spending most of their effort maintaining fragile UI scripts, engineers can focus on test strategy, risk coverage, environment design, and the cases that need coded control. That’s a better use of experienced automation skill.
What should a team do first
Start with a narrow regression slice. Pick one business-critical flow such as login, checkout, or account creation. Rewrite those cases in a plain-English format with clear preconditions, single-action steps, and explicit assertions.
Then run them in CI and review the failures closely. The first round usually teaches the team where their wording is vague, where their environments are unstable, and which assumptions were never documented.
If your team wants to stop babysitting brittle browser scripts and start running executable tests from plain-English scenarios, e2eAgent.io is worth evaluating. It’s built for teams that want AI to execute real browser workflows from written test steps, with less dependence on hand-maintained Playwright or Cypress code.
