Posts

The challenge for this week deals with a common way of structuring feature file headers and scenario headers.

We write feature introductions in the user story format (As a… I want… So that…). Then we add scenarios related to different user personas and different user needs, and the original description no longer fits the contents. As the feature file grows, this becomes more and more wrong. Any ideas on structuring the story at the top to be more generic, but not too vague?

Gherkin, the format of Given-When-Then files, allows users to add any text after a scenario or a feature title, and just ignores it from an automation perspective. This is incredibly useful to provide more context for readers. Most online examples show the feature description in the Connextra user story format, such as the one below:

Feature: pending invoices report

  As a client account manager, 
  I want a report showing all pending (issued but unpaid) invoices for a client
  So that I can control control client credit risk


This usually works well when a feature is first introduced, since any new functionality should come with a clearly associated user need. But as more stories extend the feature, the connection between the feature and the stories becomes complicated.

For example, we might initially add a pending invoice report because an accounts manager wants it, in order to control client credit risk. But as the system grows, and we might add features to the pending invoice report that help accounts managers achieve other goals. A regular customer might be eligible for a discount, so client managers might want to know quarterly subtotals and averages to decide on discount amounts. Other types of users might want changes to the same report to achieve their own objectives. For example, call centre operators might need a few tweaks to solve client problems faster, or accountants might use it to prepare end-of-year tax returns. After a few such updates, the original value statement no longer captures the purpose for the feature. Teams sometimes try to make the user story at the top more and more generic, to encapsulate all the needs and personas, but then it becomes too vague.

The challenge for this week is: How to write a good description for a feature or a scenario? Think about a feature that evolved over time, potentially through dozens of stories or tasks. What are your ideas on how to structure those descriptions? Should they look like a user stories, or something else? What should they contain?

Post your suggestions using the link below. If you’d like to send a longer comment, perhaps write a blog post and then just send us the URL (you can do that using the same form, link below).

Also, if you have a problem related to Given-When-Then that you’d like help with, or know a topic that might be interesting for the community to discuss, please propose it as a challenge for one of the next articles using the second link.

Stay up to date with all the tips and tricks and follow SpecFlow on Twitter or LinkedIn.

PS: … and don’t forget to share the challenge with your friends and team members by clicking on one of the social icons below 👇

The challenge for last week was to improve specifications that deal with pauses, in particular those that wait for a period of time. For a detailed explanation of the problem, check out the original post. This article contains an analysis of the community responses, two ways of cleaning up the problematic specs, and tips on more general approaches to solving similar problems.

Some people noted that Given-When-Then tools aren’t the right solution for testing systems requiring synchronization. However, issues with waiting and synchronization are general problems of test design, not something specific to Given-When-Then tools. The design ideas outlined in this article are applicable to other classes of tools, and other types of tests as well. However, there is one specific aspect of Given-When-Then that is important to consider first: where to define the pauses.

Move waiting into step implementations

Lada Flac, commenting on the problem of asynchronous web page elements, suggested pushing the pauses into step implementations:

“I don’t see a need to add the waits to gherkin steps. I would add them only to the implementation steps. And those would not be implicit waits, but those where a script would try to find an element until the timeout.”

Lada is spot on. The most common cause for waiting in tests is the way a test is executed. For example, testing through a browser, or over a network, implies asynchronous network operations so it may require synchronisation. In cases such as that, it’s best to place the waiting in the step implementations, not the scenario definition. Describing a flexible periodic polling process with exponential back-off is huge challenge using Given-When-Then, but it’s very easy to do with C#, Java or any other programming language.

Kuba Nowak suggested a similar way to rephrase the waiting, using steps such as “And waits until payment page opens”, and then dealing with the meaning of “opens” within the step implementation.

Moving the waiting into step implementations makes the scenario definitions shorter and more focused on the problem domain. A team can discuss such scenarios more easily with business representatives. By moving the pauses into step implementations, as Lada and Kuba suggest, we can also postpone the decision on how to perform them. This gives us the option to avoid waiting for time. Jonathan Timm nicely explained it:

Regardless of the tool, automating user interaction needs to be a two-way conversation between the application under test and the automation code. Human users take cues from user interfaces unconsciously, and the same cues can be listened for with code using existing native functionality in tools like Selenium.

Jonathan noted that WebDriver, a popular user interface automation tool, supports waiting until an element becomes visible or clickable. Instead of pausing for a specified period, the implementation of a step can pause until some interface elements appear or become active.

Whenever possible, use this trick: instead of waiting for time, try waiting for events. Pausing for a pre-defined period should be the technique of last resort, applied only when there are no other ways of handling the synchronization.

Wait for events, not for time

David Wardlaw wrote a nice blog post with his thinking about the problem, documenting several ideas with increasing complexity. One of the very important things he noticed is that waiting for time is generally not a good idea since the background process can depend on many different factors. David wrote: “The timing can be affected by things like CPU usage and network traffic and you may get inconsistent test results”. This is especially true for test environments, which are usually underpowered, and sometimes shared across teams.

Stan Desyatnikov proposed a way to rephrase the waiting scenarios. “Don’t reflect delays (of the application under testing) in steps of scenarios… Wait for a particular condition to be met…”. Most of the responses to this challenge correctly identified some other condition for waiting, often related to the user interface. David Wardlaw suggested looking at the user experience, for example specifying the condition “When the account page loading progress bar is at 100%”.

Faith Peterson wrote a very detailed blog response with excellent ideas, proposing to “raise the level of abstraction and ignore the pause”. Faith explains it:

“This has worked for me when my primary interest is verifying the human-observable result of an integrated system, and I’m less interested in verifying internal operations or handoffs.”

I fully agree with Faith, but it’s important to note that this trick can work for a much wider set of contexts than just human-observable results. As Jonathan noted, people subconsciously take cues from a user interface, so it’s easy to think about human-observable results. But we can, and in most cases should, raise the level of abstraction even higher. Mathieu Roseboom looked in this problem from the perspective of relevance:

“The fact that a step needs to wait for a certain amount of time should not be part of the scenario in my opinion. It is mostly (there are always exceptions) not relevant for the business.”

A progress bar reaching a certain percentage or a page loading fully is definitely better than specifying a pause, but it’s still implementation detail, not a core business requirement. Combining Faith’s idea about raising abstractions and Mathieu’s idea of relevance into work, Dave Nicolette suggested the following improvement to the first scenario in last week’s challenge:

Given a user has $200 worth of goods in their cart 
When the user completes the purchase 
Then..

Dalibor Karlović suggested replacing the waiting statements in the second scenario with a meaningful business event, step such as:

And the account verification completes

Both these suggestions frame the waiting period in terms of an event from the business domain, not user interface.

So how do we decide if the condition for waiting should be described in terms of user interface elements, some more generic user observable behaviour or a business process? The answer is in one of the most useful techniques for clarifying scenarios: focus on what, not on how.

Focus on what, not how

There are usually two dimensions of relevance in Given-When-Then scenarios and related tests. The first is the purpose of a test, the second is the process of testing. For example, in a specification of the registration process, observing the progress bar is not relevant for the purpose of a test. It may be relevant for the process of testing. On the other hand, in a specification for user interface interactions, the progress bar activity is relevant for the purpose of the test as well.

When scenarios focus on how something should be tested rather than what a feature should do, then the purpose of a feature is obscured, and the scenario depends too much on a specific implementation or system constraints. When the implementation changes, or when the code executes on a different system, tests based on those specifications often start to fail although there are no bugs in the underlying code. As a general guideline, try to keep the scenario definitions focused on things relevant for the purpose of the test, not on the mechanics of test execution. Move the mechanics into step implementations, into the automation layer.

Sometimes, the distinction can be very subtle. For example, Stan Desyatnikov proposed restructuring the second scenario from the challenge in the following way:

When a user registers successfully
Then the account page displays "Account approved"

In cases such as this one, it’s interesting to consider whether the account page is relevant for the purpose of the test, or is it just there to explain the process of testing something. We could rephrase the post-condition as just one line:

Then the user account status is "approved"

By specifying the wait condition in the terminology of the business domain, instead of test execution, we can postpone the discussion on test automation. The scenarios will be clearer and easier to discuss with business representatives. We can potentially optimise the execution later so it can go below the user interface, or even avoid asynchronous issues altogether.

Bring the interactions into the model

Time-based pauses in scenarios are not necessarily symptoms of the test process leaking into specifications. They could also be caused by a wrong model, which in turn causes bad user experience and makes the system error prone. Any web site that warns against pressing the back button during a background process is a good example of this. The worst offenders are airline ticket sites with warnings that you mustn’t close the browser window during payment. Why not? As if the user watching some web progress bar is magically going to improve payment approval rates. Connection problems are a fact of life on the Internet, especially with consumer applications. Wi-Fi signals drop, phone batteries die, and people close windows by mistake.

Showing a warning against something users can’t control doesn’t make the problem disappear. When an asynchronous process is fundamental for the purpose of the feature, not just for the process of testing, then we don’t want to hide it into step definition code. This is not accidental technical complexity, it’s a fundamental property of the problem domain. We want to expose such domain properties, so we can openly discuss them and define what should happen when the situation develops in a predictable but unwanted way.

René Busch suggested rephrasing the registration scenario in terms of events, such as the one below:

Given the process of registration is running
when the registration process completes successfully
then the user gets a notification the registration is completed successfully
then the user can confirm that registration

Given the process of registration is running
when the registration process completes in error 
then the user gets a notification the registration is completed with failure
then the user can do/see ....

The original challenge had two example scenarios. The second was stuck in waiting purely because of badly described user interface constraints. But the first one, dealing with order approvals, is a lot more tricky. In the challenge post, I explained that “platform passes orders through a risk evaluation module before confirming”. This is a hint that there might be a fundamentally asynchronous problem in this domain. For example, although an automated risk evaluation module might take a few seconds most of the time, fraud prevention can also require escalating to a human investigator, which might take hours or days.

In such cases, the events and notifications are what we need to test, not how we’re testing something. They need to be explicitly defined and included in the scenarios. This allows us to specify examples when the risk evaluation is not yet complete, without worrying how long the process actually takes. Treating the process as fundamentally asynchronous allows us to design better user notifications, improve user experience, system operations and support.

If a risk review can take a while, there is a good chance that a user will want to view the status of an order during that process, or make new orders. Identifying such important domain events might lead to further refinement of the underlying software model, and asking more interesting questions. For example, is the risk assessment the only thing that happens before an order is confirmed? Perhaps there are other things that need to happen, such as checking the inventory. Alternatively, we can start asking questions around what happens when the order confirmation is pending, and when the order confirmation resulted in a negative response. This might lead us to discover some further events, such as the order being rejected. We might need some additional examples around that. Discovering domain events is key to modelling asynchronous systems correctly. For some further background on this, check out Alberto Brandolini’s work on Event Storming.

Similarly to how identifying user interface events (such as a button becoming visible) helps to automate tests better with UI drivers, identifying domain events is important because it allows us to introduce additional control points into the system. René’s example shows this nicely. Instead of actually running an external registration process, the test framework can just submit the appropriate results synchronously. Such tests will run faster and more reliably. We can also avoid the complexity of setting up data for an external system, and keeping it up to date as the external system changes. In the best case scenario, this can also turn a test case that was previously asynchronous into something that can run synchronously. Of course, it’s good to complement this with a proper integration test which shows that the system under test and the registration process communicate correctly, but this is a technical concern that should be handled outside Given-When-Then feature files.

Modelling time

Specifying what and automating how something is done is a great technique to decide if something should go into scenario definitions or step implementations. If what we’re specifying is user interface, then user interface interactions should stay in the scenario definition. If what we’re specifying is a business process, and the user interface is just how we’re testing it, then user interface interactions should go into the step implementations. This also applies to time.

Most responses to this challenge correctly suggested moving away from specific time periods, but there are cases where the passing of time is actually what we’re testing. David Wardlaw had one such example in his post, specifying that user registration must complete within three minutes, otherwise the process should fail and users need to be notified about an external system not being available. Performance requirements, service level agreements and operational constraints often involve specific time limits, and in such cases the period itself needs to be visible in the scenario. This will allow us to discuss other examples, probe for hidden assumptions and change rules in the future more easily.

However, even in such cases, it is usually wrong to implement the test mechanics by waiting for a period of time. With longer periods, such as checking that a regulatory report is generated at the end of every quarter or that transactions are reconciled at the end of the business day, people tend to avoid automating the tests at all because they don’t know how to handle the waits. With short periods, such as seconds or minutes, sometimes people are tempted to just block the test and wait it out. Please don’t. That makes testing unnecessarily slow.

If time is critical to an aspect of your system, model it in the domain and represent it as business time. Technically, this often involves creating a wrapper around the system clock, with methods to schedule and wait for events. Anything else that depends on time should not use the system clock, but connect to the business clock instead. That allows the test system to easily move forwards and backwards in time and prove complex business rules, without delaying the test execution. This is the right approach for dealing with time-based events, such as expiring unpaid accounts if more than two hours passed since registration, generating end of day or end of quarter reports, or generating monthly statements. We can easily move forward a whole month, or stop just one second short of a month start, to prove that the right things happened or did not happen.

When a business clock runs the show, there is only one time definition, so everything can be synchronous to that clock. This often means that test automation turns to synchronous execution, which is much more reliable and resilient than working with asynchronous events.

The next challenge

The challenge for this week deals with a common way of structuring feature file headers and scenario headers.

Check out the challenge

Stay up to date with all the tips and tricks and follow SpecFlow on Twitter or LinkedIn.

PS: … and don’t forget to share the solution with your friends and team members by clicking on one of the social icons below👇

The challenge for this week is dealing with wait statements and pauses in specifications.

My testers keep adding pauses into Given-When-Then. They claim it’s necessary because of how the tests work. I don’t like it because the numbers look random and do not reflect our business rules. Also, sometimes pauses aren’t long enough. The tests sometimes fail although there are no bugs in the system version we’re testing, because the pauses aren’t long enough. How do we avoid that?

This problem is symptomatic of working with an asynchronous process, often an external system or an executable outside of your immediate control. For example, think of an e-commerce platform that passes orders through a risk evaluation module before confirming. The risk check might take a few seconds, and the user may not be able to pay until the order is confirmed. In a visual user interface, this waiting period is often shown with a spinner or a progress bar. A bad, but common way of capturing this in Given-When-Then is to add a pause between actions:

Given a user checks out with a shopping cart worth $200
When the user submits an order 
And the system waits 3 seconds
And the user authorises the payment
Then ...

In some cases, it’s not the system under test requiring pauses, but the way a test is automated. A common example is executing tests through a web browser automation tool such as Selenium or Puppeteer. The framework can load a web page using a browser, but that page might need to fetch external javascript files or additional content to fully initialise the user interface. Triggering an action immediately after a web page loads might cause the test to fail, because the related elements are not yet available. Adding pauses to Given-When-Then scenarios is also a common work-around for these kinds of issues:

Given a user registers successfully
When the account page reloads
And the user waits 2 seconds
Then the account page displays "Account approved"

In both these situations the testing process requires a bit of time between an action and observing its result. But the actual period will vary from test to test, even between two executions of the same test. It will depend on network latency, CPU load and the amount of data required to process. That is why it’s not possible to correctly select the period duration upfront. Choosing a short pause causes tests to occasionally fail because the asynchronous process sometimes does not complete when the test framework moves on to the next step, so people have to waste time chasing ghost problems. Setting a pause that’s long delays feedback and slows down testing excessively.

How would you rephrase and restructure this instead? Post your suggestions using the link below. If you’d like to send a longer comment, perhaps write a blog post and then just send us the URL (you can do that using the same form, link below).

Read-on for a selection of good ideas we’ve received and our suggestions on how to handle scenarios such as this one.

Stay up to date with all the tips and tricks and follow SpecFlow on Twitter or LinkedIn.

PS: … and don’t forget to share the challenge with your friends and team members by clicking on one of the social icons below 👇

Last week, we cleaned up a vague scenario, transforming it into a relatively long list of concrete examples. The challenge for this week was to provide a good structure to capture all those examples in a feature file.

A large set of varied examples can be overwhelming. Examples that show important boundaries usually differ only in small details, so in a long list of examples many will contain similar information. Unfortunately, showing a lot of similar scenarios with only minor differences will make readers tune out and stop seeing the details, which is exactly the opposite of what we want to achieve with a good feature file.

Usually, a big part of the repetitive data comes from common preconditions. Developers often try to clean up such scenarios by applying programming design guidelines, such as DRY (“don’t repeat yourself”). They will remove all the duplication to achieve something akin to a normalised data set. This makes it easy to maintain and evolve the examples in the feature files, but it also makes it very difficult to understand individual scenarios. Readers have to remember too many things spread throughout a file to fully grasp the context of a single test case.

The key to capturing long lists of examples in a feature file is to find a good balance between clarity and completeness, between repeating contextual information and hiding it. Repeating the contextual information close to where it is used helps with understanding. On the other hand, repeating too many things too often makes it difficult to see the key differences (and makes it difficult to update the examples consistently). Here are four ways to restructure long lists of examples into something easy to understand and easy to maintain.

Identify groups of examples

For me, the first step when restructuring a long list of examples is usually to identify meaningful groups among them. Analyse common aspects to identify groups, then focus on clearly showing variation in each group.

Discovering meaningful groups of examples that have a lot of things in common allows us to just enough context to understand each group, allowing information to be duplicated across contexts, but focus examples in each group so they show differences between them clearly. There’s no hard rule how many examples should go into each group, but in my experience anywhere between three and 10 is fine. If a group contains more than ten examples, then we start having the initial problem again within that group, where readers start to tune out. Fewer than three examples usually gets me to ask if we covered enough boundaries to show a rule.

Each group should have a specific scope. On a very coarse level, groups can demonstrate a specific business rule. If there are too many examples to demonstrate a business rule fully, then structure groups around answering individual questions about that rule. If there are too many examples to fully answer a specific question, create groups around demonstrating a specific boundary or problem type.

Example mapping, a scoping technique for collaborative analysis promoted by Matt Wynne from the Cucumber team, turns this idea of groups of examples into a fully fledged facilitation technique. It starts by creating a breakdown of scope into topics, questions and groups of examples. If you start with example mapping, the map itself may provide good hints about grouping examples. You may still want to restructure individual groups if they end up too big.

Capture the variations

Within each group, try identifying the things that really change. Scenario outlines, mentioned in the previous post, are a nice way of dividing data from examples into two parts. One part will be common for the whole group, and you can specify it in the Given/When/Then section of the scenario outline. Another part of the data shows what’s really different between individual examples, and you can specify it within the Examples block, following the Given/When/Then.

As an illustration, when demonstrating rules around detecting duplicated emails during registration, we might want to show some more contextual information. To avoid potential bad assumptions about other types of errors (such as duplicated username), we might also want to show the proposed username and the name of the person registering. That information is not critical for each individual example, but it helps to understand the context. Instead of repeating it for each case, we can capture once for each group, in the When section.

Scenario Outline: duplicated emails should be prevented

   Email is case insensitive. GMail is a very popular system so 
   many users will register with gmail emails. Sometimes they use 
   gmail aliases or labels. 
   To prevent users mistakenly registering for multiple accounts
   the user repository should recognise common Gmail tricks.

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with <registration message> 

Examples:

| email               | registration message      | 
| mi.ke@gmail.com     | Email already registered  |
| mike+test@gmail.com | Email already registered  |
| Mike@gmail.com      | Email already registered  |
| mike@Gmail.com      | Email already registered  |
| mike@googlemail.com | Email already registered  |

Notice that in this case the message is always the same, so there isn’t much point in repeating it. A table column that always has the same value is a good hint that it can be removed. A table column that only has two values in a group of 10 examples perhaps suggests that the group should be split into two sets of five examples.

We can move the values that are always the same to the common part of the scenario outline:

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with "Email already registered"

Examples:

| email               | 
| mi.ke@gmail.com     | 
| mike+test@gmail.com |
| Mike@gmail.com      |
| mike@Gmail.com      |
| mike@googlemail.com |

This set of examples is concise, but perhaps it’s too brief. It’s not clear why these examples need to be in the spec. With the earlier set of examples, the registration message helped to explain what’s going on. But the messages were the same for all these examples, so they were not pointing at differences between examples. We had to explain that in the scenario context, which is good, but can do even better. When there’s nothing obvious in the domain to serve that purpose, give each example a meaningful name. You can, for example, introduce a comment column that will be ignored by test automation:

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with "Email already registered"

Examples:

| comment                                                    | email               |
| google ignores dots in an email, so mi.ke is equal to mike | mi.ke@gmail.com     |
| google allows setting labels by adding +label to an email  | mike+test@gmail.com |
| emails should be case insensitive                          | Mike@gmail.com      |                           
| domains should be case insensitive                         | mike@Gmail.com      | 
| googlemail is a equivalent alias for gmail                 | mike@googlemail.com | 

Identifying meaningful groups, and then structuring the examples into scenario outlines based on those groups, allows us to provide just enough context for understanding each group. It also allows us to use different structure for each set of examples. The scenario outlines around duplicated usernames will have different When and Then clauses (perhaps using a common email to show context). The examples themselves in that group would show variations in usernames. The When and Then steps will look similar to the ones we used previously, but likely use different placeholders from the examples that demonstrate email rules. The scenario outlines for password validity checks will have a totally different structure – perhaps not even showing the personal name and email.

Another useful trick to keep in mind with scenario outlines is that a single outline can have many example groups. If the list of examples around duplicated emails becomes too big, you can just split it into several groups. When doing this, I like to add a title to each group of examples, to show its scope. This allows us to ask further questions and identify further examples. For example, we only have two examples around case sensitivity. Thinking a bit harder about additional edge cases for that rule, we can start considering unicode problems. Unicode normalisation tricks can allow people to spoof data easily, abusing lowercase transformations in back-end components, and we might want to add a few examples to ensure this is taken care of.

Here is what an evolved scenario outline could look like:

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with "Email already registered"

Examples: uppercase/lowercase aliases should be detected

| comment                                                    | email               |
| emails should be case insensitive                          | Mike@yahoo.com      |
| domains should be case insensitive                         | mike@Yahoo.com      |
| unicode normalisation tricks should be detected            | mᴵke@yahoo.com      |

Examples: gmail aliases should be detected

| comment                                                    | email               |
| google ignores dots in an email, so mi.ke is equal to mike | mi.ke@gmail.com     |
| google allows setting labels by adding +label to an email  | mike+test@gmail.com |
| googlemail is a equivalent alias for gmail                 | mike@googlemail.com |

Use a framing example

As we start discussing more specific rules around preventing duplication, the structure of examples will change to reflect that. Each set of examples may have a different structure, focused on the key aspects that it is trying to show. That’s how you can avoid overly complex examples and tables with too much data. However, explaining a single feature through lots of different scenario outlines with varying structures might make things difficult to grasp at first. To make a feature file easy to understand, I like to use a framing scenario first. That scenario should be simple in terms of domain rules, and it should not try to explain difficult boundaries. It’s there to help readers understand the structure, not to prevent problems. A good choice is usually a “happy day” case that demonstrates the common flow by showing the full structure of data. For example, this could be the successful registration case. The framing scenario can then be followed by increasingly complex scenarios or scenario outlines. For example, I would first list the generic email rules, then follow that with system-specific rules such as the ones for GMail.

Feature: Preventing duplicated registrations

Scenario: Successful registration
...
Scenario: allowing duplicated personal names
...
Scenario outline: preventing duplicated emails
...
Scenario outline: preventing GMail aliases
...

(Maybe) extract common preconditions into a background

With the framing scenario structure, you will sometimes find preconditions shared among all scenarios. In this case, the initial users in the repository might be the same for all examples. In cases such as that, you have an option to avoid duplication and move common preconditions from individual scenarios to a Background section. Automation tools will copy the steps from the background section before each individual scenario when they execute them.

When doing this, beware of hiding too much. The background section is useful only when the actual data is relatively simple so people can remember it. A common pitfall with using feature file backgrounds is that it becomes quite complex and readers lose the understanding of the feature before they even get to the interesting part. If you can’t keep the common background vert simple, it’s perhaps worth introducing just the minimum required inputs within each scenario outline itself.

Here is how a fully structured feature file, with a background section and a framing scenario, would look like:

Feature: Preventing duplicated registrations

   To prevent users mistakenly registering for multiple accounts
   the user repository should reject registrations matching
   existing users

Background:

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |

Scenario: Users with unique data should be able to register

When John Michaels attempts to register with the username "john" and email "johnm@gmail.com"
Then the user repository will contain the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |
| john     | John Michaels | johnm@gmail.com  |

Scenario: Personal names do not have to be unique

When Mike Smith attempts to register with the username "john" and email "johnm@gmail.com"
Then the user repository will contain the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |
| john     | Mike Smith    | johnm@gmail.com  |

Scenario Outline: Usernames should be unique

  Detecting simple duplication is not enough, since usernames that are visually
  similar may lead to support problems and security issues. See 
  https://engineering.atspotify.com/2013/06/18/creative-usernames/ for more information.

When the Steve James tries to register with <requested username> and "steve5@gmail.com"
Then the registration should fail with "Username taken"
And user repository should contain the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |

Examples:

| comment                     | requested username | 
| identical username          | steve123           |
| minor spelling difference   | Steve123           |
| unicode normalisation       | sᴛᴇᴠᴇ123           |
| interpunction difference    | steve123.          |

Scenario Outline: Duplicated emails should be disallowed

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with "Email already registered"

Examples: uppercase/lowercase aliases should be detected

| comment                                                    | email               |
| emails should be case insensitive                          | Mike@yahoo.com      |
| domains should be case insensitive                         | mike@Yahoo.com      |
| unicode normalisation tricks should be detected            | mᴵke@yahoo.com      |

Examples: Gmail aliases should be detected

   GMail is a very popular system so many users will register
   with gmail emails. Sometimes they use gmail aliases or labels,
   to prevent users mistakenly registering for multiple accounts
   the user repository should recognise common Gmail tricks.

| comment                                                    | email               |
| google ignores dots in an email, so mi.ke is equal to mike | mi.ke@gmail.com     |
| google allows setting labels by adding +label to an email  | mike+test@gmail.com |
| googlemail is a equivalent alias for gmail                 | mike@googlemail.com |

The next challenge

The challenge for this week is a bit more technical – dealing with situations that require waiting.

Check out the challenge

Stay up to date with all the tips and tricks and follow SpecFlow on Twitter or LinkedIn.

PS: … and don’t forget to share the challenge with your friends and team members by clicking on one of the social icons below 👇

Last week, we started a community challenge to suggest improvements to a badly worded Given-When-Then feature file, specifically addressing the problem of setting up something that’s not supposed to exist. This article contains an analysis of the responses, some tips on a more general approach to solving similar problems, and a new challenge for this week.

I’ve outlined some of the most common problems with describing something that doesn’t exist in the original challenge, so I will not repeat them here – check the linked post for a refresher.

We received lots of excellent responses to the first challenge, from people in sixteen different countries. Alister Scott even wrote a wonderful blog post about his proposal.

Describe what exists instead

Although the problems with describing something that doesn’t exist sound like technical challenges that are purely related to testing, they are often just symptoms of an unexplored model. It’s possible to solve or avoid most of them by exploring the model differently, and ensuring shared understanding between team members and stakeholders. Concrete examples are a great way to do that, and many solutions tried to make things more concrete by introducing different properties of users, or talking about users before they register in a different way.

For example, several suggestions involved introducing a specific qualifier, such as an “unregistered user”. This helps to differentiate between two different types of entities: one that exists before registration, and another that exists after. Similarly, some solutions tried to move away from an ambiguous user identifiers, such as “john”, to more concrete data available even users aren’t registered. For example, it’s possible to consider user emails even before they sign up to our system. The following suggestion, which also came in anonymously, illustrates this nicely:

Given user john@doe.com tries to create an account
When there is already an account for john@doe.com 
Then registration is not possible

All these attempts are going in the right direction, but they are still just workarounds for a problem and they aren’t solving it fully. On a conceptual level, a great way to approach similar problems is another anonymous suggestion, “Describe the state of the system without the value”. Instead of trying to describe something that is not supposed to exist, we can describe what else exists instead.

There are two ways of approaching this suggestion practically. The first, suggested by Mathieu Roseboom, is to look outside the system. Mathieu wrote “I’d emphasize that John is a person, and not a user.” Rather than talking about unregistered users that do not exist, let’s talk about people that do exist. The suggestions that tried to use emails to describe unregistered users lean in that direction, without fully benefiting from it. Thinking about person that exists outside of our system might leads us to discover about some other attributes, which could be important for the current feature.

The second way to specify something that exists instead of talking about vague non-existing entities is to describe all the other registered users. For example, one response suggested splitting the problem into two cases:

Given list of users is empty
...
Given a "normal" database

Another reader suggested setting up a separate feature file, that just creates the relevant users, and then technically ensuring that all other feature file tests run after it.

Feature: Setup users

Scenario: Register users
The following users are registered:
| name | email |
| Rick | rick@mail.com |

David Wardlaw suggested starting the scenario with:

Given the following user is already registered
| First Name | Last Name | Email Address | 
| John       | Smith     | js@bob.com    |
And a new user wants to register with a first name of <First Name>

These three ideas illustrate different levels of visibility that feature files could offer. In the first option, readers will need to know what the “normal” database contains, but the solution is quite easy to reuse across different scenarios. On the other hand, if someone modifies the “normal” database for some unrelated reason, tests might start to break unexpectedly.

The second solution optimises performance because it inserts records only once, and provides better visibility to readers about the assumed state. The downside of this idea is that it imposes a very specific order for executing tests. This feature file must run before everything else, otherwise things start weirdly misbehaving. I usually try to avoid imposing a specific order of test executions, since then we can’t just run a single test when needed. This approach also suffers from implied shared state. If someone updates that central setup file for some unrelated test case, our tests may magically start to fail.

The third suggestion provides full visibility to a reader, and it does not impose any shared state that could cause problems later when things change. This approach might get a bit wordy if we need to set up a lot of users, but there are ways around that as well.

The key problem the first and second solution are trying to solve is the complexity and performance of working with a real database. Databases are slow and difficult to control compared to simple in-memory objects, so tests involving a real database often have to compensate for those downsides somehow. Between the three ideas, I would go with the third one unless there is some very specific performance constraint we want to solve. I promise to come back to this next week, but since we’re not solving database performance in this challenge, let’s just postpone that discussion.

So which approach should we choose? For that, I’d like to explore this problem on a more conceptual level, and explain several techniques which you can use with other similar situations.

Ask an extreme question

In order to reason about the existence of an item (such as a user or a product), we first need to answer a key probing question: where? Where should that item exist or not exist? Defining existence is, at core, reasoning about the state of system data. Figuring where the data resides is critical for three reasons:

  1. To resolve the chicken-and-egg issue of describing something before it exists, since we can talk about an item that simultaneously exists in one context, but not in another.
  2. To clarify the underlying business model, exposing previously hidden domain concepts. This will help us get a better shared understanding about their constraints.
  3. To create fast, reliable and resilient test automation, since identifying new domain concepts makes it possible to introduce additional control points. That’s how we can avoid premature performance optimisation, and having to deal with a real database.

Team members closer to implementation work, such as developers or testers, often think about entities as system data. They will intuitively understand that an entity always exists in a specific location, but they might not be able to share their assumptions about this easily with other team members. People not used to working with system state, such as business representatives, might struggle with that idea. They tend to think about existence in a more absolute way. A helpful probing technique to start the conversation about this concept is to ask an extreme question.

  • What does it mean for a user not to exist? Were they not born yet?
  • What does it mean for a product not to exist? Was it not manufactured? Was it not even designed yet?

These questions might sounds silly, but try them out and you’ll quickly see their value as conversation starters. Extreme questions often result with a hard “No”, and get people to start thinking about multiple contexts, a timeline or a scale. You will start identifying where an entity exists directly before it appears in your context, and what related information you can actually reason about at that point. Mathieu Roseboom’s suggestion to talk about a “person” is one potential outcome of such discussions.

The first set of answers to an extreme question is usually an overly generic statement. For example, the user might exist as a living person, but they don’t yet exist in “the system” or in “the database”. That’s a good starting point for the discussion, since we can now consider what’s known about an entity in different contexts. We can explore the domain much more easily.

Provide a meaningful domain name

A non-registered user likely exists as a living person, so we can talk about their personal name. They do not yet have a username in our system, so we can’t talk about it yet. They likely have an email, which is why so many responses to the challenge focused on that attribute. They might also have a preferred username in mind, which has nothing to do with them existing or not existing in our system. The When part can then become clearer:

When John Michaels attempts to register with the preferred username "john"

A good technique to continue the discussion about something generic such as “system” or “database” is to provide a meaningful domain name. In the Domain-Driven-Design terminology, domain representations of entity stores are called Repositories. When developers hear the word “repository”, they often think about a common technical pattern that involves specific technical operations. Ignore that for the moment – that’s implementation detail. A repository is a meaningful first-order domain concept that encapsulates storing and retrieving data. Tell the rest of the team to imagine it as a box where you keep the users. If they can’t get over that, then think of a different name. I’ll keep using “repository” in this post.

Instead of a user existing or not existing in an absolute sense, we can potentially talk about the repository:

Given John does not exist in the user repository

Even better, let’s make the user repository the subject of the sentence. The repository exists regardless of the users:

Given a user repository that does not include a user called John

The real precondition here is that a user repository exists, with some specific constraints about its contents. There’s no more chicken-and-egg issue. We identified an important first-order domain concept that we can reason about, regardless of a specific user.

Instead of just saying “does not include a user”, which is still a bit vague, we can now start capturing the constraints of the user repository in a more specific way, using an approach very similar to what David Wardlaw suggested. Here’s how I’d start writing it:

Given a user repository with the following users
| username | personal name |
| mike     | Mike Smith    |
| steve123 | Steve James   |
When John Michaels attempts to register with the username "john"
Then the user repository should contain the following users:
| username | personal name |
| mike     | Mike Smith    |
| steve123 | Steve James   |
| john     | John Michaels |

Add counter-examples

We now have a simple but concrete example. We’re not done yet, this is still just a good conversation starter. A good technique to continue the discussion is Simple-Counter-Key:

  1. Add a few more simple examples to show a range of potential outcomes, and discuss them.
  2. Try to provide counterexamples that disprove some of the proposed rules, and could lead to a different outcomes. This often helps to identify additional attributes, and a different structure for the Given section of the scenario.
  3. After you have a good structure, start listing important boundaries that illustrate the key examples.

The first step is to add at least one simple example, which could lead to a different outcome:

When Steve James attempts to register with the username "steve123"

The outcome in this case might be obvious to everyone, but vary the data a bit. What happens if another “Steve James” attempts to register with the username “steveo”?

This can lead to an interesting conversation around the meaning of “uniqueness” and “existence”. Is the purpose of this feature to prevent the same person from creating multiple accounts in the system? If so, we should probably stop Steve from registering again, even with a different username. But if two different people called Steve James try to register, we should not prevent them. We need some other way of determining uniqueness, and personal names are obviously not good enough. Still, we might care about personal names in this scenario to ensure that we’re capturing them correctly in the repository. Are there any other attributes of a person that we should care about? This brings us back to the emails, suggested by several readers responding to the challenge. But how do we know that emails are the right thing to capture?

We can now start discussing the meaning of “unique” as relating to the person, and if emails are the right way to approach it. Some systems need to be lax about this, so they might let people register even with the same email. For example, a multi-tenant cloud app might want to allow opening sub-accounts with the same admin email to provide centralised billing. Those users will genuinely be different people, sharing a common email inbox. Many online systems today want to prevent abuse by validating emails, so they insist that every user has a unique email. Some systems need to be more strict, and they might enforce a unique mobile phone number. For government or banking purposes, even that’s not enough, and we might want to enforce unique date of birth, social security number, passport number or some other government-issued identifier. Different contexts will have different rules around this. Exposing those rules often leads to much better shared understanding, and also points to the attributes which should be captured for the key examples.

Discussing simple examples and counterexamples allows us to probe further into the model. What does it actually mean for a user to exist in our context? Does that mean the user repository has a record matching the specified email, or proposed username, or something else? How do we know that two users are not the same person? To keep things simple, let’s limit our case to just checking that the emails are different. We’ll say that a user exists in the system even if the proposed username is not taken, but we have a different username matching the same email. The simple scenario now evolves to capture the email as well:

Scenario: Users with a unique email should register successfully 

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |
When John Michaels attempts to register with the username "john" and email "johnm@gmail.com"
Then the user repository should contain the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |
| john     | John Michaels | johnm@gmail.com  |

We could add three or four more examples checking for duplicated emails and usernames in the same way, but this will very quickly become difficult to read.

If you’ve not yet read Alister Scott’s blog post about this challenge, now would be a good time to do it. He argues against misusing Given-When-Then to capture similar examples with small variances over and over again, suggesting that tables would be a better match for this case.

Tables are indeed a much better way of capturing related examples, but they can sometimes make it difficult to understand the purpose of a specific example from the group. Alister solves this nicely by pointing out differences between examples in an additional field, called Result. This field contains the explanation why a case failed, and what should have happened instead. For example, one of the results in his blog post is “Registration Unsuccessful – Reset password page displayed with email prefilled”. This points to another interesting thing we might want to check. Knowing that registration failed may not be enough – we need to know that it failed for the right reasons.

Extract scenario outlines

I agree with Alister that misusing Given-When-Then to copy and paste examples is a waste of time (and screen space). Tables are a good solution, and most Given-When-Then tools support listing tables of examples with scenario outlines. Scenario outlines use placeholders marked with <> inside a Given-When-Then scenario, followed by a table of examples with placeholder values. Here is how we could show two failure examples in the same structure:

Scenario Outline: Users with existing emails or usernames should be rejected

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |
When the Tony James tries to register with <requested username> and <email>
Then the registration should fail with <registration message> 
and user repository should contain only the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |

Examples:

| requested username | email             | registration message      | 
| steve123           | steve2@yahoo.com  | Username taken            |
| steve456           | steve@yahoo.com   | Email already registered  |

Note that the When and Then statements contain placeholders relating to the table of examples, which is shown at the bottom.

Evolving Alister’s “result” idea, the registration message helps to explain what’s going on, but it’s also an important domain concept. Now that we identified it, we can talk about what that should contain. For a closely controlled system, we might want to immediately remind an existing user about his username, so he can sign in easily instead of registering or calling support staff. For an open public system, we might want to go in the other direction and mask the fact that an email is registered, and prevent potential malicious actors from inspecting accounts by trying to sign up.

Showing a comment in the table of examples is quite a common trick to explain the purpose of individual examples. Some people prefer to use the first column for that. I tend to use the first column if it’s a generic comment or name of the test, and does not play any role in the testable outputs. If we actually want to check it against some system output (in this case the registration message), I prefer to list inputs on the left and outputs on the right side of the table.

Probe for boundaries

The technique I mentioned earlier is Simple-Counter-Boundary. We still need to do the third step. So far we identified a few simple examples and some counter-examples. That’s a great start because it gives us the structure for discovering boundaries. Inspect each individual property in the Given section and probe with further examples.

What if someone tries to register with the proposed username “Steve123”? How about the email “Steve@Yahoo.com”? An email written in a different case is still the same, so the registration should probably fail. Some systems (notably Amazon AWS Cognito) by default do not enforce case-sensitive username checks, so two people could register with just a small difference in spelling. This is a support and maintenance nightmare, so let’s make sure we prevent it.

The nice thing about a scenario outline is that it’s very easy to add more examples. We can just append the following two cases to the table:

| Steve123           | steve2@yahoo.com  | Username taken            |
| steve456           | Steve@Yahoo.com   | Email already registered  |

If we really want to avoid support problems with accounts that are too similar, perhaps we should also prevent people from registering usernames that only differ from existing accounts in interpunction symbols.

| steve.456          | Steve3@Yahoo.com   | Username taken  |
| steve_456          | Steve3@Yahoo.com   | Username taken  |

Discussing boundaries often leads to more examples, and identifying hidden domain rules. And again, some systems will have different rules. Expose an example, discuss it, then decide if you want to keep it or not. Perhaps the interpunction checking in usernames is too much or too complicated, so we can skip that for now. We could monitor if it becomes a problem, and add it to a later iteration if necessary. In most teams, it’s up to developers and testers to offer this for discussion, and for business representatives to decide on priority and scoping.

Even if the username interpunction examples for usernames end up out of scope, thinking about this them can lead you to consider similar boundaries for other input fields. Should we care about interpunction symbols in emails?

For example, Gmail allows users to put a dot anywhere in the email, or follow the email with a plus sign and a label, so “steve.o@gmail.com” and “steveo@gmail.com” are actually the same physical account, as well as “steveo+anything@gmail.com”. Popular email systems often have aliases, so “steve@gmail.com” is the same as “steve@googlemail.com”. How much do we care about preventing duplicated emails? Should we try to fight against commonly known cases such as that, or just ignore them? These are very specific domain questions, and the answers will depend on the risks you are trying to control. For a complex set of rules, there may be many more examples, and this table might become too big to read.

The next challenge

This brings us to the challenge for the next week. Let’s say that we actually do want to be a bit more strict about preventing the same person from registering twice, so all those Gmail hacks need to be handled. Plus the support managers came back and said that they liked the idea of avoiding problems with usernames that are too similar. This will lead to dozens, if not hundreds of examples. Alister also looks at examples around password complexity, which is another common aspect of registration that we might want to add to the registration feature. Putting it all into a single scenario outline is definitely not a good idea. How would you better structure such a big list of examples, so it’s easy to understand and maintain?

Next week, we’ll publish an analysis of the responses and a proposed solution, along with some ideas on handling automation and avoiding database performance issues.

Stay up to date with all the tips and tricks and follow SpecFlow on Twitter or LinkedIn.

PS: … and don’t forget to share the challenge with your friends and team members by clicking on one of the social icons below 👇

This week’s challenge is the tricky problem of explaining a missing value.

What’s the best way to describe a Given for a value that’s not supposed to be there? For example, starting the specification from a point of a user who does not yet exist in a database, or a product that’s not been set up. “Given a user that doesn’t exist” sounds vague, but “Given user John that does not exist” sounds silly – how can we talk about John if he doesn’t exist yet?

Key examples should ideally be specific enough to avoid ambiguity, but it’s difficult to be specific about something that does not exist. Such problematic Given-When-Then scenarios are usually overly generic, and do not really help with shared understanding or testing. Here’s a typical trivial example that should ensure unique accounts:

Given user John does not exist
When John tries to register
Then the registration is successful

Given user John exists
When John tries to register
Then the registration fails

This set of scenarios seems simple but it can hide many important constraints, and provide a false sense of shared understanding. A team might think they nailed down a feature, but develop something full of bugs. Scenarios such as these leave too much for later exploratory testing, overloading the team to discover seemingly unexpected problems that could have easily been predicted and documented with better examples.

As a trivial counter-example, consider what should happen if John tries to register with an email already assigned to a different user. For a more complex edge case, consider what should happen if two Johns tried to register at the same time. And what is “John” anyway in this case? Is it a personal name or a username? If it’s a personal name, should we really prevent two people with the same name from opening two different accounts?

Oversimplified examples often lead to overcomplicated test automation. Proving that a user successfully registered usually requires accessing a database, which means that the related tests will be slow and brittle. Accessing external systems is a huge performance penalty compared to in-process tests. Asynchronous networking introduces a whole set of technical edge cases that have nothing to do with user registration business rules.

Data persistence leads to issues around reproducibility. John might not exist in the database the first time you run a test, but it surely will be there after that test completes. To make such a test repeatable, you’ll either need to introduce complex test set-up or clean-up procedures or make the examples even more generic by introducing placeholders for random or unique values.

Database resources are usually difficult to set up from scratch, so team members and test automation systems sometimes share testing databases, which can lead to issues around reproducibility and isolation. If two people execute tests against the same database at the same time, one might add data that the other does not expect, causing tests to fail intermittently without a good explanation.

Read-on for a selection of good ideas we’ve received and our suggestions on how to handle scenarios such as this one.

Stay up to date with all the tips and tricks and follow SpecFlow on Twitter or LinkedIn.

PS: … and don’t forget to share the challenge with your friends and team members by clicking on one of the social icons below 👇

I’m very excited to partner with Specflow on a new series of weekly blog articles, aimed at helping you get the most out of Given-When-Then feature files.

Tips and tricks for better feature specifications

Earlier this year, before this damned virus situation, I published the results of a research aimed to understand what’s changed in the Specification by Example space in the ten years since the book came out. One of the most surprising discoveries was how popular the Given-When-Then format became. When I wrote the book, it was used by a tiny minority.

Now, more than 70% of the teams that drive specifications and tests with examples do so using Given-When-Then steps. No doubt, one of the key reasons is that the format is very easy to get started with, but unfortunately, lots of people never move beyond the basics. That’s what we’re trying to fix with “Given-When-Then With Style” challenge.

The goal of this series of articles is to help you build more successful software through better-shared understanding, powered by examples.

  • For those new to specification by example, we’ll explore how to get started easily and how to avoid the most common problems.
  • For readers that are more experienced, we’ll cover how to capture examples through collaborative analysis, how to structure and organize feature files when dealing with complex domains, and how to ensure that documents stay easy to understand and easy to maintain over a long period.
  • For the expert readers, we’ll cover tips and tricks on collaborative modeling, and how specification by example relates to other techniques.

Although the articles will be published on the SpecFlow website, they are tool-agnostic. You’ll be able to use most of the ideas with alternative tools, such as Cucumber, but also as more general techniques when doing collaborative analysis, even if you never automate any tests.

Each week, we’ll post a challenge on this web site, explaining a common problem people face when trying to capture examples with Given-When-Then steps. We invite the community to participate. Send your ideas and solutions, and the following week we’ll publish an analysis with answers.

List of published challenges

In case you have missed a challenge, here is an overview of our past challenges:

Do you have any challenges you’d like us to explore in one of our future posts?

Stay up to date with all the tips and tricks and follow SpecFlow on Twitter or LinkedIn.

PS: … and don’t forget to share the challenge with your friends and team members by clicking on one of the social icons below 👇