Solving: How to structure a long list of examples? #GivenWhenThenWithStyle

Last week, we cleaned up a vague scenario, transforming it into a relatively long list of concrete examples. The challenge for this week was to provide a good structure to capture all those examples in a feature file.

A large set of varied examples can be overwhelming. Examples that show important boundaries usually differ only in small details, so in a long list of examples many will contain similar information. Unfortunately, showing a lot of similar scenarios with only minor differences will make readers tune out and stop seeing the details, which is exactly the opposite of what we want to achieve with a good feature file.

Usually, a big part of the repetitive data comes from common preconditions. Developers often try to clean up such scenarios by applying programming design guidelines, such as DRY (“don’t repeat yourself”). They will remove all the duplication to achieve something akin to a normalised data set. This makes it easy to maintain and evolve the examples in the feature files, but it also makes it very difficult to understand individual scenarios. Readers have to remember too many things spread throughout a file to fully grasp the context of a single test case.

The key to capturing long lists of examples in a feature file is to find a good balance between clarity and completeness, between repeating contextual information and hiding it. Repeating the contextual information close to where it is used helps with understanding. On the other hand, repeating too many things too often makes it difficult to see the key differences (and makes it difficult to update the examples consistently). Here are four ways to restructure long lists of examples into something easy to understand and easy to maintain.

Identify groups of examples

For me, the first step when restructuring a long list of examples is usually to identify meaningful groups among them. Analyse common aspects to identify groups, then focus on clearly showing variation in each group.

Discovering meaningful groups of examples that have a lot of things in common allows us to just enough context to understand each group, allowing information to be duplicated across contexts, but focus examples in each group so they show differences between them clearly. There’s no hard rule how many examples should go into each group, but in my experience anywhere between three and 10 is fine. If a group contains more than ten examples, then we start having the initial problem again within that group, where readers start to tune out. Fewer than three examples usually gets me to ask if we covered enough boundaries to show a rule.

Each group should have a specific scope. On a very coarse level, groups can demonstrate a specific business rule. If there are too many examples to demonstrate a business rule fully, then structure groups around answering individual questions about that rule. If there are too many examples to fully answer a specific question, create groups around demonstrating a specific boundary or problem type.

Example mapping, a scoping technique for collaborative analysis promoted by Matt Wynne from the Cucumber team, turns this idea of groups of examples into a fully fledged facilitation technique. It starts by creating a breakdown of scope into topics, questions and groups of examples. If you start with example mapping, the map itself may provide good hints about grouping examples. You may still want to restructure individual groups if they end up too big.

Capture the variations

Within each group, try identifying the things that really change. Scenario outlines, mentioned in the previous post, are a nice way of dividing data from examples into two parts. One part will be common for the whole group, and you can specify it in the Given/When/Then section of the scenario outline. Another part of the data shows what’s really different between individual examples, and you can specify it within the Examples block, following the Given/When/Then.

As an illustration, when demonstrating rules around detecting duplicated emails during registration, we might want to show some more contextual information. To avoid potential bad assumptions about other types of errors (such as duplicated username), we might also want to show the proposed username and the name of the person registering. That information is not critical for each individual example, but it helps to understand the context. Instead of repeating it for each case, we can capture once for each group, in the When section.

Scenario Outline: duplicated emails should be prevented

   Email is case insensitive. GMail is a very popular system so 
   many users will register with gmail emails. Sometimes they use 
   gmail aliases or labels. 
   To prevent users mistakenly registering for multiple accounts
   the user repository should recognise common Gmail tricks.

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with <registration message> 

Examples:

| email               | registration message      | 
| mi.ke@gmail.com     | Email already registered  |
| mike+test@gmail.com | Email already registered  |
| Mike@gmail.com      | Email already registered  |
| mike@Gmail.com      | Email already registered  |
| mike@googlemail.com | Email already registered  |

Notice that in this case the message is always the same, so there isn’t much point in repeating it. A table column that always has the same value is a good hint that it can be removed. A table column that only has two values in a group of 10 examples perhaps suggests that the group should be split into two sets of five examples.

We can move the values that are always the same to the common part of the scenario outline:

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with "Email already registered"

Examples:

| email               | 
| mi.ke@gmail.com     | 
| mike+test@gmail.com |
| Mike@gmail.com      |
| mike@Gmail.com      |
| mike@googlemail.com |

This set of examples is concise, but perhaps it’s too brief. It’s not clear why these examples need to be in the spec. With the earlier set of examples, the registration message helped to explain what’s going on. But the messages were the same for all these examples, so they were not pointing at differences between examples. We had to explain that in the scenario context, which is good, but can do even better. When there’s nothing obvious in the domain to serve that purpose, give each example a meaningful name. You can, for example, introduce a comment column that will be ignored by test automation:

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with "Email already registered"

Examples:

| comment                                                    | email               |
| google ignores dots in an email, so mi.ke is equal to mike | mi.ke@gmail.com     |
| google allows setting labels by adding +label to an email  | mike+test@gmail.com |
| emails should be case insensitive                          | Mike@gmail.com      |                           
| domains should be case insensitive                         | mike@Gmail.com      | 
| googlemail is a equivalent alias for gmail                 | mike@googlemail.com | 

Identifying meaningful groups, and then structuring the examples into scenario outlines based on those groups, allows us to provide just enough context for understanding each group. It also allows us to use different structure for each set of examples. The scenario outlines around duplicated usernames will have different When and Then clauses (perhaps using a common email to show context). The examples themselves in that group would show variations in usernames. The When and Then steps will look similar to the ones we used previously, but likely use different placeholders from the examples that demonstrate email rules. The scenario outlines for password validity checks will have a totally different structure – perhaps not even showing the personal name and email.

Another useful trick to keep in mind with scenario outlines is that a single outline can have many example groups. If the list of examples around duplicated emails becomes too big, you can just split it into several groups. When doing this, I like to add a title to each group of examples, to show its scope. This allows us to ask further questions and identify further examples. For example, we only have two examples around case sensitivity. Thinking a bit harder about additional edge cases for that rule, we can start considering unicode problems. Unicode normalisation tricks can allow people to spoof data easily, abusing lowercase transformations in back-end components, and we might want to add a few examples to ensure this is taken care of.

Here is what an evolved scenario outline could look like:

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with "Email already registered"

Examples: uppercase/lowercase aliases should be detected

| comment                                                    | email               |
| emails should be case insensitive                          | Mike@yahoo.com      |
| domains should be case insensitive                         | mike@Yahoo.com      |
| unicode normalisation tricks should be detected            | mᴵke@yahoo.com      |

Examples: gmail aliases should be detected

| comment                                                    | email               |
| google ignores dots in an email, so mi.ke is equal to mike | mi.ke@gmail.com     |
| google allows setting labels by adding +label to an email  | mike+test@gmail.com |
| googlemail is a equivalent alias for gmail                 | mike@googlemail.com |

Use a framing example

As we start discussing more specific rules around preventing duplication, the structure of examples will change to reflect that. Each set of examples may have a different structure, focused on the key aspects that it is trying to show. That’s how you can avoid overly complex examples and tables with too much data. However, explaining a single feature through lots of different scenario outlines with varying structures might make things difficult to grasp at first. To make a feature file easy to understand, I like to use a framing scenario first. That scenario should be simple in terms of domain rules, and it should not try to explain difficult boundaries. It’s there to help readers understand the structure, not to prevent problems. A good choice is usually a “happy day” case that demonstrates the common flow by showing the full structure of data. For example, this could be the successful registration case. The framing scenario can then be followed by increasingly complex scenarios or scenario outlines. For example, I would first list the generic email rules, then follow that with system-specific rules such as the ones for GMail.

Feature: Preventing duplicated registrations

Scenario: Successful registration
...
Scenario: allowing duplicated personal names
...
Scenario outline: preventing duplicated emails
...
Scenario outline: preventing GMail aliases
...

(Maybe) extract common preconditions into a background

With the framing scenario structure, you will sometimes find preconditions shared among all scenarios. In this case, the initial users in the repository might be the same for all examples. In cases such as that, you have an option to avoid duplication and move common preconditions from individual scenarios to a Background section. Automation tools will copy the steps from the background section before each individual scenario when they execute them.

When doing this, beware of hiding too much. The background section is useful only when the actual data is relatively simple so people can remember it. A common pitfall with using feature file backgrounds is that it becomes quite complex and readers lose the understanding of the feature before they even get to the interesting part. If you can’t keep the common background vert simple, it’s perhaps worth introducing just the minimum required inputs within each scenario outline itself.

Here is how a fully structured feature file, with a background section and a framing scenario, would look like:

Feature: Preventing duplicated registrations

   To prevent users mistakenly registering for multiple accounts
   the user repository should reject registrations matching
   existing users

Background:

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |

Scenario: Users with unique data should be able to register

When John Michaels attempts to register with the username "john" and email "johnm@gmail.com"
Then the user repository will contain the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |
| john     | John Michaels | johnm@gmail.com  |

Scenario: Personal names do not have to be unique

When Mike Smith attempts to register with the username "john" and email "johnm@gmail.com"
Then the user repository will contain the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |
| john     | Mike Smith    | johnm@gmail.com  |

Scenario Outline: Usernames should be unique

  Detecting simple duplication is not enough, since usernames that are visually
  similar may lead to support problems and security issues. See 
  https://engineering.atspotify.com/2013/06/18/creative-usernames/ for more information.

When the Steve James tries to register with <requested username> and "steve5@gmail.com"
Then the registration should fail with "Username taken"
And user repository should contain the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
| steve123 | Steve James   | steve@yahoo.com  |

Examples:

| comment                     | requested username | 
| identical username          | steve123           |
| minor spelling difference   | Steve123           |
| unicode normalisation       | sᴛᴇᴠᴇ123           |
| interpunction difference    | steve123.          |

Scenario Outline: Duplicated emails should be disallowed

Given a user repository with the following users:
| username | personal name | email            |
| mike     | Mike Smith    | mike@gmail.com   |
When the Steve James tries to register with username steve123 and <email>
Then the registration should fail with "Email already registered"

Examples: uppercase/lowercase aliases should be detected

| comment                                                    | email               |
| emails should be case insensitive                          | Mike@yahoo.com      |
| domains should be case insensitive                         | mike@Yahoo.com      |
| unicode normalisation tricks should be detected            | mᴵke@yahoo.com      |

Examples: Gmail aliases should be detected

   GMail is a very popular system so many users will register
   with gmail emails. Sometimes they use gmail aliases or labels,
   to prevent users mistakenly registering for multiple accounts
   the user repository should recognise common Gmail tricks.

| comment                                                    | email               |
| google ignores dots in an email, so mi.ke is equal to mike | mi.ke@gmail.com     |
| google allows setting labels by adding +label to an email  | mike+test@gmail.com |
| googlemail is a equivalent alias for gmail                 | mike@googlemail.com |

The next challenge

The challenge for this week is a bit more technical – dealing with situations that require waiting.

Check out the challenge

Stay up to date with all the tips and tricks and follow SpecFlow on Twitter or LinkedIn.

PS: … and don’t forget to share the challenge with your friends and team members by clicking on one of the social icons below 👇