Set up complex relationships | Gherkin examples

This challenge seems to be the #1 problem with Given-When-Then, according to many people who filled in our survey on this article series. I’ve combined the question from several similar responses:

How to set up complex relationships for the “Given” part (things with 1:n relations and conditions on the children), so that they are easy to follow.

Here is the challenge

This issue mostly appears with systems that have a big database, and strict data storage requirements. Sometimes it can also happen if there is no database, but the in-memory object model requires a complex network of collaborators to fully operate on something we need to test.

For example, consider an online retailer trying to test order refunds. The retailer buys items from multiple providers, and sells them to customers. To test an order refund, we must first have a valid order. An order must contain at least one (but usually many) order items, which can’t exist in a vacuum. For each order item, we must have a corresponding inventory item. Inventory items must have a name, quantity, and provider. In order to set up a provider, we first need to set up one or more purchase contracts, and for each contract we must set up a billing schedule, and several types of provider contacts (billing, technical, logistics). A valid order also must be created by an active customer. To create the customer we need a payment method, which must be validated first. For personal customers, we have just one contact email. For corporate customers, we might also need to add multiple types of contacts (billing, technical, logistics). Some of that set-up can take a long time, as corporate customer contacts need to be validated by an operator to avoid issues. By the time you’re reading this sentence, you probably forgot what we wanted to test at the start of the paragraph. With concrete data in several rows and columns under each of the sentences, there’s very little chance that anyone will understand what’s going on.

An easy way out of this situation would be to break the model – relax the constraints so the system becomes more easily testable (“Let’s allow orders without any items?”). Although in some cases you may be able to get away with this, to make the challenge more interesting, we’ll want to directly test the functionality involving end-points in that graph of relationships. Here are the rules to try to capture in a good way:

  1. If a customer is asking for a refund because the order was not delivered, we take the responsibility for it. The customer gets a refund (using the original payment method from the order), and a notification about the successful refund. For corporate customers, both the billing and the logistics contacts should receive the notification. The provider is not going to be involved at all in this.
  2. If a customer is asking for a refund because the order was delivered, but one of the items was defective, then the provider of that item takes the responsibility. The customer gets a refund (using the original payment method from the order), with contact notifications as in the previous option, but we also immediately contact the provider about it. The provider billing contact should get a notification about a refund so they can credit our account. The technical contact should get the information supplied by the customer in the refund request, along with the defective item information from the order, and customer’s technical contact information so they can follow up directly. If an order contains items from multiple providers, only the provider supplying the defective item should be contacted, the others should not.

Your challenge is to capture the two requirements in a Given-When-Then spec that’s easy to follow. If you avoid listing any of the information directly in the feature file, which you absolutely should try to do, then also explain briefly how the automation layer is supposed to fill in the missing pieces of the puzzle. (How should it connect the information from the feature file to database items? What’s being set up and how?).

Solving: How to set up complex relationships?

The challenge was to manage complex relationships in test set-ups, especially when creating a whole network of collaborator objects, but keep the test easy to read and understand.

For a detailed explanation of the problem, check out the original challenge post. This article is a summary of the community responses and has some additional ideas on how to solve similar problems.

Manage storage constraints outside Given-When-Then files

Consistency requirements enforced by a database, or by the object model, are a large part of the problem with complex data setup. An inventory item needs a provider, the provider requires a purchase contract, with each contract in turn depending on a billing schedule. We might not care about the details of all those objects for a specific test case, but we can’t avoid setting them up. The typical – but not so good – solution is to list all these objects with all their properties explicitly in the background section of a feature file. There are two major perceived benefits of that approach:

  1. the data is completely visible to the readers of a Given-When-Then file
  2. Set-up step implementation can be relatively generic and simple

The first perceived benefit is in theory great, but it’s usually wasted because an overwhelming amount of information. Complex object networks tend to be difficult to read and understand, so even though the information might be in a feature file, readers can’t consume it easily.

The second perceived benefit is just plainly wrong. It’s a wrong local optimisation. By creating generic test set-ups, we might be saving programming time, but we’ll lose a lot more in trying to understand and maintain complexity in plain text. As a general guideline, avoid trying to do complex coding tasks in Given-When-Then scenarios. Push that complexity to a programming language environment where you have proper support for loops, conditions, type checking and full IDE tooling. Focus on clarity and understanding in executable specifications.

To make the important data visible to the readers of a feature file, we’ll need to deal with all the transitive relationships and storage constraints in the step implementations, not in the feature descriptions or scenario set-ups. There are three good ways of achieving that:

  • Object factories
  • Golden Source databases
  • Object finders

I’ll explain each of these in the following sections.

Use object factories to construct complex networks from attributes

Factory methods are one of the traditional object design patterns, mentioned in the original Gang-of-Four book. The pattern is a typical solution for situations where the process of creating an object is complex, and not appropriate for the local class constructor. In that sense, it matches the situation of complex data set-ups perfectly.

To implement this pattern for Given-When-Then scenarios, I usually create a separate utility class, so I can use it from many step implementations. This allows me to limit the scenario description to the bare essentials needed for a test, such as the one below:


Given a "not-delivered" refund request 

The implementation of this step could call the RefundRequestFactory object and just pass ‘not-delivered’ as the reason. The RefundRequestFactory would set up the customer, the orders, the payment methods, the inventory items, the providers and the billing contracts as needed. For situations where we need to specify a bit more about the scenario starting point, for example in order to test that the refunded amount matched the order amount minus the fees, we can make the factory take a few more parameters. I usually do that by allowing a table of properties that will be passed directly to the factory.


Given a "not-delivered" refund request for
| order amount |     customer email |
|      100 USD | test@customer.com  |

The major benefit of this approach is that it can be very flexible. Factories can provide default values for all non-essential properties and collaborators, and ensure that the provided attributes are correctly mapped. Although the order amount and customer email belong to different levels of a hierarchy, we can specify them in a flat list in the Given-When-Then scenario. The factory can deal with distributing the property values to the right objects. Factories are also a very effective way to reduce duplication. For example, the order amount may be copied to invoices, refund requests and account postings, but we don’t need to specify it three times. The factory can ensure that the dependent objects match the request. With more complex object relationships, the collaborators might have their own factories. So a RefundRequestFactory might just call the OrderRequestFactory to build the bulk of its dependencies. This is another good reason for pushing the object construction into code, and away from feature files. Other similar objects can just reuse the OrderRequestFactory when needed.

Another benefit of object factories compared to other approaches is that the set-up process is easy to version, and relatively easy to change. It’s all contained in a single class, so programmers can easily update it, and track changes through history.

The downside of the factory approach is that the process can be quite slow if the collaborator objects need to be saved to an external storage (for example, a database). Combining factories and databases can also cause problems for multiple concurrent test runs, as factories may be creating overlapping objects.

Use Golden Source databases for external storage

‘Golden Source’ (also known as ‘Golden Record’ or ‘Master Copy’) databases are a polar opposite approach to object factories. Instead of relying on dynamic creation, these databases contain a well-known starting point for the key reference data of an application. For example, we might pre-populate a database with a set of inventory items, providers, billing schedules and contact information. An individual scenario does not need to set up any of that data, as long as it knows what to expect in the database.

The key trick for using golden source data to use identifiers that imply the underlying references. For example, “Unavailable_Item” could be a good identifier for an inventory item that is no longer available. The key risk, conversely, is to use generic identifiers that make it difficult to understand the scenario.

The benefit of this approach is that database setup for individual scenarios is usually very fast, so it speeds up feedback.

The downside of this approach is that the data can become obscure, and that people may have incorrect assumptions about the relationships. “Unavailable_Item” might mean a completely different thing to different parts of the business.

Another common issue with golden source databases is that versions are very difficult to control. Database storages are usually binary files, and they don’t collaborate nicely with modern version control systems.

A potential way to manage golden data sources in a more controlled way is to use a set of SQL scripts as the primary source, and then create the binary database files from scratch. The SQL scripts are easy to store in version control systems. However, this requires setting up the database from start every time, so it can slow down the testing process. Because of that, full data set-up is usually not done for each test, but instead just once for the entire test suite, or even just when the SQL scripts change. Keeping SQL scripts in a version control system, and using a live “testing” database that is automatically built from those scripts but kept outside version control, often provides a good balance between confidence and feedback speed.

Another issue with a single shared golden source is that the data is easy to mess up. One test can change the inventory status of the “Unavailable_Item” and all of the sudden we’ll get a whole bunch of unexpected test failures. There are two good workarounds for that:

  • using database transactions to reverse changes
  • limiting golden data sources only to immutable reference data

Wrap tests into database transactions

Most relational databases provide transactions as a way of isolating concurrent processes and batching operations. By wrapping a test run into a database transaction, we can roll back the transaction at the end and just undo any changes to the data. With SpecFlow, the usual way to implement this would be to set up a before/after scenario hooks.

The benefit of this approach is that it is very easy to implement technically, and that it’s relatively generic. A test framework doesn’t need to know or care about data changes in individual tests. It can just roll back the current database transaction.

The downside of this approach is that it cannot be used to test processes that explicitly manage transactions, or coordinate distributed systems. For example, an API call might explicitly commit the changes to a database, and a subsequent rollback might then not completely clean up everything. With distributed systems, collaborators may not be able to see uncommitted data, so this approach is not applicable.

Many databases can also lock out readers in case of uncommitted data changes to records, so transactions can also limit our ability to run concurrent tests.

Split transactional and reference data

An alternative workaround is to commit the changes to the database, but ensure that the key data is not modified. Usually, this involves splitting the data into reference and transactional information.

Reference information is relatively static, key set-up information, designed to be the same for all tests. For example, billing schedules, provider information and item inventory set-up could be reference data for an order management system. That kind of information could be relatively small and generic.

Transactional information is dynamic, created or modified by individual test cases. For example, orders, refunds and notifications would be transactional data for an order management system.

A well designed split between transactional and reference data also makes it easier to manage the golden data set, since we can keep the SQL scripts minimal and restore databases more easily.

The problem with this approach is that it’s difficult to make a clean cut between the two categories. Customer information may fit into both, depending on the perspective. We can optimise the test set-up by creating a few customers upfront, and then use them to create orders and request refunds. Alternatively, we can make tests more isolated by creating a new customer for each test. This is further complicated if the same test suite runs different types of tests. For example, the inventory items might need to behave like transactional data for tests related to inventory management, but they can be reference data for tests related to order refunds.

Use object finders

Factories work well for in-memory systems. Golden data sources work well for databases, but can get tricky with modifications. A third popular approach for solving the complex data set-up issues is to combine the two, and use a pre-populated database that can contain partial information, complementing it with a factory that knows how to fill in the missing the information.

Object finders are usually responsible for creating an object with all its dependencies, but unlike factories they start from an existing data source. For example, if a scenario requires a valid credit card, the finder might look into the database to find a customer with a valid card and return it. If it doesn’t find anything, it can use any existing customer, just create a new valid credit card and associate with the customer. If there are no customers, it can create a valid customer object and so on. The process resembles a factory, but it treats the existing database like a temporary cache. You can decide how deep the finder goes, and at what point it gives up.

The benefit of the finder approach is that it’s much easier to ensure data isolation than just with a golden data source. For example, a step implementation may suggest to a finder that it wants to later modify a customer object, so the finder can clone an existing customer and return a modifiable copy instead of a shared reference. This allows finders to deal with different use cases, and avoid polluting reference data. A finder can treat inventory information as reference data for refund tests, but it can work with the same records as transactional data for inventory tests.

The downside of the finder approach is that it is the most complicated of all. We need to manage both object construction and database maintenance, and scenarios need to correctly report what they want to modify, and what they just want to read.


Although there is no perfect solution for all cases, the three approaches to constructing complex object networks all push the mess away from feature files into step implementation. They differ in terms of performance and ease of maintenance.

If you want to test in-memory systems, go with the object factory. If you must talk to an external database, consider how much creating the whole network takes each time, and whether this is too slow for your tests. If not, use an object factory again. If this would be too slow, then check if you can run tests in transactions easily. If so, a golden data source might be a better option. If you can’t run transactions easily, or if the golden data sources would take too long to set up, object finders are probably the best option.

Next challenge

The next Given-When-Then with style challenge is to remove duplication from similar scenarios, in particular when groups of steps are shared between different scenarios.