Integralist/rules for good testing.md

Last active November 27, 2025 13:04

Star (72) You must be signed in to star a gist
Fork (14) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/Integralist/7944948.js"></script>
Save Integralist/7944948 to your computer and use it in GitHub Desktop.

Download ZIP

Sandi Metz advice for writing tests

Raw

rules for good testing.md

Rules for good testing

Look at the following image...

...it shows an object being tested.

You can't see inside the object. All you can do is send it messages. This is an important point to make because we should be "testing the interface, and NOT the implementation" - doing so will allow us to change the implementation without causing our tests to break.

Messages can go 'into' an object and can be sent 'out' from an object (as you can see from the image above, there are messages going in as well as messages going out). That's fine, that's how objects communicate.

Now there are two types of messages: 'query' and 'command'...

Queries

Queries are messages that "return something" and "change nothing".

In programming terms they are "getters" and not "setters".

Commands

Commands are messages that "return nothing" and "change something".

In programming terms they are "setters" and not "getters".

What to test?

Test incoming query messages by making assertions about what they send back
Test incoming command messages by making assertions about direct public side effects

What NOT to test?

Messages that are sent from within the object itself (e.g. private methods).
Outgoing query messages (as they have no public side effects)
Outgoing command messages (use mocks and set expectations on behaviour to ensure rest of your code pass without error)
Incoming messages that have no dependants (just remove those tests)

Note: there is no point in testing outgoing messages because they should be tested as incoming messages on another object

What to Mock/Stub

Command messages should be mocked, while query messages should be stubbed

Contract Tests

Contract tests exist to ensure a specific 'role' (or 'interface' by another - stricter - name) actually presents an API that we expect.

These types of tests can be useful to ensure third party APIs do (or don't) cause our code to break when we update the version of the software.

Note: if the libraries we use follow Semantic Versioning then this should only happen when we do a major version upgrade. But it's still good to have contract/role/interface tests in place to catch any problems.

The following is a modified example (written in Ruby) borrowed from the book "Practical Object-Oriented Design in Ruby":

# Following test asserts that SomeObject (@some_object) 
# implements the method `some_x_interface_method`
module SomeObjectInterfaceTest
  def test_object_implements_the_x_interface
    assert_respond_to(@some_object, :some_x_interface_method)
  end
end

# Following test proves that Foobar implements the SomeObject role correctly
# i.e. Foobar implements the SomeObject interface
class FoobarTest < MiniTest::Unit::TestCase
  include SomeObjectInterfaceTest

  def setup
    @foobar = @some_object = Foobar.new
  end

  # ...other tests...
end

Alpheus commented Nov 12, 2025

@Alpheus Thanks so much for responding.

In this example, isn't the fact that we pass location to the API call part of the behaviour of our OUT, and therefore something that should be unit-tested?

Even more so if we were to eg. do some kind of transformation of the value (sanitisation/normalisation) before passing it to the collaborator. Consider this implementation:
def wear_a_jacket?(location)
  weather = @api.weather(location.strip.downcase)
  
  #...
end
I'd hope to have a spec that would break if I removed this normalisation.

Ah ruby isn't my strong suit. Let me digest that first. Right off the bat I'd put what you wrote and what we wrote into two axes: testing that the implementation works vs. defining the behavior spec so we can refactor. I'm leaning heavily onto the latter, with very little observable value-add on the former.
You seem to favor implementation-assurance over all else (or at least more than I do) for the sake of example.

Did I get that right so far?

Fundamentally what I reason about differently is I would never write a test such as:
"given API returns rain" -> "SUT returns raincoat"

Instead I would test
"given person in London and it's raining in London" -> "SUT returns raincoat"

And important, this also exposes the need to expose the base cases:
"given person's location is unknown" -> "SUT returns error with usage instructions"
"given person in London and weather at (or near) person's location is unknown" -> "SUT provides sorry message"

This small change would emphasise the semantics that the person's location and the weather are related and actually part of an underdesigned behavior contract, and not a collaborator contract.

Once you got the upstream (what-do-I-wear-side) of the code right and have 5-6 tests for the core behaviors, you can probably squeeze in a single test to cover the location + weather resolution, API, cache or otherwise but at that point it will be modelled correctly as an implementation detail for information retrieval and not in any way be related to decision-making.

dgmstuart commented Nov 12, 2025 •

edited

Loading

@Alpheus thanks again for your continued engagement 🙏

You seem to favor implementation-assurance over all else

I'm not sure what you mean by "implementation assurance" but certainly nothing is "over all else" - we're developers: everything is tradeoffs 😉.

I try to practice TDD, so I'm looking for the spec which forces me to write any correct implementation, otherwise my TDD loop doesn't progress the implementation beyond this:

weather = @api.weather # passes no arguments, even though the real implementation requires one argument

...but of course I want to avoid a spec which over-specifices the implementation, so if "implementation assurance" means that then no, I don't value that at all.

The way you frame it makes sense: rather than describing specs in terms of the behaviour of the collaborator, the specs are described in terms of the state of the world. 👍

...but I'm still not sure how to actually write the spec in such a way that I'm forced to write a valid implementation: something has to change (I guess in the stubbing?). Here's an updated straw man set of specs for discussion, which all pass with the above implementation despite the incorrect implementation:

RSpec.describe ClothingPicker do
  context "when the person is in London and it's raining in London" do
    it "returns 'raincoat'" do
      api = instance_double(WeatherApi, weather: :rain)
      picker = described_class.new(api:)

      expect(picker.wear_a_jacket?("London")).to be(true)
    end
  end

  context "when the person's location is unknown" do
    it "raises an exception" do
      api = instance_double(WeatherApi)
      allow(api).to receive(:weather).and_raise("Unknown location")
      picker = described_class.new(api:)

      expect { picker.wear_a_jacket?("Pitlochry") }.to raise_error("Unknown location")
    end
  end

  context "when the person's is in london and the weather is unknown" do
    it "raises an exception" do
      api = instance_double(WeatherApi)
      allow(api).to receive(:weather).and_raise("Unknown weather")
      picker = described_class.new(api:)

      expect { picker.wear_a_jacket?("London") }.to raise_error("Unknown weather")
    end
  end
end

Side-note: when you say "underdesigned" I'm not sure if that's supposed to be a positive (ie. it's implementation-agnostic?) or a negative (not enough design has been done in some sense?).

Alpheus commented Nov 12, 2025 •

edited

Loading

Definitely negative. I like your faux-implementation to highlight the red state.
A step up from the oversimplified example before.

Focusing in on the top test: I'd continue the exercise you started and continue removing the faux-implementation detail from the test.

Instead of
api = instance_double(WeatherApi) -> picker = new (api:)

I would focus on the state-mechanism (forgive me, I don't know how to create tuples in ruby):
UKWeather = instance_double(WeatherApi, [London: Rain, Bristol: Cloudy, Manchester: Fog])
-> picker = new(UKWeather).

This way it's obvious what part of the test setup is "controlled" and what part is "controlled against, AKA being observed for correct behavior".

It also highlights two anti-goals when it comes to the design of the system:

If you intercept the API call, then you couple the test to the SDK used for the weather api. If the SDK changes then the test may fail or become flaky
If you do not intercept the API call and use the real connection, the test may fail if there's network issues between you and the foreign API (ie. won't run on a local production-like setup on an airplane)

The design and assurance comments are there to highlight that the underlying collaborator is a network-based mapping device, rather than a weather API, and especially not a specific weather api on a specific url. That aspect of the code is a detail that can only be tested on the internet, facing the real API, thus becoming a monitoring concern, not a testing one.

For example, the anti-goal of writing an integration test against the real API:

Weather API is down temporarily. Test will fail. But I only want tests to fail if there is reason for developer intervention to improve or fix code. In this case a false-negative as we can't do anything about it (other than using a different API).
Weather API is down permanently. Test will always be red. Only option is removing feature or changing APIs. Either way the underlying behavior can be benign and fail gracefully, prompting the only real value here to be the "monitoring of the internet aspect". But why are we monitoring the internet with unit tests instead of... actual monitoring on production/telemetry?
Another anti-goal wrt to SDK usage: it's possible that SDK is v8 and works correctly with v8, but Weather Api ltd. changed their API version to v10 and now need SDK version of at least v9. All of the behavior tests will fail (imagine you have 100 of them), rather than just 1 test highlighting that the SDK-to-weatherlocation-mapping has stopped integrating. This can create the impression that all the failing tests need a fix, rather than there being 1 integration or contract test highlighting the main issue.

dgmstuart commented Nov 13, 2025

Thanks again.

For me running against a real API is never an option - or at least for the purposes of this discussion I'm only interested in how to test without a real API.

Unfortunately the stub you wrote doesn't make sense to me: maybe the stubbing frameworks you're used to work on a different paradigm?

If I read that stub naively then it implies an API with a different specific signature:

def weather # no arguments
   # => return a mapping of cities to weather
end

...which is different from the actual API in my example:

def weather(city)
   # => return a token representing a kind of weather.
end

...and I'm sure that's not what you intended.

Maybe it would make more sense if you write the stub in a language you're familiar with and I can look up how stubbing works in that language?

Alpheus commented Nov 13, 2025 •

edited

Loading

What I meant was

interface ProvidesWeather {
  async weather(city: string): Weather;
}

class WeatherAPI implements ProvidesWeather {
  constructor(private apiKey: string) {}
  async weather(city) {} // calls real API
}

// Used in whatshouldIwear test
class WeatherStub implements ProvidesWeather {
  constructor(private mappingUnderTest: Record<City, Weather>) {}
  async weather(city) {
    return this.mappingUnderTest[city] || throw Error();
  }
}

This is a fairly classical example, you could also use delegates or named stubs if your languages does not differentiate interfaces and classes or prefers ducktyping (like python).

Notably I strive to highlight that there is an intersection between (What the API does=SDK) and (What I use it for=Interface) that encompasses two different domains, and I only want my tests to actually execute real code when it is authored inside my domain (ie. don't test the SDK you didn't write, that's already test-covered in the SDK's repo).

dgmstuart commented Nov 13, 2025

Right - I think I understand - thank you for constructing this stub.

What I'm concluding is that in order to force me to write a valid implementation (in my TDD loop), I need to pass a stub where the output depends on the input.

Perhaps there's some rule of thumb that if the output of a class (eg. WeatherAPI) depends on its input, then a stub for that class also needs to depend on its input?

Consider the following, which implements the interface just fine, but doesn't have the property we want?

class HardcodedWeatherStub implements ProvidesWeather {
  async weather(_city) {
    return "rain";
  }
}

(for clarity this is more or less what instance_double(WeatherApi, weather: :rain) creates: even though Ruby doesn't have interfaces, instance_double checks that the method exists on the real class. It doesn't check types though, since Ruby doesn't have types either and by convention uses ducktyping and polymorphism)

I see the point that if we pass this in our spec:

const stubAPI = new WeatherStub({
  "London": "Rain",
  "Bristol": "Cloudy",
  "Manchester": "Fog"
});

...then we don't need any explicit assertion that the API was called.

...but won't a passing spec have implicitly asserted that the API was called with "London" since there's no other way it could possibly have come up with the correct result? 🙃.

To put it another way, it feels like passing this stub is more or less equivalent to:

use const stubAPI = new HardcodedWeatherStub() (always return "rain")
assert that stubAPI received weather with London (this is easy with testing frameworks in Ruby)

...but maybe there's some key difference that I'm missing?

I 100% agree on not testing the SDK I didn't write, and relying on the interface of the API rather than what it actually does: this was never in doubt.

I still feel like making an assertion on the outgoing query message is only relying on the interface to the WeatherAPI, not on its implementation? But there are of course other reasons to not make such assertions.

Alpheus commented Nov 13, 2025 •

edited

Loading

Right - I think I understand - thank you for constructing this stub.

What I'm concluding is that in order to force me to write a valid implementation (in my TDD loop), I need to pass a stub where the output depends on the input.

Yes, and further: it is important that the stub is async (if your language supports that). The fact it is a time-sensitive is a more important detail than things like api key and url or what it comes from.

Perhaps there's some rule of thumb that if the output of a class (eg. WeatherAPI) depends on its input, then a stub for that class also needs to depend on its input?

Feels like you took a step forward, but now two steps back. The output of a class of type WeatherAPI does not depend on its input. The output depends on whatever the weather server says it does. We don't know that for certain with a guarantee that a machine would consider proof. For all we know it could be returning Rainy all the time and ignore your input (or your input parameter could be wrong).

This detail does not matter for the SUT of what-should-I-wear because no amount of fiddling will prove that it is correct with a production-level certainty.

Consider the following, which implements the interface just fine, but doesn't have the property we want?
class HardcodedWeatherStub implements ProvidesWeather {
  async weather(_city) {
    return "rain";
  }
}

This would not work. The test has to specify the Stub, so the test has to define what it returns when asked, ie. by default "Sunny", and only for London return "rain". Then you implicitly verify it was called correctly if the output is raincoat. So the information (default: Sunny, London: Rain) is part of the assertion and should be right next to it in the same test.

I see the point that if we pass this in our spec:
const stubAPI = new WeatherStub({
  "London": "Rain",
  "Bristol": "Cloudy",
  "Manchester": "Fog"
});
...then we don't need any explicit assertion that the API was called.

Yes, correct! This is on track now, you got it.

...but won't a passing spec have implicitly asserted that the API was called with "London" since there's no other way it could possibly have come up with the correct result? 🙃.

For that you need a zero-case for the behavior for some other form of clothing that isn't rainy. You are right if Sunny -> Raincoat and Rain -> Raincoat. That's why it is important that you compose these tests from sensible base tests (remember the two I added in my original reply?)

To put it another way, it feels like passing this stub is more or less equivalent to:

use const stubAPI = new HardcodedWeatherStub() (always return "rain")

assert that stubAPI received weather with London (this is easy with testing frameworks in Ruby)

...but maybe there's some key difference that I'm missing?

Yes and you picked up on it already. The difference is that the stub needs to exhibit the intended behavior. Let me zoom in all the way:

When you wrote the test originally you said the behavior is Rain -> Raincoat. There is zero location in that. London is not in any shape or form the correct answer. So the simplest test for that is assert(wearing("Rain"), "raincoat").

However, this is now just a simple map, and the information that the weather is async got lost. So the async bit has to come back. Let's compose it.

The inverse isn't super sensible:
whatShouldIwear = fetchWeather(wearing(location))

So we keep the obvious one:
whatShouldIwear = wearing(fetchWeather(location))

So far so good. You can now see that whatShouldIwear is a function of (fetched location). The SDK (or fetch API) are details. So the object becomes

class ClothingPicker
  def initialize(resolvesLocationToWeather: WeatherApi.new)
    @resolvesLocationToWeather = resolvesLocationToWeather
  end

  def wear_a_jacket?(location)
    weather = @resolvesLocationToWeather.weather(1) # incorrect: our real API class expects us to pass a location
    
    case weather
    when :rain
      true
    else
      false
    end
  end
end

RSpec.describe ClothingPicker do
  context "when it's Sunny everywhere in the UK except raining in London" do
    it "returns 'raincoat' for London" do
      api = ... # a stub that has non-rain default + rain in London
      picker = described_class.new(api:)

      expect(picker.wear_a_jacket?("London")).to be(true) # Now it will fail, because giving the resolver 1 will return sunny
    end
  end
end

I still feel like making an assertion on the outgoing query message is only relying on the interface to the WeatherAPI, not on its implementation? But there are of course other reasons to not make such assertions.

Let's unpack this with a counter example. Let's say you put a "cache" in front of your outgoing query which is the WeatherAPI but it has the same interface. That cache returns "Rain" if it cannot access the server (ie. a unit test with an air gap) or if you flip a flag to not talk to external servers (ie. in CI mode).

This is sensible for the weatherAPI, but now none of our tests for ClothingPicker are failing or passing correctly. It also makes the question, "was the API called with London" completely non-deterministic because you don't know whether you got Rain because the location was passed or if you got Rain because the server is down.

This now exposes the core issue: You cannot write the test in such a way, that you could have the API be pick(location) but not know the weather. So the location-to-weather mapping is intrinsically the main information needed, which is exactly what the collaborator provides. The collaborator doesn't "return Rain". It provides "Rain for London", if it is indeed raining in London. It's that second piece of the information that we're mocking, not the mapping itself.

Now you ask, why is it not worth testing? Because if you want any code to execute at all, you need to call the real one. Because if you do not, then the only code executed is the creation of the mock, and the calling of a mock. In this case you fully ignore the fact that something has to be done with raincoat.

So your code becomes (pseudo code):

api = weather_mock_that_asserts_call_with_london_and_returns_rain
picker = new(api) # a class you know calls the api
picker.wear_a_jacket?(london)
#assert the mock was called with London

This breaks the determinism and structure-insensitivity on the test, because test has multiple falses:

test breaks if API got called multiple times
test passes incorrectly if the weather api changes, but the mock does not
test fails incorrectly if the mock changes, but the weather api does not
test fails if there are multiple ways to query london, and the one in the mock wasn't the obvious choice (Ie. does not survive refactor)
test fails if the value object for the location has multiple canonical forms (ie. trimming whitespaces or lowercasing location)

Ultimately you cannot test the invocation of the method, without screwing up what it does with the return value in any way that would be more reasonable thangiving the SUT a simpler stub through the code.

dgmstuart commented Nov 16, 2025

Thanks for your thorough response.

The test has to specify the Stub, so the test has to define what it returns when asked
....
So the information ... is part of the assertion and should be right next to it in the same test.

In Ruby we can (and do, with the ubiquitous testing framework RSpec) define such things inline in the test - in this case as

instance_double(WeatherApi, weather: :rain)

I was only writing it as typescript class for the sake of mutual intelligibility, but I see that this was misleading in another way.

Regardless: reading between the lines I interpret your main point as this:

To be a useful stub, it should only return the relevant result in the scenario that we're testing.

If so then that makes sense, and we have a syntax for this in RSpec:

api_stub = instance_double(WeatherApi)
allow(api_stub).to receive(:weather).with("London").and_return(:rain)

This, like WeatherStub:

doesn't assert that the message was called
will raise an error if "New York" or "london" is passed

This way we avoid the arbitrary values ("Sunny", "Manchester" etc.): I prefer to avoid arbitrary values, because it's not 100% clear to the reader whether it's significant that this specific value is used, or whether any other value would fulfil the same purpose.

Alpheus commented Nov 22, 2025 •

edited

Loading

Thanks for your thorough response.

The test has to specify the Stub, so the test has to define what it returns when asked
....
So the information ... is part of the assertion and should be right next to it in the same test.

In Ruby we can (and do, with the ubiquitous testing framework RSpec) define such things inline in the test - in this case as
instance_double(WeatherApi, weather: :rain)
I was only writing it as typescript class for the sake of mutual intelligibility, but I see that this was misleading in another way.

Regardless: reading between the lines I interpret your main point as this:

To be a useful stub, it should only return the relevant result in the scenario that we're testing.

If so then that makes sense, and we have a syntax for this in RSpec:
api_stub = instance_double(WeatherApi)
allow(api_stub).to receive(:weather).with("London").and_return(:rain)
This, like WeatherStub:

doesn't assert that the message was called

will raise an error if "New York" or "london" is passed

This way we avoid the arbitrary values ("Sunny", "Manchester" etc.): I prefer to avoid arbitrary values, because it's not 100% clear to the reader whether it's significant that this specific value is used, or whether any other value would fulfil the same purpose.

Yes, to really shorten the conversation down:
The test should contain all details that matter for the particular scenario being tested, and it should not contain any details about the current implementation.

Let's connect that with Sandi's talk from the original idea way up in the thread:

Why not test outgoing queries?
Because an outgoing query will definitely be tested as a scenario-fixture for one of the command tests or incoming queries

I can see how this can be confusing if you do not pin down tests to each scenario, and use broad-spectrum mocks instead.

dgmstuart commented Nov 27, 2025

OK thanks again for all the discussion - all the best of luck with your projects.

Alpheus commented Nov 27, 2025

OK thanks again for all the discussion - all the best of luck with your projects.

Thank you too, nice conversation!

Integralist/rules for good testing.md

Select an option

No results found

Select an option

No results found

Rules for good testing

Queries

Commands

What to test?

What NOT to test?

What to Mock/Stub

Contract Tests

Alpheus commented Nov 12, 2025

Uh oh!

dgmstuart commented Nov 12, 2025 •

edited

Loading

Uh oh!

Alpheus commented Nov 12, 2025 •

edited

Loading

Uh oh!

dgmstuart commented Nov 13, 2025

Uh oh!

Alpheus commented Nov 13, 2025 •

edited

Loading

Uh oh!

dgmstuart commented Nov 13, 2025

Uh oh!

Alpheus commented Nov 13, 2025 •

edited

Loading

Uh oh!

dgmstuart commented Nov 16, 2025

Uh oh!

Alpheus commented Nov 22, 2025 •

edited

Loading

Uh oh!

dgmstuart commented Nov 27, 2025

Uh oh!

Alpheus commented Nov 27, 2025

Uh oh!

Integralist/rules for good testing.md

Rules for good testing

Queries

Commands

What to test?

What NOT to test?

What to Mock/Stub

Contract Tests

Alpheus commented Nov 12, 2025

Uh oh!

dgmstuart commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alpheus commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dgmstuart commented Nov 13, 2025

Uh oh!

Alpheus commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This is a fairly classical example, you could also use delegates or named stubs if your languages does not differentiate interfaces and classes or prefers ducktyping (like python).

Uh oh!

dgmstuart commented Nov 13, 2025

Uh oh!

Alpheus commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dgmstuart commented Nov 16, 2025

Uh oh!

Alpheus commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dgmstuart commented Nov 27, 2025

Uh oh!

Alpheus commented Nov 27, 2025

Uh oh!

dgmstuart commented Nov 12, 2025 •

edited

Loading

Alpheus commented Nov 12, 2025 •

edited

Loading

Alpheus commented Nov 13, 2025 •

edited

Loading

Alpheus commented Nov 13, 2025 •

edited

Loading

Alpheus commented Nov 22, 2025 •

edited

Loading