Ensuring backward compatibility in your applications

In software development, Backward compatibility is an important concept that is often overlooked. This leads to stressful software updates and production bugs. Many a time we stumble upon a situation that we have updated the API but forgot to update the dependent systems or APPs or even don’t know about who else is using the same.

Especially in distributed systems, adding new features to existing code can be really challenging. Not only do we have to understand the impact our changes might have on the various other services and apps, but we often also risk introducing bugs and odd behaviour.

Backward compatibility can help us in many ways to make such changes in a much smoother way. It can help by allowing us to add new features without breaking any of our existing code.

So what the heck is backward compatibility?

Compatibility means how one thing blends with the other. In the context of software, backward compatibility checks whether a newer version of a product is compatible with an older version.

So a code is Backwards Compatible if a program written against one version of that code will continue to work the same way, without modification, against future versions of the code.

Every software product is bound to change according to the demands of the users. Hence tasks performed on an older version must work as before in the newer versions as well.

So when we are writing new code consider writing it in such a manner that existing functionality will continue to work and will not hamper the exiting functionality. Backward compatibility doesn’t only exist in code, it also needs to be maintained in all the places like databases, apps, API’s, hardware and libraries. So when we design or improve any software, we keep compatibility in mind and keep the older code compatible with new code.

Yeah, the concept is there but why should I give a lot of f*ck to it?

Well if you write industry level code and there are users who are using your product then yeah you surely need to. Backward compatibility is damn important because it eliminates the need to start over when you upgrade to a newer product.

Suppose if you’re going to introduce a breaking change, you’ll have to coordinate releases. The update to system A will have to be done simultaneously with the update to system B. This is a risky situation.

First, it’s difficult to coordinate releases. Especially, if multiple systems are involved. But it’s also stressful because things can go wrong easily. And when they do, it’s difficult to fix the situation. If there’s a bug in the integration, both systems might need to be fixed, requiring even more coordination. It’s even worse if one system needs to be rolled back for whatever reason. Then the other needs to be rolled back as well. This might not always be possible, because the new version may have side effects that can’t be undone: new data structures, emails sent, lost data, etc.

Let’s consider a simple example, suppose you released a mobile app. There are users who are using your app on various models. Now you have to add a new feature which requires a change in one of the exiting API. If you change the API signature or modified API with certain changes to support new functionality then as soon as you deploy your changes to production then everyone needs to update the app otherwise the app will stop working. If you keep your API backward compatible then even if users are not updating the app, they will not see the new feature but will continue to use at least the exiting functionality.

Sometimes, however, it is necessary to sacrifice backward compatibility to take advantage of new technology but that call needs to be taken with proper consideration.

Does it worth the time?

Backward compatibility may sound like a lot of time invested on a maintainers part. While it certainly takes some effort, this is an investment for your future, not just an expenditure.

  • It proves you’re serious about maintaining your code. When I’m choosing between different software, the software with regular updates to address changes in dependencies is always the higher quality one.
  • Conversely, a lack of frequent updates tells you that you shouldn’t trust a piece of code. Bitrot is no joke; code gets worse over time! Having something that forces you to change code regularly is the best way to avoid bitrot.
  • It properly penalizes you for using too many dependencies. Encouraging decreased dependencies is a great way to avoid dependency problems!

So how is this useful?

Backward compatibility actually helps in many ways. Let’s talk about a few of them.

Streamline your releases

  • Backward compatibility save your ass from doing big-bang deployments
  • You can start deploying your code without stressing about sequencing your deployments.
  • You don’t have to plan big releases and can be deployed smoothly in smaller chunks

Reduce the downtime

  • We can reduce downtime by introducing backward compatibility as the systems are less tightly coupled and can be deployed smoothly. That way if there are multiple systems dependent on your changes then other systems don’t need to be updated at the same time.

De-couples deployments

  • Baclkward compatibility reduces coupling within multiple systems by providing smooth integration and update supports
  • This way the deployments can be decoupled and deployed independently
  • With this, we reduce the dependencies from multiple systems.

Identify the problems early

  • It helps to identify the problems early as we deploy the changes before finally releasing the changes

Controlled roll-out

  • You can easily control and co-ordinate the deployments and the plan to release the changes in a very smooth manner

We don’t break at least the exiting functionality

  • The biggest benefit that we gain that we at least continue to work with the exiting functionality. New feature or functionality may have a bug but existing features will continue to work and will not hamper any user.

So what are the ways to achieve it?

As I explained earlier that backward compatibility exists not only in code but mostly everywhere. There are various places where we need to consider it. We are going to consider only a few areas here.

  • Web development
  • Libraries
  • Async processing
  • Databases

Over time, our code may need to change. Whether it’s from shifting business priorities or new strategies, we should accept from day one that our APIs will likely be modified.

Write code that is easy to evolve, follow the robustness principle, summarised as “Be conservative in what you do, be liberal in what you accept.”

In the context of web development, this principle can be applied in several ways. Let’s look at some ways we can make our code backwards compatible.

Manage compatibility to be handled in code

Sometimes we can or can not handle or ensure backward compatibility and that depends on the changes that you need to make. But this should be the first things that we should seek. There are other ways to be looked after this. So write code in a way to ensure it continues to support existing functionality. Let’s take an example of a method which use to be called for user login. Consider previously we had one login for user and had the following signature.

Now suppose you start to accept login for a different user residing in the different table then based on the user type you will start looking up on the table & now your code looks like

But now in the above code, the new API has to send UserType. As soon as you deploy this change then the exiting login will start failing as it will not find the type and will throw NPE.

Consider modifying the code slightly to manage compatibility with existing code.

This way internal API implementation will start to take additional parameter as UserType. If it’s being sent then it will consider else it will continue to search on customer table as before.

Use Feature flag

There are times when the code backward compatibility unable to serve and we have to introduce a way by which we can enable the new code flow. There is a very famous concept which is also known as feature flag, feature toggle or kill switch

A feature flag is a piece of configuration that applications can use to determine if a particular feature is enabled or not. This allows us to release new code, but avoid actually executing it until a time of our choosing. Likewise, we can quickly disable new functionality if we find a problem with it.

There are plenty of tools that can be used to implement feature toggles, such as launchdarkly.com, rollout.io and Optimizely. Regardless of which tool we use, there are certain characteristics that we should look for.

It should be fast

Implementing feature toggles usually means adding lots of code like the following into our applications

Checking the state of a feature toggle must therefore be quick. We shouldn’t rely on reading from a database or remote file every time we need to check that state of the flag, as that could degrade our application very quickly.

Should handle distributed systems

Since we’re dealing with distributed systems, it’s likely that a feature flag will need to be accessed by multiple applications. Therefore the state of a flag should be distributed so that every application sees the same state, along with any changes.

Should be atomic

Changing the state of a feature flag should be a single operation. If we have to update multiple sources of configuration, we’re increasing the chances that applications will get a different view of the flag.

Feature flag has a tendency to accumulate over time in code. While the performance impact of checking lots of flags may be negligible, they can quickly morph into tech debt and require periodic cleaning. Make sure to plan for time to revisit them and cleanup as necessary.

Versioning

Versioning allows us to support different functionality for the same resource.

For example, consider a blog application that offers an API for managing its core data such as users, blog posts, categories, etc. Let’s say the first iteration has an endpoint that creates a user with the following data: name, email, and a password. Six months later, we decide that every account now must include a role (admin, editor, author, etc). What should we do with the existing API?

We essentially have two options:

  1. Update the user API to require a role with every request.
  2. Simultaneously support the old and new user APIs.

With option 1, we update the code and any request that doesn’t include the new parameter is rejected as a bad request. This is easy to implement, but it also breaks existing API users.

With option 2, we implement the new API and also update the original API to provide some reasonable default for the new role parameter. While this is definitely more work for us, we don’t break any existing API users.

Versioning can be maintained at multiple levels.

URI Path

This is the easiest and most common way and can be achieved using either the path

POST /v2/blog/users

Or by using query parameters

POST /blog/users?v=2

URLs are convenient because they’re a required part of every request, so your consumers have to deal with it. Most frameworks log URLs with every request, so it’s easy to track which consumers are using which versions.

Headers

You can do this with a custom header name that your services understand

Accept-Version: 2

Using headers for versioning is more in line with RESTful practices. After all, the URL should represent the resource, not some version of it. Additionally, headers are already great at passing what is essentially metadata between clients and servers, so adding in version seems like a good fit.

Message body

We could wrap the message body with some metadata that includes the version

From a RESTful point of view, this violates the idea that message bodies are representations of resources, not a version of the resource. We also have to wrap all our domain objects in a common wrapper class, which doesn’t feel great — if that wrapper class ever needs to change, all of our APIs potentially have to change with it.

Handle async communication

Messaging services like JMS and Kafka are another way to connect distributed systems. Unlike web APIs, messaging services are fire-and-forget. This means we typically don’t get immediate feedback about whether the consumer accepted the message or not.

Because of that, we have to be careful when updating either the publisher or consumer. There are several strategies we can adopt to prevent breaking changes when upgrading our messaging apps.

Upgrade consumers first

A good best practice is to upgrade consumer applications first. This gives us a chance to handle new message formats before we actually start publishing them.

The robustness principle applies here as well. Producers should always send the minimum required payload, and consumers should only consume the fields they care about and ignore anything else.

Create new topics and queues

If message bodies change significantly or we introduce a new message type entirely, we should use a new topic or queue. This allows us to publish messages without worrying that consumers might not be ready to consume them. Messages will queue up in the brokers, and we are free to deploy the new or updated consumer whenever we want.

Use headers and filters

Most message buses offer message headers. Just like HTTP headers, this is a great way to pass metadata without polluting the message payload. We can use this to our advantage in multiple ways. Just like with web APIs, we can publish messages with version information in the header.

On the consumer side, we can filter for messages that match versions that are known to us, while ignoring others.

Relational databases

Relational databases, such as Oracle, MySQL, and PostgreSQL, have several characteristics that can make upgrading them a challenge:

  • Tables have very strict schemas and will reject data that doesn’t exactly conform
  • Tables can have foreign key constraints amongst themselves

Changes to relational databases can be broken into three categories.

Adding new tables

This is generally safe to do and will not break any existing applications. We should avoid creating foreign key constraints in existing tables, but otherwise, there’s not much to worry about in this case.

Adding new columns

Always add new columns to the end of tables. If the column is not nullable, we should include a reasonable default value for existing rows.

Additionally, queries in our applications should always use named columns instead of numeric indices. This is the safest way to ensure new columns do not break existing queries.

Removing columns or tables

These types of updates pose the most risk to backwards compatibility. There’s no good way to ensure a table or column exists before querying it. The overhead of checking a table before each query simply isn’t worth it.

If possible, database queries should gracefully handle failure. Assuming the table or column that is being removed isn’t critical or part of some larger transaction, the query should continue execution if possible.

However, this won’t work for most cases. Chances are, every column or table in the schema is important, and having it disappear unexpectedly will break your queries.

Therefore the most practical approach to removing columns and tables is to first update the code that calls it. This means updating every query that references the table in question and modifying its behaviour. Once all those usages are gone, it is safe to drop it from the database.

How to ensure that your code is backward compatible?

There are various ways but we will discuss the most naive solutions

Make sure that the unit tests pass

You should have proper unit tests written that will verify if the functionality is intact with a new release. The tests should be written in such a way that they should fail if there are any backward compatibility problems. Ideally, you should have a test suite for testing that will fail and alert when there are issues with backward compatibility. You could also have an automated test suite plugged into the CI/CD pipeline that checks for backward compatibility and alerts when there is a violation.

Integration and regression tests

Same way as Unit testing we need to do the integration testing with the exiting flow and make sure that the existing integration & regression tests don’t fail because we introduce a new change or feature.

Deprecations

Backward compatibility can be a bit of a “double-edged sword” — in that we might add more code just to be able to keep serving our existing call sites, rather than just updating them in one go. Sometimes we want that extra code to stick around, for convenience or to simply avoid requiring our code users to change their code just because we needed to add a new feature, but sometimes we want to make it clear that the old code is going away.

Deprecations are a way to do just that. Just like how Apple uses deprecations in their SDKs and frameworks to give us hints and encouragement to move our code to more modern APIs, we can do the exact same thing for our own code as well.

Even though it’s a bit of extra effort, adding more finely grained deprecations when replacing types and APIs can really help with communication — especially in an open-source project or in a large team. Instead of being frustrated when a new version abruptly breaks their code, our API users will now get a clear indication of what changes that need to be made to their call sites.

Please also make sure that you annotate or comment on the code that is being deprecated so that it can be cleaned after the planned date.

Custom warnings for deprecation

Over time you will have to stop the support for old features which are no more required but suddenly stopping them will cause a big problem. We can start giving custom warning and can deprecate the unused code by the communicated time.

Removal of unused code

Planned deprecation & removal or deprecated code should only be done by properly monitoring things for multiple days. Monitor the data and usage of the code, if it’s not being used still then publish the warning and plan to remove the dead code. But make sure that you never delete the code as soon as you are done with the feature and the deletion should be communicated properly.

Documentation is a must

Most of the time we write code but forgets to consider that it needs to be maintained as well. We should provide proper comments and changelog to the consumers in case of new releases. We should also annotate the deprecated code with @Deprecated so that it will be present to support backward compatibility but with defined time we can plan to clear it. Few things, in general, we need to consider especially in web API cases to consider for documentation:

  • Version and effective date
  • Breaking changes that consumers will have to handle
  • New features that can optionally be used but don’t require any updates by consumers
  • Fixes and changes to existing APIs that don’t require consumers to change anything
  • Deprecation notices that are planned for future work

This last part is critical to making our APIs evolvable. Deleting an endpoint is clearly not backwards compatible, so instead, we should deprecate them. This means we continue to support it for a fixed period of time and allow our consumers time to modify their code instead of breaking unexpectedly.

Conclusion

Taking a few extra steps to make API changes backward compatible might seem like an unnecessary effort at first, but it can often make larger refactors and API additions a lot quicker and easier to pull off. Especially in a larger team or when working on open source, avoiding breaking APIs with every change or addition can really help improve the workflow among developers and backward compatibility can also be a great communication tool as to why a specific change was made.

Not all changes can be backward compatibility of course, and maintaining backward compatibility over a long period of time is also a complication in of its own. In my opinion, it’s worth doing either to make larger changes or refactors easier and less risky or when backward compatibility also adds convenience at a low cost. Like always, everything is a tradeoff, but the fewer disruptive changes we need to make the smoother our workflow usually becomes. The tips and ideas above are only a starting point and don’t cover all the ways in which our systems might talk. Things like distributed caches and transactions can also provide obstacles to building backwards compatible software.

Code, read, sleep & repeat.