Braincuber

Millions of Requests Per Hour: SoundCloud’s Microservices Evolution

An illustration depicting the scale of millions of requests per hour, emphasizing rapid data flow and system efficiency.

As with most things in life, there are BFFs when it comes to software projects as well—better than life savers. This is how SoundCloud thought when it came to changing its service architecture for millions of requests per hour. SoundCloud is a website that allows you to listen to or stream audio and music online—free of charge. The 320 music tracks available on the website have the biggest online community of artists, bands, DJs and audio content creators. At first, the web app developed by SoundCloud complied with what they call the internal option called eat virulent complement.

One single API monolith satisfied the needs of the in-house applications and third-party’s core services development. Of course, monolith-likewise when SoundCloud implemented its approach this way it was not able to scale up operationally and organizationally and therefore outgrew this emigration of the monolith architecture to microservices. This, however, knew how to do and practice. With the onset of new microservices and the separation of the monolith into services, the mono clients had to start making multiple service calls to fetch the required data. For example, this complicated things for most of the clients who were applications based on SoundCloud fame.

Hence, this was not workable so, SoundCloud had to think of a middle ground which would solve this problem for the client applications and still keep microservices stuck to the architecture. In this post, we will observe the process of realization of these objectives by sharing BFFs, Value Added Services and Domain Gateways.

BFFs at SoundCloud

BFF is an acronym for Backends-for-Frontends. Simply put, a BFF can be considered an additional API gateway for every device or interface that interacts with the application.

Most BFF’s that the engineering team of SoundCloud uses have unique characters for each BFF. For example, there is a BFF called Mobile API, which is targeted at Android and iOS apps. After that, there is a Web API BFF, which is responsible for the web application and its widgets, and so on. There are also separate BFFs for the public and partner API services.

All external traffic into SoundCloud is routed through one of the BFFs. Additionally, these BFFs perform other functions including:

  • Rate Limiting 
  • Authentication 
  • Header Sanitization 
  • Cache-Control 


To achieve uniformity in the functions provided by all the BFFs, all the BFFs use an in-built edge logic which is commonly referred to as an
internal library. Any modification of this library is done automatically within less than a day.

Development of these BFFs at SoundCloud happens with an approach of the inner source model. 

This means that different teams are free to make changes to the BFFs developed, while every change is moderated by a core team following the structure defined in the Collective. The structure led by a Platform Lead facilitates interaction among the members by organizing meetings where members can voice concerns and exchange information.

A Virtual Conference to Rival the In-Person Ones It's All About The Data (Sponsored)

For every workload dealing with AL and ML, the core factor to consider will always be data. This has all too often been overlooked in the recent fad opportunities of Gen pU. On the 12th of September, the MLOps Community is organizing a free virtual conference that addresses key issues on data engineering in AI/ML.

The list of speakers is on fire since there are over 40 speakers from companies such as NVIDIA, Databricks, DuckDB, Lyft, etc. Themes addressed include, but are not limited to data ingestion and FinOps for AI/ML.

Infographic detailing the benefits and drawbacks of breadth-first search (BFS) in data structure exploration.
Image Source @Braincuber

Advantages of BFF

There are several benefits associated with the use of BFF. Let us review a few of the most important ones.

1 - Independence

The foremost reason that BFF is highly recommended is Enhanced Control.

With different client types come different APIs which can be adjusted to the most convenient API for a particular client type.

For instance, in the case of SoundCloud, mobile clients tend to demand larger responses with more embedded entities to limit the number of calls made to the backend. In opposition to this, the web front end responds to very specific queries.

BFFs help to solve such issues for every client type.

2 - Resilience and Lower Risk

The great thing about BFFs is that they also mitigate the chances of failure of the application as a whole. For instance, while a failed deployment may render a whole BFF in an availability zone, the downside risk does not extend to the entire platform, which was possible in the case of using a monolithic API. In the diagram below, you will notice that when the mobile BFF is down, it does not necessarily follow that the Web BFF is also down.

3 – Speed of Development

Speed of Development is Very High Decreeing and variation are likely to swell assurance levels and therefore the speed of development of new functionalities will be enhanced Autonomy and resilience work together. At SoundCloud, the main BFFs are brought up and put to use several times every day owing to the input from the entire engineering department.

Disadvantages of BFF

Developing any software will always have some trade-offs involved in every decision made. This also applies to embracing the BFFs. 

1 - Complexity

When the microservices powering the BFFs are very small- performing only the CRUD operations without any business logic, the integration of the feature gets to be done at the BFF layer. In other words, all of the business logic goes into the BFF layer.

Furthermore, there is an understanding of how the BFF simply acts as a client attachment and thus must be worked on as the client’s backend only. 

This is understandable looking at how it has been named but, leads to BFF being used as a dumping ground for complex functionalities. for example functionalities like pagination which should be pushed to the server.

2 - Duplication of Code

Over-centralized API gateways have this problem also when several applications might end up outside the scope of the infinity gateway BFF, but it’s not ad-free publisher’s face division center suffers too. 

BFFs have this additional problem of inclusivity. Rationale: Business logic implementation is replicated in several BFFs. With the advancement of development in time, such narrowing may lead to diverging purely contextual implementations. 

This problem was visible also in SoundCloud, where the authorization logic was cloned in many BFFs. The reason for that was that the authorization logic required Track and Playlist information from separate microservices, thus the integration logic had to go into the BFFs.

3 - Proliferation of BFFs

Even best friends forever can cross the line at times. BFFs allow greater freedom to the individuals but at the same time, they come with extra burdens. Suppose the group begins developing BFFs for every simple scenario, there will be too much maintenance burden in a short time. 

Also, complete freedom is a myth. BFFs are two extremes of a spectrum hence the need for front and back-end developers to work closely to have appropriate BFFs developed. 

Responsibility of VAS

The Value Added Services or VAS that are based on the principles of Domain Driven Design DDD are the very core of this new architecture. In particular, DDD has four key aspects that can be summarised as follows.

Domain – a user or the area of business which draws the line between integrations of services.
Entity – a thing with a distinct identity and lifecycle. Value objects – metadata appropriate to some entity.
Aggregates – one or more related entities form groups known as aggregates.

In the case of SoundCloud, the Value Added Services are located in between the BFFs and the downstream core services, and they construct the aggregates for the BFF. The tracks or playlists are the primary aggregates that must be handled. Moreover, the VAS took care of the authorization and visibility contexts concerning tracks. For the Track VAS, for instance, all tracks available in certain territories but geo-blocked are filtered. Owing to such a design, the BFFs could no longer invoke the separate microservices. The upside to this was that all the common functionality was contained within a single code base and there was no need to make the same requests to the basic services in different BFFs.

VAS Migration Process

The engineering teams at SoundCloud in charge of migrating the business logic from the BFFs to VAS, went through a 3-step procedure. For example, this is the scheme employed when creating a separate VAS for Playlists: 

  • First, the logic embedded in the BFFs was removed to establish the new Playlist VAS. This was done through a lot of analysis, investigations and documentation.
  • Second, as a central logic was the one that was refactored, automatic tests were ensured that the services matched the logic. Integration tests were, also, conducted to test the format of the response.
  • Lastly, the Playlist VAS was used to migrate about 50 playlist endpoints that were in the BFFs. There was a thorough comparison of the responses from both to guarantee a safe migration.

Challenges of VAS

Then, despite these benefits, VAS had a few limitations as well:

  • VAS had the problem of extreme calls to multiple services, which resulted in service fan out. Every time new features were added, the correlator nets expanded, and not only that, the number of service calls to core services increased, creating a significant problem regarding fanout size.
  • Different applications can demand different BFFs concerning the applications built over them. For instance, a track feature that is only required on the mobile interface does not make sense to retrieve the whole track aggregate from the Web API when such a feature may not be applicable.
  • There were one but many local VAS ends – points required for integration on the given BFF for the specific VAS aggregates as Growing not collapsing equates to the bellows’ jaw.


Let us substantiate this with the help of the partial response feature where API consumers can indicate the part of the response that they will want to consume using a FieldMask in the request.

Domain Gateways

One of the most recent changes in SoundCloud service architecture involves Domain Gateways. 

This was necessary since SoundCloud, in addition to offering a consumer app with access to a music library, offers features that allow for the uploading and distributing of music for the use of the content creators.

In short, Consumer and Creator, are distinct domains belonging to different teams. 

Implementing concerns of both domains in one VAS was effective for some time, but in the end, introduced a lot of coupling and complexity which slowed down the development pace.

As a remedy to this problem, the engineering team at SoundCloud came up with the idea of a Domain Gateway.

In this case, the SoundCloud engineering team looked into the various business domains that would require the use of that entity or aggregate and built a Domain Gateway for each one of those domains. Each gateway is then operated by different teams with different perspectives on a given entity while using the same underlying microservices.

Moreover, if you think of Domain Gateway, it is like a facade but one that is stable and is designed to fight corruption. 

It sacrifices a certain amount of redundancy for the benefits of independence and better scalability. It is, however, appropriate when access patterns and feature sets are quite different between the domains.

Conclusion

The historical development focused on the internal services of SoundCloud that first of all consisted of a common monolithic architecture, later evolving to a three-tier architecture. We have researched the entire way in great detail.

Some important conclusions are given below:

  • BFFs have been crucial to SoundCloud’s abi we ability to master various types of clients with a specific set of Notably, core Enabling Functions.
  • As SoundCloud grew, Value-Added Services, modelled on DDD concepts, were introduced to act as authoritative entry points for accessing aggregates.
  • Finally, the contexts of multiple domains were managed through Domain Gateways.


Today, SoundCloud is implementing its architecture following the same rationale hoping for increased agility and less redundancy.

Considering features like insights, monitoring, and optimizations in real time, Braincuber can assist Such Platforms like SoundCloud in ensuring seamless operations, even when they are at scale. Braincuber also helps tuning of microservices systems, load and resource management and preserving the quality of services under stress, by providing the visibility of the system and its characteristics, its delays, and potential bottlenecks.

Thank you for your interest in Braincuber. We’d like to ask a few questions to better understand your software development needs.

Amazing clients who trust us