Secure your Spring Boot API with rate limit.

8 min readOct 27, 2020

Availability and reliability are paramount for all web applications and APIs. If you are providing an API, chances are you’ve already experienced sudden increases in traffic that affect the quality of your service, potentially even leading to a service outage for all your users.

However, when you’re running a production API, not only do you have to make it robust you also need to build for scale and ensure that one bad actor can’t accidentally or deliberately affect its availability.

Rate limiting can help make your API more reliable.

Rate limiting is not a new concept. We used to call it “quality of service” to make it sound nicer, is an well-known concept, used by many companies for years. For example, telecommunication companies and content providers frequently throttle requests from abusive users by using popular rate-limiting algorithms such as leaky bucket, fixed window, sliding log, sliding window, etc. All of these avoid resource abuse and protect important resources. Companies have also developed rate limiting solutions for inter-service communications, such as Doorman (https://github.com/youtube/doorman/blob/master/doc/design.md), Ambassador (https://www.getambassador.io/reference/services/rate-limit-service), etc, just to name a few.

What kinds of bot attacks are stopped by rate limiting?

Rate limiting is often employed to stop bad bots from negatively impacting a website or application. Bot attacks that rate limiting can help mitigate include:

Brute force attacks

DoS and DDoS attacks

Web scraping

Rate limiting also protects against API overuse, which is not necessarily malicious or due to bot activity, but is important to prevent nonetheless.

The way API rate limiting works, in general, is that each client is allowed X requests per time_interval. The time interval might be minutes, hours, or days. It might even be seconds. The reason for this is to prevent any given client (user) from consuming so many resources (memory, CPU, database) as to prevent the system from responding to other users.

Rate limiting for APIs helps protect against malicious bot attacks as well. An attacker can use bots to make so many repeated calls to an API that it renders the service unavailable for anyone else, or crashes the service altogether. This is a type of DoS or DDoS attack.

Social media platform rate limiting is basically just API rate limiting. Any third-party application that integrates Twitter, for instance, can only refresh to look for new tweets or messages a certain amount of times per hour. Instagram has similar limits for third-party apps. This is why users may occasionally encounter “rate limit exceeded” messages.

These limits typically don’t apply to users who are using the social media platform directly

Examining rate limiting algorithms

Sliding Window Counter

This approach attempts to optimize some of the inefficiencies of both the fixed window counter and sliding logs technique. In this technique, the user’s requests are grouped by timestamp, and rather than log each request, we keep a counter for each group.

It keeps track of each user’s request count while grouping them by fixed time windows(usually a fraction of the limit’s window size). Here’s how it works.

When a user’s request is received, we check whether the user’s record already exists and whether there is already an entry for that timestamp. If both cases are true, we simply increment the counter on the timestamp.

In determining whether the user has exceeded their limit, we retrieve all groups created in the last window, and then sum the counters on them. If the sum equals the limit, then the user has reached their limit and the incoming request is dropped. Otherwise, the timestamp is inserted or updated and the request processed.

As an addition, the timestamp groups can be set to expire after the window time is exhausted in order to control the rate at which memory is consumed.

Token bucket

In the token bucket algorithm, we simply keep a counter indicating how many tokens a user has left and a timestamp showing when it was last updated. This concept originates from packet-switched computer networks and telecomm networks in which there is a fixed-capacity bucket to hold tokens that are added at a fixed rate (window interval).

When the packet is tested for conformity, the bucket is checked to see whether it contains a sufficient number of tokens as required. If it does, the appropriate number of tokens are removed, and the packet passes for transmission; otherwise, it is handled differently.

In our case, when the first request is received, we log the timestamp and then create a new bucket of tokens for the user:

On subsequent requests, we test whether the window has elapsed since the last timestamp was created. If it hasn’t, we check whether the bucket still contains tokens for that particular window. If it does, we will decrement the tokens by 1 and continue to process the request; otherwise, the request is dropped and an error triggered.

In a situation where the window has elapsed since the last timestamp, we update the timestamp to that of the current request and reset the number of tokens to the allowed limit.

Leaky bucket

The leaky bucket algorithm makes use of a queue that accepts and processes requests in a first-in, first-out (FIFO) manner. The limit is enforced on the queue size. If, for example, the limit is 10 requests per minute, then the queue would only be able to hold 10 requests per time.

As requests get queued up, they are processed at a relatively constant rate. This means that even when the server is hit with a burst of traffic, the outgoing responses are still sent out at the same rate.

Once the queue is filled up, the server will drop any incoming requests until space is freed up for more.

The examples in this blog post use the bucket4j Java library which implements the token bucket algorithm.

Understanding and implementing rate limiting in Spring Boot

Prerequisites

In order to follow along effectively as you read through this, you are expected to have the following:

-A general understanding of how servers handle requests

-A good understanding of how to build REST APIs in Spring Boot.

If you’re lacking some or all of these, do not feel intimidated. We will make sure to break things down as much as possible so that you can easily understand every concept we end up exploring.

Project setup.

Spring Boot Application Configuration

Let’s build a new Spring Boot application

Creating the Spring Boot Application

To generate the initial project structure, visit Spring Initializr: https://start.spring.io/

Provide details as below.

Project: Maven or Gradle Project

Language: Java

Spring Boot: Select the latest stable version or keep the default selection as it is.

Project Metadata: Provide an artifact name and select your preferred Java version. Make sure your local environment has the selected Java version available. If not download and install.

pom.xml Changes

Open pom.xml and make the following changes

Add Bukhtoyarov Version Property

<!-- https://mvnrepository.com/artifact/com.github.vladimir-bukhtoyarov/bucket4j-core -->
<dependency>
    <groupId>com.github.vladimir-bukhtoyarov</groupId>
    <artifactId>bucket4j-core</artifactId>
    <version>4.10.0</version>
</dependency>

Gradle

// https://mvnrepository.com/artifact/com.github.vladimir-bukhtoyarov/bucket4j-core
compile group: 'com.github.vladimir-bukhtoyarov', name: 'bucket4j-core', version: '4.10.0'

Now we create Bucket

public Bucket createNewBucket() {
    long capacity = 10;
    Refill refill = Refill.greedy(10, Duration.ofMinutes(1));
    Bandwidth limit = Bandwidth.classic(capacity, refill);
    return Bucket4j.builder().addLimit(limit).build();
}

The basics of the algorithm are easy to understand. You have a bucket that holds a maximum number of tokens (capacity). Whenever a consumer wants to call a service or consume a resource, he takes out one or multiple tokens. The consumer can only consume a service if he can take out the required number of tokens. If the bucket does not contain the required number of tokens, he needs to wait until there are enough tokens in the bucket.

Refill.greedy(10, Duration.ofMinutes(1));

The “greedy”-refiller adds the tokens more greedily. With this example, he splits one minute into 10 periods and puts 1 token at each of these periods into the bucket. With the configuration above, the refiller puts 1 token every 12 seconds into the bucket. After one minute, both refillers put the same amount of tokens into the bucket.

Basic example

As an example server, I wrote a Spring Boot application that provides HTTP endpoints for fetching data about Products.

The following endpoint returns information about the Products.

@GetMapping("/products")
public ResponseEntity<List<Product>> getAllProducts(HttpServletRequest httpRequest ) {
    HttpSession session = httpRequest.getSession(true);
    String appKey =String.valueOf("<Security Key> ");        /*SecurityUtils.getThirdPartyAppKey();*/
    Bucket bucket = (Bucket) session.getAttribute("throttler-" +   appKey);
    if (bucket == null) {
        bucket = createNewBucket();
        session.setAttribute("throttler-" + appKey, bucket);
    }
    boolean okToGo = bucket.tryConsume(1);
    if (okToGo) {
        return new ResponseEntity<List<Product>>(
   Arrays.asList(new Product("product 1", "category 1", new Date()),
                 new Product("product 2", "category 1", new Date()),
                 new Product("product 3", "category 2", new Date())
                 ),
                HttpStatus.OK);
    }
    else
       return new ResponseEntity(
            "You have exceeded the 10 requests in 1 minute limit!"                                     ,HttpStatus.TOO_MANY_REQUESTS);
}

We decide that only 10 requests per minute can reach this service. First, we create a refiller, in this example, a greedy refiller that refills the bucket with 10 tokens per minute. That means this refiller adds one token every 6 seconds. Next, we create a Bandwith class which combines the maximum capacity of a bucket with the refiller, and then we build the bucket with the Bucket4j builder.

For HTTP endpoints, it’s common to send back the status code 429 (TOO_MANY_REQUESTS) if the calling client exceeded the configured rate limit. This is not a hard requirement; your service can do whatever he wants if the clients exceed the rate limit. However, this is a standard convention, and you should follow it, especially when you create a public-facing HTTP API.

When we execute the from Terminal or Postman. The response would look like below.

If we exceeded the configured rate limit

Conclusion

In this article, we have successfully explored the concept of rate limiting — what it is, how it works, and and practical scenarios in which it is applicable.

We have also done our very own implementation in Spring Boot, using bucket4j I hope you enjoyed doing this with me.

You may find the source code for this tutorial here on GitHub.

See you in the next one!

Secure your Spring Boot API with rate limit.

Written by Youssef EL Yamani

No responses yet