Back to Resources
Tutorial

Setting Up Rate Limiting in Production

Alex Rodriguez
Oct 15, 2025
12 min read

Rate limiting is your first line of defense against API abuse, DDoS attacks, and resource exhaustion. In this comprehensive guide, we will walk through implementing production-grade rate limiting with real-world examples.

Why Rate Limiting Matters

Real Incident: The $72,000 Bill

In March 2024, a startup without rate limiting was hit with a DDoS attack. Their auto-scaling infrastructure spun up 300+ servers trying to handle the traffic. The attack lasted 6 hours. Their cloud bill: $72,000.

With proper rate limiting, the attack would have been blocked at the gateway, costing them $0 in excess infrastructure.

Rate Limiting Algorithms Compared

1. Token Bucket (Best for Bursts)

Tokens are added to a bucket at a fixed rate. Each request consumes a token. Allows controlled bursts.

✓ Pros:

  • Allows traffic bursts
  • Smooth over time
  • Memory efficient

✗ Cons:

  • Complex to configure
  • Can allow sudden spikes

2. Fixed Window (Simplest)

Count requests in fixed time windows (e.g., per minute). Reset counter at window boundary.

✓ Pros:

  • Very simple to implement
  • Low memory usage
  • Easy to understand

✗ Cons:

  • Window boundary issue
  • Allows 2x burst at boundaries

3. Sliding Window (Recommended)

G8KEPR Uses This

Combines fixed window simplicity with smooth rate enforcement. No boundary issues.

✓ Pros:

  • Accurate rate limiting
  • No boundary exploits
  • Smooth enforcement

~ Cons:

  • Slightly more memory
  • More computation

Multi-Level Rate Limiting Strategy

Production systems need rate limiting at multiple levels. Here is the recommended approach:

LevelLimitPurpose
Per IP Address100/minPrevent DDoS, scraping
Per User1000/minFair usage across users
Per API Key5000/minTier-based limits
Per EndpointVariesProtect expensive operations
Global50000/minInfrastructure capacity

Implementation with G8KEPR

G8KEPR provides production-ready rate limiting out of the box. Here is a sample configuration:

# docker-compose.yml
services:
  gatekeeper:
    image: g8kepr/gateway:latest
    environment:
      # Sliding window rate limits
      RATE_LIMIT_PER_IP: "100/1m"
      RATE_LIMIT_PER_USER: "1000/1m"
      RATE_LIMIT_PER_API_KEY: "5000/1m"

      # Endpoint-specific limits
      RATE_LIMIT_LOGIN: "5/5m"        # Prevent brute force
      RATE_LIMIT_SIGNUP: "3/1h"       # Prevent spam accounts
      RATE_LIMIT_EXPORT: "10/1h"      # Expensive operation

      # Redis for distributed counting
      REDIS_URL: "redis://redis:6379"

      # Response headers
      RATE_LIMIT_HEADERS: "true"      # X-RateLimit-* headers
    ports:
      - "8080:8080"

  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data

volumes:
  redis-data:

Monitoring and Alerts

Rate limiting is only effective if you monitor it. Set up alerts for:

  • Unusual spike in rate limit rejections (possible attack)
  • Specific IPs hitting limits repeatedly (block them)
  • Legitimate users hitting limits (increase their tier)
  • Global limit approaching capacity (scale infrastructure)

Get Production-Ready Rate Limiting

G8KEPR includes sliding window rate limiting, Redis integration, and real-time monitoring dashboards.

View Pricing

Ready to Secure Your APIs?

Deploy enterprise-grade API security in 5 minutes. No credit card required.

Start Free Trial
G8KEPR - Enterprise API Security Platform | $99/mo vs Kong's $2K