In the context of web applications, limiting the number of requests a host or user makes solves two problems:

  • withstanding Denial-of-service attacks (rate-limiting)
  • ensuring that a user doesn’t consume too many resources (throttling)

Rate-limiting is often accomplished with firewall rules on a device, iptables, or web server. They are enforced at the network or transport layer before the request is delivered to the application. For example, a rule such as “An IP address may make no more than 20 reqs/sec” would queue, or simply drop any requests that exceeded the maximum rate, and the application will not receive the request.

Throttling can be thought of as application middleware that maintains a count of users’ requests during a specific time period. If an incoming request exceeds the maximum for the time period, the user receives a response (e.g. HTTP 403) containing a helpful error message.

A good example of throttling is Twitter’s controversial API rate-limiting. Twitter enforces several types of limits depending on the type of access token used and the API function used. An example of a rule is “a user may make no more than 150 requests per 15-minute window”.


Although Twitter uses the term rate limiting, I find it helpful to distinguish the concepts of network-layer rate limiting versus application-specific request limiting (throttling).