Every public API needs rate limiting, or one bad actor — or one buggy script — can hammer it into the ground. But clumsy rate limiting punishes your best customers along with the abusers. The goal is to stop abuse while normal use never notices the limit exists.
Choose the right key
Limiting by IP alone is blunt: an entire office behind one IP shares the limit, and one heavy user there blocks everyone. For an authenticated API, limit per API key or per account so each customer gets their own budget and one customer can’t affect another.
Set limits to real usage, and be forgiving
Look at how your real users behave and set the ceiling well above normal peaks. A limit that a legitimate integration hits during normal work is too low. Different endpoints can have different limits — a cheap read can allow far more than an expensive export. Use a token bucket so short bursts (which real apps do) are allowed while sustained abuse is still capped.
Then communicate. Return the limit, how many requests remain, and when it resets in response headers, and use the standard 429 status with Retry-After when someone is throttled. A developer who can see the limits will design around them; one who gets a silent failure will just be angry.
Done well, rate limiting is invisible to everyone except the traffic you actually wanted to stop.