Introduction — what kind of counts does the distribution describe?
Build the entry-level feel for the Poisson distribution by looking only at counts inside a fixed interval.
The intuition we want to build first
The Poisson distribution does not look at "how large the value is" — it looks at "how many events occurred within a fixed interval." To start, we line up 5-minute buckets and look only at the count inside each bucket.
5-minute buckets lined up with only the per-bucket count visible. How those counts are distributed is exactly what the Poisson distribution describes.
Count events, not values
For example, when you think about API timeout counts, it's natural to say "0 this minute, 2 this minute, 5 this minute." Here what you want to summarize is not the exact timing of each event but how many events happened in that interval.
At the entry to Poisson, three things matter:
- Decide the interval first. For example: 1 minute, 10 minutes, 1 day.
- Look at the count per interval. For example: 3 events in the last 10 minutes.
- Set the fine-grained ordering of events aside for now.
| What you want to model | Is Poisson a good fit? | Why |
|---|---|---|
| Number of inquiries per hour | Good fit | It's the count inside a fixed interval. |
| The heights of students | Not a fit | It's a continuous value, not a count. |
| Number of winning tickets drawn without replacement | Inappropriate (not impossible) | The number of draws is capped, and removing a card changes the remaining composition so trials are not independent. This is the territory of the hypergeometric distribution. Poisson is not "unusable" here — it is just the wrong fit. |
The assumptions behind Poisson (the Poisson process)
The Poisson distribution fits naturally when the way events occur roughly satisfies the three conditions below. These are the assumptions of what is called a Poisson process.
- Independence: counts in one interval are independent of counts in any other. Whether one event occurred does not change the chance of the next.
- Rarity (no simultaneous events): in a vanishingly short sub-interval, the chance that two or more events happen at the same instant is negligible.
- Stationarity (uniform rate): the average rate of occurrence is the same at every position inside the interval.
Heights are modeled by the normal distribution because they are continuous magnitudes, not counts. Poisson always describes "how many events happened in a fixed interval" as a discrete random variable. Keeping these three conditions in mind makes it easier to judge whether Poisson is appropriate in practice.
Check 1 — Carry the rate onto the interval
First, confirm your feel for 'count data' and 'fixed interval' by multiplying a rate by a window length.
Q1. Suppose timeouts occur at an average rate of 6 per 10 minutes. What is λ when you look at a 5-minute interval?
6 per 10 minutes means 0.6 per minute on average. Over 5 minutes that gives 0.6 × 5 = 3, so λ = 3.
Q2. Which of the following is the most natural candidate for a Poisson distribution?
Poisson is good for representing the count of events inside a fixed interval. Heights are continuous, and drawing without replacement is a capped combinatorial problem — neither is a Poisson fit.
λ is "the mean count for that interval"
The star of Poisson is λ, the mean count for the interval. If you double the observation window, the mean count doubles; halve it, and the mean count halves.
λ = rate × interval length"Mean" here is the long-run center if you observed intervals with the same conditions many times. A single interval does not have to land exactly on λ events.
When you only look at counts, what stays and what is lost?
When Poisson looks at the count per interval, how many events there were is preserved. On the other hand, which second within that minute they clustered in, and which came first, are all thrown away.
This is not "treating the data carelessly" — it is a deliberate simplification so we can focus on the count model first. When we need the rest, we move on to models of waiting times and arrival times.
The point to keep in mind is that Chapter 1 is a bridge from raw counts to a probability model. Once we have counts inside fixed intervals that we want to describe by some distribution, the next chapter introduces a function that assigns a probability to each possible count — the probability mass function P(X = k).
Check 2 — The interval and what you throw away
λ is 'the rate carried onto this interval.' While you look at counts, check what you are letting go of at the same time.
Q1. Consider a monitoring system that emits on average 0.3 warnings per minute. Over a 20-minute window, what is the mean count λ?
At 0.3 per minute, 20 minutes gives 0.3 × 20 = 6. So λ = 6.
Q2. Pick the correct description of what this course means by 'fixed interval.'
Poisson counts the events that fall inside an interval you fixed in advance. Changing the interval after the fact makes it hard to compare using the same λ.
Q3. From a record that says '3 events occurred in the same 1 minute,' which piece of information is being discarded?
At the entry to Poisson, we first look at how many events occurred. In exchange, we do not keep the timing information of which second within the minute each event happened.
Key takeaways from this chapter
- Poisson is a distribution for the count of events in a fixed interval.
- Its main parameter λ is the mean count for that interval.
- In exchange for looking at counts, the fine-grained timing within the interval is set aside.