# My Water Usage Follows Normal Distribution

I have been collecting water usage data for three years. The way home-assistant collects data, I get hourly mean, mix and max values for the water level in a sqlite database.

# Raw Numbers

To set the context:

**Total data points**: 13,410**Range of Values**: 0.05 - 99.93

Plot of all the points:

# Cleanup

Negative values come from malfunctioning sensors. A lot of jitter from earlier sensors as well. Cleaning up negative values, assuming they’re sensor noise, here is a plot of data from 2021:

You might notice four conspicuous gaps in the dataset. The first two correspond to periods when I was tinkering with different sensors to improve system accuracy. The large gap in August-September was due to system failure during the rainy season. This event led me to thoroughly waterproof the sensor setup, resulting in visibly more consistent data collection thereafter.

# Patterns

To understand usage patterns, the following plot is mean values from Sept’23:

And this is Min, Mean, Max:

To see the pattern more clearly, here is a plot of mean values bucketed by hour through the month of Sept’23

Key observations:

- Water usage dips early in the morning when the family wakes up.
- Another dip occurs around 20:00, signaling the family’s transition to winding down for the day.

Further insights can be derived from a histogram focusing on readings below 30:

# Normal Distribution

Taking 1000 random samples out of total data for mean values, for 10000 times gives us the nice bell curve!

When compared with a standard Normal Distribution curve, the resemblance is uncanny:

To verify if the distribution really follows Normal Distribution, we can check it with Quantile-Quantile Plot showing strong correlation:

# Bonus: Frequency Spectrum Analysis

I conducted a frequency spectrum analysis to look for any recurring patterns or cycles in water usage: