header

I have been collecting water usage data for three years. The way home-assistant collects data, I get hourly mean, mix and max values for the water level in a sqlite database.

Raw Numbers

To set the context:

  • Total data points: 13,410
  • Range of Values: 0.05 - 99.93

Plot of all the points:

Uncleaned data plot

Cleanup

Negative values come from malfunctioning sensors. A lot of jitter from earlier sensors as well. Cleaning up negative values, assuming they’re sensor noise, here is a plot of data from 2021:

Cleaned data plot

You might notice four conspicuous gaps in the dataset. The first two correspond to periods when I was tinkering with different sensors to improve system accuracy. The large gap in August-September was due to system failure during the rainy season. This event led me to thoroughly waterproof the sensor setup, resulting in visibly more consistent data collection thereafter.

Patterns

To understand usage patterns, the following plot is mean values from Sept’23:

Sept mean values

And this is Min, Mean, Max:

Sept mean min and max values

To see the pattern more clearly, here is a plot of mean values bucketed by hour through the month of Sept’23

Sept hourly buckets

Key observations:

  • Water usage dips early in the morning when the family wakes up.
  • Another dip occurs around 20:00, signaling the family’s transition to winding down for the day.

Further insights can be derived from a histogram focusing on readings below 30:

Sept hourly histogram

Normal Distribution

Taking 1000 random samples out of total data for mean values, for 10000 times gives us the nice bell curve!

histogram 1

When compared with a standard Normal Distribution curve, the resemblance is uncanny:

histogram 2

To verify if the distribution really follows Normal Distribution, we can check it with Quantile-Quantile Plot showing strong correlation:

quantile-quantile plot

Bonus: Frequency Spectrum Analysis

I conducted a frequency spectrum analysis to look for any recurring patterns or cycles in water usage:

frequency spectrum analysis