I put some time into providing proper histograms
for Salstat. The basics work and work well enough but the advanced stuff is yet to come.
Histograms are very useful in statistics because they can help us immediately see the distribution of a vector's data. The screenshot above tells me that it's not likely to be a normal distribution. If I wanted to perform an inferential test that assumed the data were normally distributed, I might need to transform them to a normal distribution first or use a test that doesn't have that assumption.
The critical thing, however, is to see how the data look, and Salstat does this in a basic form.How does it work?
module has a handy histogram function that (in its simplest form) takes a vector and returns 2 vectors of frequencies and limits. These are used to directly form the histogram.
Once completed, a column chart is drawn in HighCharts
using these values but with some additional 'plotOptions' so that no gaps exist between the columns.What's left to do?
- The histogram defaults to 10 bins. This is fine for basic uses but more advanced use cases need to let the user define the bins.
- Histogram limits are defined by the minimum and maximum of the data. Some users need to define their own.
- Rarer use cases might exist for defining weights for each bin and the histogram function might need to return the probability density function rather than the counts.
We're keen to get the first two working but are unsure how to design the interface to meet this need. The interface was designed for simpler charting needs and will need careful thought before accommodating those needs.
For now, however, Salstat has a basic histogram charting function which meets probably 80% of needs.