12-09-2018 11:57 PM
I do have a tricky task where I hope anybody out there might have some ideas or even a working solution.
I do need to identify if the raster values within a polygon do have a somehow normal distribution. The idea is to ensure that a given polygon contains only one land use type and not two different classes. This means I need to get rid of any polygons that show a bi- or multimodal histogram distribution of the underlying raster data.
I have found two operators that might be a first starting point: Skew Texture and Kurtosis Texture Per feature. Both might be good indicators for this task as described e.g. here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3791391/
But unfortunately those two operators use a focal operation at all the pixel locations within a Feature and then average (or SD or Median) those values for the overall measure for the Feature, rather than simply taking the sample of pixels that fall within the Feature and calculating the Kurtosis/Skew of that distribution.
Another possible way would be to use Python modules like “shapiro” for Shapiro-Wilk Test or “anderson” for the Anderson-Darling Test. There I would need to pass the histogram off all underlying raster cells to a Python script.
Any ideas or working solutions are more than welcome.
Cheers and many thanks for your help in advance.
12-10-2018 08:31 AM - edited 12-10-2018 12:59 PM
As additional input to Fritz' request, attached is a Spatial Model which (I think) calculates Skew, Excess Kurtosis and the Bimodality Coefficient (for a single-band input image).
However given this Histogram
I was getting Kurtosis = 1 (i.e. skewed to the left), Excess Kurtosis = 7.5 (heavy tailed) and BC = 0.2 (not bimodal).
So either I’ve gotten the math wrong, or BC isn’t a good test of bimodality (as is implied in the paper).
So I'd appreciate it if everyone could please check my math in the model and see if I got it right.
And even if I did, I think other suggestions for robust tests for multi-modality would be appreciated.
12-12-2018 05:19 AM - edited 12-12-2018 05:21 AM
thanks a lot for your work. But it really seems that there is still something wrong with your model or as you said the BC value is not really the best practice for a normal distribution test.
In the meantime, I tried to write a small python script that would allow me to derive the p value for a raster subset. But I am still struggling with the normaltest of scipy. You should find the first version of the python script as well as some data in the attached zip. It is still a standalone script without any class functions.
I do read an image (line 7) and loop through rows and columns to get all pixel values stored in an array. For testing I write out the values into a text file (20). Those look good. Then I output the histogram of the array in a chart (lines 23 and 24). Those also look correct. And for me I see a somehow normal distribution.
Then I use the scipi.normaltest function to derive stats and p (line 27). I now would assume that p is somewhere in the region of 0.05. But I always to get 0.0 as result. I have no idea what I am doing wrong.
Beware: you need to have the python 3.x and modules scipy, mathplot and pil installed to be able to run the script. Use “python -m pip list” to list all install modules or “python -m pip install modulename” to install the needed modules.
Any help is more than welcome