Landsat 8 image on the left, results of running this model overlaid in the right view |
Satellite imagery suffers from the influences of weather, especially the presence of clouds. The clouds not only obscure the features below them but also cast shadows that attenuate the light reflected back from the features falling in those shadows. For many sensors, such as Landsat, a basic cloud mask is often provided with the data. But sometimes this can be inaccurate or lacking entirely. An automated way to map the locations of clouds and, optionally, areas of shadowing/shading, is frequently helpful in excluding these locations from further processing, or to identify the locations which need "back filling" using older data of the same location.
This article looks at one such research area. It is not offered as a finished, fool-proof approach, but as a topic for further investigation and fine tuning. Feedback is, as always, encouraged and welcomed.
Landsat8_Cloud_Mask_v16_1_4.gmdx |
The basic premise is based on the paper "Generation of Cloud-free Imagery Using Landsat-8" by Byeonghee Kim, et al . The premise is that clouds generally reflect strongly in certain bands (such as the Cirrus and Coastal Blue bands) where as they are also cold and therefore have low returns in the thermal band(s). Consequently you should be able to threshold the bands to help identify cloud locations. But the traditional problem with this is that the threshold has to be manually chosen, or is fixed based on certain assumptions of pre-processing, making it sometimes difficult to apply scene to scene. For example, see the article entitled "Simple Cloud Mask for Landsat Imagery" for an example of fixed thresholds dependant on data that has been corrected to Surface Reflectance. Instead Kim et al recommend using a clustering technique to automatically identify the threshold points. In this model we have used ISODATA to perform the automated thresholding.
The paper uses a similar assumption to attempt to identify shadows cast by the clouds. The authors suggested first identifying the cloud location and then searching (buffering) to a distance of 200 pixels (which has been provided as a user-definable variable in this model) and looking for a cluster of dark pixels in the lower SWIR (Band 6). Note: The paper actually states "band 6 (NIR)" which is contradictory. Observation of several test images seemed to indicate that Band 6 (the lower SWIR band) gave both a low return in shadows and a high return in other areas, and so was used.
Based on these automated thresholds, 3 sets of criteria are evaluated and combined:
Type of Mask
|
Threshold Conditions
|
---|---|
Thick Cloud | Band 9 > threshold B9 (i.e. bright) & Band 10 < threshold B10 (i.e. cold/dark) |
Thin Cloud |
Band 1 > threshold B1 (i.e. bright) & Band 9 < threshold B9 (i.e. dark) & Band 10 < threshold B10 (i.e. cold/dark) |
Shadow | Band 6 < Lowest threshold B6 (i.e. dark) |
Using these rules produced decent results. However analysis of several test images highlighted at least two major concerns which needed addressing:
This appeared to increase the accuracy of the Cloud and Shadow identification. However cloud locations were still generally under-classified (errors of omission). So an option to Dilate the cloud locations is provided and should generally be used.
Shadows also continued to be over classified (errors of commission). Consequently an option is provided to sieve (remove) shadows that fall below a user-provided size threshold.
The model is designed for Landsat 8 data with both the Cirrus (Band 9) and Thermal 1 (Band 10) bands being available for processing. It also expects the Coastal Blue (Band 1) and bands 3, 4, 5, and 6 (Green, Red, NIR and SWIR1 respectively). The model could be modified to work on other sensors which provide those wavelengths.
Based on the sample images tested the model as it currently stands provided good, but by no means perfect, results. Further research is required to fine-tune the approach. Feedback from the community is encouraged.
However if the purpose is to identify pixels contaminated with clouds (and their effects, such as shadows) for the purpose of replacing those pixels with data from other sources (such as older, but cloud free imagery), the technique produces usable results.
Band 10 of Landsat 8: From a directory containing all (relevant) TIFF band files of a Landsat 8 scene select the file representing Band 10 (Thermal Infrared (TIRS) 1). Based on the naming pattern of this file, the other relevant band files will be automatically identified and used. For example: "LC80190362017118LGN00_B10.TIF"
Dilate Clouds?: The algorithm tends to under-detect the edges of clouds as they thin. Consequently a boolean (true/false) option is provided. If set to True (1), a filter is applied to the basic cloud locations to grow the initial detections a distance of 2 pixels.
Mask Shadows?: Optionally a second output class can be created for the shadows of the clouds. If the Shadow class is desired, set this option to True (1).
Maximum Shadow Offset Distance (in pixels): The distance away from the Cloud class to consider when looking for shadows, 200 pixels tends to be the maximum in most Landsat 8 scenes. If you can visually review the scene prior to running the model setting an appropriate, shorter distance will increase both the speed and accuracy of the Shadow class.
Shadow Sieve Size (pixels): Over classification of shadows can be minimised by sieving out small detections (which therefore probably aren't associated with clouds). Enter the size (in pixels) below which clumps of detected shadow pixels will be discarded.
Output Mask Filename: The name of the output image file containing the Cloud class (DN 1) and the optional Shadow class (DN 2)