pyshepseg¶

Introduction¶

Python implementation of image segmentation algorithm of Shepherd et al (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing 11(6).

This package is a tool for Python programmers to implement the segmentation algorithm. It is not a stand-alone solution for people with no Python experience.

We thank the authors of the paper for their algorithm. This implementation was created independently of them, and they are in no way to blame for any mistakes we have made.

Downloads¶

From GitHub. Release notes by version can be viewed in Pyshepseg Release Notes.

Dependencies¶

The package requires the scikit-learn package, and the numba package. These need to be installed before this package will run. These are installed automatically when using the conda-forge pyshepseg package (see below), but will need to be available when building from source.

Also recommended is the GDAL package for reading and writing raster file formats. It is not required by the core segmentation algorithm, but is highly recommended as a portable way to interface to a large range of raster formats. It is required by the tiling module to support segmentation of large rasters. This is installed when using the conda-forge pyshepseg package.

Installation¶

This package can be installed from conda-forge and is the recommended approach. Once you have installed Conda run the following commands to install pyshepseg into a new environment:

conda config --add channels conda-forge
conda config --set channel_priority strict
conda create -n mysegenv pyshepseg
conda activate mysegenv

Alternatively, this package can be installed directly from the source, using the setup.py script (see required dependencies above).

The source code is available from https://github.com/ubarsc/pyshepseg. Either unpack the latest release bundle from https://github.com/ubarsc/pyshepseg/releases, or clone the repository.
Run the setup.py script. This is best done by using pip as a wrapper around it, as follows. Note that pip has a --prefix option to allow installation in non-standard locations.

pip install .

Usage¶

from pyshepseg import shepseg

# Read in a multi-band image as a single array, img,
# of shape (nBands, nRows, nCols).
# Ensure that any null pixels are all set to a known
# null value in all bands. Failure to correctly identify
# null pixels can result in a poorer quality segmentation.

segRes = shepseg.doShepherdSegmentation(img, imgNullVal=nullVal)

The segimg attribute of the segRes object is an array of segment ID numbers, of shape (nRows, nCols).

See the help in the pyshepseg.shepseg module and pyshepseg.shepseg.doShepherdSegmentation() function for further details and tips.

Large Rasters¶

The basic usage outlined above operates entirely in-memory. For very large rasters, this can be infeasible. A tiled implementation is provided in the pyshepseg.tiling module, which divides a large raster into fixed-size tiles, segments each tile in-memory, and stitched the results together to create a single segment image. The tiles are stitched such that segments are matched and merged across tile boundaries, so the result is seamless.

This technique should be used with caution. See the docstring for the pyshepseg.tiling module and the pyshepseg.tiling.doTiledShepherdSegmentation() function for further discussion of usage and caveats.

Once a segmentation has been completed, statistics can be gathered per segment on large rasters using the functions defined in the pyshepseg.tilingstats module.

See the Concurrency section below for possible speed-ups of both these operations.

Command Line Scripts¶

A few basic command line scripts are also provided as entry points. Their main purpose is as test scripts during development, but they also serve as examples of how to write scripts which use the package. In addition, they can also be used directly for simple segmentation tasks.

The pyshepseg_run_seg entry point is a wrapper around the basic in-memory usage.

The pyshepseg_tiling entry point is a wrapper around the tiled segmentation for large rasters.

The pyshepseg_subset entry point uses the pyshepseg.subset.subsetImage() function to subset a segmentation image, re-labelling the segments to contiguous segment ID numbers.

The pyshepseg_variograms entry point uses the pyshepseg.tilingstats.calcPerSegmentSpatialStatsTiled() function to calculate the given number of variograms.

The pyshepseg_runtests entry point runs some tests on packages data and can be used to confirm that the behaviour of this package is as expected.

Use the --help option on each script for usage details.

Per-segment Statistics¶

It can be useful to calculate statistics of the pixels from the original input imagery on a per-segment basis. For example, for all the pixels in a single segment, one might calculate the mean value of one or more of the bands from the original imagery.

A routine is provided to do this in a memory-efficient way, given the original image and the completed segmentation image. A standard set of statistics are available, including mean, standard deviation, and arbitrary percentile values, amongst others. The selected per-segment statistics are written to the segment image file as columns of a raster attribute table (RAT).

For details, see the help on the pyshepseg.tilingstats.calcPerSegmentStatsTiled() and pyshepseg.tilingstats.calcPerSegmentSpatialStatsTiled() function.

Segment Colour Tables¶

The segment image contains a large number of individual segment values, and can be difficult to view in simple greyscale colouring. To improve this, two routines are provided in the pyshepseg.utils module which will attach a colour table.

The simplest routine is pyshepseg.utils.writeRandomColourTable(), which attaches a randomly-generated colour table, so that each segment is assigned a randomly chosen colour, which merely serves to distinguish it from the surrounding segments. See its help for details.

More sophisticated and more useful is the function pyshepseg.utils.writeColorTableFromRatColumns(), which uses previously calculated columns in the raster attribute table (RAT) to create a colour table which approximates the original imagery. See its help for details, and the preceding section on how to create suitable RAT columns.

Subsetting¶

For large segmentations sometimes it is necessary to subset the result into a smaller image so it is easier to work with, but have contiguous segment ids and a link back to the original segments. For doing this, see the pyshepseg.subset module and the pyshepseg.subset.subsetImage() function.

Concurrency¶

There is some support for performing segmentation in parallel using threads, subprocesses or AWS Fargate. See the docstrings for pyshepseg.tiling.doTiledShepherdSegmentation() and pyshepseg.tiling.SegmentationConcurrencyConfig for more information.

In addition, read concurrency is available when doing statistics from files with high latency (for instance, AWS S3). Both the routines for calculating per-segment statistics (calcPerSegmentStatsTiled and calcPerSegmentSpatialStatsTiled) accept an optional readCfg argument. This is an instance of pyshepseg.tilingstats.StatsReadConfig, and allows the user to specify a number of read workers. Each worker runs in a separate thread and reads ahead, placing tiles of raster data into a buffer ready for the statistics calculation to use.