tiling

Routines in support of tiled segmentation of very large rasters.

Main entry routine is doTiledShepherdSegmentation(). See that function for further details.

The broad idea is that the Shepherd segmentation algorithm, as implemented in the shepseg module, runs entirely in memory. For larger raster files, it is more efficient to divide the raster into tiles, segment each tile individually, and stitch the results together to create a segmentation of the whole raster.

The main caveats arise from the fact that the initial clustering is performed on a uniform subsample of the whole image, in order to give consistent segment boundaries at tile intersections. This means that for a larger raster, with a greater range of spectra, one may wish to increase the number of clusters in order to allow sufficient initial segments to characterize the variation.

Related to this, one may also consider reducing the percentile used for automatic estimation of maxSpectralDiff (see pyshepseg.shepseg.doShepherdSegmentation() and pyshepseg.shepseg.autoMaxSpectralDiff() for further details).

Because of these caveats, one should be very cautious about segmenting something like a continental-scale image. There is a lot of spectral variation across an area like a whole continent, and it may be unwise to use all the same parameters for the whole area.

class pyshepseg.tiling.FargateConfig(containerImage=None, taskRoleArn=None, executionRoleArn=None, subnet=None, securityGroups=None, cpu='0.5 vCPU', memory='1GB', cpuArchitecture=None, cloudwatchLogGroup=None, tags=None)

Configuration for AWS Fargate (i.e. for use with CONC_FARGATE).

Parameters:
containerImagestr

URI of the container image to use for segmentation workers. This container must have pyshepseg installed. It can be the same container as used for the main script, as the entry point is over-written.

taskRoleArnstr

ARN for an AWS role. This allows your code to use AWS services. This role should include policies such as AmazonS3FullAccess, covering any AWS services the segmentation workers will need.

executionRoleArnstr

ARN for an AWS role. This allows ECS to use AWS services on your behalf. A good start is a role including AmazonECSTaskExecutionRolePolicy

subnetstr

Subnet ID string associated with the VPC in which workers will run.

securityGroupslist of str

Fargate. List of security group IDs associated with the VPC.

cpustr

Number of CPU units requested for each segmentation worker, expressed in AWS’s own units. For example, ‘0.5 vCPU’, or ‘1024’ (which corresponds to the same thing). Both must be strings. This helps Fargate to select a suitable VM instance type.

memorystr

Amount of memory requested for each segmentation worker, expressed in MiB, or with a units suffix. For example, ‘1024’ or its equivalent ‘1GB’. This helps Fargate to select a suitable VM instance type.

cpuArchitecturestr

If given, selects the CPU architecture of the hosts to run worker on. Can be ‘ARM64’, defaults to ‘X86_64’.

cloudwatchLogGroupstr or None

If not None, the name of a CloudWatch log group. This group should already exist, in the region that the job is running. Logs from workers will be sent to this log group. If None, no CloudWatch logging is done. Intended for tracking problems, rather than operational use.

tagsdict or None

Optional. If specified this needs to be a dictionary of key/value pairs which will be turned into AWS tags. These will be added to the ECS cluster, task definition and tasks. The keys and values must all be strings. Requires ecs:TagResource permission.

class pyshepseg.tiling.HistogramAccumulator

Accumulator for histogram for the output segmentation image. This allows us to accumulate the histogram incrementally, tile-by-tile. Note that there are simplifying assumptions about being uint32, and the null value being zero, so don’t try to use this for anything else.

static addTwoHistograms(hist1, hist2)

Add the two given histograms together, and return the result.

If one is longer than the other, the shorter one is added to it.

doHistAccum(arr)

Accumulate the histogram with counts from the given arr.

updateHist(newCounts)

Update the current histogram counts. If positive is True, then the counts for positive values are updated, otherwise those for the negative values are updated.

class pyshepseg.tiling.NetworkDataChannel(inQue=None, segResultCache=None, forceExit=None, exceptionQue=None, segDataDict=None, readSemaphore=None, timings=None, workerBarrier=None, hostname=None, portnum=None, authkey=None)

Single class to manage communication with workers running on different machines. Uses the facilities in multiprocessing.managers.

Created from either the server or the client end, the constructor takes

addressStr()

Return a single string encoding the network address of this channel

shutdown()

Shut down the NetworkDataChannel in the right order. This should always be called explicitly by the creator, when it is no longer needed. If left to the garbage collector and/or the interpreter exit code, things are shut down in the wrong order, and the interpreter hangs on exit.

I have tried __del__, also weakref.finalize and atexit.register, and none of them avoid these problems. So, just make sure you call shutdown explicitly, in the process which created the NetworkDataChannel.

The client processes don’t seem to care, presumably because they are not running the server thread. Calling shutdown on the client does nothing.

exception pyshepseg.tiling.PyShepSegTilingError
class pyshepseg.tiling.SegFargateMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)

Run tiled segmentation with concurrency based on AWS Fargate workers.

checkTaskErrors()

Check for errors reported via describe_tasks(). This mechanism seems rather unreliable, particularly when reporting the ‘reason’, but I am doing my best.

concurrencyType = 'CONC_FARGATE'
getClusterTaskCount()

Query the cluster, and return the number of tasks it has. This is the total of running and pending tasks. If the cluster does not exist, return None.

shutdown()

Shut down the workers and data channel

specificChecks()

Initial checks which are specific to the subclass

startWorkers()

Start all segmentation workers as AWS Fargate tasks

waitClusterTasksFinished()

Poll the given cluster until the number of tasks reaches zero

class pyshepseg.tiling.SegNoConcurrencyMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)

Runs tiled segmentation with no concurrency

checkWorkerExceptions()

Dummy. No workers, so no worker exceptions.

concurrencyType = 'CONC_NONE'
getTileSegmentation(col, row)

Read the requested tile of segmentation output from disk

loadOverlap(overlapCacheKey)

Load the requested overlap from disk cache

overlapCacheFilename(overlapCacheKey)

Return filename for given overlapCacheKey

saveOverlap(overlapCacheKey, overlapData)

Save given overlap data to disk file

segmentAllTiles()

Run segmentation for all tiles, and write output image. Just runs all tiles in sequence, and then the recode and stitch together for final output.

writeTileToTemp(segResult, filename, outDrvr, xpos, ypos, xsize, ysize)

Write the segmented tile to a temporary image file

class pyshepseg.tiling.SegSubprocMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)

Run tiled segmentation with concurrency based on subprocess workers. This is used only as a test bed for the NetworkDataChannel and external worker command, and should not be used in real life.

concurrencyType = 'CONC_SUBPROC'
startWorkers()

Start all segmentation workers

class pyshepseg.tiling.SegThreadsMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)

Run tiled segmentation with concurrency based on threads within the main process.

concurrencyType = 'CONC_THREADS'
setupNetworkComms()

Dummy. No network communications required.

shutdown()

Shut down the thread pool

specificChecks()

Checks which are specific to the subclass

startWorkers()

Start worker threads for segmenting tiles

worker()

Worker function. Called for each worker thread.

class pyshepseg.tiling.SegmentationConcurrencyConfig(concurrencyType='CONC_NONE', numWorkers=0, maxConcurrentReads=20, tileCompletionTimeout=60, segResultCacheSize=30, segResultCacheAddTimeout=300, barrierTimeout=300, fargateCfg=None)

Configuration for concurrency in segmentation of multiple tiles.

The segmentation of each tile can be performed concurrently by individual workers. However, the stitching together of the resulting tiles is inherently sequential, and with sufficient workers, this easily becomes the dominant operation. Adding more workers after this simply increases the memory usage for tiles waiting to be stitched (up to segResultCacheSize), without any further speedup.

It is recommended that the user begin with a small number of workers, and inspect the timings (see pyshepseg.utils.formatTimingRpt()) and increase the number of workers so as to reduce stitchwaitfortile time. When this no longer decreases, there is no further benefit to adding more workers.

Parameters:
concurrencyTypeOne of {CONC_NONE, CONC_THREADS, CONC_FARGATE, CONC_SUBPROC}

The mechanism used for concurrency

numWorkersint

Number of segmentation workers

maxConcurrentReadsint

Maximum number of concurrent reads. Each segmentation worker does its own reading of input data. Since the number of workers can be quite large, this could load the read device too heavily. Given that the read step is a very small component of each worker’s activity, we can limit the number of concurrent reads to this value, without degrading throughput.

tileCompletionTimeoutint

Timeout (seconds) to wait for completion of each segmentation tile

segResultCacheSizeint

Maximum number of completed tile segmentations in cache

segResultCacheAddTimeoutint

Timeout (seconds) to wait to add a completed segmentation into the result cache. If this timeout is reached, it may indicate that there are too many workers.

barrierTimeoutint

Timeout (seconds) to wait for all workers to start. Used with CONC_FARGATE (and CONC_SUBPROC).

fargateCfgNone or instance of FargateConfig

Configuration for AWS Fargate (when using CONC_FARGATE)

class pyshepseg.tiling.SegmentationConcurrencyMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)

Base class for segmentation concurrency

checkForEmptySegments(hist, overlapSize)

Check the final segmentation for any empty segments. These can be problematic later, and should be avoided. Prints a warning message if empty segments are found.

Parameters:
histndarray of uint32

Histogram counts for the segmentation raster

overlapSizeint

Number of pixels to use in overlaps between tiles

Returns:
hasEmptySegmentsbool

True if there are segment ID numbers with no pixels

checkWorkerExceptions()

Check if any workers raised exceptions. If so, raise a local exception with the WorkerErrorRecord.

concurrencyType = 'CONC_NONE'
static crossesMidline(overlap, segLoc, orientation)

Check whether the given segment crosses the midline of the given overlap. If it does not, then it will lie entirely within exactly one tile, but if it does cross, then it will need to be re-coded across the midline.

Parameters:
overlapshepseg.SegIdType ndarray (overlapNrows, overlapNcols)

Array of segments just for this overlap region

segLocshepseg.RowColArray

The row/col coordinates (within the overlap array) of the pixels for the segment of interest

orientation{HORIZONTAL, VERTICAL}

Indicates the orientation of the midline

Returns:
crossesbool

True if the given segment crosses the midline

getTileSegmentation(col, row)

Get the segmented tile output data from the local cache, and remove it from the cache

initialize()

Runs initial phase of segmentation. This does not have any concurrency, so is the same for every concurrencyType. The main job is to do the spectral clustering, setting self.kmeansObj

loadOverlap(overlapCacheKey)

Load the requested overlap from cache, and remove it from cache

static overlapCacheKey(col, row, edge)

Return the temporary cache key used for the overlap array

Parameters:
col, rowint

Tile column & row numbers

edge{right’, ‘bottom’}

Indicates from which edge of the given tile the overlap is taken

Returns:
cachekeystr

Identifying key for the overlap

static popFromQue(que)

Pop out the next item from the given Queue, returning None if the queue is empty.

WARNING: don’t use this if the queued items can be None

static recodeSharedSegments(tileData, overlapA, overlapB, orientation, recodeDict)

Work out a mapping which recodes segment ID numbers from the tile in tileData. Segments to be recoded are those which are in the overlap with an earlier tile, and which cross the midline of the overlap, which is where the stitchline between the tiles will fall.

Updates recodeDict, which is a dictionary keyed on the existing segment ID numbers, where the value of each entry is the segment ID number from the earlier tile, to be used to recode the segment in the current tile.

overlapA and overlapB are numpy arrays of pixels in the overlap region in question, giving the segment ID numbers in the two tiles. The values in overlapB are from the earlier tile, and those in overlapA are from the current tile.

It is critically important that the overlapping region is either at the top or the left of the current tile, as this means that the row and column numbers of pixels in the overlap arrays match the same pixels in the full tile. This cannot be used for overlaps on the right or bottom of the current tile.

Parameters:
tileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)

Tile subset of segment ID image

overlapA, overlapBshepseg.SegIdType ndarray (overlapNrows, overlapNcols)

Tile overlap subsets of segment ID image

orientation{HORIZONTAL, VERTICAL}

The orientation parameter defines whether we are dealing with overlap at the top (orientation == HORIZONTAL) or the left (orientation == VERTICAL).

recodeDictdict

Keys and values are both segment ID numbers. Defines the mapping which recodes segment IDs. Updated in place.

recodeTile(tileData, maxSegId, tileRow, tileCol, top, bottom, left, right)

Adjust the segment ID numbers in the current tile, to make them globally unique (and contiguous) across the whole mosaic.

Make use of the overlapping regions of tiles above and left, to identify shared segments, and recode those to segment IDs from the adjacent tiles (i.e. we change the current tile, not the adjacent ones). Non-shared segments are increased so they are larger than previous values.

Parameters:
tileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)

The array of segment IDs for a single image tile

maxSegIdshepseg.SegIdType

The current maximum segment ID for all preceding tiles.

tileRow, tileColint

The row/col numbers of this tile, within the whole-mosaic tile numbering scheme. (These are not pixel numbers, but tile grid numbers)

top, bottom, left, rightint

Pixel coordinates within tile of the non-overlap region of the tile.

Returns:
newTileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)

A copy of tileData, with new segment ID numbers.

static relabelSegments(tileData, recodeDict, maxSegId, top, bottom, left, right)

Recode the segment IDs in the given tileData array.

For segment IDs which are keys in recodeDict, these are replaced with the corresponding entry. For all other segment IDs, they are replaced with sequentially increasing ID numbers, starting from one more than the previous maximum segment ID (maxSegId).

A re-coded copy of tileData is created, the original is unchanged.

Parameters:
tileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)

Segment IDs of tile

recodeDictdict

Keys and values are segment ID numbers. Defines mapping for segment relabelling

maxSegIdshepseg.SegIdType

Maximum segment ID number

top, bottom, left, rightint

Pixel coordinates within tile of the non-overlap region of the tile.

Returns:
newTileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)

Segment IDs of tile, after relabelling

newMaxSegIdshepseg.SegIdType

New maximum segment ID after relabelling

saveOverlap(overlapCacheKey, overlapData)

Save given overlap data to cache

segmentAllTiles()

Run segmentation for all tiles, and write output image. Runs a number of segmentation workers, each working independently on individual tiles. The tiles to process are sent via a Queue, and the computed results are returned via the SegmentationResultCache.

Stitching the tiles together is run in the main thread, beginning as soon as the first tile is completed.

setupNetworkComms()

Set up a NetworkDataChannel to communicate with the workers outside the main process (e.g. Fargate instances)

setupOverviews(outDs)

Calculate a suitable set of overview levels to use for output segmentation file, and set these up on the given Dataset. Stores the overview levels list as self.overviewLevels

shutdown()

Any explicit shutdown operations

specificChecks()

Checks which are specific to the subclass. Called at the end of __init__().

startWorkers()

Start segmentation workers, if required

stitchTiles()

Recombine individual tiles into a single segment raster output file. Segment ID values are recoded to be unique across the whole raster, and contiguous.

Sets maxSegId and outDs on self.

static writeHistogramToFile(outBand, histAccum)

Write the accumulated histogram to the output segmentation file

writeOverviews(outBand, arr, xOff, yOff)

Calculate and write out the overview layers for the tile given as arr.

class pyshepseg.tiling.SegmentationResultCache(colRowList, timeout=None, size=10, addTimeout=300)

Thread-safe cache for segmentation results, by tile. As each worker completes a tile, it adds it directly to this cache. The writing thread can then pop tiles out of this when required.

addResult(col, row, segResult)

Add a single segResult object to the cache, for the given (col, row)

waitForTile(col, row)

Wait until the nominated tile is ready, and then pop it out of the cache.

class pyshepseg.tiling.TileInfo

Class that holds the pixel coordinates of the tiles within an image.

addTile(xpos, ypos, xsize, ysize, col, row)

Add a new tile to the set

Parameters:
xpos, yposint

Pixel column & row of top left pixel of tile

xsize, ysizeint

Number of pixel columns & rows in tile

col, rowint

Tile column & row

getNumTiles()

Get total number of tiles in the set

Returns:
numTilesint

Total number of tiles

getTile(col, row)

Return the position and shape of the requested tile, as a single tuple of values

Parameters:
col, rowint

Tile column & row

Returns:
xpos, yposint

Pixel column & row of top left pixel of tile

xsize, ysizeint

Number of pixel columns & rows in tile

class pyshepseg.tiling.TiledSegmentationResult

Result of tiled segmentation

Attributes:
maxSegIdshepseg.SegIdType

Largest segment ID used in final segment image

numTileRowsint

Number of rows of tiles used

numTileColsint

Number of columns of tiles used

subsamplePcntfloat

Percentage of image subsampled for clustering

maxSpectralDifffloat

The value used to limit segment merging (in all tiles)

kmeanssklearn.cluster.KMeans

The sklearn KMeans object, after fitting

hasEmptySegmentsbool

True if the segmentation contains segments with no pixels. This is an error condition, probably indicating that the merging of segments across tiles has produced inconsistent numbering. A warning message will also have been printed.

timingspyshepseg.timinghooks.Timers

Timings for various key parts of the segmentation process

outDsgdal.Dataset or None

Open GDAL dataset object to the output file. May be None, see the returnGDALDS parameter to doTiledShepherdSegmentation.

pyshepseg.tiling.calcHistogramTiled(segfile, maxSegId, writeToRat=True)

This function is now deprecated, and will probably be removed in a future version.

Calculate a histogram of the given segment image file.

Note that we need this function because GDAL’s GetHistogram function does not seem to work when attempting a histogram with very large numbers of entries. We want an entry for every segment, rather than an approximate count for a range of segment values, and the number of segments is very large. So, we need to write our own routine.

It works in tiles across the image, so that it can process very large images in a memory-efficient way.

For a raster which can easily fit into memory, a histogram can be calculated directly using pyshepseg.shepseg.makeSegSize().

Parameters:
segfilestr or gdal.Dataset

Segmentation image file. Can be either the file name string, or an open Dataset object.

maxSegIdshepseg.SegIdType

Maximum segment ID used

writeToRatbool

If True, the completed histogram will be written to the image file’s raster attribute table. If segfile was given as a Dataset object, it would therefore need to have been opened with update access.

Returns:
histint ndarray (numSegments+1, )

Histogram counts for each segment (index is segment ID number)

pyshepseg.tiling.doTiledShepherdSegmentation(infile, outfile, tileSize=4096, overlapSize=1024, minSegmentSize=50, numClusters=60, bandNumbers=None, subsamplePcnt=None, maxSpectralDiff='auto', imgNullVal=None, fixedKMeansInit=False, fourConnected=True, verbose=False, simpleTileRecode=False, outputDriver='KEA', creationOptions=[], spectDistPcntile=50, kmeansObj=None, tempfilesDriver='KEA', tempfilesExt='kea', tempfilesCreationOptions=[], writeHistogram=True, returnGDALDS=False, concurrencyCfg=None)

Run the Shepherd segmentation algorithm in a memory-efficient manner, suitable for large raster files. Runs the segmentation on separate (overlapping) tiles across the raster, then stitches these together into a single output segment raster.

The initial spectral clustering is performed on a sub-sample of the whole raster (using fitSpectralClustersWholeFile), to create consistent clusters. These are then used as seeds for all individual tiles. Note that subsamplePcnt is used at this stage, over the whole raster, and is not passed through to shepseg.doShepherdSegmentation() for any further sub-sampling.

Most of the arguments are passed through to shepseg.doShepherdSegmentation, and are described in the docstring for that function.

Parameters:
infilestr

Filename of input raster

outfilestr

Filename of output segmentation raster

tileSizeint

Desired width & height (in pixels) of the tiles (i.e. desired tiles have shape (tileSize, tileSize). Tiles on the right and bottom edges of the input image may end up slightly larger than tileSize to ensure there are no small tiles.

overlapSizeint

Number of pixels to overlap tiles. The overlap area is a rectangle, this many pixels wide, which is covered by both adjacent tiles.

minSegmentSizeint

Minimum number of pixels in a segment

numClustersint

Number of clusters to request in k-means clustering

bandNumberslist of int

The GDAL band numbers (i.e. start at 1) of the bands of input raster to use for segmentation

subsamplePcntfloat or None

See fitSpectralClustersWholeFile()

maxSpectralDifffloat or str

See shepseg.doShepherdSegmentation()

spectDistPcntileint

See shepseg.doShepherdSegmentation()

imgNullValfloat or None

If given, use this as the null value for the input raster. If None, use the value defined in the raster file

fixedKMeansInitbool

If True, use a fixed set of initial cluster centres for the KMeans clustering. This is good to ensure exactly reproducible results

fourConnectedbool

If True, use 4-way connectedness, otherwise use 8-way

verbosebool

If True, print informative messages during processing (to stdout)

simpleTileRecodebool

If True, use only a simple tile recoding procedure. See stitchTiles() for more detail

outputDriverstr

The short name of the GDAL format driver to use for output file

creationOptionslist of str

The GDAL output creation options to match the outputDriver

kmeansObjsklearn.cluster.KMeans

See shepseg.doShepherdSegmentation() for details

tempfilesDriverstr

Short name of GDAL driver to use for temporary raster files

tempfilesExtstr

File extension to use for temporary raster files

tempfilesCreationOptionslist of str

GDAL creation options to use for temporary raster files

writeHistogrambool

Deprecated, and ignored. The histogram is always written.

returnGDALDSbool

Whether to set the outDs member of TiledSegmentationResult when returning. If set, this will be open in update mode.

concurrencyCfgSegmentationConcurrencyConfig

Configuration for segmentation concurrency. Default is None, meaning no concurrency.

Returns:
tileSegResultTiledSegmentationResult
pyshepseg.tiling.fitSpectralClustersWholeFile(inDs, bandNumbers, numClusters=60, subsamplePcnt=None, imgNullVal=None, fixedKMeansInit=False)

Given an open raster Dataset, read a selected sample of pixels and use these to fit a spectral cluster model. Uses GDAL to read the pixels, and shepseg.fitSpectralClusters() to do the fitting.

Parameters:
inDsgdal.Dataset

Open GDAL Dataset object for the input raster

bandNumberslist of int (or None)

List of GDAL band numbers for the bands of interest. If None, then use all bands in the dataset. Note that GDAL band numbers start at 1.

numClustersint

Desired number of clusters

subsamplePcntfloat or None

Percentage of pixels to use in fitting. If it is None, then a suitable subsample is calculated such that around one million pixels are sampled. (Note - this would include null pixels, so if the image is dominated by nulls, this would undersample.) No further subsampling is carried out by fitSpectralClusters().

imgNullValfloat or None

Pixels with this value in the input raster are ignored. If None, the NoDataValue from the raster file is used

fixedKMeansInitbool

If True, then use a fixed estimate for the initial KMeans cluster centres. See shepseg.fitSpectralClusters() for details.

Returns:
kmeansObjsklearn.cluster.KMeans

The fitted KMeans object

subsamplePcntfloat

The subsample percentage actually used

imgNullValfloat

The image null value (possibly read from the file)

pyshepseg.tiling.getImgNullValue(inDs, bandNumbers)

Return the null value for the given dataset

Parameters:
inDsgdal.Dataset

Open input Dataset

bandNumberslist of int

GDAL band numbers of interest

Returns:
imgNullValfloat or None

Null value from input raster, None if there is no null value

Raises:
PyShepSegTilingError

If not all bands have the same null value

pyshepseg.tiling.getTilesForFile(ds, tileSize, overlapSize)

Return a TileInfo object for a given file and input parameters.

Parameters:
dsgdal.Dataset

Open GDAL Dataset object for raster to be tiles

tileSizeint

Size of tiles, in pixels. Individual tiles may end up being larger in either direction, when they meet the edge of the raster, to ensure we do not use very small tiles

overlapSizeint

Number of pixels by which tiles will overlap

Returns:
tileInfoTileInfo

TileInfo object detailing the sizes and positions of all tiles across the raster

pyshepseg.tiling.readSubsampledImageBand(bandObj, subsampleProp)

Read in a sub-sampled copy of the whole of the given band.

Note that one can, in principle, do this directly using GDAL. However, if overview layers are present in the file, it will use these, and so is dependent on how these were created. Since these are often created just for display purposes, past experience has shown that they are not always to be trusted as data, so we have chosen to always go directly to the full resolution image.

Parameters:
bandObjgdal.Band

An open Band object for input

subsamplePropfloat

The proportion by which to sub-sample (i.e. a value between zero and 1, applied to rows and columns separately)

Returns:
imgSub<dtype> ndarray (nRowsSub, nColsSub)

A numpy array of the image subsample, equivalent to calling gdal.Band.ReadAsArray()

pyshepseg.tiling.selectConcurrencyClass(concurrencyType, baseClass)

Choose the sub-class corresponding to the given concurrencyType

pyshepseg.tiling.updateCounts(tileData, hist)

Fast function to increment counts for each segment ID in the given tile