tiling¶
Routines in support of tiled segmentation of very large rasters.
Main entry routine is doTiledShepherdSegmentation(). See that function for further details.
The broad idea is that the Shepherd segmentation algorithm, as implemented in the shepseg module, runs entirely in memory. For larger raster files, it is more efficient to divide the raster into tiles, segment each tile individually, and stitch the results together to create a segmentation of the whole raster.
The main caveats arise from the fact that the initial clustering is performed on a uniform subsample of the whole image, in order to give consistent segment boundaries at tile intersections. This means that for a larger raster, with a greater range of spectra, one may wish to increase the number of clusters in order to allow sufficient initial segments to characterize the variation.
Related to this, one may also consider reducing the percentile
used for automatic estimation of maxSpectralDiff (see
pyshepseg.shepseg.doShepherdSegmentation() and
pyshepseg.shepseg.autoMaxSpectralDiff() for further details).
Because of these caveats, one should be very cautious about segmenting something like a continental-scale image. There is a lot of spectral variation across an area like a whole continent, and it may be unwise to use all the same parameters for the whole area.
- class pyshepseg.tiling.FargateConfig(containerImage=None, taskRoleArn=None, executionRoleArn=None, subnet=None, securityGroups=None, cpu='0.5 vCPU', memory='1GB', cpuArchitecture=None, cloudwatchLogGroup=None, tags=None)¶
Configuration for AWS Fargate (i.e. for use with CONC_FARGATE).
- Parameters:
- containerImagestr
URI of the container image to use for segmentation workers. This container must have pyshepseg installed. It can be the same container as used for the main script, as the entry point is over-written.
- taskRoleArnstr
ARN for an AWS role. This allows your code to use AWS services. This role should include policies such as AmazonS3FullAccess, covering any AWS services the segmentation workers will need.
- executionRoleArnstr
ARN for an AWS role. This allows ECS to use AWS services on your behalf. A good start is a role including AmazonECSTaskExecutionRolePolicy
- subnetstr
Subnet ID string associated with the VPC in which workers will run.
- securityGroupslist of str
Fargate. List of security group IDs associated with the VPC.
- cpustr
Number of CPU units requested for each segmentation worker, expressed in AWS’s own units. For example, ‘0.5 vCPU’, or ‘1024’ (which corresponds to the same thing). Both must be strings. This helps Fargate to select a suitable VM instance type.
- memorystr
Amount of memory requested for each segmentation worker, expressed in MiB, or with a units suffix. For example, ‘1024’ or its equivalent ‘1GB’. This helps Fargate to select a suitable VM instance type.
- cpuArchitecturestr
If given, selects the CPU architecture of the hosts to run worker on. Can be ‘ARM64’, defaults to ‘X86_64’.
- cloudwatchLogGroupstr or None
If not None, the name of a CloudWatch log group. This group should already exist, in the region that the job is running. Logs from workers will be sent to this log group. If None, no CloudWatch logging is done. Intended for tracking problems, rather than operational use.
- tagsdict or None
Optional. If specified this needs to be a dictionary of key/value pairs which will be turned into AWS tags. These will be added to the ECS cluster, task definition and tasks. The keys and values must all be strings. Requires
ecs:TagResourcepermission.
- class pyshepseg.tiling.HistogramAccumulator¶
Accumulator for histogram for the output segmentation image. This allows us to accumulate the histogram incrementally, tile-by-tile. Note that there are simplifying assumptions about being uint32, and the null value being zero, so don’t try to use this for anything else.
- static addTwoHistograms(hist1, hist2)¶
Add the two given histograms together, and return the result.
If one is longer than the other, the shorter one is added to it.
- doHistAccum(arr)¶
Accumulate the histogram with counts from the given arr.
- updateHist(newCounts)¶
Update the current histogram counts. If positive is True, then the counts for positive values are updated, otherwise those for the negative values are updated.
- class pyshepseg.tiling.NetworkDataChannel(inQue=None, segResultCache=None, forceExit=None, exceptionQue=None, segDataDict=None, readSemaphore=None, timings=None, workerBarrier=None, hostname=None, portnum=None, authkey=None)¶
Single class to manage communication with workers running on different machines. Uses the facilities in multiprocessing.managers.
Created from either the server or the client end, the constructor takes
- addressStr()¶
Return a single string encoding the network address of this channel
- shutdown()¶
Shut down the NetworkDataChannel in the right order. This should always be called explicitly by the creator, when it is no longer needed. If left to the garbage collector and/or the interpreter exit code, things are shut down in the wrong order, and the interpreter hangs on exit.
I have tried __del__, also weakref.finalize and atexit.register, and none of them avoid these problems. So, just make sure you call shutdown explicitly, in the process which created the NetworkDataChannel.
The client processes don’t seem to care, presumably because they are not running the server thread. Calling shutdown on the client does nothing.
- exception pyshepseg.tiling.PyShepSegTilingError¶
- class pyshepseg.tiling.SegFargateMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)¶
Run tiled segmentation with concurrency based on AWS Fargate workers.
- checkTaskErrors()¶
Check for errors reported via describe_tasks(). This mechanism seems rather unreliable, particularly when reporting the ‘reason’, but I am doing my best.
- concurrencyType = 'CONC_FARGATE'¶
- getClusterTaskCount()¶
Query the cluster, and return the number of tasks it has. This is the total of running and pending tasks. If the cluster does not exist, return None.
- shutdown()¶
Shut down the workers and data channel
- specificChecks()¶
Initial checks which are specific to the subclass
- startWorkers()¶
Start all segmentation workers as AWS Fargate tasks
- waitClusterTasksFinished()¶
Poll the given cluster until the number of tasks reaches zero
- class pyshepseg.tiling.SegNoConcurrencyMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)¶
Runs tiled segmentation with no concurrency
- checkWorkerExceptions()¶
Dummy. No workers, so no worker exceptions.
- concurrencyType = 'CONC_NONE'¶
- getTileSegmentation(col, row)¶
Read the requested tile of segmentation output from disk
- loadOverlap(overlapCacheKey)¶
Load the requested overlap from disk cache
- overlapCacheFilename(overlapCacheKey)¶
Return filename for given overlapCacheKey
- saveOverlap(overlapCacheKey, overlapData)¶
Save given overlap data to disk file
- segmentAllTiles()¶
Run segmentation for all tiles, and write output image. Just runs all tiles in sequence, and then the recode and stitch together for final output.
- writeTileToTemp(segResult, filename, outDrvr, xpos, ypos, xsize, ysize)¶
Write the segmented tile to a temporary image file
- class pyshepseg.tiling.SegSubprocMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)¶
Run tiled segmentation with concurrency based on subprocess workers. This is used only as a test bed for the NetworkDataChannel and external worker command, and should not be used in real life.
- concurrencyType = 'CONC_SUBPROC'¶
- startWorkers()¶
Start all segmentation workers
- class pyshepseg.tiling.SegThreadsMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)¶
Run tiled segmentation with concurrency based on threads within the main process.
- concurrencyType = 'CONC_THREADS'¶
- setupNetworkComms()¶
Dummy. No network communications required.
- shutdown()¶
Shut down the thread pool
- specificChecks()¶
Checks which are specific to the subclass
- startWorkers()¶
Start worker threads for segmenting tiles
- worker()¶
Worker function. Called for each worker thread.
- class pyshepseg.tiling.SegmentationConcurrencyConfig(concurrencyType='CONC_NONE', numWorkers=0, maxConcurrentReads=20, tileCompletionTimeout=60, segResultCacheSize=30, segResultCacheAddTimeout=300, barrierTimeout=300, fargateCfg=None)¶
Configuration for concurrency in segmentation of multiple tiles.
The segmentation of each tile can be performed concurrently by individual workers. However, the stitching together of the resulting tiles is inherently sequential, and with sufficient workers, this easily becomes the dominant operation. Adding more workers after this simply increases the memory usage for tiles waiting to be stitched (up to segResultCacheSize), without any further speedup.
It is recommended that the user begin with a small number of workers, and inspect the timings (see
pyshepseg.utils.formatTimingRpt()) and increase the number of workers so as to reducestitchwaitfortiletime. When this no longer decreases, there is no further benefit to adding more workers.- Parameters:
- concurrencyTypeOne of {CONC_NONE, CONC_THREADS, CONC_FARGATE, CONC_SUBPROC}
The mechanism used for concurrency
- numWorkersint
Number of segmentation workers
- maxConcurrentReadsint
Maximum number of concurrent reads. Each segmentation worker does its own reading of input data. Since the number of workers can be quite large, this could load the read device too heavily. Given that the read step is a very small component of each worker’s activity, we can limit the number of concurrent reads to this value, without degrading throughput.
- tileCompletionTimeoutint
Timeout (seconds) to wait for completion of each segmentation tile
- segResultCacheSizeint
Maximum number of completed tile segmentations in cache
- segResultCacheAddTimeoutint
Timeout (seconds) to wait to add a completed segmentation into the result cache. If this timeout is reached, it may indicate that there are too many workers.
- barrierTimeoutint
Timeout (seconds) to wait for all workers to start. Used with CONC_FARGATE (and CONC_SUBPROC).
- fargateCfgNone or instance of FargateConfig
Configuration for AWS Fargate (when using CONC_FARGATE)
- class pyshepseg.tiling.SegmentationConcurrencyMgr(infile, outfile, tileSize, overlapSize, minSegmentSize, numClusters, bandNumbers, subsamplePcnt, maxSpectralDiff, imgNullVal, fixedKMeansInit, fourConnected, verbose, simpleTileRecode, outputDriver, creationOptions, spectDistPcntile, kmeansObj, tempfilesDriver, tempfilesCreationOptions, writeHistogram, returnGDALDS, concCfg)¶
Base class for segmentation concurrency
- checkForEmptySegments(hist, overlapSize)¶
Check the final segmentation for any empty segments. These can be problematic later, and should be avoided. Prints a warning message if empty segments are found.
- Parameters:
- histndarray of uint32
Histogram counts for the segmentation raster
- overlapSizeint
Number of pixels to use in overlaps between tiles
- Returns:
- hasEmptySegmentsbool
True if there are segment ID numbers with no pixels
- checkWorkerExceptions()¶
Check if any workers raised exceptions. If so, raise a local exception with the WorkerErrorRecord.
- concurrencyType = 'CONC_NONE'¶
- static crossesMidline(overlap, segLoc, orientation)¶
Check whether the given segment crosses the midline of the given overlap. If it does not, then it will lie entirely within exactly one tile, but if it does cross, then it will need to be re-coded across the midline.
- Parameters:
- overlapshepseg.SegIdType ndarray (overlapNrows, overlapNcols)
Array of segments just for this overlap region
- segLocshepseg.RowColArray
The row/col coordinates (within the overlap array) of the pixels for the segment of interest
- orientation{HORIZONTAL, VERTICAL}
Indicates the orientation of the midline
- Returns:
- crossesbool
True if the given segment crosses the midline
- getTileSegmentation(col, row)¶
Get the segmented tile output data from the local cache, and remove it from the cache
- initialize()¶
Runs initial phase of segmentation. This does not have any concurrency, so is the same for every concurrencyType. The main job is to do the spectral clustering, setting self.kmeansObj
- loadOverlap(overlapCacheKey)¶
Load the requested overlap from cache, and remove it from cache
- static overlapCacheKey(col, row, edge)¶
Return the temporary cache key used for the overlap array
- Parameters:
- col, rowint
Tile column & row numbers
- edge{right’, ‘bottom’}
Indicates from which edge of the given tile the overlap is taken
- Returns:
- cachekeystr
Identifying key for the overlap
- static popFromQue(que)¶
Pop out the next item from the given Queue, returning None if the queue is empty.
WARNING: don’t use this if the queued items can be None
Work out a mapping which recodes segment ID numbers from the tile in tileData. Segments to be recoded are those which are in the overlap with an earlier tile, and which cross the midline of the overlap, which is where the stitchline between the tiles will fall.
Updates recodeDict, which is a dictionary keyed on the existing segment ID numbers, where the value of each entry is the segment ID number from the earlier tile, to be used to recode the segment in the current tile.
overlapA and overlapB are numpy arrays of pixels in the overlap region in question, giving the segment ID numbers in the two tiles. The values in overlapB are from the earlier tile, and those in overlapA are from the current tile.
It is critically important that the overlapping region is either at the top or the left of the current tile, as this means that the row and column numbers of pixels in the overlap arrays match the same pixels in the full tile. This cannot be used for overlaps on the right or bottom of the current tile.
- Parameters:
- tileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)
Tile subset of segment ID image
- overlapA, overlapBshepseg.SegIdType ndarray (overlapNrows, overlapNcols)
Tile overlap subsets of segment ID image
- orientation{HORIZONTAL, VERTICAL}
The orientation parameter defines whether we are dealing with overlap at the top (orientation == HORIZONTAL) or the left (orientation == VERTICAL).
- recodeDictdict
Keys and values are both segment ID numbers. Defines the mapping which recodes segment IDs. Updated in place.
- recodeTile(tileData, maxSegId, tileRow, tileCol, top, bottom, left, right)¶
Adjust the segment ID numbers in the current tile, to make them globally unique (and contiguous) across the whole mosaic.
Make use of the overlapping regions of tiles above and left, to identify shared segments, and recode those to segment IDs from the adjacent tiles (i.e. we change the current tile, not the adjacent ones). Non-shared segments are increased so they are larger than previous values.
- Parameters:
- tileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)
The array of segment IDs for a single image tile
- maxSegIdshepseg.SegIdType
The current maximum segment ID for all preceding tiles.
- tileRow, tileColint
The row/col numbers of this tile, within the whole-mosaic tile numbering scheme. (These are not pixel numbers, but tile grid numbers)
- top, bottom, left, rightint
Pixel coordinates within tile of the non-overlap region of the tile.
- Returns:
- newTileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)
A copy of tileData, with new segment ID numbers.
- static relabelSegments(tileData, recodeDict, maxSegId, top, bottom, left, right)¶
Recode the segment IDs in the given tileData array.
For segment IDs which are keys in recodeDict, these are replaced with the corresponding entry. For all other segment IDs, they are replaced with sequentially increasing ID numbers, starting from one more than the previous maximum segment ID (maxSegId).
A re-coded copy of tileData is created, the original is unchanged.
- Parameters:
- tileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)
Segment IDs of tile
- recodeDictdict
Keys and values are segment ID numbers. Defines mapping for segment relabelling
- maxSegIdshepseg.SegIdType
Maximum segment ID number
- top, bottom, left, rightint
Pixel coordinates within tile of the non-overlap region of the tile.
- Returns:
- newTileDatashepseg.SegIdType ndarray (tileNrows, tileNcols)
Segment IDs of tile, after relabelling
- newMaxSegIdshepseg.SegIdType
New maximum segment ID after relabelling
- saveOverlap(overlapCacheKey, overlapData)¶
Save given overlap data to cache
- segmentAllTiles()¶
Run segmentation for all tiles, and write output image. Runs a number of segmentation workers, each working independently on individual tiles. The tiles to process are sent via a Queue, and the computed results are returned via the SegmentationResultCache.
Stitching the tiles together is run in the main thread, beginning as soon as the first tile is completed.
- setupNetworkComms()¶
Set up a NetworkDataChannel to communicate with the workers outside the main process (e.g. Fargate instances)
- setupOverviews(outDs)¶
Calculate a suitable set of overview levels to use for output segmentation file, and set these up on the given Dataset. Stores the overview levels list as self.overviewLevels
- shutdown()¶
Any explicit shutdown operations
- specificChecks()¶
Checks which are specific to the subclass. Called at the end of __init__().
- startWorkers()¶
Start segmentation workers, if required
- stitchTiles()¶
Recombine individual tiles into a single segment raster output file. Segment ID values are recoded to be unique across the whole raster, and contiguous.
Sets maxSegId and outDs on self.
- static writeHistogramToFile(outBand, histAccum)¶
Write the accumulated histogram to the output segmentation file
- writeOverviews(outBand, arr, xOff, yOff)¶
Calculate and write out the overview layers for the tile given as arr.
- class pyshepseg.tiling.SegmentationResultCache(colRowList, timeout=None, size=10, addTimeout=300)¶
Thread-safe cache for segmentation results, by tile. As each worker completes a tile, it adds it directly to this cache. The writing thread can then pop tiles out of this when required.
- addResult(col, row, segResult)¶
Add a single segResult object to the cache, for the given (col, row)
- waitForTile(col, row)¶
Wait until the nominated tile is ready, and then pop it out of the cache.
- class pyshepseg.tiling.TileInfo¶
Class that holds the pixel coordinates of the tiles within an image.
- addTile(xpos, ypos, xsize, ysize, col, row)¶
Add a new tile to the set
- Parameters:
- xpos, yposint
Pixel column & row of top left pixel of tile
- xsize, ysizeint
Number of pixel columns & rows in tile
- col, rowint
Tile column & row
- getNumTiles()¶
Get total number of tiles in the set
- Returns:
- numTilesint
Total number of tiles
- getTile(col, row)¶
Return the position and shape of the requested tile, as a single tuple of values
- Parameters:
- col, rowint
Tile column & row
- Returns:
- xpos, yposint
Pixel column & row of top left pixel of tile
- xsize, ysizeint
Number of pixel columns & rows in tile
- class pyshepseg.tiling.TiledSegmentationResult¶
Result of tiled segmentation
- Attributes:
- maxSegIdshepseg.SegIdType
Largest segment ID used in final segment image
- numTileRowsint
Number of rows of tiles used
- numTileColsint
Number of columns of tiles used
- subsamplePcntfloat
Percentage of image subsampled for clustering
- maxSpectralDifffloat
The value used to limit segment merging (in all tiles)
- kmeanssklearn.cluster.KMeans
The sklearn KMeans object, after fitting
- hasEmptySegmentsbool
True if the segmentation contains segments with no pixels. This is an error condition, probably indicating that the merging of segments across tiles has produced inconsistent numbering. A warning message will also have been printed.
- timingspyshepseg.timinghooks.Timers
Timings for various key parts of the segmentation process
- outDsgdal.Dataset or None
Open GDAL dataset object to the output file. May be None, see the returnGDALDS parameter to doTiledShepherdSegmentation.
- pyshepseg.tiling.calcHistogramTiled(segfile, maxSegId, writeToRat=True)¶
This function is now deprecated, and will probably be removed in a future version.
Calculate a histogram of the given segment image file.
Note that we need this function because GDAL’s GetHistogram function does not seem to work when attempting a histogram with very large numbers of entries. We want an entry for every segment, rather than an approximate count for a range of segment values, and the number of segments is very large. So, we need to write our own routine.
It works in tiles across the image, so that it can process very large images in a memory-efficient way.
For a raster which can easily fit into memory, a histogram can be calculated directly using
pyshepseg.shepseg.makeSegSize().- Parameters:
- segfilestr or gdal.Dataset
Segmentation image file. Can be either the file name string, or an open Dataset object.
- maxSegIdshepseg.SegIdType
Maximum segment ID used
- writeToRatbool
If True, the completed histogram will be written to the image file’s raster attribute table. If segfile was given as a Dataset object, it would therefore need to have been opened with update access.
- Returns:
- histint ndarray (numSegments+1, )
Histogram counts for each segment (index is segment ID number)
- pyshepseg.tiling.doTiledShepherdSegmentation(infile, outfile, tileSize=4096, overlapSize=1024, minSegmentSize=50, numClusters=60, bandNumbers=None, subsamplePcnt=None, maxSpectralDiff='auto', imgNullVal=None, fixedKMeansInit=False, fourConnected=True, verbose=False, simpleTileRecode=False, outputDriver='KEA', creationOptions=[], spectDistPcntile=50, kmeansObj=None, tempfilesDriver='KEA', tempfilesExt='kea', tempfilesCreationOptions=[], writeHistogram=True, returnGDALDS=False, concurrencyCfg=None)¶
Run the Shepherd segmentation algorithm in a memory-efficient manner, suitable for large raster files. Runs the segmentation on separate (overlapping) tiles across the raster, then stitches these together into a single output segment raster.
The initial spectral clustering is performed on a sub-sample of the whole raster (using fitSpectralClustersWholeFile), to create consistent clusters. These are then used as seeds for all individual tiles. Note that subsamplePcnt is used at this stage, over the whole raster, and is not passed through to shepseg.doShepherdSegmentation() for any further sub-sampling.
Most of the arguments are passed through to shepseg.doShepherdSegmentation, and are described in the docstring for that function.
- Parameters:
- infilestr
Filename of input raster
- outfilestr
Filename of output segmentation raster
- tileSizeint
Desired width & height (in pixels) of the tiles (i.e. desired tiles have shape (tileSize, tileSize). Tiles on the right and bottom edges of the input image may end up slightly larger than tileSize to ensure there are no small tiles.
- overlapSizeint
Number of pixels to overlap tiles. The overlap area is a rectangle, this many pixels wide, which is covered by both adjacent tiles.
- minSegmentSizeint
Minimum number of pixels in a segment
- numClustersint
Number of clusters to request in k-means clustering
- bandNumberslist of int
The GDAL band numbers (i.e. start at 1) of the bands of input raster to use for segmentation
- subsamplePcntfloat or None
See fitSpectralClustersWholeFile()
- maxSpectralDifffloat or str
See shepseg.doShepherdSegmentation()
- spectDistPcntileint
See shepseg.doShepherdSegmentation()
- imgNullValfloat or None
If given, use this as the null value for the input raster. If None, use the value defined in the raster file
- fixedKMeansInitbool
If True, use a fixed set of initial cluster centres for the KMeans clustering. This is good to ensure exactly reproducible results
- fourConnectedbool
If True, use 4-way connectedness, otherwise use 8-way
- verbosebool
If True, print informative messages during processing (to stdout)
- simpleTileRecodebool
If True, use only a simple tile recoding procedure. See stitchTiles() for more detail
- outputDriverstr
The short name of the GDAL format driver to use for output file
- creationOptionslist of str
The GDAL output creation options to match the outputDriver
- kmeansObjsklearn.cluster.KMeans
See shepseg.doShepherdSegmentation() for details
- tempfilesDriverstr
Short name of GDAL driver to use for temporary raster files
- tempfilesExtstr
File extension to use for temporary raster files
- tempfilesCreationOptionslist of str
GDAL creation options to use for temporary raster files
- writeHistogrambool
Deprecated, and ignored. The histogram is always written.
- returnGDALDSbool
Whether to set the outDs member of TiledSegmentationResult when returning. If set, this will be open in update mode.
- concurrencyCfgSegmentationConcurrencyConfig
Configuration for segmentation concurrency. Default is None, meaning no concurrency.
- Returns:
- tileSegResultTiledSegmentationResult
- pyshepseg.tiling.fitSpectralClustersWholeFile(inDs, bandNumbers, numClusters=60, subsamplePcnt=None, imgNullVal=None, fixedKMeansInit=False)¶
Given an open raster Dataset, read a selected sample of pixels and use these to fit a spectral cluster model. Uses GDAL to read the pixels, and shepseg.fitSpectralClusters() to do the fitting.
- Parameters:
- inDsgdal.Dataset
Open GDAL Dataset object for the input raster
- bandNumberslist of int (or None)
List of GDAL band numbers for the bands of interest. If None, then use all bands in the dataset. Note that GDAL band numbers start at 1.
- numClustersint
Desired number of clusters
- subsamplePcntfloat or None
Percentage of pixels to use in fitting. If it is None, then a suitable subsample is calculated such that around one million pixels are sampled. (Note - this would include null pixels, so if the image is dominated by nulls, this would undersample.) No further subsampling is carried out by fitSpectralClusters().
- imgNullValfloat or None
Pixels with this value in the input raster are ignored. If None, the NoDataValue from the raster file is used
- fixedKMeansInitbool
If True, then use a fixed estimate for the initial KMeans cluster centres. See shepseg.fitSpectralClusters() for details.
- Returns:
- kmeansObjsklearn.cluster.KMeans
The fitted KMeans object
- subsamplePcntfloat
The subsample percentage actually used
- imgNullValfloat
The image null value (possibly read from the file)
- pyshepseg.tiling.getImgNullValue(inDs, bandNumbers)¶
Return the null value for the given dataset
- Parameters:
- inDsgdal.Dataset
Open input Dataset
- bandNumberslist of int
GDAL band numbers of interest
- Returns:
- imgNullValfloat or None
Null value from input raster, None if there is no null value
- Raises:
- PyShepSegTilingError
If not all bands have the same null value
- pyshepseg.tiling.getTilesForFile(ds, tileSize, overlapSize)¶
Return a TileInfo object for a given file and input parameters.
- Parameters:
- dsgdal.Dataset
Open GDAL Dataset object for raster to be tiles
- tileSizeint
Size of tiles, in pixels. Individual tiles may end up being larger in either direction, when they meet the edge of the raster, to ensure we do not use very small tiles
- overlapSizeint
Number of pixels by which tiles will overlap
- Returns:
- tileInfoTileInfo
TileInfo object detailing the sizes and positions of all tiles across the raster
- pyshepseg.tiling.readSubsampledImageBand(bandObj, subsampleProp)¶
Read in a sub-sampled copy of the whole of the given band.
Note that one can, in principle, do this directly using GDAL. However, if overview layers are present in the file, it will use these, and so is dependent on how these were created. Since these are often created just for display purposes, past experience has shown that they are not always to be trusted as data, so we have chosen to always go directly to the full resolution image.
- Parameters:
- bandObjgdal.Band
An open Band object for input
- subsamplePropfloat
The proportion by which to sub-sample (i.e. a value between zero and 1, applied to rows and columns separately)
- Returns:
- imgSub<dtype> ndarray (nRowsSub, nColsSub)
A numpy array of the image subsample, equivalent to calling gdal.Band.ReadAsArray()
- pyshepseg.tiling.selectConcurrencyClass(concurrencyType, baseClass)¶
Choose the sub-class corresponding to the given concurrencyType
- pyshepseg.tiling.updateCounts(tileData, hist)¶
Fast function to increment counts for each segment ID in the given tile