tilingstats¶

Routines to support calculation of statistics on large rasters. The statistics are calculated per segment so require two input rasters - the segmentation output, and another image to gather statistics from, for each segment in the first image.

These are optimised to work on one tile of the images at a time so should be efficient in terms of memory use.

The main functions in this module are calcPerSegmentStatsTiled() and calcPerSegmentSpatialStatsTiled().

class pyshepseg.tilingstats.OpenRatContainer(ds=None, band=None, rz=None)¶

Hold all data structures for an open RAT, hiding the distinction between GDAL-based and Zarr-based RATs. The constructor takes either a single RatZarr object rz, or a pair of GDAL objects ds & band.

Parameters:

dsgdal.Dataset or None: Open Dataset object
bandgdal.Band or None: Open band on ds
rzratzarr.RatZarr or None: Open RatZarr object

CreateColumn(colName, colType)¶: Create the column with the given name and type. For GDAL RAT, always use GFU_Generic usage

SetRowCount(rowCount)¶: Set the row count for the table

WriteArray(colArr, colNumber, start)¶

Intended to look like GDAL’s RAT WriteArray function.

When the output RAT is a GDAL file, parameters are passed straight through. When using a RatZarr file, the colNdx is translated to a column name and the data written to that column.

checkColType(colName, colType)¶: Check that the given column (pre-existing) is compatible with the given GDAL column type (gdal.GFT_*). Raise exception if not.

close()¶: Close the open file handles

colExists(colName)¶: Check if the named column already exists in the RAT

getColNdx(colName)¶: Get the column index for the given name. The index is only meaningful for GDAL-based RAT, but we fake it for Zarr-based, so we can continue to use it as the basic identifier in the numba-compiled sections of code (i.e. the statsSelection_fast structure).

exception pyshepseg.tilingstats.PyShepSegStatsError¶

class pyshepseg.tilingstats.RatPage(numIntCols, numFloatCols, startSegId, numSeg)¶

Hold a single page of the paged RAT

getIndexInPage(segId)¶: Return the index for the given segment, within the current page.

getRatVal(segId, colType, colArrayNdx)¶: Get the RAT entry for the given segment.

getSegmentComplete(segId)¶: Returns True if the segment has been flagged as complete

pageComplete()¶: Return True if the current page has been completed

setRatVal(segId, colType, colArrayNdx, val)¶: Set the RAT entry for the given segment, to be the given value.

setSegmentComplete(segId)¶: Flag that the given segment has had all stats calculated.

class pyshepseg.tilingstats.SegPoint(x, y, val)¶

Class for handling a given data point and it’s location in pixel space (within the whole image, not a tile).

Used so that all the data for a given segment can be collected together even if the segment straddles a tile.

class pyshepseg.tilingstats.SegmentStats(segmentHistDict, missingStatsValue)¶

Manage statistics for a single segment

getPercentile(percentile)¶: Return the pixel value for the given percentile, e.g. getPercentile(50) would return the median value of the segment

getStat(statID, param)¶: Return the requested statistic

class pyshepseg.tilingstats.StatsReadConfig(numWorkers=0, bufferInsertTimeout=60, bufferPopTimeout=60)¶

Set up configuration information for running read workers

Parameters:

numWorkersint: Number of read workers to use. If zero, reading of each tile is done sequentially with processing.
bufferInsertTimeoutint: Number of seconds to wait to insert tile data into read buffer. Only relevant if using read workers.
bufferPopTimeoutint: Number of seconds to wait to get tile data from read buffer. Only relevant if using read workers.

class pyshepseg.tilingstats.StatsReadManager(imgfile, imgbandnum, segfile=None, segbandnum=1, segband=None, readCfg=None, tileSize=None, numXtiles=None, numYtiles=None, timings=None)¶

Open the input imgfile and segfile, optionally starting some read workers. It is assumed that the two rasters have same size/shape and pixel alignment.

Parameters:

imgfilestr or gdal.Dataset: Name or open gdal.Dataset of the imagery on which to collect statistics.
imgbandnumint: Band number (starts at 1) of imgfile on which to collect statistics
segfilestr or gdal.Dataset: Name or open gdal.Dataset of segmentation raster. This file has the segment ID of each pixel.
segbandnumint: Band number (starts at 1) of the band in segfile for the RAT. Default is 1 (the usual case).
readCfgInstance of StatsReadConfig: Configuration of read workers.
tileSizeint: Size (in pixels) of tiles i.e. shape is (tileSize, tileSize)
numXtilesint: Number of tiles in X direction across the images
numYtilesint: Number of tiles in Y direction across the images
timingsTimers: A Timers object in which read timings are recorded. Default will discard timings.

close()¶: Close the GDAL objects

popNextTile()¶

Get the data for the next tile. If using read workers, pop the next available tile out of the buffer, otherwise just read in directly from the files.

In the buffer case, note that we may lose the strict tile ordering, if a tile is available out of normal sequence. This is not generally a serious problem, and avoids a potential deadlock condition if we attempted to adhere to a strict order but read workers delivered them a long way out of order. Mostly would not be a problem, but serious if if it did occur.

readTile(segBand, imgBand, tileRow, tileCol)¶

Read a single tile from the two input rasters. The tile row/col numbers refer to a grid of tiles, so the first row of tiles is row 0, the second row is row 1, etc.

Parameters:

segBand, imgBandgdal.Band: GDAL Band objects for segmentation and image rasters
tileRowint: Row number of requested tile
tileColint: Col number of requested tile

Returns:

tilePairtuple of numpy.ndarray: Raster tiles as (tileSegments, tileImageData)

startReadWorkers()¶: Start the requested read workers, and set up the buffer they will feed into.

worker()¶: Function running in each read worker

class pyshepseg.tilingstats.TiledStatsResult¶

Result of tiled per-segment statistics

Attributes:

timingspyshepseg.timinghooks.Timers: Timings for various key parts of the per-segment stats calculation

pyshepseg.tilingstats.accumulateSegDict(segDict, noDataDict, imgNullVal, tileSegments, tileImageData)¶

Accumulate per-segment histogram counts for all pixels in the given tile. Updates segDict entries in-place.

Parameters:

segDictnumba.typed.Dict: Dictionary of segments keyed on segment id. Values are histograms for the segment
noDataDictnumba.typed.Dict: Dictionary of nodata values for each segment.
imgNullValint: No data value for image
tileSegmentsint ndarray of shape (nRows, nCols): Contains the tile of segments currently being processed.
tileImageDataint ndarray of shape (nRows, nCols): Contains the tile of the image data currently being processed.

pyshepseg.tilingstats.accumulateSegSpatial(segDict, noDataDict, imgNullVal, tileSegments, tileImageData, topLine, leftPix)¶

Accumulates data for each segment for the given tile. The data is put into segDict.

Parameters:

segDictnumba.typed.Dict: Dictionary of segments keyed on segment id. Values are a list of SegPoint objects.
noDataDictnumba.typed.Dict: Dictionary of nodata values for each segment.
imgNullValint: No data value for image
tileSegmentsint ndarray of shape (nRows, nCols): Contains the tile of segments currently being processed.
tileImageDataint ndarray of shape (nRows, nCols): Contains the tile of the image data currently being processed.
topLineint: row of the top of this tile in the full image
leftPixint: col of the left of this tile in the full image

pyshepseg.tilingstats.calcPerSegmentSpatialStatsRIOS(imgfile, imgbandnum, segfile, colNamesAndTypes, userFunc, userParam=None, concurrencyStyle=None, missingStatsValue=-9999, outFile=None, outFileIsZarr=False)¶

This function is deprecated. Consider using calcPerSegmentSpatialStatsTiled with a readCfg instead.

Similar to the calcPerSegmentStatsTiledRIOS() function but allows the user to calculate spatial statistics on the data for each segment. This is done by recording the location and value for each pixel within the segment. Once all the pixels are found for a segment the userFunc is called with the following parameters:

pts, imgNullVal, intArr, floatArr, userParam

where pts is List of SegPoint objects. If 2D Numpy tile is prefered the userFunc can call convertPtsInto2DArray(). intArray is a 1D numpy array which all the integer output values are to be put (in the same order given in colNamesAndTypes). floatArr is a 1D numpy array which all the floating point output values are to be put (in the same order given in colNamesAndTypes). These arrays are initialised with missingStatsValue so the function can skip any that it doesn’t have values for. userParam is the same value passed to this function and needs to be a type understood by Numba.

This function uses RIOS to perform the reading so it works in a memory-efficient way. Also, the number of read workers can be varied by passing an instance of rios.applier.ConcurrencyStyle which may be helpful when reading from data sources with high latency (ie S3). RIOS timeouts can also be changed using this method. Note that only 0 compute workers is supported and computeWorkerKind must be set to CW_NONE.

Parameters:

imgfilestring: Path to input file for collecting statistics from
imgbandnumint: 1-based index of the band number in imgfile to use for collecting stats
segfilestr or gdal.Dataset: Path to segmented file or an open GDAL Dataset. Will collect stats in imgfile for each segment in this file.
colNamesAndTypeslist of (colName, colType) tuples: This defines the names, types and order of the output RAT columns. colName should be a string containing the name of the RAT column to be created and colType should be one of gdal.GFT_Integer or gdal.GFT_Real and this controls the type of the column to be created. Note that the order of columns given in this parameter is important as this dicates the order of the intArray and floatArr parameters to userFunc.
userFunca Numba function (ie decorated with @jit or @njit).: See above for description
userParamanything that can be passed to a Numba function: This includes: arrays, scalars and @jitclass decorated classes.
concurrencyStylerios.applier.ConcurrencyStyle: Concurrency parameters for RIOS
missingStatsValueint: The value to fill in for segments that have no data.
outFilestr: Name of a separate output file in which to write RAT columns. If this is None, then columns are written back to segfile. If this is to be a GDAL file, it will be updated if it exists, or created using the KEA driver (so should have ‘.kea’ extension). If outFileIsZarr if set to True, the output file will be a RatZarr file, and will either be created or updated as appropriate.
outFileIsZarrbool: Set to True if the outFile should be written as RatZarr format.

pyshepseg.tilingstats.calcPerSegmentSpatialStatsTiled(imgfile, imgbandnum, segfile, colNamesAndTypes, userFunc, userParam=None, missingStatsValue=-9999, outFile=None, outFileIsZarr=False, readCfg=None)¶

Similar to the calcPerSegmentStatsTiled() function but allows the user to calculate spatial statistics on the data for each segment. This is done by recording the location and value for each pixel within the segment. Once all the pixels are found for a segment the userFunc is called with the following parameters:

pts, imgNullVal, intArr, floatArr, userParam

where pts is List of SegPoint objects. If 2D Numpy tile is prefered the userFunc can call convertPtsInto2DArray(). intArray is a 1D numpy array which all the integer output values are to be put (in the same order given in colNamesAndTypes). floatArr is a 1D numpy array which all the floating point output values are to be put (in the same order given in colNamesAndTypes). These arrays are initialised with missingStatsValue so the function can skip any that it doesn’t have values for. userParam is the same value passed to this function and needs to be a type understood by Numba.

Parameters:

imgfilestring: Path to input file for collecting statistics from
imgbandnumint: 1-based index of the band number in imgfile to use for collecting stats
segfilestr or gdal.Dataset: Path to segmented file or an open GDAL Dataset. Will collect stats in imgfile for each segment in this file.
colNamesAndTypeslist of (colName, colType) tuples: This defines the names, types and order of the output RAT columns. colName should be a string containing the name of the RAT column to be created and colType should be one of gdal.GFT_Integer or gdal.GFT_Real and this controls the type of the column to be created. Note that the order of columns given in this parameter is important as this dicates the order of the intArray and floatArr parameters to userFunc.
userFunca Numba function (ie decorated with @jit or @njit).: See above for description
userParamanything that can be passed to a Numba function: This includes: arrays, scalars and @jitclass decorated classes.
missingStatsValueint: The value to fill in for segments that have no data.
outFilestr: Name of a separate output file in which to write RAT columns. If this is None, then columns are written back to segfile. If this is to be a GDAL file, it will be updated if it exists, or created using the KEA driver (so should have ‘.kea’ extension). If outFileIsZarr if set to True, the output file will be a RatZarr file, and will either be created or updated as appropriate.
outFileIsZarrbool: Set to True if the outFile should be written as RatZarr format.
readCfgStatsReadConfig: Config for read manager, allowing multi-threaded reading. Default will run with no read workers.

pyshepseg.tilingstats.calcPerSegmentSpatialStats_riosFunc(info, inputs, outputs, otherArgs)¶: Called by RIOS from inside calcPerSegmentSpatialStatsRIOS. Do accumulation of statistics

pyshepseg.tilingstats.calcPerSegmentStatsRIOS(imgfile, imgbandnum, segfile, statsSelection, concurrencyStyle=None, missingStatsValue=-9999, outFile=None, outFileIsZarr=False)¶

This function is deprecated. Consider using calcPerSegmentStatsTiled with a readCfg instead.

Calculate selected per-segment statistics for the given band of the imgfile, against the given segment raster file. Calculated statistics are written to the segfile raster attribute table (RAT), so this file format must support RATs.

This function uses RIOS to perform the reading so it works in a memory-efficient way. Also, the number of read workers can be varied by passing an instance of rios.applier.ConcurrencyStyle which may be helpful when reading from data sources with high latency (ie S3). RIOS timeouts can also be changed using this method. Note that only 0 compute workers is supported and computeWorkerKind must be set to CW_NONE.

Parameters:

imgfilestring

Path to input file for collecting statistics from

imgbandnumint

1-based index of the band number in imgfile to use for collecting stats

segfilestr

Path to segmented file. Will collect stats in imgfile for each segment in this file.

statsSelectionlist of tuples.

One tuple for each statistic to be included. Each tuple is either 2 or 3 elements:

(columnName, statName)

or

(columnName, statName, parameter)

The columnName is a string, used to name the column in the output RAT. The statName is a string used to identify which statistic is to be calculated. Available options are:

'min', 'max', 'mean', 'stddev', 'median', 'mode', 'percentile', 'pixcount'

The ‘percentile’ statistic requires the 3-element form, with the 3rd element being the percentile to be calculated.

For example:

[('Band1_Mean', 'mean'),
 ('Band1_stdDev', 'stddev'),
 ('Band1_LQ', 'percentile', 25),
 ('Band1_UQ', 'percentile', 75)]

would create 4 columns, for the per-segment mean and standard deviation of the given band, and the lower and upper quartiles, with corresponding column names.

Any pixels that are set to the nodata value of imgfile (if set) are ignored in the stats calculations. If there are no pixels that aren’t the nodata value then the value passed in as missingStatsValue is put into the RAT for the requested statistics. The ‘pixcount’ statName can be used to find the number of valid pixels (not nodata) that were used to calculate the statistics.

concurrencyStylerios.applier.ConcurrencyStyle

Concurrency parameters for RIOS

missingStatsValueint or float

What to set for segments that have no valid pixels in imgile

outFilestr

Name of a separate output file in which to write RAT columns. If this is None, then columns are written back to segfile. If this is to be a GDAL file, it will be updated if it exists, or created using the KEA driver (so should have ‘.kea’ extension). If outFileIsZarr if set to True, the output file will be a RatZarr file, and will either be created or updated as appropriate.

outFileIsZarrbool

Set to True if the outFile should be written as RatZarr format.

pyshepseg.tilingstats.calcPerSegmentStatsTiled(imgfile, imgbandnum, segfile, statsSelection, missingStatsValue=-9999, outFile=None, outFileIsZarr=False, readCfg=None)¶

Calculate selected per-segment statistics for the given band of the imgfile, against the given segment raster file. Calculated statistics are written to the segfile raster attribute table (RAT), so this file format must support RATs.

Calculations are carried out in a memory-efficient way, allowing very large rasters to be processed. Raster data is handled in small tiles, attribute table is handled in fixed-size chunks.

Parameters:

imgfilestring

Path to input file for collecting statistics from

imgbandnumint

1-based index of the band number in imgfile to use for collecting stats

segfilestr or gdal.Dataset

Path to segmented file or an open GDAL dataset. Will collect stats in imgfile for each segment in this file.

statsSelectionlist of tuples.

One tuple for each statistic to be included. Each tuple is either 2 or 3 elements:

(columnName, statName)

or

(columnName, statName, parameter)

The columnName is a string, used to name the column in the output RAT. The statName is a string used to identify which statistic is to be calculated. Available options are:

'min', 'max', 'mean', 'stddev', 'median', 'mode', 'percentile', 'pixcount'

The ‘percentile’ statistic requires the 3-element form, with the 3rd element being the percentile to be calculated.

For example:

[('Band1_Mean', 'mean'),
 ('Band1_stdDev', 'stddev'),
 ('Band1_LQ', 'percentile', 25),
 ('Band1_UQ', 'percentile', 75)]

would create 4 columns, for the per-segment mean and standard deviation of the given band, and the lower and upper quartiles, with corresponding column names.

Any pixels that are set to the nodata value of imgfile (if set) are ignored in the stats calculations. If there are no pixels that aren’t the nodata value then the value passed in as missingStatsValue is put into the RAT for the requested statistics. The ‘pixcount’ statName can be used to find the number of valid pixels (not nodata) that were used to calculate the statistics.

missingStatsValueint or float

What to set for segments that have no valid pixels in imgile

outFilestr

Name of a separate output file in which to write RAT columns. If this is None, then columns are written back to segfile. If this is to be a GDAL file, it will be updated if it exists, or created using the KEA driver (so should have ‘.kea’ extension). If outFileIsZarr if set to True, the output file will be a RatZarr file, and will either be created or updated as appropriate.

outFileIsZarrbool

Set to True if the outFile should be written as RatZarr format.

readCfgStatsReadConfig

Config for read manager, allowing multi-threaded reading. Default will run with no read workers.

pyshepseg.tilingstats.calcPerSegmentStats_riosFunc(info, inputs, outputs, otherArgs)¶: Called by RIOS from inside calcPerSegmentStatsRIOS. Do accumulation of statistics

pyshepseg.tilingstats.calcStatsForCompletedSegs(segDict, noDataDict, missingStatsValue, pagedRat, statsSelection_fast, segSize, numIntCols, numFloatCols)¶

Calculate statistics for all complete segments in the segDict. Update the pagedRat with the resulting entries. Completed segments are then removed from segDict.

Parameters:

segDictnumba.typed.Dict: Dictionary of segments keyed on segment id. Values are histograms for the segment
noDataDictnumba.typed.Dict: Dictionary of nodata values for each segment.
missingStatsValueint: value to insert into the RAT where no valid pixels were found
pagedRatnumba.typed.Dict: The RAT as a paged data structure
statsSelection_fastint ndarray of shape (numStats, 5): Allows quick access to the types of stats required
segSizeint indarray of shape (numSegments+1, ): Array containing the histograms of the segment file
numIntColsint: Number of Integer RAT cols to be created
numFloatColsint: Number of Float RAT cols to be created

pyshepseg.tilingstats.calcStatsForCompletedSegsSpatial(segDict, noDataDict, missingStatsValue, pagedRat, segSize, userFunc, userParam, statsSelection_fast, intArr, floatArr, imgNullVal)¶

Calls the userFunc on data for completed segs and saves the results into the pagedRat.

Parameters:

segDictnumba.typed.Dict: Dictionary of segments keyed on segment id. Values are a numba.typed.List containing instances of SegPoint.
noDataDictnumba.typed.Dict: Dictionary of nodata values for each segment.
missingStatsValueint: Value to fill the intArr and floatArr parameters with so this valu gets written to the RAT if no valid pixels found
pagedRatnumba.typed.Dict: The RAT as a paged data structure
segSizeint indarray of shape (numSegments+1, ): Array containing the histograms of the segment file
userFuncNumba function: This is the user defined function to call for each completed segment
userParamNumba compatible argument: This is passed to the userFunc
statsSelection_fastint ndarray of shape (numStats, 5): Allows quick access to the types of stats required
intArrint indarray of shape (numIntCols,): Spaced used for storing the integer outputs for this segment
floatArrfloat indarray of shape (numFloatCols,): Spaced used for storing the float outputs for this segment
imgNullValint: No data value for image

pyshepseg.tilingstats.checkSegComplete(segDict, noDataDict, segSize, segId)¶

Return True if the given segment has a complete entry in the segDict, meaning that the pixel count is equal to the segment size

Parameters:

segDictnumba.typed.Dict: Dictionary of segments keyed on segment id. Values are histograms for the segment
noDataDictnumba.typed.Dict: Dictionary of nodata values for each segment.
segSizeint indarray of shape (numSegments+1, ): Array containing the histograms of the segment file
segIdshepseg.SegIdType: Segment to check for completeness

Returns:

completebool: Whether the segId is complete or not

pyshepseg.tilingstats.checkSegCompleteSpatial(segDict, noDataDict, segSize, segId)¶

Return True if the given segment has a complete entry in the segDict, meaning that the pixel count is equal to the segment size

Note: this is distinct from checkSegComplete() as that function is working with a dictionary of histograms

Parameters:

segDictnumba.typed.Dict: Dictionary of segments keyed on segment id. Values are a list of SegPoint objects.
noDataDictnumba.typed.Dict: Dictionary of nodata values for each segment.
segSizeint indarray of shape (numSegments+1, ): Array containing the histograms of the segment file
segIdshepseg.SegIdType: Segment to check for completeness

Returns:

completebool: Whether the segId is complete or not

pyshepseg.tilingstats.convertPtsInto2DArray(pts, imgNullVal)¶

Given a list of points for a segment turn this back into a 2D array where the value for each pixel is the value of the pixel in the original (data) image.

The tile is created just large enough for the shape of the segment. Areas of the tile not within a segment is given the value of imgNullVal.

Parameters:

ptsnumba.typed.List containing SegPoint objects: This is the list passed to the userFunc.
imgNullValint: No data value for image

Returns:

tilenumbaTypeForImageType ndarray of shape (ysize, xsize): Where ysize and xsize are the total extent of the tile

pyshepseg.tilingstats.convertPtsInto2DMaskArray(pts, imgNullVal)¶

Similar to convertPtsInto2DArray() but burns in the value 1 where each pixel is within a segment.

Given a list of points for a segment turn this back into a 2D array for passing to the user function.

The tile is created just large enough for the shape of the segment. Areas of the tile not within a segment is given the value of imgNullVal.

Parameters:

ptsnumba.typed.List containing SegPoint objects: This is the list passed to the userFunc.
imgNullValint: No data value for image

Returns:

tilenumpy.uint8 ndarray of shape (ysize, xsize): Where ysize and xsize are the total extent of the tile

pyshepseg.tilingstats.copyRatCols(srcRat, destRat)¶

Copy all columns from srcRat to dstRat.

This is intended only for use copying from a temporary RAT file, which should only contain the columns to be copied. Use outside this could be dangerous.

Copies each column in fixed-size blocks, so quite memory-efficient.

Parameters:

srcRatstr: Name of temp KEA file with RAT to copy
destRatstr or gdal.Dataset: Name (or Dataset) of destination GDAL file to which RAT is copied

pyshepseg.tilingstats.createNoDataDict()¶

Create the dictionary that holds counts of nodata seen for each segment. The key is the segId, value is the count of nodata seen for that segment in the image data.

Returns:

nodataDictnumba.typed.Dict: Dictionary of nodata counts for each segment

pyshepseg.tilingstats.createPagedRat()¶

Create the dictionary for the paged RAT. Each element is a page of the RAT, with entries for a range of segment IDs. The key is the segment ID of the first entry in the page.

The returned dictionary is initially empty.

pyshepseg.tilingstats.createSegDict()¶

Create the Dict of Dicts for handling per-segment histograms. Each entry is a dictionary, and the key is a segment ID. Each dictionary within this is the per-segment histogram for a single segment. Each of its entries is for a single value from the imagery, the key is the pixel value, and the dictionary value is the number of times that pixel value appears in the segment.

Returns:

segDictnumba.typed.Dict: Dictionary of histograms used for calculating statistics.

pyshepseg.tilingstats.createSegSpatialDataDict()¶

Create a dictionary where the key is the segment ID and the value is a List of SegPoint objects.

Returns:

segDictnumba.typed.Dict: Dictionary containing lists of SegPoint objects.

pyshepseg.tilingstats.createStatColumns(statsSelection, openRat)¶

Create requested statistic columns on the segmentation image RAT. Statistic columns are of type gdal.GFT_Real for mean and stddev, and gdal.GFT_Integer for all other statistics.

Return the column indexes for all requested columns, in the same order.

Parameters:

statsSelectionlist of tuples: Same as passed to calcPerSegmentStatsTiled()
openRatOpenRatContainer: The file handle(s) for the RAT file

Returns:

colIndexListlist of ints: A list of the indexes of each of the requested new columns in the same order as statsSelection.

pyshepseg.tilingstats.createUserColumnsSpatial(colNamesAndTypes, openRat)¶

Used by calcPerSegmentSpatialStatsTiled() to create columns specified in the colNamesAndTypes structure.

Returns a tuple with number of integer columns, number of float columns and a statsSelection_fast array for use by calcStatsForCompletedSegsSpatial().

Parameters:

colNamesAndTypeslist of (colName, colType) tuples: Same as passed to calcPerSegmentSpatialStatsTiled().
openRatOpenRatContainer: The file handle(s) for the RAT file

Returns:

numIntColsint: The number of new Integer columns
numFloatColsint: The number of new Float columns
statsSelection_fastint ndarray of shape (numStats, 5): Allows quick access to the types of stats required

pyshepseg.tilingstats.doImageChecks(segfile, imgfile, imgbandnum)¶

Do the checks that the segment file and image file that is being used to collect the stats actually align. We refuse to process the files if they don’t as it is not clear how they should be made to line up - this is up to the user to get right. Also checks that imgfile is not a float image.

Check that there is a null value set on imgfile, and that the segfile has a histogram.

The two rasters are opened read-only, and closed again afterwards.

Parameters:

segfilestr or gdal.Dataset: Path to segmentation file or an open GDAL dataset.
imgfilestring: Path to input file for collecting statistics from
imgbandnumint: 1-based index of the band number in imgfile to use for collecting stats

Returns:

imgNullValnumba.int64: The null value set in the imgdata raster
segSizendarray: The Histogram column of the segfile, i.e. the pixel counts of each segment

pyshepseg.tilingstats.equalProjection(proj1, proj2)¶

Returns True if the proj1 is the same as proj2

Stolen from rios/pixelgrid.py

Parameters:

proj1string: WKT string for the first projection
proj2string: WKT string for the second projection

Returns:

equalbool: Whether the projections are equal or not

pyshepseg.tilingstats.getRatPageId(segId)¶: For the given segment ID, return the page ID. This is the segment ID of the first segment in the page.

pyshepseg.tilingstats.getSortedKeysAndValuesForDict(d)¶

The given dictionary is keyed by pixel values from the imagery, and the values are counts of occurences of the corresponding pixel value. This function returns a pair of numpy arrays (as a tuple), one for the list of pixel values, and one for the corresponding counts. The arrays are sorted in increasing order of pixel value.

Parameters:

ddictionary of int: Counts of each pixel value.

Returns:

keysSortednumbaTypeForImageType ndarray of shape (numValues,): Pixel values sorted
valuesSortedint ndarray of shape (numValues,): Counts of each pixel sorted by pixel value

pyshepseg.tilingstats.makeFastStatsSelection(colIndexList, statsSelection)¶

Make a fast version of the statsSelection data structure, combined with the global column index numbers.

Return a tuple of

(statsSelection_fast, numIntCols, numFloatCols)

The statsSelection_fast is a single array, of shape (numStats, 5). The first index corresponds to the sequence in statsSelection. The second index corresponds to the STATSEL_* values.

Everything is encoded as an integer value in a single numpy array, suitable for fast access within numba njit-ed functions.

This is all a bit ugly and un-pythonic. Not sure if there is a better way.

Parameters:

colIndexListlist of ints: The column indexes for all requested columns
statsSelectionlist of tuples: See tilingstats.calcPerSegmentStatsTiled() for a complete description of this parameter.

Returns:

statsSelection_fastint ndarray of shape (numStats, 5): The statsSelection_fast structure
intCountint: Number of int columns
floatCountint: Number of float columns

pyshepseg.tilingstats.makeOutRatKea(outFile)¶

Create a small KEA file to write a RAT into. Return a single object with all the open GDAL handles on it.

Parameters:

outFilestr: Name of output KEA file

Returns:

openRatOpenRatContainer: Holds all the open GDAL handles

pyshepseg.tilingstats.openEverything(readCfg, outFile, outFileIsZarr, tileSize, numXtiles, numYtiles, imgfile, imgbandnum, segfile, timings)¶: Open all the input and output files, ready for the stats routines to do their work.

pyshepseg.tilingstats.userFuncMeanCoord(pts, imgNullVal, intArr, floatArr, transform)¶

Calculates the mean coordinate of each segment. This function is intended to be passed in as the userFunc parameter to calcPerSegmentSpatialStatsTiled() if the mean coordinates of each segment are required.

The transform of the image (ie. returned by ds.GetGeoTransform() where ds is a GDAL dataset) is to be passed as userParam but converted to an array so it works with Numba.

The result will be written to floatArr (2 values - easting then northing). It is expected that at least 2 float columns are available.

Parameters:

ptsnumba.typed.List containing SegPoint objects: This is the list passed to the userFunc.
imgNullValint: The nodata value for the imagery
intArrint ndarray of shape (numIntCols, ): The integer columns - not used in this function
floatArrint ndarray of shape (numFloatCols, ): The float columns - output written here
transformfloat ndarray of shape (6,): The GDAL transform array. Passed in as a userParam to tilingstats.calcPerSegmentSpatialStatsTiled().

pyshepseg.tilingstats.userFuncNumEdgePixels(pts, imgNullVal, intArr, floatArr, fourConnected)¶

Calculates the number of ‘edge’ pixels for each segment. Edge pixels are pixels that touch either another segment or the edge of the image. This function is intended to be passed in as the userFunc parameter to calcPerSegmentSpatialStatsTiled() if the number of edge pixels are required.

The result will be written to intArr (1 value - number of edge pixels).

Parameters:

ptsnumba.typed.List containing SegPoint objects: This is the list passed to the userFunc.
imgNullValint: The nodata value for the imagery
intArrint ndarray of shape (numIntCols, ): The integer columns - output written here
floatArrint ndarray of shape (numFloatCols, ): The float columns - not used in this function
fourConnectedbool: If True, use four-way connectedness to judge neighbours, otherwise use eight-way.

pyshepseg.tilingstats.userFuncVariogram(pts, imgNullVal, intArr, floatArr, maxDist)¶

Calculates the variogram at the given distance for the segment contained in the tile. This function is intended to be passed in as the userFunc parameter to calcPerSegmentSpatialStatsTiled() if variograms are to be calculated.

maxDist is the number of variograms to calculate and should be passed in as the userParam argument to calcPerSegmentSpatialStatsTiled().

It is assumed that floatArr has enough space for the number of variograms requested (this is calculated from the colNamesAndTypes parameter to calcPerSegmentSpatialStatsTiled()).

Parameters:

ptsnumba.typed.List containing SegPoint objects: This is the list passed to the userFunc.
imgNullValint: The nodata value for the imagery
intArrint ndarray of shape (numIntCols, ): The integer columns - not used in this function
floatArrint ndarray of shape (numFloatCols, ): The float columns - output written here
maxDistint: Number of variograms to calculate

pyshepseg.tilingstats.writeCompletePages(pagedRat, openRat, statsSelection_fast)¶

Check for completed pages, and write them to the attribute table. Remove them from the pagedRat after writing.

Parameters:

pagedRatnumba.typed.Dict: The RAT as a paged data structure
attrTblgdal.RasterAttributeTable: The Raster Attribute Table object for the file
statsSelection_fastint ndarray of shape (numStats, 5): Allows quick access to the types of stats required

tilingstats¶

Table of Contents

Previous topic

Next topic

This Page