tilingstats¶
Routines to support calculation of statistics on large rasters. The statistics are calculated per segment so require two input rasters - the segmentation output, and another image to gather statistics from, for each segment in the first image.
These are optimised to work on one tile of the images at a time so should be efficient in terms of memory use.
The main functions in this module are calcPerSegmentStatsTiled()
and calcPerSegmentSpatialStatsTiled().
- class pyshepseg.tilingstats.OpenRatContainer(ds=None, band=None, rz=None)¶
Hold all data structures for an open RAT, hiding the distinction between GDAL-based and Zarr-based RATs. The constructor takes either a single RatZarr object rz, or a pair of GDAL objects ds & band.
- Parameters:
- dsgdal.Dataset or None
Open Dataset object
- bandgdal.Band or None
Open band on ds
- rzratzarr.RatZarr or None
Open RatZarr object
- CreateColumn(colName, colType)¶
Create the column with the given name and type. For GDAL RAT, always use GFU_Generic usage
- SetRowCount(rowCount)¶
Set the row count for the table
- WriteArray(colArr, colNumber, start)¶
Intended to look like GDAL’s RAT WriteArray function.
When the output RAT is a GDAL file, parameters are passed straight through. When using a RatZarr file, the colNdx is translated to a column name and the data written to that column.
- checkColType(colName, colType)¶
Check that the given column (pre-existing) is compatible with the given GDAL column type (gdal.GFT_*). Raise exception if not.
- close()¶
Close the open file handles
- colExists(colName)¶
Check if the named column already exists in the RAT
- getColNdx(colName)¶
Get the column index for the given name. The index is only meaningful for GDAL-based RAT, but we fake it for Zarr-based, so we can continue to use it as the basic identifier in the numba-compiled sections of code (i.e. the statsSelection_fast structure).
- exception pyshepseg.tilingstats.PyShepSegStatsError¶
- class pyshepseg.tilingstats.RatPage(numIntCols, numFloatCols, startSegId, numSeg)¶
Hold a single page of the paged RAT
- getIndexInPage(segId)¶
Return the index for the given segment, within the current page.
- getRatVal(segId, colType, colArrayNdx)¶
Get the RAT entry for the given segment.
- getSegmentComplete(segId)¶
Returns True if the segment has been flagged as complete
- pageComplete()¶
Return True if the current page has been completed
- setRatVal(segId, colType, colArrayNdx, val)¶
Set the RAT entry for the given segment, to be the given value.
- setSegmentComplete(segId)¶
Flag that the given segment has had all stats calculated.
- class pyshepseg.tilingstats.SegPoint(x, y, val)¶
Class for handling a given data point and it’s location in pixel space (within the whole image, not a tile).
Used so that all the data for a given segment can be collected together even if the segment straddles a tile.
- class pyshepseg.tilingstats.SegmentStats(segmentHistDict, missingStatsValue)¶
Manage statistics for a single segment
- getPercentile(percentile)¶
Return the pixel value for the given percentile, e.g. getPercentile(50) would return the median value of the segment
- getStat(statID, param)¶
Return the requested statistic
- class pyshepseg.tilingstats.StatsReadConfig(numWorkers=0, bufferInsertTimeout=60, bufferPopTimeout=60)¶
Set up configuration information for running read workers
- Parameters:
- numWorkersint
Number of read workers to use. If zero, reading of each tile is done sequentially with processing.
- bufferInsertTimeoutint
Number of seconds to wait to insert tile data into read buffer. Only relevant if using read workers.
- bufferPopTimeoutint
Number of seconds to wait to get tile data from read buffer. Only relevant if using read workers.
- class pyshepseg.tilingstats.StatsReadManager(imgfile, imgbandnum, segfile=None, segbandnum=1, segband=None, readCfg=None, tileSize=None, numXtiles=None, numYtiles=None, timings=None)¶
Open the input imgfile and segfile, optionally starting some read workers. It is assumed that the two rasters have same size/shape and pixel alignment.
- Parameters:
- imgfilestr or gdal.Dataset
Name or open gdal.Dataset of the imagery on which to collect statistics.
- imgbandnumint
Band number (starts at 1) of imgfile on which to collect statistics
- segfilestr or gdal.Dataset
Name or open gdal.Dataset of segmentation raster. This file has the segment ID of each pixel.
- segbandnumint
Band number (starts at 1) of the band in segfile for the RAT. Default is 1 (the usual case).
- readCfgInstance of StatsReadConfig
Configuration of read workers.
- tileSizeint
Size (in pixels) of tiles i.e. shape is (tileSize, tileSize)
- numXtilesint
Number of tiles in X direction across the images
- numYtilesint
Number of tiles in Y direction across the images
- timingsTimers
A Timers object in which read timings are recorded. Default will discard timings.
- close()¶
Close the GDAL objects
- popNextTile()¶
Get the data for the next tile. If using read workers, pop the next available tile out of the buffer, otherwise just read in directly from the files.
In the buffer case, note that we may lose the strict tile ordering, if a tile is available out of normal sequence. This is not generally a serious problem, and avoids a potential deadlock condition if we attempted to adhere to a strict order but read workers delivered them a long way out of order. Mostly would not be a problem, but serious if if it did occur.
- readTile(segBand, imgBand, tileRow, tileCol)¶
Read a single tile from the two input rasters. The tile row/col numbers refer to a grid of tiles, so the first row of tiles is row 0, the second row is row 1, etc.
- Parameters:
- segBand, imgBandgdal.Band
GDAL Band objects for segmentation and image rasters
- tileRowint
Row number of requested tile
- tileColint
Col number of requested tile
- Returns:
- tilePairtuple of numpy.ndarray
Raster tiles as (tileSegments, tileImageData)
- startReadWorkers()¶
Start the requested read workers, and set up the buffer they will feed into.
- worker()¶
Function running in each read worker
- class pyshepseg.tilingstats.TiledStatsResult¶
Result of tiled per-segment statistics
- Attributes:
- timingspyshepseg.timinghooks.Timers
Timings for various key parts of the per-segment stats calculation
- pyshepseg.tilingstats.accumulateSegDict(segDict, noDataDict, imgNullVal, tileSegments, tileImageData)¶
Accumulate per-segment histogram counts for all pixels in the given tile. Updates segDict entries in-place.
- Parameters:
- segDictnumba.typed.Dict
Dictionary of segments keyed on segment id. Values are histograms for the segment
- noDataDictnumba.typed.Dict
Dictionary of nodata values for each segment.
- imgNullValint
No data value for image
- tileSegmentsint ndarray of shape (nRows, nCols)
Contains the tile of segments currently being processed.
- tileImageDataint ndarray of shape (nRows, nCols)
Contains the tile of the image data currently being processed.
- pyshepseg.tilingstats.accumulateSegSpatial(segDict, noDataDict, imgNullVal, tileSegments, tileImageData, topLine, leftPix)¶
Accumulates data for each segment for the given tile. The data is put into segDict.
- Parameters:
- segDictnumba.typed.Dict
Dictionary of segments keyed on segment id. Values are a list of SegPoint objects.
- noDataDictnumba.typed.Dict
Dictionary of nodata values for each segment.
- imgNullValint
No data value for image
- tileSegmentsint ndarray of shape (nRows, nCols)
Contains the tile of segments currently being processed.
- tileImageDataint ndarray of shape (nRows, nCols)
Contains the tile of the image data currently being processed.
- topLineint
row of the top of this tile in the full image
- leftPixint
col of the left of this tile in the full image
- pyshepseg.tilingstats.calcPerSegmentSpatialStatsRIOS(imgfile, imgbandnum, segfile, colNamesAndTypes, userFunc, userParam=None, concurrencyStyle=None, missingStatsValue=-9999, outFile=None, outFileIsZarr=False)¶
This function is deprecated. Consider using calcPerSegmentSpatialStatsTiled with a readCfg instead.
Similar to the
calcPerSegmentStatsTiledRIOS()function but allows the user to calculate spatial statistics on the data for each segment. This is done by recording the location and value for each pixel within the segment. Once all the pixels are found for a segment theuserFuncis called with the following parameters:pts, imgNullVal, intArr, floatArr, userParam
where
ptsis List ofSegPointobjects. If 2D Numpy tile is prefered theuserFunccan callconvertPtsInto2DArray().intArrayis a 1D numpy array which all the integer output values are to be put (in the same order given incolNamesAndTypes).floatArris a 1D numpy array which all the floating point output values are to be put (in the same order given incolNamesAndTypes). These arrays are initialised withmissingStatsValueso the function can skip any that it doesn’t have values for.userParamis the same value passed to this function and needs to be a type understood by Numba.This function uses RIOS to perform the reading so it works in a memory-efficient way. Also, the number of read workers can be varied by passing an instance of rios.applier.ConcurrencyStyle which may be helpful when reading from data sources with high latency (ie S3). RIOS timeouts can also be changed using this method. Note that only 0 compute workers is supported and computeWorkerKind must be set to CW_NONE.
- Parameters:
- imgfilestring
Path to input file for collecting statistics from
- imgbandnumint
1-based index of the band number in imgfile to use for collecting stats
- segfilestr or gdal.Dataset
Path to segmented file or an open GDAL Dataset. Will collect stats in imgfile for each segment in this file.
- colNamesAndTypeslist of
(colName, colType)tuples This defines the names, types and order of the output RAT columns.
colNameshould be a string containing the name of the RAT column to be created andcolTypeshould be one ofgdal.GFT_Integerorgdal.GFT_Realand this controls the type of the column to be created. Note that the order of columns given in this parameter is important as this dicates the order of theintArrayandfloatArrparameters touserFunc.- userFunca Numba function (ie decorated with @jit or @njit).
See above for description
- userParamanything that can be passed to a Numba function
This includes: arrays, scalars and @jitclass decorated classes.
- concurrencyStylerios.applier.ConcurrencyStyle
Concurrency parameters for RIOS
- missingStatsValueint
The value to fill in for segments that have no data.
- outFilestr
Name of a separate output file in which to write RAT columns. If this is None, then columns are written back to segfile. If this is to be a GDAL file, it will be updated if it exists, or created using the KEA driver (so should have ‘.kea’ extension). If outFileIsZarr if set to True, the output file will be a RatZarr file, and will either be created or updated as appropriate.
- outFileIsZarrbool
Set to True if the outFile should be written as RatZarr format.
- pyshepseg.tilingstats.calcPerSegmentSpatialStatsTiled(imgfile, imgbandnum, segfile, colNamesAndTypes, userFunc, userParam=None, missingStatsValue=-9999, outFile=None, outFileIsZarr=False, readCfg=None)¶
Similar to the
calcPerSegmentStatsTiled()function but allows the user to calculate spatial statistics on the data for each segment. This is done by recording the location and value for each pixel within the segment. Once all the pixels are found for a segment theuserFuncis called with the following parameters:pts, imgNullVal, intArr, floatArr, userParam
where
ptsis List ofSegPointobjects. If 2D Numpy tile is prefered theuserFunccan callconvertPtsInto2DArray().intArrayis a 1D numpy array which all the integer output values are to be put (in the same order given incolNamesAndTypes).floatArris a 1D numpy array which all the floating point output values are to be put (in the same order given incolNamesAndTypes). These arrays are initialised withmissingStatsValueso the function can skip any that it doesn’t have values for.userParamis the same value passed to this function and needs to be a type understood by Numba.- Parameters:
- imgfilestring
Path to input file for collecting statistics from
- imgbandnumint
1-based index of the band number in imgfile to use for collecting stats
- segfilestr or gdal.Dataset
Path to segmented file or an open GDAL Dataset. Will collect stats in imgfile for each segment in this file.
- colNamesAndTypeslist of
(colName, colType)tuples This defines the names, types and order of the output RAT columns.
colNameshould be a string containing the name of the RAT column to be created andcolTypeshould be one ofgdal.GFT_Integerorgdal.GFT_Realand this controls the type of the column to be created. Note that the order of columns given in this parameter is important as this dicates the order of theintArrayandfloatArrparameters touserFunc.- userFunca Numba function (ie decorated with @jit or @njit).
See above for description
- userParamanything that can be passed to a Numba function
This includes: arrays, scalars and @jitclass decorated classes.
- missingStatsValueint
The value to fill in for segments that have no data.
- outFilestr
Name of a separate output file in which to write RAT columns. If this is None, then columns are written back to segfile. If this is to be a GDAL file, it will be updated if it exists, or created using the KEA driver (so should have ‘.kea’ extension). If outFileIsZarr if set to True, the output file will be a RatZarr file, and will either be created or updated as appropriate.
- outFileIsZarrbool
Set to True if the outFile should be written as RatZarr format.
- readCfgStatsReadConfig
Config for read manager, allowing multi-threaded reading. Default will run with no read workers.
- pyshepseg.tilingstats.calcPerSegmentSpatialStats_riosFunc(info, inputs, outputs, otherArgs)¶
Called by RIOS from inside calcPerSegmentSpatialStatsRIOS. Do accumulation of statistics
- pyshepseg.tilingstats.calcPerSegmentStatsRIOS(imgfile, imgbandnum, segfile, statsSelection, concurrencyStyle=None, missingStatsValue=-9999, outFile=None, outFileIsZarr=False)¶
This function is deprecated. Consider using calcPerSegmentStatsTiled with a readCfg instead.
Calculate selected per-segment statistics for the given band of the imgfile, against the given segment raster file. Calculated statistics are written to the segfile raster attribute table (RAT), so this file format must support RATs.
This function uses RIOS to perform the reading so it works in a memory-efficient way. Also, the number of read workers can be varied by passing an instance of rios.applier.ConcurrencyStyle which may be helpful when reading from data sources with high latency (ie S3). RIOS timeouts can also be changed using this method. Note that only 0 compute workers is supported and computeWorkerKind must be set to CW_NONE.
- Parameters:
- imgfilestring
Path to input file for collecting statistics from
- imgbandnumint
1-based index of the band number in imgfile to use for collecting stats
- segfilestr
Path to segmented file. Will collect stats in imgfile for each segment in this file.
- statsSelectionlist of tuples.
One tuple for each statistic to be included. Each tuple is either 2 or 3 elements:
(columnName, statName)
or
(columnName, statName, parameter)
The columnName is a string, used to name the column in the output RAT. The statName is a string used to identify which statistic is to be calculated. Available options are:
'min', 'max', 'mean', 'stddev', 'median', 'mode', 'percentile', 'pixcount'
The ‘percentile’ statistic requires the 3-element form, with the 3rd element being the percentile to be calculated.
For example:
[('Band1_Mean', 'mean'), ('Band1_stdDev', 'stddev'), ('Band1_LQ', 'percentile', 25), ('Band1_UQ', 'percentile', 75)]
would create 4 columns, for the per-segment mean and standard deviation of the given band, and the lower and upper quartiles, with corresponding column names.
Any pixels that are set to the nodata value of imgfile (if set) are ignored in the stats calculations. If there are no pixels that aren’t the nodata value then the value passed in as missingStatsValue is put into the RAT for the requested statistics. The ‘pixcount’ statName can be used to find the number of valid pixels (not nodata) that were used to calculate the statistics.
- concurrencyStylerios.applier.ConcurrencyStyle
Concurrency parameters for RIOS
- missingStatsValueint or float
What to set for segments that have no valid pixels in imgile
- outFilestr
Name of a separate output file in which to write RAT columns. If this is None, then columns are written back to segfile. If this is to be a GDAL file, it will be updated if it exists, or created using the KEA driver (so should have ‘.kea’ extension). If outFileIsZarr if set to True, the output file will be a RatZarr file, and will either be created or updated as appropriate.
- outFileIsZarrbool
Set to True if the outFile should be written as RatZarr format.
- pyshepseg.tilingstats.calcPerSegmentStatsTiled(imgfile, imgbandnum, segfile, statsSelection, missingStatsValue=-9999, outFile=None, outFileIsZarr=False, readCfg=None)¶
Calculate selected per-segment statistics for the given band of the imgfile, against the given segment raster file. Calculated statistics are written to the segfile raster attribute table (RAT), so this file format must support RATs.
Calculations are carried out in a memory-efficient way, allowing very large rasters to be processed. Raster data is handled in small tiles, attribute table is handled in fixed-size chunks.
- Parameters:
- imgfilestring
Path to input file for collecting statistics from
- imgbandnumint
1-based index of the band number in imgfile to use for collecting stats
- segfilestr or gdal.Dataset
Path to segmented file or an open GDAL dataset. Will collect stats in imgfile for each segment in this file.
- statsSelectionlist of tuples.
One tuple for each statistic to be included. Each tuple is either 2 or 3 elements:
(columnName, statName)
or
(columnName, statName, parameter)
The columnName is a string, used to name the column in the output RAT. The statName is a string used to identify which statistic is to be calculated. Available options are:
'min', 'max', 'mean', 'stddev', 'median', 'mode', 'percentile', 'pixcount'
The ‘percentile’ statistic requires the 3-element form, with the 3rd element being the percentile to be calculated.
For example:
[('Band1_Mean', 'mean'), ('Band1_stdDev', 'stddev'), ('Band1_LQ', 'percentile', 25), ('Band1_UQ', 'percentile', 75)]
would create 4 columns, for the per-segment mean and standard deviation of the given band, and the lower and upper quartiles, with corresponding column names.
Any pixels that are set to the nodata value of imgfile (if set) are ignored in the stats calculations. If there are no pixels that aren’t the nodata value then the value passed in as missingStatsValue is put into the RAT for the requested statistics. The ‘pixcount’ statName can be used to find the number of valid pixels (not nodata) that were used to calculate the statistics.
- missingStatsValueint or float
What to set for segments that have no valid pixels in imgile
- outFilestr
Name of a separate output file in which to write RAT columns. If this is None, then columns are written back to segfile. If this is to be a GDAL file, it will be updated if it exists, or created using the KEA driver (so should have ‘.kea’ extension). If outFileIsZarr if set to True, the output file will be a RatZarr file, and will either be created or updated as appropriate.
- outFileIsZarrbool
Set to True if the outFile should be written as RatZarr format.
- readCfgStatsReadConfig
Config for read manager, allowing multi-threaded reading. Default will run with no read workers.
- pyshepseg.tilingstats.calcPerSegmentStats_riosFunc(info, inputs, outputs, otherArgs)¶
Called by RIOS from inside calcPerSegmentStatsRIOS. Do accumulation of statistics
- pyshepseg.tilingstats.calcStatsForCompletedSegs(segDict, noDataDict, missingStatsValue, pagedRat, statsSelection_fast, segSize, numIntCols, numFloatCols)¶
Calculate statistics for all complete segments in the segDict. Update the pagedRat with the resulting entries. Completed segments are then removed from segDict.
- Parameters:
- segDictnumba.typed.Dict
Dictionary of segments keyed on segment id. Values are histograms for the segment
- noDataDictnumba.typed.Dict
Dictionary of nodata values for each segment.
- missingStatsValueint
value to insert into the RAT where no valid pixels were found
- pagedRatnumba.typed.Dict
The RAT as a paged data structure
- statsSelection_fastint ndarray of shape (numStats, 5)
Allows quick access to the types of stats required
- segSizeint indarray of shape (numSegments+1, )
Array containing the histograms of the segment file
- numIntColsint
Number of Integer RAT cols to be created
- numFloatColsint
Number of Float RAT cols to be created
- pyshepseg.tilingstats.calcStatsForCompletedSegsSpatial(segDict, noDataDict, missingStatsValue, pagedRat, segSize, userFunc, userParam, statsSelection_fast, intArr, floatArr, imgNullVal)¶
Calls the
userFuncon data for completed segs and saves the results into thepagedRat.- Parameters:
- segDictnumba.typed.Dict
Dictionary of segments keyed on segment id. Values are a numba.typed.List containing instances of SegPoint.
- noDataDictnumba.typed.Dict
Dictionary of nodata values for each segment.
- missingStatsValueint
Value to fill the intArr and floatArr parameters with so this valu gets written to the RAT if no valid pixels found
- pagedRatnumba.typed.Dict
The RAT as a paged data structure
- segSizeint indarray of shape (numSegments+1, )
Array containing the histograms of the segment file
- userFuncNumba function
This is the user defined function to call for each completed segment
- userParamNumba compatible argument
This is passed to the userFunc
- statsSelection_fastint ndarray of shape (numStats, 5)
Allows quick access to the types of stats required
- intArrint indarray of shape (numIntCols,)
Spaced used for storing the integer outputs for this segment
- floatArrfloat indarray of shape (numFloatCols,)
Spaced used for storing the float outputs for this segment
- imgNullValint
No data value for image
- pyshepseg.tilingstats.checkSegComplete(segDict, noDataDict, segSize, segId)¶
Return True if the given segment has a complete entry in the segDict, meaning that the pixel count is equal to the segment size
- Parameters:
- segDictnumba.typed.Dict
Dictionary of segments keyed on segment id. Values are histograms for the segment
- noDataDictnumba.typed.Dict
Dictionary of nodata values for each segment.
- segSizeint indarray of shape (numSegments+1, )
Array containing the histograms of the segment file
- segIdshepseg.SegIdType
Segment to check for completeness
- Returns:
- completebool
Whether the segId is complete or not
- pyshepseg.tilingstats.checkSegCompleteSpatial(segDict, noDataDict, segSize, segId)¶
Return True if the given segment has a complete entry in the segDict, meaning that the pixel count is equal to the segment size
Note: this is distinct from checkSegComplete() as that function is working with a dictionary of histograms
- Parameters:
- segDictnumba.typed.Dict
Dictionary of segments keyed on segment id. Values are a list of SegPoint objects.
- noDataDictnumba.typed.Dict
Dictionary of nodata values for each segment.
- segSizeint indarray of shape (numSegments+1, )
Array containing the histograms of the segment file
- segIdshepseg.SegIdType
Segment to check for completeness
- Returns:
- completebool
Whether the segId is complete or not
- pyshepseg.tilingstats.convertPtsInto2DArray(pts, imgNullVal)¶
Given a list of points for a segment turn this back into a 2D array where the value for each pixel is the value of the pixel in the original (data) image.
The tile is created just large enough for the shape of the segment. Areas of the tile not within a segment is given the value of
imgNullVal.- Parameters:
- ptsnumba.typed.List containing SegPoint objects
This is the list passed to the userFunc.
- imgNullValint
No data value for image
- Returns:
- tilenumbaTypeForImageType ndarray of shape (ysize, xsize)
Where ysize and xsize are the total extent of the tile
- pyshepseg.tilingstats.convertPtsInto2DMaskArray(pts, imgNullVal)¶
Similar to
convertPtsInto2DArray()but burns in the value 1 where each pixel is within a segment.Given a list of points for a segment turn this back into a 2D array for passing to the user function.
The tile is created just large enough for the shape of the segment. Areas of the tile not within a segment is given the value of
imgNullVal.- Parameters:
- ptsnumba.typed.List containing SegPoint objects
This is the list passed to the userFunc.
- imgNullValint
No data value for image
- Returns:
- tilenumpy.uint8 ndarray of shape (ysize, xsize)
Where ysize and xsize are the total extent of the tile
- pyshepseg.tilingstats.copyRatCols(srcRat, destRat)¶
Copy all columns from srcRat to dstRat.
This is intended only for use copying from a temporary RAT file, which should only contain the columns to be copied. Use outside this could be dangerous.
Copies each column in fixed-size blocks, so quite memory-efficient.
- Parameters:
- srcRatstr
Name of temp KEA file with RAT to copy
- destRatstr or gdal.Dataset
Name (or Dataset) of destination GDAL file to which RAT is copied
- pyshepseg.tilingstats.createNoDataDict()¶
Create the dictionary that holds counts of nodata seen for each segment. The key is the segId, value is the count of nodata seen for that segment in the image data.
- Returns:
- nodataDictnumba.typed.Dict
Dictionary of nodata counts for each segment
- pyshepseg.tilingstats.createPagedRat()¶
Create the dictionary for the paged RAT. Each element is a page of the RAT, with entries for a range of segment IDs. The key is the segment ID of the first entry in the page.
The returned dictionary is initially empty.
- pyshepseg.tilingstats.createSegDict()¶
Create the Dict of Dicts for handling per-segment histograms. Each entry is a dictionary, and the key is a segment ID. Each dictionary within this is the per-segment histogram for a single segment. Each of its entries is for a single value from the imagery, the key is the pixel value, and the dictionary value is the number of times that pixel value appears in the segment.
- Returns:
- segDictnumba.typed.Dict
Dictionary of histograms used for calculating statistics.
- pyshepseg.tilingstats.createSegSpatialDataDict()¶
Create a dictionary where the key is the segment ID and the value is a List of
SegPointobjects.- Returns:
- segDictnumba.typed.Dict
Dictionary containing lists of
SegPointobjects.
- pyshepseg.tilingstats.createStatColumns(statsSelection, openRat)¶
Create requested statistic columns on the segmentation image RAT. Statistic columns are of type gdal.GFT_Real for mean and stddev, and gdal.GFT_Integer for all other statistics.
Return the column indexes for all requested columns, in the same order.
- Parameters:
- statsSelectionlist of tuples
Same as passed to
calcPerSegmentStatsTiled()- openRatOpenRatContainer
The file handle(s) for the RAT file
- Returns:
- colIndexListlist of ints
A list of the indexes of each of the requested new columns in the same order as statsSelection.
- pyshepseg.tilingstats.createUserColumnsSpatial(colNamesAndTypes, openRat)¶
Used by
calcPerSegmentSpatialStatsTiled()to create columns specified in thecolNamesAndTypesstructure.Returns a tuple with number of integer columns, number of float columns and a statsSelection_fast array for use by
calcStatsForCompletedSegsSpatial().- Parameters:
- colNamesAndTypeslist of (colName, colType) tuples
Same as passed to
calcPerSegmentSpatialStatsTiled().- openRatOpenRatContainer
The file handle(s) for the RAT file
- Returns:
- numIntColsint
The number of new Integer columns
- numFloatColsint
The number of new Float columns
- statsSelection_fastint ndarray of shape (numStats, 5)
Allows quick access to the types of stats required
- pyshepseg.tilingstats.doImageChecks(segfile, imgfile, imgbandnum)¶
Do the checks that the segment file and image file that is being used to collect the stats actually align. We refuse to process the files if they don’t as it is not clear how they should be made to line up - this is up to the user to get right. Also checks that imgfile is not a float image.
Check that there is a null value set on imgfile, and that the segfile has a histogram.
The two rasters are opened read-only, and closed again afterwards.
- Parameters:
- segfilestr or gdal.Dataset
Path to segmentation file or an open GDAL dataset.
- imgfilestring
Path to input file for collecting statistics from
- imgbandnumint
1-based index of the band number in imgfile to use for collecting stats
- Returns:
- imgNullValnumba.int64
The null value set in the imgdata raster
- segSizendarray
The Histogram column of the segfile, i.e. the pixel counts of each segment
- pyshepseg.tilingstats.equalProjection(proj1, proj2)¶
Returns True if the proj1 is the same as proj2
Stolen from rios/pixelgrid.py
- Parameters:
- proj1string
WKT string for the first projection
- proj2string
WKT string for the second projection
- Returns:
- equalbool
Whether the projections are equal or not
- pyshepseg.tilingstats.getRatPageId(segId)¶
For the given segment ID, return the page ID. This is the segment ID of the first segment in the page.
- pyshepseg.tilingstats.getSortedKeysAndValuesForDict(d)¶
The given dictionary is keyed by pixel values from the imagery, and the values are counts of occurences of the corresponding pixel value. This function returns a pair of numpy arrays (as a tuple), one for the list of pixel values, and one for the corresponding counts. The arrays are sorted in increasing order of pixel value.
- Parameters:
- ddictionary of int
Counts of each pixel value.
- Returns:
- keysSortednumbaTypeForImageType ndarray of shape (numValues,)
Pixel values sorted
- valuesSortedint ndarray of shape (numValues,)
Counts of each pixel sorted by pixel value
- pyshepseg.tilingstats.makeFastStatsSelection(colIndexList, statsSelection)¶
Make a fast version of the statsSelection data structure, combined with the global column index numbers.
Return a tuple of
(statsSelection_fast, numIntCols, numFloatCols)
The statsSelection_fast is a single array, of shape (numStats, 5). The first index corresponds to the sequence in statsSelection. The second index corresponds to the STATSEL_* values.
Everything is encoded as an integer value in a single numpy array, suitable for fast access within numba njit-ed functions.
This is all a bit ugly and un-pythonic. Not sure if there is a better way.
- Parameters:
- colIndexListlist of ints
The column indexes for all requested columns
- statsSelectionlist of tuples
See
tilingstats.calcPerSegmentStatsTiled()for a complete description of this parameter.
- Returns:
- statsSelection_fastint ndarray of shape (numStats, 5)
The statsSelection_fast structure
- intCountint
Number of int columns
- floatCountint
Number of float columns
- pyshepseg.tilingstats.makeOutRatKea(outFile)¶
Create a small KEA file to write a RAT into. Return a single object with all the open GDAL handles on it.
- Parameters:
- outFilestr
Name of output KEA file
- Returns:
- openRatOpenRatContainer
Holds all the open GDAL handles
- pyshepseg.tilingstats.openEverything(readCfg, outFile, outFileIsZarr, tileSize, numXtiles, numYtiles, imgfile, imgbandnum, segfile, timings)¶
Open all the input and output files, ready for the stats routines to do their work.
- pyshepseg.tilingstats.userFuncMeanCoord(pts, imgNullVal, intArr, floatArr, transform)¶
Calculates the mean coordinate of each segment. This function is intended to be passed in as the
userFuncparameter tocalcPerSegmentSpatialStatsTiled()if the mean coordinates of each segment are required.The
transformof the image (ie. returned by ds.GetGeoTransform() where ds is a GDAL dataset) is to be passed asuserParambut converted to an array so it works with Numba.The result will be written to floatArr (2 values - easting then northing). It is expected that at least 2 float columns are available.
- Parameters:
- ptsnumba.typed.List containing SegPoint objects
This is the list passed to the userFunc.
- imgNullValint
The nodata value for the imagery
- intArrint ndarray of shape (numIntCols, )
The integer columns - not used in this function
- floatArrint ndarray of shape (numFloatCols, )
The float columns - output written here
- transformfloat ndarray of shape (6,)
The GDAL transform array. Passed in as a
userParamtotilingstats.calcPerSegmentSpatialStatsTiled().
- pyshepseg.tilingstats.userFuncNumEdgePixels(pts, imgNullVal, intArr, floatArr, fourConnected)¶
Calculates the number of ‘edge’ pixels for each segment. Edge pixels are pixels that touch either another segment or the edge of the image. This function is intended to be passed in as the
userFuncparameter tocalcPerSegmentSpatialStatsTiled()if the number of edge pixels are required.The result will be written to intArr (1 value - number of edge pixels).
- Parameters:
- ptsnumba.typed.List containing SegPoint objects
This is the list passed to the userFunc.
- imgNullValint
The nodata value for the imagery
- intArrint ndarray of shape (numIntCols, )
The integer columns - output written here
- floatArrint ndarray of shape (numFloatCols, )
The float columns - not used in this function
- fourConnectedbool
If True, use four-way connectedness to judge neighbours, otherwise use eight-way.
- pyshepseg.tilingstats.userFuncVariogram(pts, imgNullVal, intArr, floatArr, maxDist)¶
Calculates the variogram at the given distance for the segment contained in the tile. This function is intended to be passed in as the
userFuncparameter tocalcPerSegmentSpatialStatsTiled()if variograms are to be calculated.maxDistis the number of variograms to calculate and should be passed in as theuserParamargument tocalcPerSegmentSpatialStatsTiled().It is assumed that floatArr has enough space for the number of variograms requested (this is calculated from the
colNamesAndTypesparameter tocalcPerSegmentSpatialStatsTiled()).- Parameters:
- ptsnumba.typed.List containing SegPoint objects
This is the list passed to the userFunc.
- imgNullValint
The nodata value for the imagery
- intArrint ndarray of shape (numIntCols, )
The integer columns - not used in this function
- floatArrint ndarray of shape (numFloatCols, )
The float columns - output written here
- maxDistint
Number of variograms to calculate
- pyshepseg.tilingstats.writeCompletePages(pagedRat, openRat, statsSelection_fast)¶
Check for completed pages, and write them to the attribute table. Remove them from the pagedRat after writing.
- Parameters:
- pagedRatnumba.typed.Dict
The RAT as a paged data structure
- attrTblgdal.RasterAttributeTable
The Raster Attribute Table object for the file
- statsSelection_fastint ndarray of shape (numStats, 5)
Allows quick access to the types of stats required