Skip to content

Commit

Permalink
ENH: Update the output from histogram ranges to be a 2 component array
Browse files Browse the repository at this point in the history
Signed-off-by: Michael Jackson <[email protected]>
  • Loading branch information
imikejackson committed Sep 30, 2024
1 parent 9bb1c89 commit 9afb9fc
Show file tree
Hide file tree
Showing 9 changed files with 107 additions and 49 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ nx::core::Result<> LoadEbsdData(const nx::core::ReadH5EbsdInputValues* mInputVal
err = ebsdReader->loadData(dcDims[0], dcDims[1], dcDims[2], mRefFrameZDir);
if(err < 0)
{
return {nx::core::MakeErrorResult(-50003, fmt::format("Error loading data from H5Ebsd file '{}'", mInputValues->inputFilePath))};
return {nx::core::MakeErrorResult(-50003, fmt::format("Error loading data from H5Ebsd file '{}'. Error from EbsdLib is {}", mInputValues->inputFilePath, err))};
}

nx::core::DataPath geometryPath = mInputValues->dataContainerPath;
Expand Down
5 changes: 5 additions & 0 deletions src/Plugins/SimplnxCore/docs/ComputeArrayHistogramFilter.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ Statistics(Ensemble)

This **Filter** accepts **DataArray(s)** as input, creates histogram **DataArray(s)** in specified **DataGroup** from input **DataArray(s)**, then calculates histogram values according to user parameters and stores values in created histogram **DataArray(s)**.

The output is in the form of 2 Data Arrays. The first data array will have the counts. The number of tuples of the array is
the same as the number of bins in the histogram. The second data array will have the bin ranges. The array has 2 components
where the first component of each tuple is the minimum of the bin (inclusive) and the second component of the tuple
is the maximum for that bin (exclusive).

## Example Data

Using some data about the "Old Faithful" geyser in the United States from the [R site](http://www.r-tutor.com/elementary-statistics/quantitative-data/frequency-distribution-quantitative-data), here is the top few lines of data:
Expand Down
22 changes: 21 additions & 1 deletion src/Plugins/SimplnxCore/docs/ComputeArrayStatisticsFilter.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,27 @@ The user must select a destination **Attribute Matrix** in which the computed st

Special operations occur for certain statistics if the supplied array is of type *bool* (for example, a mask array produced [when thresholding](@ref multithresholdobjects)). The length, minimum, maximum, median, mode, and summation are computed as normal (although the resulting values may be platform dependent). The mean and standard deviation for a boolean array will be true if there are more instances of true in the array than false. If *Standardize Data* is chosen for a boolean array, no actual modifications will be made to the input. These operations for boolean inputs are chosen as a basic convention, and are not intended be representative of true boolean logic.

**Note**: If *Find Histogram* is on AND *Compute Statistics Per Feature/Ensemble* is on, then any features that have the exact same value throughout the entire feature will have its first histogram bin set to the total count of feature values. All other bins will be 0.

## Hisogram Notes:

When creating a histogram the output arrays can take 2 different layouts.

### Histogram and "Compute Statistics by Feature/Ensemble" is NOT selected

The output is in the form of 2 Data Arrays. The first data array will have the counts. The number of tuples of the array is
the same as the number of bins in the histogram. The second data array will have the bin ranges. The array has 2 components
where the first component of each tuple is the minimum of the bin (inclusive) and the second component of the tuple
is the maximum for that bin (exclusive).

### Histogram and "Compute Statistics by Feature/Ensemble" IS selected

The output is in the form of 2 arrays, but for each output array the number of tuples of the array
is the same as the number of features/ensembles for which you are calculating the statistics. The number of components
for the "Counts" array is now the number of bins. The second array is the same tuple shape as the
counts array but now the number of components is the number of bins * 2 and the data
is encoded as [Bin Min, Bin Max], [Bin Min, Bin Max].

**Note**:

% Auto generated parameter table will be inserted here

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ class ComputeArrayStatisticsByIndexImpl

if(m_Histogram && binCountsStorePtr != nullptr && binRangesStorePtr != nullptr)
{
std::vector<T> ranges(m_NumBins + 1);
std::vector<T> ranges(m_NumBins * 2);
std::vector<uint64> histogram(m_NumBins, 0);
if(length[localFeatureIndex] > 0)
{
Expand Down Expand Up @@ -630,7 +630,7 @@ void FindStatisticsImpl(const ContainerType& data, std::vector<IArray*>& arrays,
std::atomic_bool neverCancel{false};
std::atomic<usize> overflow{0};
std::vector<uint64> binCounts(inputValues->NumBins, 0);
std::vector<T> binRanges(inputValues->NumBins + 1);
std::vector<T> binRanges(inputValues->NumBins * 2);

Result<> result = {};
if constexpr(std::is_same_v<DataArray<T>, ContainerType>)
Expand All @@ -642,8 +642,12 @@ void FindStatisticsImpl(const ContainerType& data, std::vector<IArray*>& arrays,
result = HistogramUtilities::serial::GenerateHistogram(data, binRanges, range, neverCancel, inputValues->NumBins, binCounts, overflow);
}

binCountsStore.setTuple(0, binCounts);
binRangesStore.setTuple(0, binRanges);
for(size_t i = 0; i < inputValues->NumBins; i++)
{
binCountsStore.setComponent(i, 0, binCounts[i]);
binRangesStore.setComponent(i, 0, binRanges[i * 2]);
binRangesStore.setComponent(i, 1, binRanges[i * 2 + 1]);
}

auto maxElementIt = std::max_element(binCounts.begin(), binCounts.end());
uint64 index = std::distance(binCounts.begin(), maxElementIt);
Expand Down Expand Up @@ -699,10 +703,12 @@ void FindStatistics(const DataArray<T>& source, const Int32Array* featureIds, co
auto* modeArrayPtr = dynamic_cast<NeighborList<T>*>(arrays[5]);
auto* stdDevArrayPtr = dynamic_cast<Float32Array*>(arrays[6]);
auto* summationArrayPtr = dynamic_cast<Float32Array*>(arrays[7]);

auto* histBinCountsArrayPtr = dynamic_cast<UInt64Array*>(arrays[8]);
auto* histBinRangesArrayPtr = dynamic_cast<DataArray<T>*>(arrays[12]);
auto* mostPopulatedBinPtr = dynamic_cast<UInt64Array*>(arrays[10]);
auto* modalBinsArrayPtr = dynamic_cast<NeighborList<T>*>(arrays[11]);
auto* histBinRangesArrayPtr = dynamic_cast<DataArray<T>*>(arrays[12]);

auto* featureHasDataPtr = dynamic_cast<BoolArray*>(arrays[13]);

IParallelAlgorithm::AlgorithmArrays indexAlgArrays;
Expand Down Expand Up @@ -1094,8 +1100,8 @@ Result<> ComputeArrayStatistics::operator()()
const auto& featureIds = m_DataStructure.getDataRefAs<Int32Array>(m_InputValues->FeatureIdsArrayPath);
numFeatures = findNumFeatures(featureIds);

auto* destAttrMatPtr = m_DataStructure.getDataAs<AttributeMatrix>(m_InputValues->DestinationAttributeMatrix);
destAttrMatPtr->resizeTuples({numFeatures});
// auto* destAttrMatPtr = m_DataStructure.getDataAs<AttributeMatrix>(m_InputValues->DestinationAttributeMatrix);
// destAttrMatPtr->resizeTuples({numFeatures});

for(const auto& array : arrays)
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,13 @@ Uuid ComputeArrayHistogramFilter::uuid() const
//------------------------------------------------------------------------------
std::string ComputeArrayHistogramFilter::humanName() const
{
return "Calculate Frequency Histogram";
return "Compute Attribute Array Frequency Histogram";
}

//------------------------------------------------------------------------------
std::vector<std::string> ComputeArrayHistogramFilter::defaultTags() const
{
return {className(), "Statistics", "Ensemble"};
return {className(), "Statistics", "Ensemble", "Histogram"};
}

//------------------------------------------------------------------------------
Expand All @@ -56,7 +56,7 @@ Parameters ComputeArrayHistogramFilter::parameters() const

// Create the parameter descriptors that are needed for this filter
params.insertSeparator(Parameters::Separator{"Input Parameter(s)"});
params.insert(std::make_unique<Int32Parameter>(k_NumberOfBins_Key, "Number of Bins", "Specifies number of histogram bins (greater than zero)", 1));
params.insert(std::make_unique<Int32Parameter>(k_NumberOfBins_Key, "Number of Bins", "Specifies number of histogram bins (greater than zero)", 10));
params.insertLinkableParameter(
std::make_unique<BoolParameter>(k_UserDefinedRange_Key, "Use Custom Min & Max Range", "Whether the user can set the min and max values to consider for the histogram", false));
params.insert(std::make_unique<Float64Parameter>(k_MinRange_Key, "Min Value", "Specifies the inclusive lower bound of the histogram.", 0.0));
Expand All @@ -67,7 +67,7 @@ Parameters ComputeArrayHistogramFilter::parameters() const
MultiArraySelectionParameter::ValueType{}, MultiArraySelectionParameter::AllowedTypes{IArray::ArrayType::DataArray},
nx::core::GetAllNumericTypes()));

params.insertSeparator(Parameters::Separator{"Output parameters"});
params.insertSeparator(Parameters::Separator{"Output Parameters"});
params.insertLinkableParameter(
std::make_unique<BoolParameter>(k_CreateNewDataGroup_Key, "Create New DataGroup for Histograms", "Whether or not to store the calculated histogram(s) in a new DataGroup", true));
params.insert(std::make_unique<DataGroupCreationParameter>(k_NewDataGroupPath_Key, "New DataGroup Path", "The path to the new DataGroup in which to store the calculated histogram(s)", DataPath{}));
Expand Down Expand Up @@ -111,7 +111,6 @@ IFilter::PreflightResult ComputeArrayHistogramFilter::preflightImpl(const DataSt
auto pBinRangeSuffix = filterArgs.value<std::string>(k_HistoBinRangeName_Key);

nx::core::Result<OutputActions> resultOutputActions;
;

if(pNewDataGroupValue)
{
Expand All @@ -137,7 +136,7 @@ IFilter::PreflightResult ComputeArrayHistogramFilter::preflightImpl(const DataSt
}

{
auto createArrayAction = std::make_unique<CreateArrayAction>(dataArray->getDataType(), std::vector<usize>{static_cast<usize>(pNumberOfBinsValue + 1)}, std::vector<usize>{1},
auto createArrayAction = std::make_unique<CreateArrayAction>(dataArray->getDataType(), std::vector<usize>{static_cast<usize>(pNumberOfBinsValue)}, std::vector<usize>{2},
parentPath.createChildPath((dataArray->getName() + pBinRangeSuffix)));
resultOutputActions.value().appendAction(std::move(createArrayAction));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include "simplnx/DataStructure/DataPath.hpp"
#include "simplnx/Filter/Actions/CreateArrayAction.hpp"
#include "simplnx/Filter/Actions/CreateAttributeMatrixAction.hpp"
#include "simplnx/Filter/Actions/CreateDataGroupAction.hpp"
#include "simplnx/Filter/Actions/CreateNeighborListAction.hpp"
#include "simplnx/Parameters/ArraySelectionParameter.hpp"
#include "simplnx/Parameters/AttributeMatrixSelectionParameter.hpp"
Expand Down Expand Up @@ -57,7 +58,8 @@ OutputActions CreateCompatibleArrays(const DataStructure& dataStructure, const A

OutputActions actions;

auto amAction = std::make_unique<CreateAttributeMatrixAction>(destinationAttributeMatrixValue, tupleDims);
auto amAction = std::make_unique<CreateDataGroupAction>(destinationAttributeMatrixValue);

actions.appendAction(std::move(amAction));

if(computeByIndexValue)
Expand Down Expand Up @@ -115,7 +117,31 @@ OutputActions CreateCompatibleArrays(const DataStructure& dataStructure, const A
auto action = std::make_unique<CreateArrayAction>(DataType::float32, tupleDims, std::vector<usize>{1}, destinationAttributeMatrixValue.createChildPath(arrayPath));
actions.appendAction(std::move(action));
}
if(findHistogramValue)
if(findHistogramValue && !computeByIndexValue)
{
{
auto arrayPath = args.value<std::string>(ComputeArrayStatisticsFilter::k_HistoBinCountName_Key);
auto action = std::make_unique<CreateArrayAction>(DataType::uint64, std::vector<usize>{numBins}, std::vector<usize>{1ULL}, destinationAttributeMatrixValue.createChildPath(arrayPath));
actions.appendAction(std::move(action));
}
{
auto arrayPath = args.value<std::string>(ComputeArrayStatisticsFilter::k_HistoBinRangeName_Key);
auto action = std::make_unique<CreateArrayAction>(dataType, std::vector<usize>{numBins}, std::vector<usize>{2ULL}, destinationAttributeMatrixValue.createChildPath(arrayPath));
actions.appendAction(std::move(action));
}
{
auto arrayPath = args.value<std::string>(ComputeArrayStatisticsFilter::k_MostPopulatedBinArrayName_Key);
auto action = std::make_unique<CreateArrayAction>(DataType::uint64, tupleDims, std::vector<usize>{2}, destinationAttributeMatrixValue.createChildPath(arrayPath));
actions.appendAction(std::move(action));
}
if(findModalBinRanges)
{
auto arrayPath = args.value<std::string>(ComputeArrayStatisticsFilter::k_ModalBinArrayName_Key);
auto action = std::make_unique<CreateNeighborListAction>(dataType, tupleSize, destinationAttributeMatrixValue.createChildPath(arrayPath));
actions.appendAction(std::move(action));
}
}
if(findHistogramValue && computeByIndexValue)
{
{
auto arrayPath = args.value<std::string>(ComputeArrayStatisticsFilter::k_HistoBinCountName_Key);
Expand All @@ -124,7 +150,7 @@ OutputActions CreateCompatibleArrays(const DataStructure& dataStructure, const A
}
{
auto arrayPath = args.value<std::string>(ComputeArrayStatisticsFilter::k_HistoBinRangeName_Key);
auto action = std::make_unique<CreateArrayAction>(dataType, tupleDims, std::vector<usize>{numBins + 1}, destinationAttributeMatrixValue.createChildPath(arrayPath));
auto action = std::make_unique<CreateArrayAction>(dataType, tupleDims, std::vector<usize>{numBins * 2}, destinationAttributeMatrixValue.createChildPath(arrayPath));
actions.appendAction(std::move(action));
}
{
Expand Down Expand Up @@ -186,7 +212,7 @@ std::string ComputeArrayStatisticsFilter::humanName() const
//------------------------------------------------------------------------------
std::vector<std::string> ComputeArrayStatisticsFilter::defaultTags() const
{
return {className(), "SimplnxCore", "Statistics"};
return {className(), "SimplnxCore", "Statistics", "Histogram", "Mean", "Average", "Min", "Max", "Standard Deviation", "Length"};
}

//------------------------------------------------------------------------------
Expand All @@ -198,16 +224,19 @@ Parameters ComputeArrayStatisticsFilter::parameters() const
params.insertSeparator(Parameters::Separator{"Input Data"});
params.insert(std::make_unique<ArraySelectionParameter>(k_SelectedArrayPath_Key, "Attribute Array to Compute Statistics", "Input Attribute Array for which to compute statistics", DataPath{},
nx::core::GetAllDataTypes(), ArraySelectionParameter::AllowedComponentShapes{{1}}));
params.insertSeparator(Parameters::Separator{"Output Data"});
params.insert(
std::make_unique<DataGroupCreationParameter>(k_DestinationAttributeMatrixPath_Key, "Destination Attribute Matrix", "Attribute Matrix in which to store the computed statistics", DataPath{}));

params.insertSeparator(Parameters::Separator{"Histogram Options"});
params.insertLinkableParameter(std::make_unique<BoolParameter>(k_FindHistogram_Key, "Find Histogram", "Whether to compute the histogram of the input array", false));
params.insert(std::make_unique<Float64Parameter>(k_MinRange_Key, "Histogram Min Value", "Min cutoff value for histogram", 0.0));
params.insert(std::make_unique<Float64Parameter>(k_MaxRange_Key, "Histogram Max Value", "Max cutoff value for histogram", 0.0));
params.insert(std::make_unique<Int32Parameter>(k_NumBins_Key, "Number of Bins", "Number of bins in histogram", 10));
params.insert(
std::make_unique<BoolParameter>(k_UseFullRange_Key, "Use Full Range for Histogram", "If true, ignore min and max and use min and max from array upon which histogram is computed", false));
params.insert(std::make_unique<Int32Parameter>(k_NumBins_Key, "Number of Bins", "Number of bins in histogram", 1));

params.insert(std::make_unique<Float64Parameter>(k_MinRange_Key, "Custom Histogram Min Value", "Min cutoff value for histogram", 0.0));
params.insert(std::make_unique<Float64Parameter>(k_MaxRange_Key, "Custom Histogram Max Value", "Max cutoff value for histogram", 1.0));

params.insert(std::make_unique<DataObjectNameParameter>(k_HistoBinCountName_Key, "Histogram Bin Counts Array Name", "The name of the histogram bin counts array", "Histogram Bin Counts"));
params.insert(std::make_unique<DataObjectNameParameter>(k_HistoBinRangeName_Key, "Histogram Bin Ranges Array Name", "The name of the histogram bin ranges array", "Histogram Bin Ranges"));
params.insert(std::make_unique<DataObjectNameParameter>(k_MostPopulatedBinArrayName_Key, "Most Populated Bin Array Name", "The name of the Most Populated Bin array", "Most Populated Bin"));
Expand Down
8 changes: 4 additions & 4 deletions src/Plugins/SimplnxCore/test/ComputeArrayHistogramTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ void compareHistograms(const AbstractDataStore<T>& calculated, const std::array<
{
if(calculated.getSize() != actual.size())
{
throw std::runtime_error("Improper sizing of DataStore");
throw std::runtime_error(fmt::format("Improper sizing of DataStore. {} vs {}", calculated.getSize(), actual.size()));
}
for(int32 i = 0; i < N; i++)
{
Expand Down Expand Up @@ -81,23 +81,23 @@ TEST_CASE("SimplnxCore::ComputeArrayHistogram: Valid Filter Execution", "[Simpln
SIMPLNX_RESULT_REQUIRE_VALID(executeResult.result);

{
std::array<float64, 5> binRangesSet = {-56.8, 184.475, 425.75, 667.025, 908.3};
std::array<float64, 8> binRangesSet = {-56.8, 184.475, 184.475, 425.75, 425.75, 667.025, 667.025, 908.3};
std::array<uint64, 4> binCountsSet = {11, 0, 0, 1};
const std::string name = k_Array0Name;

compareHistograms(dataStruct.getDataAs<Float64Array>(dataGPath.createChildPath((name + std::string{k_BinRangesSuffix})))->getDataStoreRef(), binRangesSet);
compareHistograms(dataStruct.getDataAs<UInt64Array>(dataGPath.createChildPath((name + std::string{k_BinCountsSuffix})))->getDataStoreRef(), binCountsSet);
}
{
std::array<int32, 5> binRangesSet = {-90, -44, 2, 48, 94};
std::array<int32, 8> binRangesSet = {-90, -44, -44, 2, 2, 48, 48, 94};
std::array<uint64, 4> binCountsSet = {1, 2, 3, 6};
const std::string name = k_Array1Name;

compareHistograms(dataStruct.getDataAs<Int32Array>(dataGPath.createChildPath((name + std::string{k_BinRangesSuffix})))->getDataStoreRef(), binRangesSet);
compareHistograms(dataStruct.getDataAs<UInt64Array>(dataGPath.createChildPath((name + std::string{k_BinCountsSuffix})))->getDataStoreRef(), binCountsSet);
}
{
std::array<uint32, 5> binRangesSet = {34, 2270, 4506, 6742, 8978};
std::array<uint32, 8> binRangesSet = {34, 2270, 2270, 4506, 4506, 6742, 6742, 8978};
std::array<uint64, 4> binCountsSet = {11, 0, 0, 1};
const std::string name = k_Array2Name;

Expand Down
Loading

0 comments on commit 9afb9fc

Please sign in to comment.