Class Histogram

Object
Histogram

public final class Histogram extends Object
A histogram of integer values.

The bin-size is always 1, so each bin corresponds to a discrete integer.

See histogram on Wikipedia.

This can be used to record a discrete probability distribution, and is typically used in the Anchor software to record the distribution of image voxel intensity values.

Note that this is dense implementation and memory is allocated to store all values from minValue to maxValue (inclusive). This can be a lot of memory for e.g. unsigned-short value types. However, it allows for a maximally efficient incrementing through voxels in an image, without intermediate structures.

Author:
Owen Feehan
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static interface 
    Consumes a bin and corresponding count.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Histogram(int maxValue)
    Constructs with a maximum value, and assuming a minimum value of 0.
    Histogram(int minValue, int maxValue)
    Constructs with a minimum and maximum value.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Adds the counts from another histogram to the current object.
    int
    Calculates the maximum value with non zero-count among the histogram values.
    int
    Calculates the minimum value with non zero-count among the histogram values.
    int
    Calculates the mode of the histogram values.
    long
    Calculates the sum of all values in the distribution considering their counts.
    long
    Calculates the cube of the squares of all values in the distribution considering their counts.
    long
    Calculates the sum of the squares of all values in the distribution considering their counts.
    long
    Gets the total count of all values that match a predicate.
    cropRemoveLargerValues(long maxCount)
    Like cropRemoveSmallerValues(long) but larger values are removed rather than smaller values if the total count is too high.
    cropRemoveSmallerValues(long maxCount)
    Creates a Histogram reusing the bins in the current histogram, but with an upper limit on the total count.
    Creates a deep-copy of the current object.
    int
    getCount(int value)
    The count corresponding to a particular value.
    int
    Maximum possible value in the histogram (inclusive).
    long
    The total count across values in the histogram.
    boolean
    hasNonZeroCount(int threshold)
    Whether at least one value, greater or equal to startMin has non-zero count?
    void
    incrementValue(int value)
    Increments the count for a particular value by one.
    void
    incrementValueBy(int value, int increase)
    Increments the count for a particular value.
    void
    incrementValueBy(int value, long increase)
    Like incrementValueBy(int, int) but accepts a long as the increase argument.
    boolean
    If no value exists in the histogram with a count greater than zero.
    void
    Calls consumer for every value, increasing from min to max.
    void
    Calls consumer for every value until a limit, increasing from min to limit.
    double
    Calculates the mean of the histogram values, considering their frequency.
    double
    mean(double power)
    Calculates the mean of the values in the distribution, if each value is raised to a power.
    double
    mean(double power, double subtractValue)
    Like mean(double) but a value may be subtracted before raising to a power.
    int
    quantile(double quantile)
    Calculates the corresponding value for a particular quantile in the distribution of values in the histogram.
    void
    removeBelowThreshold(int threshold)
    All values less than threshold are removed.
    void
    Sets the count for all values to 0.
    int
    The size of the range of values in the histogram.
    double
    Calculates the standard-deviation of the distribution represented by the histogram.
    Generates a new histogram containing only values that match a predicate.
    A string representation of what's in the histogram.
    void
    transferCount(int valueFrom, int valueTo)
    Moves all count for a particular value and adds it to the count for another.
    double
    Calculates the variance of the distribution represented by the histogram.
    void
    zeroValue(int value)
    Sets the count for a particular value to 0.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • Histogram

      public Histogram(int maxValue)
      Constructs with a maximum value, and assuming a minimum value of 0.
      Parameters:
      maxValue - maximum possible value in the histogram (inclusive).
    • Histogram

      public Histogram(int minValue, int maxValue)
      Constructs with a minimum and maximum value.
      Parameters:
      minValue - minimum possible value in the histogram (inclusive).
      maxValue - maximum possible value in the histogram (inclusive).
  • Method Details

    • duplicate

      public Histogram duplicate()
      Creates a deep-copy of the current object.
      Returns:
      a deep-copy.
    • reset

      public void reset()
      Sets the count for all values to 0.
    • zeroValue

      public void zeroValue(int value)
      Sets the count for a particular value to 0.
      Parameters:
      value - the value whose count is zeroed.
    • transferCount

      public void transferCount(int valueFrom, int valueTo)
      Moves all count for a particular value and adds it to the count for another.
      Parameters:
      valueFrom - the value whose count is moved, after which it's count is set to zero.
      valueTo - the value to which the count for valueFrom is added.
    • incrementValue

      public void incrementValue(int value)
      Increments the count for a particular value by one.
      Parameters:
      value - the value whose count will be incremented by one.
    • incrementValueBy

      public void incrementValueBy(int value, int increase)
      Increments the count for a particular value.
      Parameters:
      value - the value whose count will be incremented.
      increase - how much to increase the count by.
    • incrementValueBy

      public void incrementValueBy(int value, long increase)
      Like incrementValueBy(int, int) but accepts a long as the increase argument.
      Parameters:
      value - the value whose count will be incremented.
      increase - how much to increase the count by.
      Throws:
      ArithmeticException - if increase cannot be converted to an int safely.
    • removeBelowThreshold

      public void removeBelowThreshold(int threshold)
      All values less than threshold are removed.
      Parameters:
      threshold - values greater or equal to this are kept in the histogram, lesser values are removed.
    • isEmpty

      public boolean isEmpty()
      If no value exists in the histogram with a count greater than zero.
      Returns:
      true iff the histogram has zero-count for all values.
    • getCount

      public int getCount(int value)
      The count corresponding to a particular value.
      Parameters:
      value - the value (the bin) to find a count for.
      Returns:
      the corresponding count.
    • size

      public int size()
      The size of the range of values in the histogram.

      This is equivalent to (maxValue - minValue + 1).

      Returns:
      the number of values represented in the histogram.
    • addHistogram

      public void addHistogram(Histogram other) throws OperationFailedException
      Adds the counts from another histogram to the current object.

      Both histograms must have identical minimum and maximum values, and therefore represent the same range of values.

      Parameters:
      other - the histogram to add.
      Throws:
      OperationFailedException - if the histograms do have identical minimum and maximum values.
    • mean

      public double mean() throws OperationFailedException
      Calculates the mean of the histogram values, considering their frequency.

      Specifically, this is the mean of value * countFor(value) across all values.

      Returns:
      the mean.
      Throws:
      OperationFailedException - if the histogram has no values.
    • quantile

      public int quantile(double quantile) throws OperationFailedException
      Calculates the corresponding value for a particular quantile in the distribution of values in the histogram.

      See Quantile on wikipedia.

      A quantile of 0.3, would return the minimal value, greater or equal to at least 30% of the count.

      Parameters:
      quantile - the quantile, in the interval [0, 1].
      Returns:
      the mean.
      Throws:
      OperationFailedException - if the histogram has no values, or the quantile is outside acceptable bounds.
    • hasNonZeroCount

      public boolean hasNonZeroCount(int threshold)
      Whether at least one value, greater or equal to startMin has non-zero count?
      Parameters:
      threshold - only values greater or equal to threshold are considered. Use 0 for all values.
      Returns:
      true iff at least one value in this range has a non-zero count, false if all values in the range are zero.
    • calculateMode

      public int calculateMode() throws OperationFailedException
      Calculates the mode of the histogram values.

      The mode is the most frequently occurring item.

      Returns:
      the mode.
      Throws:
      OperationFailedException - if the histogram has no values.
    • calculateMaximum

      public int calculateMaximum() throws OperationFailedException
      Calculates the maximum value with non zero-count among the histogram values.
      Returns:
      the maximal value with non-zero count.
      Throws:
      OperationFailedException - if the histogram has no values.
    • calculateMinimum

      public int calculateMinimum() throws OperationFailedException
      Calculates the minimum value with non zero-count among the histogram values.
      Returns:
      the minimal value with non-zero count.
      Throws:
      OperationFailedException - if the histogram has no values.
    • calculateSum

      public long calculateSum()
      Calculates the sum of all values in the distribution considering their counts.

      Specifically, the sum is value * countFor(value) across all values.

      Returns:
      the sum.
    • calculateSumSquares

      public long calculateSumSquares()
      Calculates the sum of the squares of all values in the distribution considering their counts.

      Specifically, the sum is value^2 * countFor(value) across all values.

      Returns:
      the sum of squares.
    • calculateSumCubes

      public long calculateSumCubes()
      Calculates the cube of the squares of all values in the distribution considering their counts.

      Specifically, the sum is value^3 * countFor(value) across all values.

      Returns:
      the sum of cubes.
    • standardDeviation

      public double standardDeviation() throws OperationFailedException
      Calculates the standard-deviation of the distribution represented by the histogram.
      Returns:
      the standard-deviation.
      Throws:
      OperationFailedException - if the histogram has no values.
    • variance

      public double variance() throws OperationFailedException
      Calculates the variance of the distribution represented by the histogram.
      Returns:
      the variance.
      Throws:
      OperationFailedException - if the histogram has no values.
    • countMatching

      public long countMatching(IntPredicate predicate)
      Gets the total count of all values that match a predicate.
      Parameters:
      predicate - the predicate a value must match to be included in the count.
      Returns:
      the sum of the counts corresponding to all values that match the predicate.
    • threshold

      public Histogram threshold(DoublePredicate predicate)
      Generates a new histogram containing only values that match a predicate.

      This is an immutable operation. The existing histogram's values are unchanged.

      Parameters:
      predicate - a condition that must hold on the value for it to be included in the created histogram.
      Returns:
      a newly created Histogram containing values and corresponding counts from this object, but only if they fulfill the predicate.
    • toString

      public String toString()
      A string representation of what's in the histogram.
      Overrides:
      toString in class Object
    • getTotalCount

      public long getTotalCount()
      The total count across values in the histogram.

      This is pre-calculated, so calling this operation occurs no computational expense.

      Returns:
      the total count.
    • cropRemoveSmallerValues

      public Histogram cropRemoveSmallerValues(long maxCount)
      Creates a Histogram reusing the bins in the current histogram, but with an upper limit on the total count.

      If more total count exists than maxCount, values are removed in ascending order, until the count is under the limit.

      Parameters:
      maxCount - the maximum allowable total-count for the extracted histogram.
      Returns:
      a newly created Histogram either a copy of the existing (if the total count is less than maxCount or cropped as per above rules.
    • cropRemoveLargerValues

      public Histogram cropRemoveLargerValues(long maxCount)
      Like cropRemoveSmallerValues(long) but larger values are removed rather than smaller values if the total count is too high.
      Parameters:
      maxCount - the maximum allowable total-count for the extracted histogram.
      Returns:
      a newly created Histogram either a copy of the existing (if the total count is less than maxCount or cropped as per above rules.
    • mean

      public double mean(double power) throws OperationFailedException
      Calculates the mean of the values in the distribution, if each value is raised to a power.

      Specifically, it calculates the mean of countFor(value) * value^power across all values.

      Parameters:
      power - the power to raise each value to.
      Returns:
      the calculated mean.
      Throws:
      OperationFailedException - if the histogram has no values.
    • mean

      public double mean(double power, double subtractValue) throws OperationFailedException
      Like mean(double) but a value may be subtracted before raising to a power.

      Specifically, it calculates the mean of countFor(value) * (value - subtractValue)^power across all values.

      Parameters:
      power - the power to raise each value to (after subtraction).
      subtractValue - a value subtracted before raising to a power.
      Returns:
      the calculated mean.
      Throws:
      OperationFailedException - if the histogram has no values.
    • iterateValues

      public void iterateValues(Histogram.BinConsumer consumer)
      Calls consumer for every value, increasing from min to max.
      Parameters:
      consumer - called for every bin.
    • iterateValuesUntil

      public void iterateValuesUntil(int limit, Histogram.BinConsumer consumer)
      Calls consumer for every value until a limit, increasing from min to limit.
      Parameters:
      limit - the maximum-value to consume (inclusive).
      consumer - called for every bin.
    • getMaxValue

      public int getMaxValue()
      Maximum possible value in the histogram (inclusive).