public class ClusterByTimestamp extends CopyFilesNaming<ClusterMembership>
Associates particular timestamp with each file, and clusters.

The timestamp is chosen, in this order of priority:

  • A date / time string extracted from the filename, if exists in particular patterns, falling back to creation-time, if none exists.
  • Original photo-taken time from EXIF metadata if available, and the file has a jpg or jpeg extension.
  • File creation time.

Timezones are assumed to be the current time-zone, if not otherwise indicated.

File modification time is not considered.

The clustered are named 01, 02, 03 etc. depending on the number of clusters.

The DBSCAN algorithm is used for clustering.

A special cluster OUTLIER_CLUSTER_IDENTIFIER may also be created, for points that were not density-reachable by others, and aren't part of any cluster in particular.

The relative-path of files are preserved, being added relative to the cluster subdirectory.

The default-patterns for matching filenames are:

  • yyyy-mm-dd hh:mm:ss
  • yyyymmdd_hhmmss
  • yyyymmdd hhmmss
Author:
Owen Feehan
  • Constructor Details

    • ClusterByTimestamp

      public ClusterByTimestamp()
  • Method Details

    • beforeCopying

      public ClusterMembership beforeCopying(Path destinationDirectory, List<FileWithDirectoryInput> inputs) throws OperationFailedException
      Description copied from class: CopyFilesNaming
      Specified by:
      beforeCopying in class CopyFilesNaming<ClusterMembership>
      Parameters:
      destinationDirectory - the directory to which files are copied.
      inputs - the total number of files to copy.
      Throws:
      OperationFailedException
    • destinationPathRelative

      public Optional<Path> destinationPathRelative(File file, DirectoryWithPrefix outputTarget, int index, CopyContext<ClusterMembership> context) throws OutputWriteFailedException
      Description copied from class: CopyFilesNaming
      Calculates the relative-output path (to be appended to destDir)
      Specified by:
      destinationPathRelative in class CopyFilesNaming<ClusterMembership>
      Parameters:
      file - file to be copied
      outputTarget - the directory and prefix associated with the file for outputting
      index - an increasing sequence of numbers for each file beginning at 0
      context - the context for the copying
      Returns:
      the relative-path. if empty, the file should be skipped.
      Throws:
      OutputWriteFailedException
    • getThresholdHours

      public double getThresholdHours()
      Files whose creation-time differs <= this parameter are joined into the same cluster.

      This is the principle parameter for affecting the sensitivity of the clustering. It is specified in hours between the date-time of two files.

      A larger value encourages a smaller total number of clusters (or larger cluster-size). A smaller values encourages the opposite.

    • setThresholdHours

      public void setThresholdHours(double thresholdHours)
      Files whose creation-time differs <= this parameter are joined into the same cluster.

      This is the principle parameter for affecting the sensitivity of the clustering. It is specified in hours between the date-time of two files.

      A larger value encourages a smaller total number of clusters (or larger cluster-size). A smaller values encourages the opposite.

    • getMinimumPerCluster

      public int getMinimumPerCluster()
      The minimum number of files that must exist for a cluster.
    • setMinimumPerCluster

      public void setMinimumPerCluster(int minimumPerCluster)
      The minimum number of files that must exist for a cluster.
    • isPreserveSubdirectories

      public boolean isPreserveSubdirectories()
      If true, the entire relative-path is used when copying files into the cluster directory. If false, only the file-name is used.
    • setPreserveSubdirectories

      public void setPreserveSubdirectories(boolean preserveSubdirectories)
      If true, the entire relative-path is used when copying files into the cluster directory. If false, only the file-name is used.
    • getTimestampPatterns

      public List<TimestampPattern> getTimestampPatterns()
      The patterns which can be used to extract a date-time from a filename.
    • setTimestampPatterns

      public void setTimestampPatterns(List<TimestampPattern> timestampPatterns)
      The patterns which can be used to extract a date-time from a filename.
    • getTimeZoneOffset

      public int getTimeZoneOffset()
      If >= 0, sets a specific time-offset in hours. If == -1, then the offset is taken from the current system time-zone settings.
    • setTimeZoneOffset

      public void setTimeZoneOffset(int timeZoneOffset)
      If >= 0, sets a specific time-offset in hours. If == -1, then the offset is taken from the current system time-zone settings.