Class ClusterByTimestamp
The timestamp is chosen, in this order of priority:
- A date / time string extracted from the filename, if exists in particular patterns, falling back to creation-time, if none exists.
- Original photo-taken time from EXIF metadata if available, and the file has a jpg or jpeg extension.
- File creation time.
Timezones are assumed to be the current time-zone, if not otherwise indicated.
File modification time is not considered.
The clustered are named 01, 02, 03 etc. depending on the number of clusters.
The DBSCAN algorithm is used for clustering.
A special cluster OUTLIER_CLUSTER_IDENTIFIER
may also be created, for points that
were not density-reachable by others, and aren't part of any cluster in particular.
The relative-path of files are preserved, being added relative to the cluster subdirectory.
The default-patterns for matching filenames are:
yyyy-mm-dd hh:mm:ss
yyyymmdd_hhmmss
yyyymmdd hhmmss
- Author:
- Owen Feehan
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbeforeCopying
(Path destinationDirectory, List<FileWithDirectoryInput> inputs) To be called once before any calls toCopyFilesNaming.destinationPath(File, DirectoryWithPrefix, int, CopyContext)
.destinationPathRelative
(File file, DirectoryWithPrefix outputTarget, int index, CopyContext<ClusterMembership> context) Calculates the relative-output path (to be appended to destDir)int
The minimum number of files that must exist for a cluster.double
Files whose creation-time differs<=
this parameter are joined into the same cluster.The patterns which can be used to extract a date-time from a filename.int
If>= 0
, sets a specific time-offset in hours.boolean
If true, the entire relative-path is used when copying files into the cluster directory.void
setMinimumPerCluster
(int minimumPerCluster) The minimum number of files that must exist for a cluster.void
setPreserveSubdirectories
(boolean preserveSubdirectories) If true, the entire relative-path is used when copying files into the cluster directory.void
setThresholdHours
(double thresholdHours) Files whose creation-time differs<=
this parameter are joined into the same cluster.void
setTimestampPatterns
(List<TimestampPattern> timestampPatterns) The patterns which can be used to extract a date-time from a filename.void
setTimeZoneOffset
(int timeZoneOffset) If>= 0
, sets a specific time-offset in hours.Methods inherited from class org.anchoranalysis.plugin.io.bean.file.copy.naming.CopyFilesNaming
destinationPath
Methods inherited from class org.anchoranalysis.bean.AnchorBean
checkMisconfigured, describeBean, describeChildren, duplicateBean, fields, findFieldsOfClass, getBeanName, getLocalPath, localise, toString
-
Constructor Details
-
ClusterByTimestamp
public ClusterByTimestamp()
-
-
Method Details
-
beforeCopying
public ClusterMembership beforeCopying(Path destinationDirectory, List<FileWithDirectoryInput> inputs) throws OperationFailedException Description copied from class:CopyFilesNaming
To be called once before any calls toCopyFilesNaming.destinationPath(File, DirectoryWithPrefix, int, CopyContext)
.- Specified by:
beforeCopying
in classCopyFilesNaming<ClusterMembership>
- Parameters:
destinationDirectory
- the directory to which files are copied.inputs
- the total number of files to copy.- Throws:
OperationFailedException
-
destinationPathRelative
public Optional<Path> destinationPathRelative(File file, DirectoryWithPrefix outputTarget, int index, CopyContext<ClusterMembership> context) throws OutputWriteFailedException Description copied from class:CopyFilesNaming
Calculates the relative-output path (to be appended to destDir)- Specified by:
destinationPathRelative
in classCopyFilesNaming<ClusterMembership>
- Parameters:
file
- file to be copiedoutputTarget
- the directory and prefix associated with the file for outputtingindex
- an increasing sequence of numbers for each file beginning at 0context
- the context for the copying- Returns:
- the relative-path. if empty, the file should be skipped.
- Throws:
OutputWriteFailedException
-
getThresholdHours
public double getThresholdHours()Files whose creation-time differs<=
this parameter are joined into the same cluster.This is the principle parameter for affecting the sensitivity of the clustering. It is specified in hours between the date-time of two files.
A larger value encourages a smaller total number of clusters (or larger cluster-size). A smaller values encourages the opposite.
-
setThresholdHours
public void setThresholdHours(double thresholdHours) Files whose creation-time differs<=
this parameter are joined into the same cluster.This is the principle parameter for affecting the sensitivity of the clustering. It is specified in hours between the date-time of two files.
A larger value encourages a smaller total number of clusters (or larger cluster-size). A smaller values encourages the opposite.
-
getMinimumPerCluster
public int getMinimumPerCluster()The minimum number of files that must exist for a cluster. -
setMinimumPerCluster
public void setMinimumPerCluster(int minimumPerCluster) The minimum number of files that must exist for a cluster. -
isPreserveSubdirectories
public boolean isPreserveSubdirectories()If true, the entire relative-path is used when copying files into the cluster directory. If false, only the file-name is used. -
setPreserveSubdirectories
public void setPreserveSubdirectories(boolean preserveSubdirectories) If true, the entire relative-path is used when copying files into the cluster directory. If false, only the file-name is used. -
getTimestampPatterns
The patterns which can be used to extract a date-time from a filename. -
setTimestampPatterns
The patterns which can be used to extract a date-time from a filename. -
getTimeZoneOffset
public int getTimeZoneOffset()If>= 0
, sets a specific time-offset in hours. If== -1
, then the offset is taken from the current system time-zone settings. -
setTimeZoneOffset
public void setTimeZoneOffset(int timeZoneOffset) If>= 0
, sets a specific time-offset in hours. If== -1
, then the offset is taken from the current system time-zone settings.
-