- Using
importlib.metadata
orimportlib_metadata
, depending on the python version, to read the version from package metadata. This is in response topkg_resources
being deprecated. (#100)
- Adding
pyproject.toml
for our build definitions. - Python 3.6 has reached end-of-life and is no longer maintained, so it has been removed from the automated tests.
- Python 3.7 has reached end-of-life and is no longer maintained, so it has been removed from automated tests.
- Python 3.11 has been released, so
taskgraph
is now tested against this new version of the language. - Python 3.12 has been released, so
taskgraph
is now tested against this new version of the language.
- Testing against python 3.10 in github actions and officially noting support
for 3.10 in
setup.py
. - Testing against python 3.9 in github actions and noting support in
setup.py
. - Fixed an issue where exceptions raised during execution where the task
completed before
TaskGraph.join()
was called would not be raised. Now, if a task raises an exception, its exception will always be raised when eitherTask.join()
andTaskGraph.join()
is called. - Fixed an issue where tasks with
hash_algorithm='sizetimestamp'
would, under certain conditions, fail to re-execute when they should. This only occurred when a graph writing the same amount of , but possibly different, data is executed successively, with less than about 1.5 seconds between task executions. - After many years with the Natural Capital Project, Rich Sharp has stepped
down from the Project and as the maintainer of
taskgraph
. James Douglass is taking his place, and this change is now reflected insetup.py
. - Fixes an issue that causes an
EOFError
orBrokenPipeError
to occur when theTaskGraph
terminates. - Updated the
taskgraph
example in the README for the latest API changes and to clarify the need forif __name__ == '__main__':
- Fixed an issue that could cause the
TaskGraph
object to hang if duplicateTask
objects were created. - Fixed an issue that was causing TaskGraph to ignore a changed
hash_algorithm
if the TaskGraph was created on one run, was deconstructed, then restarted. If the user chose a different hash, TaskGraph would use the hash that the target file was originally hashed under rather than the new algorithm. - Removed
copy_duplicate_artifact
andhardlink_allowed
parameters and functionality from TaskGraph. This is to address a design error that TaskGraph is not well suited for caching file results to avoid recomputation. Rather than add additional complexity around the limitations of this feature it is being removed to guide a design toward a standalone cache library if needed.
- Fixed issue that could cause combinatorial memory usage leading to poor
runtime or
MemoryError
if a dictionary were passed that had thousands of elements. - Fixed issue that would cause
TaskGraph
to not recognize a directory that was meant to be ignored and in some cases causeTask
to unnecessarily reexecute.
- Fixed an issue that would raise an exception when __del__ was
deconstructing a taskgraph object and a thread
join()
would cause a deadlock.
- Fixed an issue that would ignore the state of a
transient_run
flag if a previous Task run had run it with that flag set to False. - Removed a limit on the number of times
TaskGraph
can attempt to update its database up to 5 minutes of continuous failures. This is to address expected issues when many parallel threads may compete for an update. Relevant information about why the database update fails is logged. - Fixed an issue where the logging queue would always report an exception even if the logging thread shut down correctly.
- Fixed several race conditions that could cause the
TaskGraph
object to hang on an otherwise ordinary termination. - Changed logging level to "INFO" on cases where the taskgraph was not
precalculated since it's an expected path of execution in
TaskGraph
. - Adding a
hardlink_allowed
parameter toadd_task
that allows the attempt to hardlink a file in a case where acopy_artifact=True
may permit one. This will save on disk space as well as computation time if large files are not needed to copy. - Adding a
store_result
flag toadd_task
that conditionally stores thefunc
result in the database for later.get
. This was added to guard against return types that were not picklable and would otherwise cause an exception when being executed normally. - Fixed issue that would cause the logger thread to continue reporting status after all tasks were complete and the graph was closed.
- Fixed issue that would cause an infinite loop if a
TaskGraph
object were created with a database from an incompatible previous version. Behavior now is to log the issue, delete the old database, and create a new compatible one. - Fixed issue that would cause some rare infinite loops if
TaskGraph
were to fail due to some kinds of task exceptions. - Adding open source BSD-3-Clause license.
- Updating primary repository URL to GitHub.
- Adding support for Python 3.8.
- Removing the
EncapsulatedOp
abstract class. In practice the development loop that encouraged the use ofEncapsulatedOp
is flawed and can lead to design errors. - Removing unnecessary internal locks which will improve runtime performance of processing many small Tasks.
- Refactor to support separate TaskGraph objects that use the same database.
- Removed the
n_retries
parameter fromadd_task
. Users are recommended to handle retries within functions themselves. - Added a
hash_target_files
flag toadd_task
that when set to False, causes TaskGraph to only note the existence of target files after execution or as part of an evaluation to determine if the Task was precalculated. This is useful for operations that initialize a file but subsequent runs of the program modify it such as a new database or a downloaded file. - Fixed an issue on the monitor execution thread that caused shutdown of a TaskGraph object to be delayed up to the amount of delay in the monitor reporting update.
- Added a
.get()
function forTask
objects that returns the result of the respectivefunc
call. This value is cached in the TaskGraph database and hence can be used to avoid repeated execution. Note the addition of this function changes the functionality of callingadd_task
with no target path list. In previous versions the Task would execute once per TaskGraph instance, now successiveTask
objects with the same execution signature will use cached results. - To support the addition of the
.get()
function atransient_run
parameter is added toadd_task
that causes TaskGraph to avoid recording a completedTask
even if the execution hash would have been identical to a previously completed run where the target artifacts still existed.
- Dropped support for Python 2.7.
- Fixed an issue where paths in
ignore_paths
were not getting ignored in the case ofcopy_duplicate_artifact=True
. - Fixed an issue where the "percent completed" in the logging monitor would sometimes exceed 100%. This occurred when a duplicate task was added to the TaskGraph object.
- Fixed an issue where a relative path set as a target path would always cause TaskGraph to raise an exception after the task was complete.
- Fixed an issue where kwargs that were unhashable were not considered when determining if a Task should be re-run.
- Fixed an issue where files with almost identical modified times and sizes would hash equal in cases even when the filenames were different.
- Fixed an exception that occurred when two tasks were constructed that targeted the same file but one path was relative and the other was absolute.
- Fixed an issue that would cause TaskGraph to raise an IOError if an
add_task
call was marked forcopy_duplicate_artifact
but the base target file was missing. - Fixed an issue that would prevent the source distribution from installing.
- Taskgraph is now tested against python versions 2.7, 3.6 and 3.7.
- Adjusted logging levels so most chatty information is lowered to debug and
oddness in
__del__
shutdown are degraded fromerror
todebug
so as not to cause alarm.
- Fixed an issue that would cause a deadlock if two tasks were added that had the same function signature except different target paths.
- Fixed a race condition that would sometimes cause an exception when multiple threads attempted to read or write to the completed Task Database.
- Fixed an issue that could cause an exception in
__del__
to print to stderr during Python interpreter shutdown. - Added a
hash_algorithm
parameter toadd_task
that is a string of either 'sizetimestamp' or anything inhashlib.algorithms_available
. This option tells TaskGraph how to fingerprint input and target files to determine the need for recomputation. - Added a
copy_duplicate_artifact
parameter toadd_task
that when True tells TaskGraph to copy duplicate target results to a new target so long as all the parameters and base/target files fingerprint to the same value. This can save significant computation time when use in scenarios where there are small changes in a workflow, but otherwise significant changes in filenames. This often occurs when putting timestamps or other suffixes on files that otherwise have identical content.
- TaskGraph now stores all task completion information in a single SQLite database stored in its cache directory. In previous versions TaskGraph would write a small text file for each task in a highly branching directory tree. This structure made removal of those directory trees computationally difficult.
- Fixed an issue that would cause TaskGraph to reexecute if the target path was included in the argument list and that path was not normalized to the operating system's path style.
- Fixed a deadlock in some cases where Tasks failed while other tasks checked for pre-execution clauses.
- Fixed an issue where very long strings might be interpreted as paths and Windows crashes because the path is too long.
- Fixed a deadlock issue where a Task might raise an unhandled exception as a new task was added to the TaskGraph.
- Fixed the occasional
BrokenPipeError
that could occur when a Task encountered an unhandled exception. - Added an
n_retries
parameter toadd_task
that lets TaskGraph attempt to reexecute a failing Task up ton_retries
times before terminating the TaskGraph. - Removed the
delayed_start
option.
- Resolving an issue with duplicate logging being printed to stdout when
n_workers > 0
. Logging is now only handled in the process that contains the TaskGraph instance. - Updated main logging message to indicate which tasks, by task name, are currently active and how many tasks are ready to execute but can't because there is not an open worker.
- Attempted to fix an issue where processes in the process pool were not terminating on a Linux system by aggressively joining all threads and processes when possible.
- Fixed an issue that would cause tasks that had been previously calculated to prematurely trigger children tasks even if the parent tasks of the current task needed to be reexecuted.
- Added a
delayed_start
flag to TaskGraph to allow for delayed execution of taskgraph tasks. If enabled on threaded or multiprocess mode, calls toadd_task
will not execute tasks until thejoin
method is invoked ontaskgraph
. This allows for finer control over execution order when tasks are passed non-equivalentpriority
levels. - Fixing an issue where a non-JSON serializeable object would cause
add_task
to crash. Now TaskGraph is more tolerant of non-JSON serializeable objects and will log warnings when parameters cannot be serialized. - TaskGraph constructor has an option to report a ongoing logging message at a set interval. The message reports how many tasks have been committed and completed.
- Fixed a bug that would cause TaskGraph to needlessly reexecute a task if
the only change was the order of the
target_path_list
ordependent_task_list
variables. - Fixed a bug that would cause a task to reexecute between runs if input argument was a file that would be generated by a task that had not yet executed.
- Made a code change that makes it very likely that tasks will be executed in priority order if added to a TaskGraph in delayed execution mode.
- Refactored internal TaskGraph scheduling to fix a design error that made it likely tasks would be needlessly reexecuted. This also simplified TaskGraph flow control and cause slight performance improvements.
- Fixed an issue discovered when a
scipy.sparse
matrix was passed as an argument andadd_task
crashed on infinite recursion. Type checking of arguments has been simplified and now iteration only occurs on the Pythonset
,dict
,list
, andtuple
types. - Fixed an issue where the
TaskGraph
was notjoin
ing the worker process pool on a closed/join TaskGraph, or when theTaskGraph
object was being deconstructed. This would occasionally cause a race condition where the TaskGraph may still have a cache.json
file open. Discovered through a flaky build test. - Added functionality to the
TaskGraph
object to propagate log messages from workers back to the parent process. This only applies for cases where aTaskGraph
instance is started withn_workers > 0
. - Fixed an issue where a function that was passed as an argument would cause
a reexecution on a separate run because the
__repr__
of a function includes its pointer address. - Adjusted logging levels so that detailed task information is shown on DEBUG but basic status updates are shown in INFO.
- Fixing an issue where a Task would hang on a
join
if the number of workers in TaskGraph was -1 and a call toadd_task
has a non-None
passed totarget_path_list
and the resulting task was\.join
ed after a second run of the same program.
- Fixing an issue where TaskGraph would hang on a
join
if the number of workers was -1 and a call toadd_task
hasNone
passed totarget_path_list
.
- Taskgraph now supports python versions 2 and 3 (tested with python 2.7, 3.6).
- Fixed an issue with
taskgraph.TaskGraph
that prevented a multiprocessed graph from executing on POSIX systems whenpsutil
was installed. - Adding matrix-based test automation (python 2.7, python 3.6, with/without
psutil
) viatox
. - Updating repository path to
https://bitbucket.org/natcap/taskgraph
.
- Auto-versioning now happens via
setuptools_scm
, replacing previous calls tonatcap.versioner
. - Added an option to
TaskGraph
constructor to allow negative values in then_workers
argument to indicate that the entire object should run in the main thread. A value of 0 will indicate that no multiprocessing will be used but concurrency will be allowed for non-blockingadd_task
. - Added an abstract class
task.EncapsulatedTaskOp
that can be used to instance a class that needs scope in order to be used as an operation passed to a process. The advantage of usingEncapsulatedTaskOp
is that the__name__
hash used byTaskGraph
to determine if a task is unique is calculated in the superclass and the subclass need only worry about implementation of__call__
. - Added a
priority
optional scalar argument toTaskGraph.add_task
to indicates the priority preference of the task to be executed. A higher priority task whose dependencies are satisfied will executed before one with a lower priority.
- Refactor of core scheduler. Old scheduler used asynchronicity to attempt to test if a Task was complete, occasionally testing all Tasks in potential work queue per task completion. Scheduler now uses bookkeeping to keep track of all dependencies and submits tasks for work only when all dependencies are satisfied.
- TaskGraph and Task
.join
methods now have a timeout parameter. Additionallyjoin
now also returns False ifjoin
terminates because of a timeout. - More robust error reporting and shutdown of TaskGraph if any tasks fail during execution using pure threading or multiprocessing.
- Fixed a critical error from the last hotfix that prevented
taskgraph
from avoiding recomputation of already completed tasks.
- Fixed an issue from the previous hotfix that could cause
taskgraph
to exceed the number of available threads if enough tasks were added with long running dependencies. - Additional error checking and flow control ensures that a TaskGraph will catastrophically fail and report useful exception logging a task fails during runtime.
- Fixed a deadlock issue where a failure on a subtask would occasionally cause a TaskGraph to hang.
Task.is_complete
raises a RuntimeError if the task is complete but failed.- More efficient handling of topological progression of task execution to attempt to maximize total possible CPU load.
- Fixing an issue from the last release that caused the test cases to fail. (Don't use 0.2.5 at all).
- Fixed a bug where tasks with satisfied dependencies or no dependencies were blocked on dependent tasks added to the task graph earlier in the main thread execution.
- Indicating that
psutil
is an optional dependency through thesetup
function.
- Empty release. Possible bug with PyPI release, so re-releasing with a bumped up version.
- More robust testing on a chain of tasks that might fail because an ancestor failed.
- Changed how TaskGraph determines of work is complete. Now records target paths in file token with modified time and file size. When checking if work is complete, the token is loaded and the target file stats are compared for each file.
- Handling cases where a function might be an object or something else that can't import source code.
- Using natcap.versioner for versioning.
- Fixing an issue where
types.StringType
is not the same astypes.StringTypes
. - Redefined
target
inadd_task
tofunc
to avoid naming collision withtarget_path_list
in the same function.
- Fixing a TYPO on
__version__
number scheme. - Importing
psutil
if it exists.
- Initial release.