* Simplifying documentation for getting started: Removed outdated information about install Golang on different platforms and instead included a link to the official Golang website which has more up to date information. Instructions for installing docker-compose have also been added.
* Added documentation to all exported functions and structs.
* Unexported some structures and functions that were needlessly exported.
* Adding golang CI default configuration which can be useful while developing and may be turned on later in the CI.
* Moving build process in CI to xenial.
* Reducing line size. in some files and shadowing in some test cases.
* Making abort job synchronous to avoid scenarios where kill is received before job update lock is released.
* Adding missing cases for terminal update statues to JobUpdate monitor.
* Monitors now return errors which provide context through behavior.
* Adding notes to the doc explaining what happens when AbortJob times out.
* ZK retries have been cleaned up. We will now retry after every error
EXCEPT when we have a badly formed path.
* ZK library has been reworked with optional arguments pattern to not be
so intertwined with the cluster.json file.
* Timeout error has been re-implemented as RetryError. RetryError
behaves like a Timeout error but is used exclusively to add more context
privately. This allows us to have unit tests that check our retry
mechanism is actually retrying.
* Additional logging has been added to retry mechanisms as well as to
the Zookeeper library we use.
* Deleting permament error as it doesn't make sense. Just return a plain old error and that will be considered permanent.
* Removing double closure at as it's unmaintainable and can be error prone. Separated back offs into a generic one and a thrift call specific one.
* ZK leader finder now returns a temporary error instead of constantly no leader found and quitting. It could be that the leader info is being propagated so it's worth trying another time.
* Adding more logging to the retry.
* Wrapping lock and unlock in an anonymous function so that we can use defer on unlock such that it is called in the case of a panic.
* Changing incorrect license in some source files.
* Changing CreateService to mimic CreateJob by setting the batch size to the instance count.
* Changing Getcerts to GetCerts to match the style of the rest of the codebase.
* Overhauled error handling. Backoff now recognizes temporary errors and continues to retry if it finds one.
* Changed thrift function call wrapper to be more explicitly named and to perform more safety checks.
* Moved Jitter function from realis to retry.
* API code is now more uniform and follows a certain template.
* Lock added whenever a thrift call is made or when a modification is done to the connection. Note that calling ReestablishConn externally may result in some race conditions. We will move to make this function private in the near future.
* Added test for Realis session thread safety. Tested ScheduleStatus monitor. Tested monitor timing out.
* Returning nil whenever there is an error return so that there are no ambiguities.
* Using defer with unlock so that the lock is still released if a panic is invoked.