Commit graph

6 commits

Author SHA1 Message Date
Renan DelValle
df8fc2fba1
Documentation and linting improvements (#108)
* Simplifying documentation for getting started: Removed outdated information about install Golang on different platforms and instead included a link to the official Golang website which has more up to date information. Instructions for installing docker-compose have also been added.

* Added documentation to all exported functions and structs.

* Unexported some structures and functions that were needlessly exported.

* Adding golang CI default configuration which can be useful while developing and may be turned on later in the CI.

* Moving build process in CI to xenial.

* Reducing line size. in some files and shadowing in some test cases.
2019-06-12 11:22:59 -07:00
Renan DelValle
2b7eb3a852
Making abort job synchronous (#95)
* Making abort job synchronous to avoid scenarios where kill is received before job update lock is released.
* Adding missing cases for terminal update statues to JobUpdate monitor.
* Monitors now return errors which provide context through behavior.
* Adding notes to the doc explaining what happens when AbortJob times out.
2019-01-15 14:55:59 -08:00
Renan DelValle
3d62df1684
* Errors have been refactored.
* ZK retries have been cleaned up. We will now retry after every error
EXCEPT when we have a badly formed path.
* ZK library has been reworked with optional arguments pattern to not be
so intertwined with the cluster.json file.
* Timeout error has been re-implemented as RetryError. RetryError
behaves like a Timeout error but is used exclusively to add more context
privately. This allows us to have unit tests that check our retry
mechanism is actually retrying.
* Additional logging has been added to retry mechanisms as well as to
the Zookeeper library we use.
2018-03-03 14:08:04 -08:00
Renan DelValle
a43dc81ea8
Simplifying retry mechanism for Thrift Calls (#56)
* Deleting permament error as it doesn't make sense. Just return a plain old error and that will be considered permanent.

* Removing double closure at as it's unmaintainable and can be error prone. Separated back offs into a generic one and a thrift call specific one.

* ZK leader finder now returns a temporary error instead of constantly no leader found and quitting. It could be that the leader info is being propagated so it's worth trying another time.

* Adding more logging to the retry.

* Wrapping lock and unlock in an anonymous function so that we can use defer on unlock such that it is called in the case of a panic.
2018-02-15 15:16:39 -08:00
Renan DelValle
a941bcb679
Thread safety, misc fixes, and refactoring (#51)
* Changing incorrect license in some source files.

* Changing CreateService to mimic CreateJob by setting the batch size to the instance count.

* Changing Getcerts to GetCerts to match the style of the rest of the codebase.

* Overhauled error handling. Backoff now recognizes temporary errors and continues to retry if it finds one.

* Changed thrift function call wrapper to be more explicitly named and to perform more safety checks.

* Moved Jitter function from realis to retry.

* API code is now more uniform and follows a certain template.

* Lock added whenever a thrift call is made or when a modification is done to the connection. Note that calling ReestablishConn externally may result in some race conditions. We will move to make this function private in the near future.

* Added test for Realis session thread safety. Tested ScheduleStatus monitor. Tested monitor timing out.

* Returning nil whenever there is an error return so that there are no ambiguities.

* Using defer with unlock so that the lock is still released if a panic is invoked.
2018-01-21 19:30:01 -08:00
Renan DelValle
b2ffb73183
Introducing temporary errors. Refactored reestablish connection code … (#50)
* Introducing temporary errors. 

* Refactored reestablish connection code to use NewClient.

* Added reestablish connection test to end to end tests.
2018-01-16 14:35:01 -08:00