Commit graph

35 commits

Author SHA1 Message Date
Renán I. Del Valle
62df98a3c8
Bug fix for auto paused update monitor (#136)
Returns success if the update has finished updating successfully.
2021-08-06 16:02:52 -07:00
Renán I. Del Valle
a9d99067ee
Documentation fix (#130)
Fixes documentation so that it is more compliant with godoc format.
2021-04-29 10:48:43 -07:00
Renán I. Del Valle
2b6025e67d
Checking previously ignored error which caused issues. (#123)
Error in monitor was going unchecked which caused some issues when the monitor timed out.
2020-05-27 12:45:53 -07:00
Renan I. Del Valle
976dc26dcc Adding autopause APIs to future (#110)
* Updating thrift definitions to add autopause for batch based update strategies.

* Adding batch calculator utility and test cases for it.

* Adding PauseUpdateMonitor which allows users to poll Aurora for information on an active Update being carried out until it enters the ROLL_FORWARD_PAUSED state.

* Tests for PauseUpdateMonitor and VariableBatchStep added to the end to end tests.

* Adding TerminalUpdateStates function which returns a slice containing all terminal states for an update. Changed signature of JobUpdateStatus from using a map for desired states to a slice. A map is no longer necessary with the new version of thrift and only adds complexity.
2020-01-14 15:50:10 -08:00
Renan DelValle
df8fc2fba1
Documentation and linting improvements (#108)
* Simplifying documentation for getting started: Removed outdated information about install Golang on different platforms and instead included a link to the official Golang website which has more up to date information. Instructions for installing docker-compose have also been added.

* Added documentation to all exported functions and structs.

* Unexported some structures and functions that were needlessly exported.

* Adding golang CI default configuration which can be useful while developing and may be turned on later in the CI.

* Moving build process in CI to xenial.

* Reducing line size. in some files and shadowing in some test cases.
2019-06-12 11:22:59 -07:00
Renan DelValle
6dc4bf93b9
Retry temporary errors by default (#107)
* Adding Aurora URL validator in order to handle scenarios where incomplete information is passed to the client. The client will do its best to guess the missing information such as protocol and port.

* Upgraded to testify 1.3.0.

* Added configuration to fail on a non-temporary error. This is reverting to the original behavior of the retry mechanism. However, this allows the user to opt to fail in a non-temporary error.
2019-06-11 11:47:14 -07:00
Renan DelValle
1a15c4a5aa
V1 CreateService and StartJobUpdate Timeout signal and cleanup (#105)
* Bumped up version to 1.21.1

* Moving admin functions to a new file. They are still part of the same pointer receiver type.

* Removing dead code and fixing some comments to add space between backslash and comment.

* Adding set up and tear down to run tests script. It sets up a pod, runs all tests, and then tears down the pod.

* Added `--rm` to run tests Mac script.

* Removing cookie jar from transport layer as it's not needed.

* Changing all error messages to start with a lower case letter. Changing some messages around to be more descriptive.

* Adding an argument to allow the retry mechanism to stop if a timeout has been encountered. This is useful for mutating API calls. Only StartUpdate and CreateService have enabled by default stop at timeout.

* Added 2 tests for when a call goes through despite the client timing out. One is with a good payload, one is with a bad payload.

* Updating changelog with information about the error type returned.

* Adding test for duplicate metadata.

* Refactored JobUpdateStatus monitor to use a new monitor called JobUpdateQuery. Update monitor will now still continue if it does not find an update to monitor. Furthermore, it has been optimized to reduce returning payloads from the scheduler as much as possible. This is through using the GetJobUpdateSummaries API instead of JobUpdateDetails and by including a the statuses we're searching for as part of the query.


* Added documentation as to how to handle a timeout on an API request.

* Optimized GetInstancesIds to create a copy of the JobKey being passed down in order to avoid unexpected behavior. Instead of setting every variable name separately, now a JobKey array is being created.
2019-05-05 11:46:22 -07:00
Renan DelValle
79fa7ba16d
Upgrading gorealis v1 to Thrift 0.12.0 code generation. End to end tests cleanup (#96)
* Ported all code from Thrift 0.9.3 to Thrift 0.12.0 while backporting some fixes from gorealis v2

* Removing git.apache.org dependency from Vendor folder as this dependency has migrated to github.

* Adding github.com thrift dependency back but now it points to github.com

* Removing unnecessary files from Thrift Vendor folder and adding them to .gitignore.

* Updating dep dependencies to include Thrift 0.12.0 from github.com

* Adding changelog.

* End to end tests: Adding coverage for killinstances.

*  End to end tests: Deleting instances after partition policy recovers them.

*  End to end tests: Adding more coverage to the realis API.

*  End to end tests: Allowing arguments to be passed to runTestMac so that '-run <test name>' can be passed in.

*  End to end tests: Reducing the resources used by CreateJob test.

*  End to end tests: Adding coverage for Pause and Resume update.

*   End to end tests: Removed checks for Aurora_OK response as that should always be handled by the error returned by the API. Changed names to be less verbose and repetitive.

*  End to end tests: Reducing watch time for instance running when creating service for reducing time it takes to run end to end test.
2019-02-20 11:11:46 -08:00
Renan DelValle
2b7eb3a852
Making abort job synchronous (#95)
* Making abort job synchronous to avoid scenarios where kill is received before job update lock is released.
* Adding missing cases for terminal update statues to JobUpdate monitor.
* Monitors now return errors which provide context through behavior.
* Adding notes to the doc explaining what happens when AbortJob times out.
2019-01-15 14:55:59 -08:00
Renan DelValle
4f5766b443
Misc. bug fixes and addition of debug logging (#61)
* Fixing possible race condition when passing backoff around as a pointer.

* Adding a debug logger that is turned off by default. If debug is turned on, but a logger has not been assigned, a default logger that will print to STDOUT will be created.

* Making Mutex a pointer so that there's no chance it can accidentally be copied.

* Removing a leftover helper function from before we changed how we configured the client.

* Minor changes to demonstrate how a logger can be used in conjunction to debug mode in the sample client.
2018-04-13 11:03:29 -07:00
Renan DelValle
1c426dd363
Changing the drain monitor to match the rest of the monitors using timer and ticker. Made a generic schedule status monitor that can be used with any of the default sets provided. (#49) 2018-01-07 13:30:02 -08:00
Renan DelValle
e614e04f27
Code cleanup, added ability to attach logger, added CreateService api
* Code cleanup: Deleted multiple functions which have become stale. Removed cluster example as we replaced the need to create the Cluster object.

* Cleaned up ZK connection code by using the backoff function. Added a test to the end to end to test that we're getting the host correctly from ZK. Changed clusters test to be an outside package.

* Added LeaderFromZKURL test to end to end tests.

* Added logger to realisConfig so that users can attach their own Loggers to the client. Logger is an interface that shadows most popular logging libraries. Only Print, Println, and Printf are needed to be a realis.Logger type. Example in the client uses the std library log.

* Moved most fmt.Print* calls to be redirected to user provided logger. Logger by default is a no-op logger.

* Adding CreateService to realis interface. Uses the StartJobUpdate API to create services instead of the createJobs API.

* Bumping up version number inside client in anticipation of new release.
2017-11-30 12:02:50 -08:00
Sivaram Mothiki
72b746e431 use exponential back off func from realis lib (#39)
* use exponential back off func from realis lib

* remove exponential backoffs from monitors

* dont compare for retry errors
2017-11-04 15:06:26 -07:00
Renan DelValle
a1350c6d55 out with the old (address) in with the new (address) 2017-10-12 17:11:01 -07:00
Renan DelValle
1fd07b5007 Avoided going through the entire list of monitored hosts by keeping a set of hosts that had transistioned to a desired mode. 2017-10-04 15:56:59 -07:00
Renan DelValle
922e8d6b5a Changing HostMaintenance to return a map[string]bool where true indicates success, false indicates failure to transition to the desired state. 2017-10-02 17:24:01 -07:00
Renan DelValle
3111b358fc Host Maintenance monitor now returns a list of hosts that did enter the desired mode(s) instead of a boolean. This means the monitor can see a partial success. 2017-09-29 18:21:30 -07:00
Renan DelValle
430764f025 Added tests for draining. run go test with a aurora vagrant image running to test. 2017-09-28 17:49:15 -07:00
Renan DelValle
8334dde12f Sample client now blocks until all hosts entered desired state. Cleaned up host maintenance monitor. 2017-09-28 16:50:46 -07:00
Renan DelValle
dc6848f804 Host Maintenance monitor which allows to block until all hosts in a list have entered of the states in a provided set 2017-09-28 16:35:24 -07:00
Mothiki
4bc0a694ae use switch instead of if else conditions 2017-08-22 17:03:35 -07:00
Mothiki
dcab5e698f monitor job update returns on Rolle back and update fail 2017-08-22 16:55:20 -07:00
Mothiki
7e578f80bd check for staus failed and return error 2017-08-21 19:45:36 -07:00
Kumar Krishna
e57dc98d65 add resiliency for LeaderFromZK and other fixes . 2017-04-06 23:15:44 -07:00
Kumar Krishna
b10df0603e gorealis config refactoring 2017-03-30 18:17:21 -07:00
Kumar Krishna
3add32a585 gorealis resiliency 2017-03-20 22:34:45 -07:00
Kumar Krishna
d97e59b9e6 inital commit gorealis resiliency 2017-03-16 23:19:30 -07:00
Kumar Krishna
3b10c10dd1 update api change and remove os.exit 2016-11-14 23:16:36 -08:00
Renan DelValle
3bf2e8a831 Updating zookeeper dependency since logging problem has been solved in main repository. Go fmt run on project to tidy it up. 2016-11-02 20:41:43 -04:00
Renan DelValle
c83e5d268a Monitor Instances now returns a tuple so an error can be passed instead of exiting on error. Tuple is: Result of action, error if it exists. 2016-10-20 14:39:48 -04:00
Renan DelValle
3fd957fe5c Updating library for compatibility with Aurora 0.16.0 final. Thrift API differs from main Aurora one due to TaskUpdateQuery fields resulting in errors when serialized by thrift as non-optionals. 2016-09-29 20:45:24 -04:00
Renan DelValle
3a78e32e27 Changing location from where to get thrift bidnings to that it points to github location. Ran go fmt on the entire project. 2016-09-19 15:34:56 -04:00
Renan DelValle
494f733f4e Changing timeouts to be more reasonable in sample client 2016-08-26 19:19:37 -07:00
Renan DelValle
928fc42fc2 Updated thrift API to the latest in the Aurora respository.
Added new monitors for watching number of instances get to a certain count using polling.
Added new commands to sample client which give some statistics.
2016-08-26 16:35:31 -07:00
Renan DelValle
01b700554a Added the ability to monitor job updates.
Added the ability to kill and restart specific instances.
Fixed incorrect documentation on using-the-sample-client.
Added helper functions under the response package to extract fields from
aurora.Response.
2016-08-25 18:56:55 -07:00