Commit graph

98 commits

Author SHA1 Message Date
Renan DelValle
fe4a0dc06e Minor error message clarification 2019-09-25 17:20:30 -07:00
Renan DelValle
d67b8ca1d7 Removing uncessary functions which previously handled initializing thrift protocol. Changed how which protocol is chosen based upon configuration. 2019-09-25 17:20:30 -07:00
Renan DelValle
ecd59f7a8d Removing unnecessary cookie jar from thrift protocol initialization. 2019-09-25 17:20:30 -07:00
Renan DelValle
5d75dcc15e Adding MonitorJobUpdateQuery which serves as the basis for other monitors. 2019-09-25 17:20:30 -07:00
Renan DelValle
9a70711537 Making JobUpdate synchronous. MonitorJobUpdateStatus creates a local copy of job key in order to guard against side effects cuased by mutations to the JobKey being performed externally. 2019-09-25 17:20:30 -07:00
Renan DelValle
203f178d68 Changing error messages to be lower case in realis API. 2019-09-25 17:20:30 -07:00
Renan DelValle
04471c6918 Adding trace logging. 2019-09-25 17:20:30 -07:00
Renan DelValle
461b23400c
V2.0 thrift repository migration and cleanup (#98)
* Remove unnecessary files from the thrift repository that come along with the go library.

* Updating thrift generated code to be 0.12.0 final generated code.

* Remove git.apache.org dependency in vendor folder.

* Migrating from git.apache.org/thrift.git to github.com/apache/thrift

* Upgrading dep (although it will not work now that imports are using mod format, it allows for users to easily fix this with a replacement of the import path).

* Upgrading mod dependencies for Thrift to point to github.com location of the repository.

* Bug fix for Thermos Payload generation relating to the GPU being set.
2019-02-19 16:40:41 -08:00
Renan DelValle
8d67d8c2f3
Releasing 2.0.1. 2019-01-08 15:59:56 -08:00
Renan DelValle
e13349db26
Initial support for Thermos and GPU resources. 2019-01-07 14:39:47 -08:00
Renan DelValle
51597ecb32
Changing paths to refer to gorealis v2 in order for dependencies to be correct. 2018-12-27 10:09:22 -08:00
Renan DelValle
56b325ed80
Aurora endpoint may now be explicitly provided with or without protocol and with or without port. 2018-12-17 18:00:20 -08:00
Renan DelValle
992e52eba2
Changing realis API to use new JobUpdate struct and to use concrete JobKey types. 2018-12-12 14:13:45 -08:00
Renan DelValle
5836ede37b
Splitting off Aurora task from Aurora Job since Update mechanism only needs task. 2018-12-10 18:57:16 -08:00
Renan DelValle
76300782ba
Renaming RealisClient to Client to avoid stuttering. Moving monitors under Client. Making configuration object private. Deleted legacy code to generate configuration object. 2018-12-08 08:57:15 -08:00
Renan DelValle
54378b2d8a
Changing the signature for some API. Specifically, result objects that hold a single variable are now returning that variable instead of a result object. Tests have been refcatored to use new v2 API. All tests are currently passing. 2018-11-28 20:13:49 -08:00
Renan DelValle
59e3a7065e
Refactoring code to be compatible with Thrift 0.12.0 generated code. Tests are still not refactored. 2018-11-27 18:45:10 -08:00
Renan DelValle
d747a48626
Simplifying API. Many API calls have gone from a tuple of two returns to a single return. 2018-11-22 12:23:18 -08:00
Renan DelValle
8a9a97c150
Removing unnecessary interface from Aurora Job. 2018-11-22 12:22:26 -08:00
Renan DelValle
1146736c2b
Refactoring variable names and variable types to saner versions. 2018-11-22 12:22:25 -08:00
Renan DelValle
c65a47f6e2
Changing Certspath to CertsPath 2018-11-22 12:22:25 -08:00
Renan DelValle
4471c62659
Removing retries as an option since it's a dup of Backoff. 2018-11-22 12:22:25 -08:00
Renan DelValle
a23bd1b2cc
Shedding interface because there is no good reason to have it. 2018-11-22 12:22:22 -08:00
Renan DelValle
2eaa60f681
Support Drain SLA API (#88)
* Bringing thrift API up to date with Aurora 0.21.0.

* Adding support for SLA Drain Host API.
2018-11-16 11:41:09 -08:00
Renan DelValle
a09a18ea3b
Stop retrying if we find a permanent url error. (#85)
* Detecting if the transport error was not temporary in which case we stop retrying. Changed bug where get results was being called before we checked for an error.

* Adding exception for EOF error. All EOF errors will be retried.

* Addressing race conditions that may happen when client is closed or connection is re-established.

* Adding documentation about how this particular implemantion of the realis client uses retries in scenarios where a temporary error is found.
2018-11-01 17:00:03 -07:00
Renan DelValle
6762c1784b
Bug fix: get quota and set quota would not retry if an error was hit. (#84) 2018-10-29 14:56:24 -07:00
Renan DelValle
fa5133c13d
Test coverage improvement (#83)
* Adding tests for getPendingReasons and startMaintenance.

* Added tests for ThriftBinary and ThriftJSON.

* Adding test for NOOP Logger.
2018-10-28 19:16:44 -07:00
JC Martin
5de913493c Add Start Maintenance and Get Pending Reason (#82)
* Add startMaintenance

* Add getPendingReason
2018-10-26 11:38:03 -07:00
Renan DelValle
2306d6180f
Adding force Implicit and force Explicit recon to gorealis. (#81) 2018-10-22 16:43:35 -07:00
Renan DelValle
e0f33ab60e
Bumping up the version number advertised by gorealis to the scheduler. 2018-10-05 08:09:30 -07:00
Renan DelValle
149d03988c
Sample Client cleanup, misc cleanup (#74)
* Changing print + os.exit to log.Fatal. Leaving a TODO to move documentation to interface.
2018-10-04 11:28:32 -07:00
Renan DelValle
037c636d6d
Retry switch fallthrough fix and create multiple tests (#77)
* Bugfix: switch statements were missing fallthrough statement thus making them retry non-retriable errors. Using a list to catch cases now.

* Adding tests for CreateService, createService when the executor doesn't exist, and createJob when the executor doesn't exist. Renamed Pulse test to reflect that it's using CreateService instead of CreateJob.

* Repsonse propagate back up to caller for context for CreateJob, CreateService, and StartJobUpdate.

* Deleting PR template as Travis CI takes care of running tests and formatting tests now.
2018-10-04 10:47:08 -07:00
Renan DelValle
9ebf118e71
Create job bevaviour does not override default batch size. (#75) 2018-09-25 16:37:17 -07:00
Renan DelValle
5099d7e6ec
Adding force snapshot and force backup APIs (#73)
* Adding force snapshot and force backup APIs.
2018-09-14 15:04:16 -07:00
Ezequiel Torres Feyuk
fe567ee966 Task query optional parameters (#69)
* Change TaskQuery struct parameters to optional

* Thrift API is modified to make all the parameters in the
  TaskQuery struct optional

* Autogenerated code is regenerated

* Changes in TaskQuery structs used in the project

* Now that TaskQuery receive optional values, pointers
  instead of values must be passed to the struct
2018-06-28 11:48:28 -07:00
Renan DelValle
6c8ab10b64 Merge develop branch into master (#68)
* Fixing possible race condition when passing backoff around as a pointer.

* Adding a debug logger that is turned off by default.
Info logger is enabled by default but prints out less information.

* Removing OK Aurora acknowledgment.

* Making Mutex a pointer so that there's no chance it can accidentally be copied.

* Changing %v to %+v for composite structs. Removing a repetitive statement for the Aurora return code.

* Removing another superflous debug statement.

* Removing a leftover helper function from before we changed how we configured the client.

* Changing the logging paradigm to only require a single logger. All logging will be disabled by default. If debug is enabled, and a logger has not been set, the library will default to printing all logging (INFO and DEBUG) to the stdout.

* Minor changes to demonstrate how a logger can be used in conjunction to debug mode.

* Removing port override as it is not needed

* Changing code comments to reflect getting rid of port override.

* Adding port override back in.

* Bug fix: Logger was being set to NOOP despite no logger being provided when debug mode is turned on.

* Turn on logging by default.

* Removing option to override schema and ports for information found on Zookeeper.

* Turning off debug mode for tests because it's too verbose. Making sure LevelLogger is initialized correctly under all scenarios.

* Removing override fields for zk config.

* Remove space.

* Removing info that is now incorrect about zk options.
2018-06-22 12:57:21 -07:00
Renan DelValle
4f5766b443
Misc. bug fixes and addition of debug logging (#61)
* Fixing possible race condition when passing backoff around as a pointer.

* Adding a debug logger that is turned off by default. If debug is turned on, but a logger has not been assigned, a default logger that will print to STDOUT will be created.

* Making Mutex a pointer so that there's no chance it can accidentally be copied.

* Removing a leftover helper function from before we changed how we configured the client.

* Minor changes to demonstrate how a logger can be used in conjunction to debug mode in the sample client.
2018-04-13 11:03:29 -07:00
Robert Allen
c0d2969976 Adding Admin Client calls GetQuota & SetQuota (#59)
* Adding Admin Client calls `GetQuota` & `SetQuota`

This change set adds admin client calls to fetch and
mutate the OwnerRole quota[cpu,ram,disk].
2018-03-07 16:24:27 -08:00
Renan DelValle
3d62df1684
* Errors have been refactored.
* ZK retries have been cleaned up. We will now retry after every error
EXCEPT when we have a badly formed path.
* ZK library has been reworked with optional arguments pattern to not be
so intertwined with the cluster.json file.
* Timeout error has been re-implemented as RetryError. RetryError
behaves like a Timeout error but is used exclusively to add more context
privately. This allows us to have unit tests that check our retry
mechanism is actually retrying.
* Additional logging has been added to retry mechanisms as well as to
the Zookeeper library we use.
2018-03-03 14:08:04 -08:00
Sivaram Mothiki
dc327bebad change config for certs path (#57)
Bug fix. Changing hardcoded certificate to one provided by configuration object.
2018-03-02 15:21:45 -08:00
Renan DelValle
a43dc81ea8
Simplifying retry mechanism for Thrift Calls (#56)
* Deleting permament error as it doesn't make sense. Just return a plain old error and that will be considered permanent.

* Removing double closure at as it's unmaintainable and can be error prone. Separated back offs into a generic one and a thrift call specific one.

* ZK leader finder now returns a temporary error instead of constantly no leader found and quitting. It could be that the leader info is being propagated so it's worth trying another time.

* Adding more logging to the retry.

* Wrapping lock and unlock in an anonymous function so that we can use defer on unlock such that it is called in the case of a panic.
2018-02-15 15:16:39 -08:00
Renan DelValle
64948c3712
Backoff mechanism fix (#54)
* Fixing logic that can lead to nil error being returned and retry stopping early.

* Fixing possible code path that may lead to an incorrect nil error.
2018-02-06 12:44:27 -08:00
kkrishna
a6b077d1fd Aurora jobupdate functionality -- pause/resume/pulse api (#55)
* Adding GetJobs api

* Adding Aurora pause/resume/pulse api
2018-02-06 12:39:02 -08:00
kkrishna
8bd3957247 GetJobs api (#53)
* GetJobs API added
2018-01-27 10:33:55 -08:00
Renan DelValle
dbb08ded90
Simplifying KillJob API implementation (#52)
By submitting an empty set to the thrift call, the Aurora scheduler will look for active tasks itself.
2018-01-24 15:37:12 -08:00
Renan DelValle
a941bcb679
Thread safety, misc fixes, and refactoring (#51)
* Changing incorrect license in some source files.

* Changing CreateService to mimic CreateJob by setting the batch size to the instance count.

* Changing Getcerts to GetCerts to match the style of the rest of the codebase.

* Overhauled error handling. Backoff now recognizes temporary errors and continues to retry if it finds one.

* Changed thrift function call wrapper to be more explicitly named and to perform more safety checks.

* Moved Jitter function from realis to retry.

* API code is now more uniform and follows a certain template.

* Lock added whenever a thrift call is made or when a modification is done to the connection. Note that calling ReestablishConn externally may result in some race conditions. We will move to make this function private in the near future.

* Added test for Realis session thread safety. Tested ScheduleStatus monitor. Tested monitor timing out.

* Returning nil whenever there is an error return so that there are no ambiguities.

* Using defer with unlock so that the lock is still released if a panic is invoked.
2018-01-21 19:30:01 -08:00
Renan DelValle
b2ffb73183
Introducing temporary errors. Refactored reestablish connection code … (#50)
* Introducing temporary errors. 

* Refactored reestablish connection code to use NewClient.

* Added reestablish connection test to end to end tests.
2018-01-16 14:35:01 -08:00
PRADYUMNA KAUSHIK
9631aa3aab Specify field names when initializing structs (#47)
* Added field names to struct initializations.
2017-12-23 10:33:42 -08:00
Renan DelValle
c338c03355
Thrift API update and Pull Request template. (#43)
* Adding a Pull request template to serve as a reminder of what
to do before creating the pull request.

* Updating our thrift API to match changes made by Aurora.

* Update go bindings to match update thrift API.
2017-12-14 14:37:08 -08:00
Sivaram Mothiki
d4027bc95c make insecureskipverify configurable (#40)
* make inseucreskipverify configurable

* add insecure and certspath to configs

* add certs test

* add config support for client key and cert
2017-12-12 14:04:11 -08:00