Commit graph

109 commits

Author SHA1 Message Date
lawwong1
d2fd7b9ba9
merge retry mechanism change from gorealis v1 to gorealis v2 (#21) 2023-01-26 13:36:40 -08:00
Tan N. Le
e33a2d99d8
release 2.28.0 (#19) 2022-08-02 09:55:36 -07:00
Tan N. Le
4258634ccf
Capacity report (#18)
- pull capacity report via /offers endpoint.
- calculate how many tasks (with resource and constraints) can be fit in the cluster.
examples of using the above 2 features are in aurora-scheduler/australis#33
2022-07-28 19:27:53 -07:00
lenhattan86
b1661698c2
GetJobSummary API (#8)
* Adds GetJobSummary API
2021-01-12 16:18:09 -08:00
Renan DelValle
851f9686b6
Bumping up version. 2020-05-07 11:35:22 -07:00
Renán I. Del Valle
02710e5434
Moving repository to aurora-scheduler organization. (#2)
gorealis v2 will now live in the aurora-scheduler organization
2020-02-19 11:40:40 -08:00
Renan DelValle
4fc4953ec4 Change JobUpdateKey pointers to be literals, then we deep copy the JobKey pointer to a new JobKey in order to avoid side effects. 2019-09-25 17:20:30 -07:00
Renan DelValle
119d1c429b Moving client configuration options from realis to its own sepearate file to make code more digestible. 2019-09-25 17:20:30 -07:00
Renan DelValle
a8a7cf779f Splitting realis into regular API and admin API files. 2019-09-25 17:20:30 -07:00
Renan DelValle
f72fdacfb0 Changing the names of the protocol constants to be more descriptive. 2019-09-25 17:20:30 -07:00
Renan DelValle
55cf9bcb70 Adding more fine grained controls to retry mechanism. Retry mechanism may now be configured to not retry if an error is hit or to specifically stop retrying if a timeout error is encountered. 2019-09-25 17:20:30 -07:00
Renan DelValle
fe4a0dc06e Minor error message clarification 2019-09-25 17:20:30 -07:00
Renan DelValle
d67b8ca1d7 Removing uncessary functions which previously handled initializing thrift protocol. Changed how which protocol is chosen based upon configuration. 2019-09-25 17:20:30 -07:00
Renan DelValle
ecd59f7a8d Removing unnecessary cookie jar from thrift protocol initialization. 2019-09-25 17:20:30 -07:00
Renan DelValle
5d75dcc15e Adding MonitorJobUpdateQuery which serves as the basis for other monitors. 2019-09-25 17:20:30 -07:00
Renan DelValle
9a70711537 Making JobUpdate synchronous. MonitorJobUpdateStatus creates a local copy of job key in order to guard against side effects cuased by mutations to the JobKey being performed externally. 2019-09-25 17:20:30 -07:00
Renan DelValle
203f178d68 Changing error messages to be lower case in realis API. 2019-09-25 17:20:30 -07:00
Renan DelValle
04471c6918 Adding trace logging. 2019-09-25 17:20:30 -07:00
Renan DelValle
461b23400c
V2.0 thrift repository migration and cleanup (#98)
* Remove unnecessary files from the thrift repository that come along with the go library.

* Updating thrift generated code to be 0.12.0 final generated code.

* Remove git.apache.org dependency in vendor folder.

* Migrating from git.apache.org/thrift.git to github.com/apache/thrift

* Upgrading dep (although it will not work now that imports are using mod format, it allows for users to easily fix this with a replacement of the import path).

* Upgrading mod dependencies for Thrift to point to github.com location of the repository.

* Bug fix for Thermos Payload generation relating to the GPU being set.
2019-02-19 16:40:41 -08:00
Renan DelValle
8d67d8c2f3
Releasing 2.0.1. 2019-01-08 15:59:56 -08:00
Renan DelValle
e13349db26
Initial support for Thermos and GPU resources. 2019-01-07 14:39:47 -08:00
Renan DelValle
51597ecb32
Changing paths to refer to gorealis v2 in order for dependencies to be correct. 2018-12-27 10:09:22 -08:00
Renan DelValle
56b325ed80
Aurora endpoint may now be explicitly provided with or without protocol and with or without port. 2018-12-17 18:00:20 -08:00
Renan DelValle
992e52eba2
Changing realis API to use new JobUpdate struct and to use concrete JobKey types. 2018-12-12 14:13:45 -08:00
Renan DelValle
5836ede37b
Splitting off Aurora task from Aurora Job since Update mechanism only needs task. 2018-12-10 18:57:16 -08:00
Renan DelValle
76300782ba
Renaming RealisClient to Client to avoid stuttering. Moving monitors under Client. Making configuration object private. Deleted legacy code to generate configuration object. 2018-12-08 08:57:15 -08:00
Renan DelValle
54378b2d8a
Changing the signature for some API. Specifically, result objects that hold a single variable are now returning that variable instead of a result object. Tests have been refcatored to use new v2 API. All tests are currently passing. 2018-11-28 20:13:49 -08:00
Renan DelValle
59e3a7065e
Refactoring code to be compatible with Thrift 0.12.0 generated code. Tests are still not refactored. 2018-11-27 18:45:10 -08:00
Renan DelValle
d747a48626
Simplifying API. Many API calls have gone from a tuple of two returns to a single return. 2018-11-22 12:23:18 -08:00
Renan DelValle
8a9a97c150
Removing unnecessary interface from Aurora Job. 2018-11-22 12:22:26 -08:00
Renan DelValle
1146736c2b
Refactoring variable names and variable types to saner versions. 2018-11-22 12:22:25 -08:00
Renan DelValle
c65a47f6e2
Changing Certspath to CertsPath 2018-11-22 12:22:25 -08:00
Renan DelValle
4471c62659
Removing retries as an option since it's a dup of Backoff. 2018-11-22 12:22:25 -08:00
Renan DelValle
a23bd1b2cc
Shedding interface because there is no good reason to have it. 2018-11-22 12:22:22 -08:00
Renan DelValle
2eaa60f681
Support Drain SLA API (#88)
* Bringing thrift API up to date with Aurora 0.21.0.

* Adding support for SLA Drain Host API.
2018-11-16 11:41:09 -08:00
Renan DelValle
a09a18ea3b
Stop retrying if we find a permanent url error. (#85)
* Detecting if the transport error was not temporary in which case we stop retrying. Changed bug where get results was being called before we checked for an error.

* Adding exception for EOF error. All EOF errors will be retried.

* Addressing race conditions that may happen when client is closed or connection is re-established.

* Adding documentation about how this particular implemantion of the realis client uses retries in scenarios where a temporary error is found.
2018-11-01 17:00:03 -07:00
Renan DelValle
6762c1784b
Bug fix: get quota and set quota would not retry if an error was hit. (#84) 2018-10-29 14:56:24 -07:00
Renan DelValle
fa5133c13d
Test coverage improvement (#83)
* Adding tests for getPendingReasons and startMaintenance.

* Added tests for ThriftBinary and ThriftJSON.

* Adding test for NOOP Logger.
2018-10-28 19:16:44 -07:00
JC Martin
5de913493c Add Start Maintenance and Get Pending Reason (#82)
* Add startMaintenance

* Add getPendingReason
2018-10-26 11:38:03 -07:00
Renan DelValle
2306d6180f
Adding force Implicit and force Explicit recon to gorealis. (#81) 2018-10-22 16:43:35 -07:00
Renan DelValle
e0f33ab60e
Bumping up the version number advertised by gorealis to the scheduler. 2018-10-05 08:09:30 -07:00
Renan DelValle
149d03988c
Sample Client cleanup, misc cleanup (#74)
* Changing print + os.exit to log.Fatal. Leaving a TODO to move documentation to interface.
2018-10-04 11:28:32 -07:00
Renan DelValle
037c636d6d
Retry switch fallthrough fix and create multiple tests (#77)
* Bugfix: switch statements were missing fallthrough statement thus making them retry non-retriable errors. Using a list to catch cases now.

* Adding tests for CreateService, createService when the executor doesn't exist, and createJob when the executor doesn't exist. Renamed Pulse test to reflect that it's using CreateService instead of CreateJob.

* Repsonse propagate back up to caller for context for CreateJob, CreateService, and StartJobUpdate.

* Deleting PR template as Travis CI takes care of running tests and formatting tests now.
2018-10-04 10:47:08 -07:00
Renan DelValle
9ebf118e71
Create job bevaviour does not override default batch size. (#75) 2018-09-25 16:37:17 -07:00
Renan DelValle
5099d7e6ec
Adding force snapshot and force backup APIs (#73)
* Adding force snapshot and force backup APIs.
2018-09-14 15:04:16 -07:00
Ezequiel Torres Feyuk
fe567ee966 Task query optional parameters (#69)
* Change TaskQuery struct parameters to optional

* Thrift API is modified to make all the parameters in the
  TaskQuery struct optional

* Autogenerated code is regenerated

* Changes in TaskQuery structs used in the project

* Now that TaskQuery receive optional values, pointers
  instead of values must be passed to the struct
2018-06-28 11:48:28 -07:00
Renan DelValle
6c8ab10b64 Merge develop branch into master (#68)
* Fixing possible race condition when passing backoff around as a pointer.

* Adding a debug logger that is turned off by default.
Info logger is enabled by default but prints out less information.

* Removing OK Aurora acknowledgment.

* Making Mutex a pointer so that there's no chance it can accidentally be copied.

* Changing %v to %+v for composite structs. Removing a repetitive statement for the Aurora return code.

* Removing another superflous debug statement.

* Removing a leftover helper function from before we changed how we configured the client.

* Changing the logging paradigm to only require a single logger. All logging will be disabled by default. If debug is enabled, and a logger has not been set, the library will default to printing all logging (INFO and DEBUG) to the stdout.

* Minor changes to demonstrate how a logger can be used in conjunction to debug mode.

* Removing port override as it is not needed

* Changing code comments to reflect getting rid of port override.

* Adding port override back in.

* Bug fix: Logger was being set to NOOP despite no logger being provided when debug mode is turned on.

* Turn on logging by default.

* Removing option to override schema and ports for information found on Zookeeper.

* Turning off debug mode for tests because it's too verbose. Making sure LevelLogger is initialized correctly under all scenarios.

* Removing override fields for zk config.

* Remove space.

* Removing info that is now incorrect about zk options.
2018-06-22 12:57:21 -07:00
Renan DelValle
4f5766b443
Misc. bug fixes and addition of debug logging (#61)
* Fixing possible race condition when passing backoff around as a pointer.

* Adding a debug logger that is turned off by default. If debug is turned on, but a logger has not been assigned, a default logger that will print to STDOUT will be created.

* Making Mutex a pointer so that there's no chance it can accidentally be copied.

* Removing a leftover helper function from before we changed how we configured the client.

* Minor changes to demonstrate how a logger can be used in conjunction to debug mode in the sample client.
2018-04-13 11:03:29 -07:00
Robert Allen
c0d2969976 Adding Admin Client calls GetQuota & SetQuota (#59)
* Adding Admin Client calls `GetQuota` & `SetQuota`

This change set adds admin client calls to fetch and
mutate the OwnerRole quota[cpu,ram,disk].
2018-03-07 16:24:27 -08:00
Renan DelValle
3d62df1684
* Errors have been refactored.
* ZK retries have been cleaned up. We will now retry after every error
EXCEPT when we have a badly formed path.
* ZK library has been reworked with optional arguments pattern to not be
so intertwined with the cluster.json file.
* Timeout error has been re-implemented as RetryError. RetryError
behaves like a Timeout error but is used exclusively to add more context
privately. This allows us to have unit tests that check our retry
mechanism is actually retrying.
* Additional logging has been added to retry mechanisms as well as to
the Zookeeper library we use.
2018-03-03 14:08:04 -08:00