Commit graph

234 commits

Author SHA1 Message Date
Renán I. Del Valle
5ec22fab98
Restoring location of r.Close() in retry mechanism since the move created a deadlock. (#122)
Moving the r.Close() call in the retry mechanism created a deadlock since r.Close() also uses the client lock to avoid multiple routines closing at the same time.

This commit reverts that change.
2020-05-27 12:36:52 -07:00
Renan DelValle
f196aa9ed7 Fixing some cosmetic issues and a potential race condition. 2020-05-26 20:40:09 -07:00
Renan DelValle
bb5408f5e2 Bumping up Thrift Version to v0.13.2 forked as v0.13.1 contains a bug. 2020-05-26 20:40:09 -07:00
Renán I. Del Valle
ea8e48f3b8
Allow users to define what extensions CA certs will have (#120)
* Allow users to define what extensions CA certs will have. Skip any files that don't have the right extension.
2020-02-26 08:24:41 -08:00
Renán I. Del Valle
3dc3b09a8e
Point to temporary Thrift fork while we wait for 0.14.0 to be released (#118)
* Updating readme to reflect changes made to the Aurora Scheduler project.

* Changing dependency of mod to point to forked version of the Thrift library while 0.14.0 is released.
2020-02-18 14:18:13 -08:00
Renan I. Del Valle
3fa2a20fe4
Thrift Upgrade to v0.13.0 (#117)
* Removing go.sum file as it's no longer required as of go1.13.

* Removing uncessary client command.

* Bumping up thrift version to v0.13.0
2020-02-12 12:31:56 -08:00
Renan I. Del Valle
c6a2a23ddb
Changing how constraints are handled internally (#115)
* Updating Changelog to reflect what's changing in 1.22.1

* Bug fix: Setting the same constraint multiple times is no longer allowed.

* Constraints map has been added to handle constraints being added to Aurora Jobs.

* Lowering timeout to avoid flaky test for bad payload timeout.

* Adding attributes to Mesos agents in order to test limits by constraint.



* Make two instances schedulable per zone in order to experience flaky behavior.
2020-01-15 08:21:12 -08:00
Renan I. Del Valle
9da3b96b1f Moving future to final 0.22.0 release and Mesos 1.6.2 (#114)
Changes in compose testing setup:
* Upgrading Aurora to 0.22.0
* Upgrading Mesos to 1.6.2
2020-01-14 15:50:10 -08:00
Renan I. Del Valle
976dc26dcc Adding autopause APIs to future (#110)
* Updating thrift definitions to add autopause for batch based update strategies.

* Adding batch calculator utility and test cases for it.

* Adding PauseUpdateMonitor which allows users to poll Aurora for information on an active Update being carried out until it enters the ROLL_FORWARD_PAUSED state.

* Tests for PauseUpdateMonitor and VariableBatchStep added to the end to end tests.

* Adding TerminalUpdateStates function which returns a slice containing all terminal states for an update. Changed signature of JobUpdateStatus from using a map for desired states to a slice. A map is no longer necessary with the new version of thrift and only adds complexity.
2020-01-14 15:50:10 -08:00
Renan DelValle
fe692040aa Variable Batch Update Support (#100)
* Changing generateBinding.sh check to check for thrift 0.12.0 and adding support for Variable Batch updates.

* Adding update strategies change to changelog, changed docker-compose to point to aurora 0.22.0 snapshot. Added test coverage for update strategies.
2020-01-14 15:50:10 -08:00
Renan DelValle
0b2dd44d94 Increasing aurora version for future branch. 2020-01-14 15:50:10 -08:00
Renan DelValle
df8fc2fba1
Documentation and linting improvements (#108)
* Simplifying documentation for getting started: Removed outdated information about install Golang on different platforms and instead included a link to the official Golang website which has more up to date information. Instructions for installing docker-compose have also been added.

* Added documentation to all exported functions and structs.

* Unexported some structures and functions that were needlessly exported.

* Adding golang CI default configuration which can be useful while developing and may be turned on later in the CI.

* Moving build process in CI to xenial.

* Reducing line size. in some files and shadowing in some test cases.
2019-06-12 11:22:59 -07:00
Renan DelValle
6dc4bf93b9
Retry temporary errors by default (#107)
* Adding Aurora URL validator in order to handle scenarios where incomplete information is passed to the client. The client will do its best to guess the missing information such as protocol and port.

* Upgraded to testify 1.3.0.

* Added configuration to fail on a non-temporary error. This is reverting to the original behavior of the retry mechanism. However, this allows the user to opt to fail in a non-temporary error.
2019-06-11 11:47:14 -07:00
Renan DelValle
4ffb509939
Adding go mod files to v1 (#106)
* Declaring dependencies using go mod.
2019-05-06 11:33:14 -07:00
Renan DelValle
1a15c4a5aa
V1 CreateService and StartJobUpdate Timeout signal and cleanup (#105)
* Bumped up version to 1.21.1

* Moving admin functions to a new file. They are still part of the same pointer receiver type.

* Removing dead code and fixing some comments to add space between backslash and comment.

* Adding set up and tear down to run tests script. It sets up a pod, runs all tests, and then tears down the pod.

* Added `--rm` to run tests Mac script.

* Removing cookie jar from transport layer as it's not needed.

* Changing all error messages to start with a lower case letter. Changing some messages around to be more descriptive.

* Adding an argument to allow the retry mechanism to stop if a timeout has been encountered. This is useful for mutating API calls. Only StartUpdate and CreateService have enabled by default stop at timeout.

* Added 2 tests for when a call goes through despite the client timing out. One is with a good payload, one is with a bad payload.

* Updating changelog with information about the error type returned.

* Adding test for duplicate metadata.

* Refactored JobUpdateStatus monitor to use a new monitor called JobUpdateQuery. Update monitor will now still continue if it does not find an update to monitor. Furthermore, it has been optimized to reduce returning payloads from the scheduler as much as possible. This is through using the GetJobUpdateSummaries API instead of JobUpdateDetails and by including a the statuses we're searching for as part of the query.


* Added documentation as to how to handle a timeout on an API request.

* Optimized GetInstancesIds to create a copy of the JobKey being passed down in order to avoid unexpected behavior. Instead of setting every variable name separately, now a JobKey array is being created.
2019-05-05 11:46:22 -07:00
Renan DelValle
e16e390afe
1.21.0 (formerly 1.4.0) release 2019-03-15 15:15:37 -07:00
Renan DelValle
f7bd7cc20f
Bug fix for metadata duplicates as well as un-initialized GPU re… (#103)
* Fix for metadata duplicates as well.
* Fix for un-initialized GPU resource when creating a new job update.
2019-03-15 15:10:31 -07:00
Renan DelValle
c997b90720
Adding future branch to testing. 2019-03-15 12:17:43 -07:00
Renan DelValle
773d842b03
Adding missing GPU to Job interface. 2019-03-05 11:43:50 -08:00
Renan DelValle
1f459dd56a
Adds support for Tier and SlaPolicy to the Job interface (#99)
* Adding parameter for Aurora so that we're able to run SLA aware updates with less than 20 instances. Lowered time it takes to run test by reducing watch time per instance as well.

* Reducing the number of instances and time for SLA aware instances in docker-compose set up.

* Adding another Mesos agent to the docker-compose setup.

* Huge thanks to @zircote for this contribution.
2019-02-20 16:36:50 -08:00
Renan DelValle
79fa7ba16d
Upgrading gorealis v1 to Thrift 0.12.0 code generation. End to end tests cleanup (#96)
* Ported all code from Thrift 0.9.3 to Thrift 0.12.0 while backporting some fixes from gorealis v2

* Removing git.apache.org dependency from Vendor folder as this dependency has migrated to github.

* Adding github.com thrift dependency back but now it points to github.com

* Removing unnecessary files from Thrift Vendor folder and adding them to .gitignore.

* Updating dep dependencies to include Thrift 0.12.0 from github.com

* Adding changelog.

* End to end tests: Adding coverage for killinstances.

*  End to end tests: Deleting instances after partition policy recovers them.

*  End to end tests: Adding more coverage to the realis API.

*  End to end tests: Allowing arguments to be passed to runTestMac so that '-run <test name>' can be passed in.

*  End to end tests: Reducing the resources used by CreateJob test.

*  End to end tests: Adding coverage for Pause and Resume update.

*   End to end tests: Removed checks for Aurora_OK response as that should always be handled by the error returned by the API. Changed names to be less verbose and repetitive.

*  End to end tests: Reducing watch time for instance running when creating service for reducing time it takes to run end to end test.
2019-02-20 11:11:46 -08:00
Renan DelValle
2b7eb3a852
Making abort job synchronous (#95)
* Making abort job synchronous to avoid scenarios where kill is received before job update lock is released.
* Adding missing cases for terminal update statues to JobUpdate monitor.
* Monitors now return errors which provide context through behavior.
* Adding notes to the doc explaining what happens when AbortJob times out.
2019-01-15 14:55:59 -08:00
Renan DelValle
10c620de7b
Fixing logger not unrolling variadic argument when appending to the front of it. 2019-01-11 12:20:01 -08:00
Renan DelValle
1d3854aa5f
Trace level for logger (#94)
* Add trace level to print out response thrift objects. Allows user to control whether these are printed or not to avoid pollution.

* Using named parameters to be more explicit about what is being set for LevelLogger.

* Adding TracePrint and TracePrintln. Inlined library level prefixes.
2019-01-10 16:58:59 -08:00
Renan DelValle
73e7ab2671
Releasing version 1.3.1 2019-01-08 15:57:19 -08:00
Renan DelValle
22b1d82d88
Bug fix for logger interface. Varidic arguments need to be unrolled when passed to print functions. 2019-01-08 15:37:25 -08:00
Renan DelValle
2f7015571c
Adding support for setting GPU as a resource. (#93)
* Adding support for setting GPU as a resource.
* Refactoring pulse update test.
2019-01-08 15:11:52 -08:00
Robert Allen
296af622d1 This adds the following function to the PartitionPolicy configuration to the Job interface (#91)
* Adding Partition Policy API
2018-12-20 14:38:06 -08:00
Renan DelValle
9a835631b2
Running goimports on all repository to conform to newest goimports. 2018-12-19 15:33:35 -08:00
Renan DelValle
b100158080
Updating Travis CI config file to include running CI on master-v2.0 branch 2018-12-19 15:30:22 -08:00
Renan DelValle
45a4416830
Adding .gitattributes to ignore generated files. 2018-12-03 16:09:46 -08:00
Renan DelValle
2eaa60f681
Support Drain SLA API (#88)
* Bringing thrift API up to date with Aurora 0.21.0.

* Adding support for SLA Drain Host API.
2018-11-16 11:41:09 -08:00
Renan DelValle
a09a18ea3b
Stop retrying if we find a permanent url error. (#85)
* Detecting if the transport error was not temporary in which case we stop retrying. Changed bug where get results was being called before we checked for an error.

* Adding exception for EOF error. All EOF errors will be retried.

* Addressing race conditions that may happen when client is closed or connection is re-established.

* Adding documentation about how this particular implemantion of the realis client uses retries in scenarios where a temporary error is found.
2018-11-01 17:00:03 -07:00
Renan DelValle
6762c1784b
Bug fix: get quota and set quota would not retry if an error was hit. (#84) 2018-10-29 14:56:24 -07:00
Renan DelValle
fa5133c13d
Test coverage improvement (#83)
* Adding tests for getPendingReasons and startMaintenance.

* Added tests for ThriftBinary and ThriftJSON.

* Adding test for NOOP Logger.
2018-10-28 19:16:44 -07:00
JC Martin
5de913493c Add Start Maintenance and Get Pending Reason (#82)
* Add startMaintenance

* Add getPendingReason
2018-10-26 11:38:03 -07:00
Renan DelValle
2306d6180f
Adding force Implicit and force Explicit recon to gorealis. (#81) 2018-10-22 16:43:35 -07:00
Renan DelValle
231793df71
Adding a separate function to add dedicated attributes. (#80)
Dedicated wrapper for "dedicated" constraints
2018-10-11 09:43:35 -07:00
Renan DelValle
e0f33ab60e
Bumping up the version number advertised by gorealis to the scheduler. 2018-10-05 08:09:30 -07:00
Renan DelValle
9dcb7a8969
Moving the Codecov badge to right beside the Travis CI badge. 2018-10-05 08:09:05 -07:00
Renan DelValle
4395c2ae1a
Code coverage (#79)
* Turning on codecoverage from Codecov.
2018-10-05 07:57:19 -07:00
Renan DelValle
70252ffacf Updating Aurora compatibility in anticipation of next release. 2018-10-04 18:46:27 -07:00
Renan DelValle
4963bbb922 Sharling layers in docker compose between agent and master. 2018-10-04 18:46:27 -07:00
Renan DelValle
149d03988c
Sample Client cleanup, misc cleanup (#74)
* Changing print + os.exit to log.Fatal. Leaving a TODO to move documentation to interface.
2018-10-04 11:28:32 -07:00
Renan DelValle
037c636d6d
Retry switch fallthrough fix and create multiple tests (#77)
* Bugfix: switch statements were missing fallthrough statement thus making them retry non-retriable errors. Using a list to catch cases now.

* Adding tests for CreateService, createService when the executor doesn't exist, and createJob when the executor doesn't exist. Renamed Pulse test to reflect that it's using CreateService instead of CreateJob.

* Repsonse propagate back up to caller for context for CreateJob, CreateService, and StartJobUpdate.

* Deleting PR template as Travis CI takes care of running tests and formatting tests now.
2018-10-04 10:47:08 -07:00
Renan DelValle
9ebf118e71
Create job bevaviour does not override default batch size. (#75) 2018-09-25 16:37:17 -07:00
Renan DelValle
e85781e6d4
Upgrade Aurora to 0.21.0 and Mesos to 1.5.1 for compose setup. 2018-09-14 16:38:05 -07:00
Renan DelValle
5099d7e6ec
Adding force snapshot and force backup APIs (#73)
* Adding force snapshot and force backup APIs.
2018-09-14 15:04:16 -07:00
Renan DelValle
0f2ece10ac
Ignoring vendor folder when checking for goimports failure. 2018-09-13 17:22:04 -07:00
Renan DelValle
ad0da8c867
Adding goimports check. From here on in, any PR that doesn't pass goimports will fail the CI build. 2018-09-13 17:14:38 -07:00