gorealis

Author	SHA1	Message	Date
Tan N. Le	db9bebb802	enable default sla for slaDrain (#138 )	2021-11-01 18:17:49 -07:00
Renán I. Del Valle	62df98a3c8	Bug fix for auto paused update monitor (#136 ) Returns success if the update has finished updating successfully.	2021-08-06 16:02:52 -07:00
Renan I. Del Valle	c6a2a23ddb	Changing how constraints are handled internally (#115 ) * Updating Changelog to reflect what's changing in 1.22.1 * Bug fix: Setting the same constraint multiple times is no longer allowed. * Constraints map has been added to handle constraints being added to Aurora Jobs. * Lowering timeout to avoid flaky test for bad payload timeout. * Adding attributes to Mesos agents in order to test limits by constraint. * Make two instances schedulable per zone in order to experience flaky behavior.	2020-01-15 08:21:12 -08:00
Renan I. Del Valle	976dc26dcc	Adding autopause APIs to future (#110 ) * Updating thrift definitions to add autopause for batch based update strategies. * Adding batch calculator utility and test cases for it. * Adding PauseUpdateMonitor which allows users to poll Aurora for information on an active Update being carried out until it enters the ROLL_FORWARD_PAUSED state. * Tests for PauseUpdateMonitor and VariableBatchStep added to the end to end tests. * Adding TerminalUpdateStates function which returns a slice containing all terminal states for an update. Changed signature of JobUpdateStatus from using a map for desired states to a slice. A map is no longer necessary with the new version of thrift and only adds complexity.	2020-01-14 15:50:10 -08:00
Renan DelValle	fe692040aa	Variable Batch Update Support (#100 ) * Changing generateBinding.sh check to check for thrift 0.12.0 and adding support for Variable Batch updates. * Adding update strategies change to changelog, changed docker-compose to point to aurora 0.22.0 snapshot. Added test coverage for update strategies.	2020-01-14 15:50:10 -08:00
Renan DelValle	df8fc2fba1	Documentation and linting improvements (#108 ) * Simplifying documentation for getting started: Removed outdated information about install Golang on different platforms and instead included a link to the official Golang website which has more up to date information. Instructions for installing docker-compose have also been added. * Added documentation to all exported functions and structs. * Unexported some structures and functions that were needlessly exported. * Adding golang CI default configuration which can be useful while developing and may be turned on later in the CI. * Moving build process in CI to xenial. * Reducing line size. in some files and shadowing in some test cases.	2019-06-12 11:22:59 -07:00
Renan DelValle	6dc4bf93b9	Retry temporary errors by default (#107 ) * Adding Aurora URL validator in order to handle scenarios where incomplete information is passed to the client. The client will do its best to guess the missing information such as protocol and port. * Upgraded to testify 1.3.0. * Added configuration to fail on a non-temporary error. This is reverting to the original behavior of the retry mechanism. However, this allows the user to opt to fail in a non-temporary error.	2019-06-11 11:47:14 -07:00
Renan DelValle	1a15c4a5aa	V1 CreateService and StartJobUpdate Timeout signal and cleanup (#105 ) * Bumped up version to 1.21.1 * Moving admin functions to a new file. They are still part of the same pointer receiver type. * Removing dead code and fixing some comments to add space between backslash and comment. * Adding set up and tear down to run tests script. It sets up a pod, runs all tests, and then tears down the pod. * Added `--rm` to run tests Mac script. * Removing cookie jar from transport layer as it's not needed. * Changing all error messages to start with a lower case letter. Changing some messages around to be more descriptive. * Adding an argument to allow the retry mechanism to stop if a timeout has been encountered. This is useful for mutating API calls. Only StartUpdate and CreateService have enabled by default stop at timeout. * Added 2 tests for when a call goes through despite the client timing out. One is with a good payload, one is with a bad payload. * Updating changelog with information about the error type returned. * Adding test for duplicate metadata. * Refactored JobUpdateStatus monitor to use a new monitor called JobUpdateQuery. Update monitor will now still continue if it does not find an update to monitor. Furthermore, it has been optimized to reduce returning payloads from the scheduler as much as possible. This is through using the GetJobUpdateSummaries API instead of JobUpdateDetails and by including a the statuses we're searching for as part of the query. * Added documentation as to how to handle a timeout on an API request. * Optimized GetInstancesIds to create a copy of the JobKey being passed down in order to avoid unexpected behavior. Instead of setting every variable name separately, now a JobKey array is being created.	2019-05-05 11:46:22 -07:00
Renan DelValle	1f459dd56a	Adds support for Tier and SlaPolicy to the Job interface (#99 ) * Adding parameter for Aurora so that we're able to run SLA aware updates with less than 20 instances. Lowered time it takes to run test by reducing watch time per instance as well. * Reducing the number of instances and time for SLA aware instances in docker-compose set up. * Adding another Mesos agent to the docker-compose setup. * Huge thanks to @zircote for this contribution.	2019-02-20 16:36:50 -08:00
Renan DelValle	79fa7ba16d	Upgrading gorealis v1 to Thrift 0.12.0 code generation. End to end tests cleanup (#96 ) * Ported all code from Thrift 0.9.3 to Thrift 0.12.0 while backporting some fixes from gorealis v2 * Removing git.apache.org dependency from Vendor folder as this dependency has migrated to github. * Adding github.com thrift dependency back but now it points to github.com * Removing unnecessary files from Thrift Vendor folder and adding them to .gitignore. * Updating dep dependencies to include Thrift 0.12.0 from github.com * Adding changelog. * End to end tests: Adding coverage for killinstances. * End to end tests: Deleting instances after partition policy recovers them. * End to end tests: Adding more coverage to the realis API. * End to end tests: Allowing arguments to be passed to runTestMac so that '-run <test name>' can be passed in. * End to end tests: Reducing the resources used by CreateJob test. * End to end tests: Adding coverage for Pause and Resume update. * End to end tests: Removed checks for Aurora_OK response as that should always be handled by the error returned by the API. Changed names to be less verbose and repetitive. * End to end tests: Reducing watch time for instance running when creating service for reducing time it takes to run end to end test.	2019-02-20 11:11:46 -08:00
Renan DelValle	2f7015571c	Adding support for setting GPU as a resource. (#93 ) * Adding support for setting GPU as a resource. * Refactoring pulse update test.	2019-01-08 15:11:52 -08:00
Robert Allen	296af622d1	This adds the following function to the PartitionPolicy configuration to the Job interface (#91 ) * Adding Partition Policy API	2018-12-20 14:38:06 -08:00
Renan DelValle	9a835631b2	Running goimports on all repository to conform to newest goimports.	2018-12-19 15:33:35 -08:00
Renan DelValle	2eaa60f681	Support Drain SLA API (#88 ) * Bringing thrift API up to date with Aurora 0.21.0. * Adding support for SLA Drain Host API.	2018-11-16 11:41:09 -08:00
Renan DelValle	fa5133c13d	Test coverage improvement (#83 ) * Adding tests for getPendingReasons and startMaintenance. * Added tests for ThriftBinary and ThriftJSON. * Adding test for NOOP Logger.	2018-10-28 19:16:44 -07:00
Renan DelValle	2306d6180f	Adding force Implicit and force Explicit recon to gorealis. (#81 )	2018-10-22 16:43:35 -07:00
Renan DelValle	037c636d6d	Retry switch fallthrough fix and create multiple tests (#77 ) * Bugfix: switch statements were missing fallthrough statement thus making them retry non-retriable errors. Using a list to catch cases now. * Adding tests for CreateService, createService when the executor doesn't exist, and createJob when the executor doesn't exist. Renamed Pulse test to reflect that it's using CreateService instead of CreateJob. * Repsonse propagate back up to caller for context for CreateJob, CreateService, and StartJobUpdate. * Deleting PR template as Travis CI takes care of running tests and formatting tests now.	2018-10-04 10:47:08 -07:00
Renan DelValle	5099d7e6ec	Adding force snapshot and force backup APIs (#73 ) * Adding force snapshot and force backup APIs.	2018-09-14 15:04:16 -07:00
Renan DelValle	1c2b1c5079	Continous integration through Travis CI (#71 ) * Adding Travis CI badge * Modifying end to end tests to reflect testing against docker-compose setup in Travis CI. * Adding bash script to run simple container with tests within bridge network for Mac. * Adding documentation for setting up a developer environment. * Decreasing amount of CPU needed for CreateJobWithPulse because a higher value causes Travis CI to hang.	2018-08-13 20:09:25 -07:00
Ezequiel Torres Feyuk	fe567ee966	Task query optional parameters (#69 ) * Change TaskQuery struct parameters to optional * Thrift API is modified to make all the parameters in the TaskQuery struct optional * Autogenerated code is regenerated * Changes in TaskQuery structs used in the project * Now that TaskQuery receive optional values, pointers instead of values must be passed to the struct	2018-06-28 11:48:28 -07:00
Renan DelValle	6c8ab10b64	Merge develop branch into master (#68 ) * Fixing possible race condition when passing backoff around as a pointer. * Adding a debug logger that is turned off by default. Info logger is enabled by default but prints out less information. * Removing OK Aurora acknowledgment. * Making Mutex a pointer so that there's no chance it can accidentally be copied. * Changing %v to %+v for composite structs. Removing a repetitive statement for the Aurora return code. * Removing another superflous debug statement. * Removing a leftover helper function from before we changed how we configured the client. * Changing the logging paradigm to only require a single logger. All logging will be disabled by default. If debug is enabled, and a logger has not been set, the library will default to printing all logging (INFO and DEBUG) to the stdout. * Minor changes to demonstrate how a logger can be used in conjunction to debug mode. * Removing port override as it is not needed * Changing code comments to reflect getting rid of port override. * Adding port override back in. * Bug fix: Logger was being set to NOOP despite no logger being provided when debug mode is turned on. * Turn on logging by default. * Removing option to override schema and ports for information found on Zookeeper. * Turning off debug mode for tests because it's too verbose. Making sure LevelLogger is initialized correctly under all scenarios. * Removing override fields for zk config. * Remove space. * Removing info that is now incorrect about zk options.	2018-06-22 12:57:21 -07:00
Renan DelValle	4f5766b443	Misc. bug fixes and addition of debug logging (#61 ) * Fixing possible race condition when passing backoff around as a pointer. * Adding a debug logger that is turned off by default. If debug is turned on, but a logger has not been assigned, a default logger that will print to STDOUT will be created. * Making Mutex a pointer so that there's no chance it can accidentally be copied. * Removing a leftover helper function from before we changed how we configured the client. * Minor changes to demonstrate how a logger can be used in conjunction to debug mode in the sample client.	2018-04-13 11:03:29 -07:00
Robert Allen	c0d2969976	Adding Admin Client calls `GetQuota` & `SetQuota` (#59 ) * Adding Admin Client calls `GetQuota` & `SetQuota` This change set adds admin client calls to fetch and mutate the OwnerRole quota[cpu,ram,disk].	2018-03-07 16:24:27 -08:00
Renan DelValle	3d62df1684	* Errors have been refactored. * ZK retries have been cleaned up. We will now retry after every error EXCEPT when we have a badly formed path. * ZK library has been reworked with optional arguments pattern to not be so intertwined with the cluster.json file. * Timeout error has been re-implemented as RetryError. RetryError behaves like a Timeout error but is used exclusively to add more context privately. This allows us to have unit tests that check our retry mechanism is actually retrying. * Additional logging has been added to retry mechanisms as well as to the Zookeeper library we use.	2018-03-03 14:08:04 -08:00
Renan DelValle	64948c3712	Backoff mechanism fix (#54 ) * Fixing logic that can lead to nil error being returned and retry stopping early. * Fixing possible code path that may lead to an incorrect nil error.	2018-02-06 12:44:27 -08:00
kkrishna	a6b077d1fd	Aurora jobupdate functionality -- pause/resume/pulse api (#55 ) * Adding GetJobs api * Adding Aurora pause/resume/pulse api	2018-02-06 12:39:02 -08:00
kkrishna	8bd3957247	GetJobs api (#53 ) * GetJobs API added	2018-01-27 10:33:55 -08:00
Renan DelValle	a941bcb679	Thread safety, misc fixes, and refactoring (#51 ) * Changing incorrect license in some source files. * Changing CreateService to mimic CreateJob by setting the batch size to the instance count. * Changing Getcerts to GetCerts to match the style of the rest of the codebase. * Overhauled error handling. Backoff now recognizes temporary errors and continues to retry if it finds one. * Changed thrift function call wrapper to be more explicitly named and to perform more safety checks. * Moved Jitter function from realis to retry. * API code is now more uniform and follows a certain template. * Lock added whenever a thrift call is made or when a modification is done to the connection. Note that calling ReestablishConn externally may result in some race conditions. We will move to make this function private in the near future. * Added test for Realis session thread safety. Tested ScheduleStatus monitor. Tested monitor timing out. * Returning nil whenever there is an error return so that there are no ambiguities. * Using defer with unlock so that the lock is still released if a panic is invoked.	2018-01-21 19:30:01 -08:00
Renan DelValle	b2ffb73183	Introducing temporary errors. Refactored reestablish connection code … (#50 ) * Introducing temporary errors. * Refactored reestablish connection code to use NewClient. * Added reestablish connection test to end to end tests.	2018-01-16 14:35:01 -08:00
Renan DelValle	1c426dd363	Changing the drain monitor to match the rest of the monitors using timer and ticker. Made a generic schedule status monitor that can be used with any of the default sets provided. (#49 )	2018-01-07 13:30:02 -08:00
Renan DelValle	8d445c1c77	Moving from govendor to dep, updated dependencies (#48 ) * Moving from govendor to dep. * Making the pull request template more friendly. * Fixing akward space in PR template. * goimports run on whole project using ` goimports -w $(find . -type f -name '.go' -not -path "./vendor/" -not -path "./gen-go/*")` source of command: https://gist.github.com/bgentry/fd1ffef7dbde01857f66	2018-01-07 13:13:47 -08:00
PRADYUMNA KAUSHIK	9631aa3aab	Specify field names when initializing structs (#47 ) * Added field names to struct initializations.	2017-12-23 10:33:42 -08:00
Sivaram Mothiki	d4027bc95c	make insecureskipverify configurable (#40 ) * make inseucreskipverify configurable * add insecure and certspath to configs * add certs test * add config support for client key and cert	2017-12-12 14:04:11 -08:00
Renan DelValle	e614e04f27	Code cleanup, added ability to attach logger, added CreateService api * Code cleanup: Deleted multiple functions which have become stale. Removed cluster example as we replaced the need to create the Cluster object. * Cleaned up ZK connection code by using the backoff function. Added a test to the end to end to test that we're getting the host correctly from ZK. Changed clusters test to be an outside package. * Added LeaderFromZKURL test to end to end tests. * Added logger to realisConfig so that users can attach their own Loggers to the client. Logger is an interface that shadows most popular logging libraries. Only Print, Println, and Printf are needed to be a realis.Logger type. Example in the client uses the std library log. * Moved most fmt.Print* calls to be redirected to user provided logger. Logger by default is a no-op logger. * Adding CreateService to realis interface. Uses the StartJobUpdate API to create services instead of the createJobs API. * Bumping up version number inside client in anticipation of new release.	2017-11-30 12:02:50 -08:00
Renan DelValle	a1350c6d55	out with the old (address) in with the new (address)	2017-10-12 17:11:01 -07:00
Renan DelValle	922e8d6b5a	Changing HostMaintenance to return a map[string]bool where true indicates success, false indicates failure to transition to the desired state.	2017-10-02 17:24:01 -07:00
Renan DelValle	3111b358fc	Host Maintenance monitor now returns a list of hosts that did enter the desired mode(s) instead of a boolean. This means the monitor can see a partial success.	2017-09-29 18:21:30 -07:00
Renan DelValle	430764f025	Added tests for draining. run go test with a aurora vagrant image running to test.	2017-09-28 17:49:15 -07:00
Renan DelValle	7db2395df1	Changed from the old style of creating clients to the new clojure pattern.	2017-09-28 17:36:41 -07:00
Renan DelValle	d27d8a4706	Updated end to end test on vagrant images to reflect new client creation.	2017-03-23 20:44:45 -04:00
Renan DelValle	58c560061f	Added timing to Thrift calls in order for end to end test to test changes that affect thrift call speed such as Thrift protocol changes	2017-02-13 19:32:48 -05:00
Renan DelValle	5f155f4337	Moving from the Thrift JSON protocol to the Thrift Binary protocol by default. Realis config now holds transport and Protocol factory for impreoved flexibility. Renamed realis_test to reflect the true nature of the test along with a minor fix for an API change.	2017-02-10 19:23:20 -05:00

42 commits