Commit graph

175 commits

Author SHA1 Message Date
Renan DelValle
0e387c4a73
Changing to new format of update strategies which uses a union. 2018-05-13 18:30:02 -07:00
Renan DelValle
77bb78927e
API update to support staggered updates prototype. 2018-05-09 15:33:32 -07:00
Renan DelValle
40153d5cb1
Temporary modification for testing, do not use in live environment. 2018-05-08 14:35:30 -07:00
Renan DelValle
4f5766b443
Misc. bug fixes and addition of debug logging (#61)
* Fixing possible race condition when passing backoff around as a pointer.

* Adding a debug logger that is turned off by default. If debug is turned on, but a logger has not been assigned, a default logger that will print to STDOUT will be created.

* Making Mutex a pointer so that there's no chance it can accidentally be copied.

* Removing a leftover helper function from before we changed how we configured the client.

* Minor changes to demonstrate how a logger can be used in conjunction to debug mode in the sample client.
2018-04-13 11:03:29 -07:00
Robert Allen
c0d2969976 Adding Admin Client calls GetQuota & SetQuota (#59)
* Adding Admin Client calls `GetQuota` & `SetQuota`

This change set adds admin client calls to fetch and
mutate the OwnerRole quota[cpu,ram,disk].
2018-03-07 16:24:27 -08:00
kkrishna
66809c55f7
Merge pull request #58 from paypal/zkPolish
Zookeeper functions and retry functions cleanup
2018-03-05 12:09:18 -08:00
Renan DelValle
acc54c1015
Adding logging when there is a client error. 2018-03-05 11:20:39 -08:00
Renan DelValle
0bb23cec71
Adding unit tests for Zookeeper related functions to prevent regressions. 2018-03-03 14:13:47 -08:00
Renan DelValle
3d62df1684
* Errors have been refactored.
* ZK retries have been cleaned up. We will now retry after every error
EXCEPT when we have a badly formed path.
* ZK library has been reworked with optional arguments pattern to not be
so intertwined with the cluster.json file.
* Timeout error has been re-implemented as RetryError. RetryError
behaves like a Timeout error but is used exclusively to add more context
privately. This allows us to have unit tests that check our retry
mechanism is actually retrying.
* Additional logging has been added to retry mechanisms as well as to
the Zookeeper library we use.
2018-03-03 14:08:04 -08:00
Sivaram Mothiki
dc327bebad change config for certs path (#57)
Bug fix. Changing hardcoded certificate to one provided by configuration object.
2018-03-02 15:21:45 -08:00
Renan DelValle
a43dc81ea8
Simplifying retry mechanism for Thrift Calls (#56)
* Deleting permament error as it doesn't make sense. Just return a plain old error and that will be considered permanent.

* Removing double closure at as it's unmaintainable and can be error prone. Separated back offs into a generic one and a thrift call specific one.

* ZK leader finder now returns a temporary error instead of constantly no leader found and quitting. It could be that the leader info is being propagated so it's worth trying another time.

* Adding more logging to the retry.

* Wrapping lock and unlock in an anonymous function so that we can use defer on unlock such that it is called in the case of a panic.
2018-02-15 15:16:39 -08:00
Renan DelValle
64948c3712
Backoff mechanism fix (#54)
* Fixing logic that can lead to nil error being returned and retry stopping early.

* Fixing possible code path that may lead to an incorrect nil error.
2018-02-06 12:44:27 -08:00
kkrishna
a6b077d1fd Aurora jobupdate functionality -- pause/resume/pulse api (#55)
* Adding GetJobs api

* Adding Aurora pause/resume/pulse api
2018-02-06 12:39:02 -08:00
kkrishna
8bd3957247 GetJobs api (#53)
* GetJobs API added
2018-01-27 10:33:55 -08:00
Renan DelValle
dbb08ded90
Simplifying KillJob API implementation (#52)
By submitting an empty set to the thrift call, the Aurora scheduler will look for active tasks itself.
2018-01-24 15:37:12 -08:00
Renan DelValle
a941bcb679
Thread safety, misc fixes, and refactoring (#51)
* Changing incorrect license in some source files.

* Changing CreateService to mimic CreateJob by setting the batch size to the instance count.

* Changing Getcerts to GetCerts to match the style of the rest of the codebase.

* Overhauled error handling. Backoff now recognizes temporary errors and continues to retry if it finds one.

* Changed thrift function call wrapper to be more explicitly named and to perform more safety checks.

* Moved Jitter function from realis to retry.

* API code is now more uniform and follows a certain template.

* Lock added whenever a thrift call is made or when a modification is done to the connection. Note that calling ReestablishConn externally may result in some race conditions. We will move to make this function private in the near future.

* Added test for Realis session thread safety. Tested ScheduleStatus monitor. Tested monitor timing out.

* Returning nil whenever there is an error return so that there are no ambiguities.

* Using defer with unlock so that the lock is still released if a panic is invoked.
2018-01-21 19:30:01 -08:00
Renan DelValle
b2ffb73183
Introducing temporary errors. Refactored reestablish connection code … (#50)
* Introducing temporary errors. 

* Refactored reestablish connection code to use NewClient.

* Added reestablish connection test to end to end tests.
2018-01-16 14:35:01 -08:00
Renan DelValle
1c426dd363
Changing the drain monitor to match the rest of the monitors using timer and ticker. Made a generic schedule status monitor that can be used with any of the default sets provided. (#49) 2018-01-07 13:30:02 -08:00
Renan DelValle
8d445c1c77
Moving from govendor to dep, updated dependencies (#48)
* Moving from govendor to dep.

* Making the pull request template more friendly.

* Fixing akward space in PR template.

* goimports run on whole project using ` goimports -w $(find . -type f -name '*.go' -not -path "./vendor/*" -not -path "./gen-go/*")`

source of command: https://gist.github.com/bgentry/fd1ffef7dbde01857f66
2018-01-07 13:13:47 -08:00
PRADYUMNA KAUSHIK
9631aa3aab Specify field names when initializing structs (#47)
* Added field names to struct initializations.
2017-12-23 10:33:42 -08:00
PRADYUMNA KAUSHIK
ff545e8aa6 Fixing semantic errors in docs/getting-started.md (#44)
* fixed semantics. Earlier mentioned that 'Pystachio does yet support...'. Changed now to mention 'Pystachio does not yet support...'

* fixed grammatical mistake.

* Ran gofmt on project.
2017-12-22 08:53:05 -08:00
Renan DelValle
c338c03355
Thrift API update and Pull Request template. (#43)
* Adding a Pull request template to serve as a reminder of what
to do before creating the pull request.

* Updating our thrift API to match changes made by Aurora.

* Update go bindings to match update thrift API.
2017-12-14 14:37:08 -08:00
Sivaram Mothiki
d4027bc95c make insecureskipverify configurable (#40)
* make inseucreskipverify configurable

* add insecure and certspath to configs

* add certs test

* add config support for client key and cert
2017-12-12 14:04:11 -08:00
kkrishna
dd804af0a8
Merge pull request #42 from paypal/log
Adding missing Logger interface.
2017-11-30 12:33:00 -08:00
Renan DelValle
06cfa214ec Adding missing Logger interface. 2017-11-30 12:11:54 -08:00
Renan DelValle
e614e04f27
Code cleanup, added ability to attach logger, added CreateService api
* Code cleanup: Deleted multiple functions which have become stale. Removed cluster example as we replaced the need to create the Cluster object.

* Cleaned up ZK connection code by using the backoff function. Added a test to the end to end to test that we're getting the host correctly from ZK. Changed clusters test to be an outside package.

* Added LeaderFromZKURL test to end to end tests.

* Added logger to realisConfig so that users can attach their own Loggers to the client. Logger is an interface that shadows most popular logging libraries. Only Print, Println, and Printf are needed to be a realis.Logger type. Example in the client uses the std library log.

* Moved most fmt.Print* calls to be redirected to user provided logger. Logger by default is a no-op logger.

* Adding CreateService to realis interface. Uses the StartJobUpdate API to create services instead of the createJobs API.

* Bumping up version number inside client in anticipation of new release.
2017-11-30 12:02:50 -08:00
Sivaram Mothiki
72b746e431 use exponential back off func from realis lib (#39)
* use exponential back off func from realis lib

* remove exponential backoffs from monitors

* dont compare for retry errors
2017-11-04 15:06:26 -07:00
Renan DelValle
23430cbf30 Merge pull request #38 from paypal/newAddress
Out with the old (address) in with the new (address)
2017-10-13 00:17:59 -07:00
Renan DelValle
a1350c6d55 out with the old (address) in with the new (address) 2017-10-12 17:11:01 -07:00
Renan DelValle
8a4a9bdb8c Merge pull request #37 from rdelval/auroraAdmin
Added end maintenance API which allows DRAINED hosts to be transition…
2017-10-11 12:28:15 -07:00
Renan DelValle
65398fdfd6 Removed print statement as it makes no sense after the monitor change 2017-10-04 17:40:33 -07:00
Renan DelValle
bd008dbb39 Changing client to reflect monitor changes 2017-10-04 17:12:06 -07:00
Renan DelValle
1fd07b5007 Avoided going through the entire list of monitored hosts by keeping a set of hosts that had transistioned to a desired mode. 2017-10-04 15:56:59 -07:00
Renan DelValle
fa7833a749 Updating client to reflect changes made on the Monitor's side 2017-10-04 14:34:47 -07:00
Renan DelValle
922e8d6b5a Changing HostMaintenance to return a map[string]bool where true indicates success, false indicates failure to transition to the desired state. 2017-10-02 17:24:01 -07:00
Renan DelValle
3111b358fc Host Maintenance monitor now returns a list of hosts that did enter the desired mode(s) instead of a boolean. This means the monitor can see a partial success. 2017-09-29 18:21:30 -07:00
Renan DelValle
430764f025 Added tests for draining. run go test with a aurora vagrant image running to test. 2017-09-28 17:49:15 -07:00
Renan DelValle
7db2395df1 Changed from the old style of creating clients to the new clojure pattern. 2017-09-28 17:36:41 -07:00
Renan DelValle
bf354bcc0a Fixed bug with NewDefaultClientUsingUrl where backoff was not initialized leading to nil pointer error. 2017-09-28 17:14:55 -07:00
Renan DelValle
8334dde12f Sample client now blocks until all hosts entered desired state. Cleaned up host maintenance monitor. 2017-09-28 16:50:46 -07:00
Renan DelValle
dc6848f804 Host Maintenance monitor which allows to block until all hosts in a list have entered of the states in a provided set 2017-09-28 16:35:24 -07:00
Renan DelValle
c03e8bf79c Added Maintenance status API 2017-09-28 16:32:17 -07:00
Renan DelValle
8fe3780949 Added end maintenance API which allows DRAINED hosts to be transitioned to ACTIVE. Fixed bug where payload error would never be returned if call failed due to a bad payload. 2017-09-27 14:02:43 -07:00
Renan DelValle
f59940f9a7 Merge pull request #36 from rdelval/auroraAdmin
New API to set hosts to DRAINING. Cleaned up some of the sample client code.
2017-09-26 12:33:50 -07:00
Renan DelValle
f59f0bbdc3 Setting retry for drain hosts to be a little more agressive for now. Added comments for go doc. DrainHosts now returns also returns a result type which will give API users the ability to operate directly on that object wihout using a response helper. 2017-09-25 15:45:40 -07:00
Renan DelValle
0d3126c468 New API to set hosts to DRAINING. Cleaned up some of the client code, and fixed a few error printing bugs. 2017-09-22 12:55:03 -07:00
kkrishna
d9f4086853 Merge pull request #35 from rdelval/jobAsPtr
AuroraJob is now passed as pointer, resources are now initialized with create job
2017-09-18 16:48:57 -07:00
Renan DelValle
f301affdd0 Initializing CPU, RAM, and DISK resources to have an address when a new job object is created so that were able to modify the values when constructing job updates. 2017-09-18 16:10:10 -07:00
Renan DelValle
13cc103faa Changing AuroraJob to *AuroraJob in order to not create new objects so that UpdateJob is able to change parameters. 2017-09-14 16:38:26 -07:00
Renan DelValle
ef49df747f Merge pull request #33 from smothiki/jobupdate
make monitor job update return false, err on rolled back and update failures
2017-08-22 17:14:07 -07:00