calls to rapl/Cap.go#Cap() result in connection leaks. #20
Labels
No labels
bug
documentation
duplicate
enhancement
fix
good first issue
help wanted
invalid
major
question
testing
wontfix
No milestone
No project
No assignees
1 participant
Due date
No due date set.
Dependencies
No dependencies set.
Reference: spdf/elektron#20
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
rapl power cap does not close the ssh connection. Even though the ssh session is closed, the connection is still maintained to allow for establishing new sessions without having to redo the handshake. However, the existing code establishes a new connection on each call to the function. This results in the
tcp: accept4: too many open connections
error.There are two ways to go about this.
defer connection.Close()
) before returning from the function.This was more or less a hacky way to get this done quickly. I'd always intended to get back and write a daemon which runs on the worker nodes and receives messages from Elektron and sends back an acknowledgement.
Maybe I'll assign this to myself if I have some time. I'll create a new repo since it'll be a new binary.
So instead of having an SSH process, the daemon will receive a payload and perform the change to RAPL, replying with a confirmation the message was received and everything went OK.
I think we should use gRPC for this.
Makes sense and sounds good. Couple of things to consider.
Just a couple of things that come to mind.
I've started working on this so I'll assign this to myself. I started out with a gRPC implementation which I then realized was totally over kill. I've switched to just making the daemon a server that listens on a port (9090 by default) for incoming JSON payloads which contain the percentage.
May add some weak auth if time allows.
Re: the rate limiting, this is definitely something we should consider but from the point of view of running experiments, though I agree that it is impractical to change caps quickly, I would be cautious of introducing anything that is out of the control of the user running the experiment.
I'd err on the side of letting the user control how often they send a capping request in academic exercises.
Makes sense. For experimentation purposes, we should provide more freedom to the user.
Also, it is possible that the user is trying to test the efficiency of RAPL and how quick it is to respond to changes in the power-cap.
I agree with you on this.
pull request #21 merged. Will close this after codebase has been refactored to pass payloads to rapl-daemon instead of opening ssh connections.