Moving workloads between nodes

Hi there, I’m not sure if this is the right place (it probably isn’t), so please point me in the right direction.

Will it become possible for a farmer to move workloads around in our farms?

Scenario:

  • I have farm with multiple servers
  • I need to retire one of the servers (or just its storage etc.) that currently has workloads deployed on it.
  • I have other nodes that has capacity for the workload.

I would like to be able to at a farm level move workloads from one node to another, to ensure that the workload owner doesn’t suffer unnecessary downtime etc.

Will something like this be possible in future?

2 Likes

Hi @heaps, are you asking if the farmer can move the workloads of other people using capacity on their farm? It’s a good question. I’m not too sure of this one. Let’s see if @scott or @Geert or @weynandkuijpers knows the answer.

Hi, yep, pretty much.

An active workload may be trickier of course, but If my nodes just have lets say SSD reservations, it would be great to be able to shunt those to other nodes in the farm when needed.

At scale, sure apps need to be built with an architecture in mind that nodes can come and go. But it might be handy if for example I know a particular server needs downtime (or will be removed from my farm permanently) that I can relocate storage etc to other nodes in the farm.

2 Likes

True, great points. Will let one of the others chime in. :pray:

1 Like

As the system stands, only deployers have the ability to change workloads. This is enforced through the smart contracts and signatures that deployers provide with their workloads.

Implementing something like this is more complex that it might seem at first. Transferring data from one node to another requires a compute workload that has access to that data. These kinds of administrative capabilities are intentionally left out of Zero OS as part of the security model.

Our goal is a self healing autonomous system without single points of failure. The Quantum Safe Storage system is one example, which is able to automatically restore the intended resilience level when a node it’s using goes offline.

In general, I think the better approach is develop solutions that provide self healing and redundant features while hiding complexity from the end user. Right now, for example, I can easily spin up a VM, but if the node goes offline it’s toast. Corporate clouds will keep backups and automatically spin up a new VM for you if something happens to one of their nodes. If someone can develop a similar service on the Grid, it would represent a great value and they can earn through sales channel fees.

At the same time, there’s nothing to stop those utilizing the Grid to coordinate with farmers to mitigate the impacts of planned service. So such a service provider could subscribe to alerts provided by farmers they work with and handle migrations on their end as needed. This could all be done without any changes to the base system, but if it turned out that some less intrusive feature could be added to facilitate such a system, I’d say we should consider it.

4 Likes

Groovy, Thanks for your thoughts :slight_smile:

2 Likes

Thanks @scott :pray: