Power saving feature in development for farmers

scott · December 13, 2022, 8:27pm

Hi everyone,

You may have already heard about a new power saving feature in development for farmers, allowing idle nodes to power off and still farm tokens. I’ve discussed this a bit in our Telegram chats and wanted to post a more formal announcement about what’s in the works.

First off, I want to emphasize that this is still just a spec proposed by our developers and is subject to change as the discussion proceeds and any issues arise in implementation. Here’s the outline:

Idle nodes can power down and be “woken up” by another node in the same local network using wake on lan (WoL)
This means each local network needs at least one node powered on at all time to receive the wake up signal from TF Chain and broadcast to the other nodes
Nodes must have WoL enabled in their BIOS settings to participate, but nodes can be excluded from powering down if they don’t have this feature available or enabled (in the case of certified nodes, the feature will need to be enabled by the vendor)
Each “sleeping” node will wake up periodically (perhaps once per day), to verify its capacity
Nodes farm the same tokens as if they were running idle

This will also require some changes in how deployments are handled:

A farm is redefined as a set of nodes running in the same local network
Grid users create contracts with a farm, rather than with nodes individually
Nodes within the farm handle the logic of how to provision capacity when a new contract is generated
If the currently powered on nodes don’t have enough capacity to handle a new contract, they will wake up another node in the farm
This means deployment times will increase in some cases by the time it takes for a node to power on and be ready (5-10 minutes in good circumstances)
Since some deployments, like Kubernetes clusters, would prefer to be spread over multiple nodes for redundancy, a new contract type of “cluster” is proposed, allowing rules such as that contained deployments should not run on the same node

Overall, this should mean that farmers can reduce their energy bills substantially for idle nodes while maintaining their farming rewards. For grid users, the experience will remain the same in many cases. Larger deployments, or of course reserving a whole node, are more likely to trigger the boot up of a new node and require the user to wait. Small deployments should often fit within available capacity of already powered on nodes and come alive as quickly as they do now.

I’ll update this thread with any updates as they come. There’s no timeline for release yet, but I hope we can get an idea once the full specification is completed.

RobertL · October 4, 2022, 7:50pm

Will you be able to select the Node that wakes up the others? That could be the ultimate power saver if you could select a low power node to wake the more powerful nodes in your farm?

scott · October 4, 2022, 10:17pm

This hasn’t been settled yet. Once the farm has enough workloads that two of its nodes must be online to accommodate them, there’s no longer a need to worry about any node being dedicated to the task of waking up the others.

Introducing nodes specifically for this task also changes the makeup of the farm. For example, choosing a NUC to manage the wake ups also means that this becomes the first available node for workloads deployed to the farm. That NUC won’t have the characteristics that a rack mount server might have, like redundant power supplies and dual CPUs, to provide more resilience.

Such nodes could also be designated to not accept workloads, thus not farming any tokens either. Then the farmer would need to weigh the power savings versus the cost of the unit to see over what time frame this would be economical. Then again, these nodes could also be sold or used to farm on a separate farm once the farm began sustaining enough utilization to make them redundant, as described above.

RobertL · October 6, 2022, 8:14pm

Hi Scott, i see your points.
I can imagine that Bronze, Silver (made those up myself for those r620 farmers that can’t get gold ) and Gold farmers are excluded, or will loose their status if they include a NUC in their farm. But i assume it would work fine for all other home rackserver users.

jakubprogramming · October 6, 2022, 6:44am

I very much like this proposal. Anything that can increase energetic efficiency and reduce electricity costs for farmers is greatly needed. Especially for farmers in Europe right now.

jakubprogramming · October 6, 2022, 6:55am

From the way I understand it, they should still be able to include a NUC as long as this NUC is configured in a way that it will not be able to accept any workloads. Though as a result the first workload that is deployed on this particular farm will be deployed with said delay (5-10 min?) until one of the actual gold certified nodes is all booted up and ready. The second time a workload is deployed to the same farm, the gold certified node is already up and running, making the NUK obsolete in a way. So depending on how long the farm is not being utilized it may or may not make sense to inlcude a NUK into the setup.

RobertL · October 6, 2022, 8:19pm

We’ll see what they come up with, but it would surprise me. Gold farming was introduced to give users the best uptime and quality guarantee. That’s why you need double power feeds, double network, double routers etc. I can’t imagine we’ll let a 250 usd NUC be the weakest link in this chain when for some reason his power or connection fails and he can’t wake up anything.

scott · October 7, 2022, 2:18am

Yes, would need at least two NUCs running redudantly, or one NUC and one big node awake at all times.

I think the approach will be to keep things simple and focus on the proverbial “80%” savings that can be achieved without additional complication.

jeroenburjs · October 10, 2022, 8:08am

Hi Scott,

Great idea, as it is very much in line with the philosophy.
We would support this some remarks

Think the “wake up” node should indeed be a full node. I don’t think people want to wait when deploying
when doing the cycle, preferably not at the same time, as this will incure peak loads on power, cooling and network infra
will the “wake up” node also provide the pxe boot for the other machines? might save some traffic

igg · October 10, 2022, 3:20pm

Hey this is a great idea!

I am almost agreed with jeroenburjs. The “wake up” (sentry) node must not provide any deployment. It should not be loaded with unnecessary applications and data. The sentry must contain the latest ZOS image. When it wakes up another node will provide the ZOS more quickly than the original servers. It will save time and traffic, and release the main servers.

jeroenburjs · October 11, 2022, 6:31pm

Apparently i need to be more specific, i think the Always on node SHOULD provide capacity for people who want to run workloads, and preferably that would be 2 nodes for redundancy purposes.

farmingtech · October 11, 2022, 8:54pm

Why not have both options?

1st - It can be a Full Node ready for deployments and have the ability to “Wake Up” other nodes

2nd - It can be just a “Wake up” node that will never have deployments (its online status should be available to be checked on chain, this would assure the farmer the everythinig is OK)

Also the “wake up” node should be allowed to have PXE boot Server on/off. PXE boot is great, but we should not force anyone to have a PXE Boot Server present in theyr network.

The ISO that boots the “Wake up” node would have this options in a TXT File for example.

Each farmer would choose theyr best scenario.

scott · October 12, 2022, 4:11am

Hi Jeroen,

I agree that keeping a node awake which is also ready to accept workloads makes the most sense, also keeping things simple. Good idea on the second point, I also don’t see any reason to wake up all the nodes at once. Allowing one node to serve its copy of Zos to its neighbors could indeed prevent generating a lot more traffic using this scheme. I’ll propose this to the dev team.

jeroenburjs · October 12, 2022, 2:46pm

1: this would mean that someone might want to deploy something, but it might take up to 20-30 min before the workload can be accepted. And what happens if that node turns out to be stuck at boot? maybe we should argue that 10% or at least 2 nodes should be online to accept workloads.

2 As far as I understand it all uses PxE boot in the end to download latest image. Would be nice if it comes locally (good for farmer bandwidth and TFT bandwidth usage)

FLnelson · October 12, 2022, 6:48pm

I would hope that a certain amount of capacity per region will always to booted and ready to go and more will come online as needed. I agree, boot as needed for an install is a terrible idea.

scott · October 13, 2022, 6:42am

Initiating the whole boot sequence from a 3Node is too fragile/interactive I think, since it depends on the farmer pointing their dhcp towards that node and then this system can’t change without breaking the whole farm. Better in my mind is to leave the farmer solely responsible for delivering the initial bootstrap to the nodes, but investigate ways to serve Zos from other live nodes in the same LAN after bootstrap.

Dany · October 16, 2022, 3:17pm

What’s the timeline for this feature??? Can you please speed up the implementation?!? That would not only help avoid unnecessary costs but also would stop wasting resources and giant amounts of energy!! “Planet First” and so on… you know?!?

weynandkuijpers · October 17, 2022, 7:36am

It’s planned for release 3.8.x. You can follow the feature request here: https://github.com/orgs/threefoldtech/projects/172/views/1

m19s · October 21, 2022, 10:04am

I would think about leaving some percent of whole grid always on for buffering power with instant access. People will probably set least powerful nodes as masters so it may be not enough later when we get into utilization phase. I think that certified nodes(gold ones?) could be good idea to carry the load as it was expected to make them rewarded better but with greater resposibility

checkkill · October 21, 2022, 10:12am

I read that a ‘nuc’ or other low power node would not be considered sufficient for accepting contracts. To be clear, a small node with an i3, 32/64GB of ram and a 1TB drive can be a very efficient node and runs only 12 watt. Like a Lenovo M93p. They are available refurbished / second hand for around 100 euro’s (need to increase mem and drive then). But still, a very cost-effective way to reduce power in the farm.

Or buy a new Intel NUC with an i7. Including 64GB Ram and a 1TB drive its a decent node, plus the fact it generates over 520TFT each month…will set you back quite a bit more of course (around 800 euro). If this powerfeature becomes a thing, I’ll def go for a secondhand HP or Lenovo mini-PC.