How does ThreeFold achieve 10x and more energy efficiency numbers

weynandkuijpers · June 17, 2020, 5:23am

This thread is started by a question in the Ambassadors telegram group. Here’s the original question: "Hi @weynandkuijpers where can I find whatever details/data we have on energy savings?

Here’s my answer:

Hi Colin, we have done a “study” not too long ago towards compute workloads. This is the hardest (and smallest) energy gain to be identified as “processing data” takes CPU power and the differences are due to more or less efficient operating systems. I’ll look for that data, but a warning this is not going to impress you.

The big advantages and large efficiency gain numbers start with storage. The facts to why this is so are simple: traditional IT relies on creating backups (=copies) of data. Every IT manager (of about my age has been raised with the 1, 2, 3 (and 4) rule.

There is one copy of the data that is the active copy which is the workhorse
The second copy you need to have is a “hot standby” copy (means online and alive and kicking) for when the active one fails which is almost as complete as the original active one (this requires a lot of complex activities to keep the two copies synchronized).
The third copy is a backup copy. This is a multiple in size of the original data set as you keep multiple copies: full weekly backups for a couple of weeks and then daily / hourly “incremental” copies to be able to go back to a certain point in time to make sure that you can recover from data corruption, ransome ware or any of the other data destruction threats (human beings deleting files … :-).
The last and ultimate one: When you really want to do a good job you need to send a copy of the data to a secure vault service. Usually, specialized companies running this service for you.

This easily leads to a 10x overhead in energy consumption: electricity for storing and transporting data, administration, human (brain) power, and actually gas to transport physical data holding devices compared to dispersed (ThreeFold) storage. This storage mechanism does not have any full copies of datasets, it cleverly spreads the data over multiple dispersed devices (on other servers, in another rack, in another datacenter or even in another country/continent). this is driven by a storage policy and by choosing the right policy for a specific data type (16+4 = 25% extra storage, 16+8 = 50% extra storage, but certainly not 300% extra storage or more) you can achieve very (if not more reliable) storage solutions Dispersed storage is described well here: https://sdk.threefold.io/#/architecture_storage

The higher efficiency gain numbers (100x+) come (also from storage) but more specifically from the blockchain components of the ThreeFold grid. Working with permissioned nodes (“lottery” based consensus) and not proof of work (which is “race” based consensus where only one wins, the rest that has participated is a waste of effort = energy) is much more efficient. The second aspect here is that public blockchains that have 1000’s of (full) nodes which by definition mean 1000’s of copies of the same data , which is 100’s of times less efficient in terms of energy consumption.

Bu blending all of this specific choices and architectural components together we can claim that we are a lot more energy efficient than traditional technology.

So we do not have empirical data to underwrite the achieved efficiencies but a bit of knowledge about how we do things compared to the rest of the industry and common sense leads to the conclusion that these claims are valid. We will get the claims verified when we have found the right partner to do this with, and when the time is right. Hope this helps!

colin · June 17, 2020, 2:03pm

Many thanks @weynandkuijpers, this most definitely helps, and thanks for using simple language which non technical people (like me!) can relate to.
I look forward to others contributing their thoughts to help build a model to estimate actual energy savings, and of course an independent assessment in time.
Ultimately the efficiencies will allow us to supply competitively priced capacity and that will be a proof in itself, as otherwise we would not be able to compete with the incumbent hyper-scale data centre providers and the economies of scale they benefit from.

Geert · June 17, 2020, 3:13pm

Not using PoW is not that exceptional anymore in blockchain projects, I think the gain is yet somewhere else.
Due to the ‘smart contract for IT’ approach, the low-level writing to an immutable database does not allow any human intervention. As in a way machines are writing all down, there is simply no need to deduplicate all info as much as there are validators.

Next to that, there is also the ‘single tenant’ aspect: cpu’s only serve one ‘customer’, but as a consequence, there is no need for ‘context switching’. Also in computing, this requires a lot of wasted instructions, and every instruction costs energy.

Next to that, the decentralized nature of the nodes, all pretty automated, makes that small nodes don’t necessarily need to be brought together in a dense spot, and thus can be air-cooled. Airconditioning is a big waste of energy in a hyperscale data center. These claim to produce very green energy, but remember: the greenest energy is energy that you DON’T consume.

And one more thing I think about is the fact that capacity can be consumed locally. There again, if no transport needs to happen over a long distance, only few routing infrastructures need to be put at work.

weynandkuijpers · June 18, 2020, 6:50am

Thank you Geert. All in all it is a very unique stack and method to create compute and storage. simplification is the key word in a lot that we do. It’s really hard to not take any shortcuts and required continuous self-reflection of what has been build. Kudos for all the engineers and developers in our movement that are doing that on a daily basis.

zaibon · June 18, 2020, 6:55am

I just want to come back to this sentence. This is not how modern computing works and especially not in a share system like 0-OS is. CPU is shared between all the application running on a node. At the time of writing there is no way to “pin” a CPU for a specific application. The CPU time is shared between all workloads running on the node. The more CPU you allocate to a workload the more CPU time your workload is allocated.

gosam · July 16, 2020, 3:42pm

@weynandkuijpers I saw a comment on one of our social channels I’m wondering if you can help?

Do you have numbers concerning the energy usage in production at ThreeFold compared to the current infrastructure? As far as I know, Google (as an example) is 100% green energy. Looking forward to hard numbers.

weynandkuijpers · July 17, 2020, 9:16am

Hi Sam, thank you for the forward. I noticed that the remark was made on Facebook, unfortunately I stopped using Facebook a while ago as I felt that they were not treating my digital self with respect. Anyway - totally different topic .

@Bram - long time no speak - it’s been a while since we have been working from the same office. With regards to energy consumption and being energy efficient there are a few things to keep in mind:

using 100% renewable energy is a result of smart procurement. Unless you have a solar/wind farm that directly connects to your datacenter (with appropriate batteries for the nights) you are getting power from the grid that is produced in any possible way, coal, gas, nuclear and renewable. In general we do not have separate grids delivering “different flavors of electricity”. So although noble and providing Google with a marketing hook to make noise around it, it does not add a thing to actually consuming less energy
you as an architect should know that packing a lot of heat sources together (people in a building) compound to a larger problem than having lesser heat sources packed together. Therefore the “hyperscale” datacenter model (although powered by renewable energy) adds to the problem of having “bad efficiencies” in power consumption as the cooling problem becomes bigger and more power consuming (PUE factor)

The TF technology stack is all about “less is more”. Less densely packed in a datacenter, less overhead in automation layers, less people involved, etc etc.

One of our larger farmers has done research and found that compute workloads do achieve efficiency gains and therefore consume less power for data processing. And you can read in the forum above that there is a lot of gain in smarter data storage and running blockchain technology which does not use proof of work for consensus. So all in all enough evidence to make the claims we do.

I’ll ask the link to the document from the farmer and post it here for some hard number researched by others than us.

https://github.com/threefoldfoundation/info_threefold/blob/development/src/Green%20Edge%20-%20Sustainability%20Paper.pdf

Keep safe Bram!

Mik · June 25, 2024, 4:00pm