Storage / Bandwidth Ratio

Hi farmers of the New Internet,

I have a quick question. Do you know the ratio of storage / bandwidth for a 3node?

For example, I read in the Telegram farmers’ channel that the maximum TB you can have is 30 TB SSD for a “normal” internet connection. But then what is a normal connection?

Is there a precise ratio of how many TB of SSD you can have per X/Y (upload/download) Mega bits per second of Internet bandwidth?

Thank you!

3 Likes

This is a great question which comes up from time to time in farmers chat. Unfortunately, there’s no hard rule we can provide and this ultimately depends on what kind of workloads get deployed to your node. As Grid utilization grows, we can start to gather some data point and give a firmer answer.

I’ll share something I wrote in the chat on this, about one specific farm:

One useful reference point is to calculate how long it would take to fill up that storage given a certain amount of bandwidth. For example, a 1 gigabit/second connection can move about 10 terabytes/day. So for archival purposes, 18 TB is no problem, since it can be filled in a few days even with a substantial buffer for regular home use. With a 10 mb/s connection, on the other hand, it would take six months at constant full bandwidth usage to fill 18 TB which would be too slow for many applications

We can calculate the data per day capacity of a given connection speed like this:

bytes/day = 24 * 60 * 60 * bits/second/8

So:

24 * 60 * 60 * 1 gigabit/second/8 = 10800 gigabytes/day = 10.8 terabytes/day

And:

24 * 60 * 60 * 10 megabit/second/8 = 108 gigabytes/day

Now that’s at full saturation, and you likely want to keep a good amount of bandwidth free for other things. Live streaming a single video feed can use as much as 25 mb/s. You may also have total bandwidth usage caps imposed by your ISP.

Seen another way, Titans are intended to work on any broadband internet connection. DSL usually has a minimum speed of 1 mb/s and the Titan comes with 1tb of storage. Based on that, we could consider 1 mb/s per 1tb of storage a baseline minimum. Of course, this ratio probably doesn’t scale linearly, so a large farmer might need less bandwidth, especially if it’s dedicated to the farm, not also split with home usage.

Those are my thoughts, anyway. Who has something to add? :slight_smile:

4 Likes

Thank you for this great answer. This clearly gives a good idea of how to think this through.

(1) Would there be a difference between HDD and SSD? Do the equations above still work for HDD?

I wonder since HDD are slower than SSD. Would it be fair to say that at a certain threshold of Internet bandwidth, HDD drives hit a wall in terms of data transfer, and this, at a faster pace than SSD drives would? If your Internet bandwidth is greater than your data transfer on the 3node itself, this could generate a bottleneck I would presume.


In a similar fashion, it would be good to have such calculations with compute units this time, instead of storage units. In other words, (2) how many mb/s of Internet is needed per GB of RAM in order to have an efficient 3node?


I think this kind of rule of thumb is very useful and could be shared in the farming community for farmers that want to expand their farm! As Titans have SSDs, it would be a great equation mainly for SSD farms.

As you said, it might not be linear, but might be akin to a square root function as the derivative of the mb/s needed will decrease as the farm gets bigger.


Having a set of such equations for the complete 3node activity would give a great approximation of the bandwidth needed per farm. This would drastically improve the farmers management of their farm. If every 3node is optimized in terms of data transfer, this would also increase the Threefold Grid effectiveness, predictability and trust level.

I’ll be curious to know what the community has to say on this!

Thanks for reading.

2 Likes

Thanks for the reply. Indeed, considering the intrinsic bottleneck for HDD would also be helpful here. A couple quick facts I dug up:

  • Hdd writes top out at about 200 mb/s, with 80-160 mb/s being more typical
  • When considering multiple disks, a single storage controller will quickly become the bottleneck, with RAM and buses being potential further bottlenecks

In practice, applications that store a lot of data may use some mix of SSD and HDD to improve performance through caching. Thus, a farmer’s bandwidth could still be saturated while filling the SSD cache in short term, without being bottle necked by HDD write speed.

I agree that giving some straightforward numbers and formulas will help farmers a lot while sizing their nodes relative to their available connectivity. However, I also hesitate to write these rules without some more perspectives.

What I think would also be helpful is some benchmarks from typical cloud workloads, including as you mentioned, their CU consumption.

2 Likes

I’ve been researching to find reference points on how to spec bandwidth relative to capacity sizing. The answers tend to focus on metering existing setups to know when it’s time add more bandwidth before problems come up. It’s hard to find any information on what kind of bandwidth is typically used for X amount of storage in a data center context. Cloud providers surely have this data, but it’s proprietary information.

This got me thinking :bulb: Why not look at the bandwidth allocations for some cloud packages on the market? This could serve as a reasonable baseline.

Storage allocation for compute workloads

Checking Vultr and Digital Ocean, I found similar figures for storage included in VMs:

25-50 GB Storage / 1-2 TB bandwidth (1:40)
80 GB Storage / 3-4 TB bandwidth (1:37.5-50)
160 GB Storage / 4-5 TB bandwidth (1:25-31.25)
320 GB Storage / 5-6 TB bandwidth (1:15.625-18.75)

Now these are total bandwidth figures per month, so we’ll need to calculate a connection’s monthly throughput to compare:

10 mbps = ~3 TB / month
100 mbps = ~30 TB / month
1 gbps = ~300 TB / month

If we look at the low end of cloud provider allocations (1:15), this means a farmers with 1 TB of storage needs 50mbps dedicated to the node to achieve the same level of service. Scaling up, a farmer with 50 TB of storage would need at least 2.5 gbps.

This is interesting, but it’s not the whole picture. These are compute products with some storage attached. How about storage products?

Storage product pricing

When it comes to object storage, Vultr and DO have the same baseline package:

250 GB / 1 TB of bandwidth (1:4)

This actually corresponds closely to one recommendation I found in a web hosting forum (1:5). That means our farmer with 50 TB, who is probably hosing quite a few storage workloads, only needs to dedicate ~600 mbps to meet this service level.

Of course, that’s the baseline package, and many users of cloud services are paying extra for additional bandwidth. Farmers at home also need to be sure they have enough bandwidth for their normal needs.

Conclusion

Using these figures as a guide can be a way to ensure that the Grid is able to support the minimum service levels of existing cloud providers. When it comes to Titans, I think we should stand by our specification that any residential high speed connection is sufficient for one node.

For DIY and large scale certified operations, I think we should be targeting the 1:4 - 1:15 range as baseline.

5 Likes

So most residential farmers will not have enough bandwidth to host 50TB of storage. Since it is quite hard and sometimes not an option to get 2/3 Gbps to meet the service level.

By 2/3 I meant the faction approximately equal to .67, but I can see that’s probably confusing and will update :slight_smile:

I se what you are doing here, but I do believe there is a “sales” element in here to purchase / make sure you have sufficient (read over provisioned) bandwidth here. If you take the first two ratio’s that this means that you up / download (network traffic is usually bi-directional is terms of what you purchase) that you down (up) load the whole storage alocation every day of the month.

A ratio1:40 means that during the month you consume enough bandwidth to 40 times down (or upload) the entire dataset stored. IMHO this is a but over the top for many use cases. For example archive? Backup.

So I completely agree that a target of 1:4 up to 1:15 (but this is only for very active sites / use cases) this is a reasonable number to work with. Taking it back to bandwidth numbers:

10 Mbps connection allows for comfortably 3 TB to be transported (not saturated and other use of the network as well) which means that you can support

  • 750GB of storage per 10mbps connection for low activity workloads.
  • 200GB of storage per 10mbps connection for high activity workloads.
2 Likes

Would it be appropriate to assume storage related to “low activity workloads” to be HDD size and “high activity workloads” to be SSD size?

That sounds like a very good addition to what I wrote. Agree!

Great discussion. Thanks everyone for sharing your thoughts.

So here we covered some ground on the storage level, then we’d need to understand and consider the needs when it comes to vcores and RAM in relation to the bandwidth.

I think it would be amazing if Threefold could come up soon with clear thresholds so farmers can align with their ISP. Imagine the scenario where Grid 3 is full of farms and 3nodes but when cultivation kicks in, we get major bottlenecks of transfers and thus a poor user experience.

I know it’s on the way. But I wonder when. As I think many farmers are building farms in March and in the coming months.

Personally, I plan to size my bandwidth according to the data transfer logging on my router. When we get closer to utilization I will move my rack to a data center. A 1 gbps connection at a DC is standard or very cheap. That will probably suffice for a while and I will be able to tell from the % utilization of my nodes and my router when more bandwidth is needed. Full utilization of my rack will need ~8gbps at my current build out. A 10gbps connection is big $$$, but that level of utilization will mean TFT is at a price that paying for that connection will not be of any concern.

1 Like

I agree that this is a good way to see the situation and it is very pragmatic. The worst case scenario is someone using your farm all of a sudden and needing much bandwidth, not having it on time then moving on to another farm.

This is why it’s kind of tricky to know in advance what will be the requirements in terms of bandwidth.

For someone who doesn’t want to go into a DC, it is essential to know how much 3nodes one can install with the most ISP bandwidth they can get.

I feel like the purpose of threefold is spreading up the node to be people internet. If at the end everyone puts their server in a data center that destroys the purpose of threefold. Just a thought,

1 Like

I partially disagree. I think the project is largely about the people hosting the services that run the internet instead of Amazon and Microsoft. They have proven themselves to pull things down due to popular opinion.

Having a decentralized network of nodes is an important part of the project as well and I do operate a network of small, at home, servers with friends and family as well. But even us having servers in a datacenter is more decentralized than you would think. Even a moderately sized city can have many datacenters. Jacksonville, FL has at least 5 available, and that is not a high tech place. While I couldn’t easily find how many DC’s Amazon runs, I think its only about 50 in the US. That’s pretty centralized given its market share.

3 Likes

I think you guys, @noretreat and @FLnelson, are in an interesting discussion.

I would agree that I thought at first Threefold would be against data center in general but lately I did read and heard the team explain how it does have great advantages.

Indeed there’s a world of difference between one giant Amazon DC and many smaller DC run by the people.

I choose personally the route of having servers elsewhere than DC. Maybe I’ll change my mind when cultivation is kicking!

I’ll be glad to hear what you and others have to say on this topic.

1 Like

Interesting indeed. ThreeFold is pushing (a part of) the internet to go back to it’s roots: by people for people. This means that capacity “production” is not in the hands of a few large providers but can be done by anyone who wishes to be involved. How they deliver their capacity depends on what they can and cannot do.

we truly believe that local produce (like everything else, food being a good example) in general is better for the world and therefore want this internet capacity to happen everywhere, close to use cases and IT capacity consumption.

How this is done is entirely up to the people that opt in to produce. A rack in the basement or a rack in a datacenter, servers in the attic, or a server in a mobile telephone mast or windfarm. Altogether this presents a much more sustainable and equal way to create IT (internet) capacity than a few hyperscale datacenters servicing customers globally (thus far away from consumption and putting a lot of string on networks everywhere).

So capacity production local to consumer needs and by anyone. :sunflower:

3 Likes

Hey guys,

Just to complement our discussion here, I’ll add some things.

In another post, @kristof made some good points and we came up with a draft of an equation to know the bandwidth needed:

The general equation is:

min_bandwidth = bandwidth_per_TB-HDD * Qty_TB-HDD + bandwidth_per_TB-SSD * Qty_TB-SSD +
bandwidth_per_GB-RAM * Qty_GB-RAM

Taking into account what @FLnelson and @weynandkuijpers said with HDD and SSD, we could have

750GB HDD storage per 10 mbps --> 13 mbps per 1TB HDD
200GB SSD storage per 10 mpbs --> 50 mbps per 1TB SSD

This would give:

min_bandwidth = 13 * Qty_TB-HDD + 50 * Qty_TB-SSD +
bandwidth_per_GB-RAM * Qty_GB-RAM

For the sake of exploration, let’s say a 32 GB ram titan needs a 10 mbps connection, we’d have 0.3125 mbps per 1 GB of ram. The equation becomes:

Equation (1): min_bandwidth = 13 * Qty_TB-HDD + 50 * Qty_TB-SSD +
0.3125 * Qty_GB-RAM

Say we have a 3node with 128GB of ram, 10TB HDD and 2 TB SSD. We’d have:

min_bandwidth = 13 * 10 + 50 * 2 +
0.3125 * 128 = 273 mbps

Doesn’t it seem too much bandwidth for a 128GB/10TB HDD/2TB SSD 3node?

Let’s say we cut in half the mbps needed for HDD and SSD. We’d have 156.5 mbps needed. Does this make more sense?

It’s clearly not rocket science here but this kind of equation can give us a good idea of what could be done.

I remember that we said 1 mbps would be the minimum for a Titan which is 32GB ram/0TB HDD/1TB SSD.

With the above equation (1), a Titan would need 60 mbps. Clearly 60 > 1. So we are oversizing bandwidth in this regard.

Let me know what you guys think!

With equation (1), basically we just need to find fitting ratios and it could work pretty well.

1 Like

Do you know if DC’s let you use your own hardware or will you have to use theirs or upgrade?

1 Like

They can offer either. Some DC’s will rent servers out, but this isn’t going to be very economical. Many data centers offer colocation aka colo, where you can put your server in the datacenter.

1 Like