Storage / Bandwidth Ratio

Mik · June 25, 2022, 2:45am

Thanks for all this information.

So this new calculation does not take into account the CPU? Only storage is taken into account? (SSD + HDD)

FLnelson · June 25, 2022, 12:37pm

Yesss, an Austin Powers reference.

scott · June 27, 2022, 8:44pm

Sure thing. My hope here is that we can circle in on a reasonable “final answer” by gathering as many reference points as possible. To be clear, this is just another perspective for consideration.

I haven’t considered CPU or memory in any of my analysis or proposals. My approach has just been to keep things simple and stick to the original question of a storage to bandwidth ratio.

Is it worth the extra complexity to consider compute resources as well? I’m not sure. Many farmers are already sizing the SSD in their nodes to optimize their CUs, and I’d bet that compute capacity is already roughly correlated with SSD. The idea of having SSD as a bounding factor in the CU formula is that the useful capacity of a node does not increase solely with more RAM and CPU cores. In the same way, I’d think that the bandwidth it requires doesn’t either.

weynandkuijpers · June 29, 2022, 9:48am

Love to combine forces and do this together (open to anyone who wants to help here). @scott is probably miles ahead of me but I have a little “project” starter which is:

‘Dockerfile’ to create local testing facility on docker
export final QA’s dockerimage and import into the hub.grid.tf
terraform script that deploys this flist with “programmable” init.sh script to start (or not start) certain binaries based on environment variables

I’ll create a github report for (starting point for other projects as well) and start working on it.

colossus · July 6, 2022, 8:59pm

A caveat to consider when polling data from farms about their available bandwidth:

I’m currently subscribed to a 20x20Mbps package on dedicated fiber. I have 10Gb core routing/switching and can turn the fiber pipe up to as much as 10Gbps to match, but there’s no point in doing so while the network essentially idles.

For potential customers shopping the grid, would we farmers be able to add a note saying “Bandwidth currently is set to X, but capable of turning up to Y”?

sigzag · July 6, 2022, 10:03pm

I think what is missing here is the concurrency factor. Think of any server/service you have access to that provides you bandwidth at +1Gbit/s. That’s just above the state of the art. Apart from the quality of bandwidth should address the amount of storage more than the workload that might be deployed in the future. In my opinion providing a bandwidth of +1 Gbit/s for hosting services (except massiv data transfer scenarios) is not necessary. Depending on how much nodes are behind that uplink (considering the concurrency factor) you should be ready to turn up the pipe for more bandwidth. This should be monitored and increased when necessary of course. I mean same situation here… walking on 10 Gbit/s SFP+ behind the uplink but far from providing that same speed in uplink bandwidth for now. That would take thousands of cash per month. Just doesn’t make sense at the moment. That’s the same problem with providing public IPs in advance of utilisation. We have to find a way for better efficiency and that would need monitoring and thresholds to be defined in order to address this. Overall it’s kind of a chicken and egg problem. Does not providing HQ bandwidth and capacity avoid users from using the grid? I don’t think so as long as we meet present standards.

sigzag · July 6, 2022, 10:02pm

Don’t know how much capacity you are providing. Also don’t know what you are referring to when talking about 20x20Mbit/s. You have a 400 Mbit/s uplink? Not to bad… but I would go for 1Gbit/s. Should be good for plenty of nodes.

colossus · July 6, 2022, 10:39pm

20Mbps symmetrical. 20 up and 20 down. I’ve got 44 active nodes and about to bump up to 80. But with the network idling right now, all 44 barely eek up to 600kbps of continuous traffic. Like you said, no need to crank up to 1Gb at this stage. It would be a waste of an extra $1k per month.

Dany · July 7, 2022, 7:58am

44 nodes behind a 20 Mbps connection? really? Well… I don’t know what are the specs of your nodes… but for me this looks like not enough bandwith by far. Let’s asume your nodes have at least 1TB storage installed (44TB in total). It would take over 200 days to transfer data to your disks. That calcualtion would be even very much simplified. Your capacity would just never be used and I guess that’s not what you are aming for, right?

If I would be a user and would deploy workload on your nodes and then realize that there is a lack in proper bandwith I would not reach out to you, ask you to increase it and wait until it’s done. I would simply move to another node which has better quality/quantity in bandwith.

colossus · July 7, 2022, 7:18pm

Again, I’m only at 20x20 temporarily because there is no demand on the grid for services. Dedicated fiber is expensive. If I’m only moving less than 1Mbps at present so my nodes can tell the grid “Hi, I’m here” then why would I bump it up to 1Gb and spend an extra $1k to do so? I can crank it up instantly with a phone call when the demand is there.

So circling back to the whole point of my original post, if there’s a bot that polls and posts available bandwidth for a farm, this could be an issue for people like me where it may seem like I’m underpowered. When in actuality, I’m ready and willing to provide much more and just being smart with my OPEX.

Mik · July 7, 2022, 7:36pm

I think this way of thinking makes sense as of now.

What would be problematic is farmers not having enough bandwidth in potential and thus diminishing the TF Grid’s overall quality. If you can upgrade your bandwidth when needed, it seems fair and convenient.

I have the same thinking as you. I can upgrade my bandwidth anytime it becomes needed and I can go way above what the max TF Grid use cases would ask on my farm.

colossus · July 8, 2022, 3:03am

Absolutely. I’m sure this is the concern sigzag and Dany are aiming at. And I wholeheartedly agree with you all. You can have all the compute power in world, but if you’re not pushing through a capable pipe then it is all for naught.

Another caveat to add that I just learned from my fiber rep: The OPP equipment my particular dedicated fiber provider(Spectrum) deploys comes in 2 flavors. The first-tier hardware can pass up to 2Gbps. If I need beyond 2Gbps, they have to come out and swap the OPP hardware for the next tier that can go up to 10Gbps. I’m trying to sweet-talk them into installing the second-tier OPP now so that if demand goes beyond 2Gbps, the hardware is already in place.

Normally, they do not install the second-tier hardware unless the subscription is already in place. There’s an extra cost in testing the fiber backhaul for the area, certifying it can connect to the desired speed, ordering and configuring the new OPP, etc. Which I completely understand. However, this entire process they tell me takes 30 days to go through all of that and be up and running from tier 1 to tier 2. I made them aware of the 36-hour SLA for TF nodes and that this would be an issue leading to a big loss of revenue and reputation.

Fingers crossed that they give me tier 2 hardware now. Who knows if I would ever actually need to go beyond 2Gbps anyway once there’s real-world traffic. But this is food for thought for anybody else in the community that may be looking at Spectrum dedicated fiber in the US as an option.

For reference since a few have asked, my collective farm resources are:
176TB SRU
96TB HRU
14.1TB MRU
1,760 CRU

Once I move into my new location at the end of July and turn up an additional 36 nodes, collective resources will be:
320TB SRU
168TB HRU
25.6TB MRU
3200 CRU

scott · July 9, 2022, 3:11am

Absolutely. I think you’re actually ahead in terms of having a reproducible method that’s compatible with scripting

I’m thinking a little Python program which:

Determines a list of nodes to query, by checking which are currently online according grid proxy or graphql
Runs said Terraform script for each node
Collects all data and spits out a csv or something

Functions that could be included in said flist:

Bandwidth test
Passmark CPU benchmark
Log public IP
…?

This is something I’ve been thinking about lately, because quite a few farmers are in the same situation. I think it’s silly for farmers to pay for large amounts of bandwidth that will sit idle, but we need to be careful about a couple things:

Paying tokens to farmers who either don’t actually have the bandwidth upgrade available or never intend to actually turn it up. Only a human can provide any kind of verification about what kind of future bandwidth is available at a site. It’s one thing to do this for certified farmers, but a big job for all farmers with similar plans
We would need to open lines of communication between farmers and deployers. Maybe something like a “farmers directory”. This could include information about a farmers hardware, where it’s located, why their setup will be reliable, etc.

ParkerS · July 12, 2022, 5:46am

Dropped some cat 6a, 3049 should be able to hit 2500 if you have an appropriate device for the other end of the connection.

ParkerS · August 5, 2022, 10:33pm

For the benefit of all involved in the network testing, i have deployed some iperf3 endpoints at major sections of the grid

162.205.240.132:1337 Tulsa node 3049
185.69.166.147:1337 Lochristi node 8
162.205.240.133:1337 Tulsa node 3081
45.156.243.1:1337 Vienna node 298
195.192.213.4:1337 Salzburg node 334

i was able to hit 1.6 gbps between vienna and tulsa

EDIT: i had to take these down as they were drawing an inordiante amount of tft to keep up, ate almost 300 tft in 2 days at a .4 tft per hour. the math doesnt make sense.