Thoughts on TFgrid and self hosting - is reliable storage possible on one node ?

naturecrypto · November 16, 2020, 7:16pm

Hi there,

I’m starting this thread to verify with you guys if my idea is possible on the TFgrid : enthusiast techs like me tend to get back governance of their data by self hosting basic services to replace Google suite, like a mail server and a Nextcloud instance to manage digital life (agenda, contacts, file storage, collaborative editing, project management…)

The idea would be to host them on a farm node and rent the rest of the capacity to people who would need it. With modern end user hardware, it’s easy to create a 16 cores 32 threads AMD box with 128 GB of Memory and a lot of storage

Hosting your own services on your farm is as easy as renting your own capacity. But workloads like Nextcloud or mail servers need ultra reliable storage. It actually is possible with the S3 service, which is usable by Nextcloud, but to create your own S3 storage service, I think you need a lot of nodes to get the 16+4 segments described on this page (https://manual.threefold.io/#/architecture_storage) which is not possible for small self hosting farmer like I would like to be !

Is there not a possibility to join an actual S3 cluster by adding the same storage capacity on an actual unique self-hosting node ? Or would it be possible to create a simple RAID-5 on the HDD to ensure that the data will not be lost when a HDD goes dead ? I’d prefer the effectiveness of the dispersed storage solution, but if RAID 5 is the only solution in my case I’d be glad to use it if the functionality exists…

Without this kind of reliable storage on a unique node, I think self hosting is a no go on TFgrid, which is a shame because I think it could bring a lot of enthusiast techs like me to become farmer on the grid, and accelerate a lot the geographic diversity of the grid !

I can also imagine myself to promote my node on local scale so people could host
small workloads near their actual location, hardware and monitoring managed by someone they know - all thanks to the hard work you guys are doing with threefold tech

So what do you think ? Is it a good idea ? Is reliable storage possible on a one node farm ?

Thanks for your answers !

azmy · November 17, 2020, 12:56pm

Hi @naturecrypto

I will try to answer all your questions, hopefully it’s gonna be useful for you and others

You can perfectly rent your own capacity, the other capacity can be rent to users as well. So you got this part right.
The node like any other physical node is prone to failure due to power failure, or hardware failure (desk failure, memory, etc…) so hosting all your service is one node won’t be 100% reliable.
For an S3 solution that is also a possibility, but it does not have to be 16+4 a simple solution as 4+1, 3+1, or 2+1 are perfectly fine, but in those rations you can only lose up to 1 node out of your total node. But anyway, this won’t work (again) on a single node!
We don’t support RAID5 at the moment and it’s not on the road map, the HDD on a node are only available for 0-db namespaces. (those namespaces are what we use as backend for our s3 solution which is based on MinIO.
Theoretically if you have multiple disks (say 4 disks) then you can use a 3+1 data/parity ratio for minio and you become reliable against disk failure (but not node failure) but unfortunately right now there is no way to choose on which disk the namespace is going to get deployed. Note that this setup is similar to RAID5.

Just a suggestion, instead of building a farm that is only one BIG single node, it would be better if you build multiple “smaller” nodes this can be more economical and easier, and then you can have S3 solution installed to span all your nodes which will be reliable against both disk and node failure.

naturecrypto · November 18, 2020, 9:52am

Hi @azmy, thanks for your reply !

I understand your point and I agree that in a farm creation for capacity rent optic it’s better to get at least 4 nodes to get redundancy.

My optic is more oriented on self hosting and I think that threefold could be a very good building block for this. In my opinion, building a “big” (note the quote, we’re speaking about an only one socket machine) has a lot of advantages : less hardware investment (only one PSU, only one motherboard with remote management (the cheapest I found in France is this one https://www.ldlc.com/fiche/PB00314723.html which still cost 245 €), only one chassis and cooling system… Less heat to dissipate due to big CPU in 7nm, less electricity spent for the same capacity than 4 small nodes… and so on). Less disk is also less hardware maintenance to do as the chances for one of your disk to fail are lesser when you have less numbers.

You’re right, I won’t be node redundant, but from my experience hardware failures are pretty rare except for HDDs, that’s why it’s mandatory for me to have a redundancy mecanism for HDDs not to lose my data !

Theoretically if you have multiple disks (say 4 disks) then you can use a 3+1 data/parity ratio for minio and you become reliable against disk failure (but not node failure) but unfortunately right now there is no way to choose on which disk the namespace is going to get deployed. Note that this setup is similar to RAID5.

That would be exactly what I need ! Do you think you could implement this feature ? It would be the perfect start point for self hosters like me ! I’d be glad to test it on test or devnet ! I’d be really grateful for that because I do really want to use threefold tech, and who knows in the future extend to a more complete and more standard farm

azmy · November 18, 2020, 11:36am

I can create a feature request but I really doubt this will get accepted for the following reason:

How zos decide which disk to use when creating a namespace is based on many factor including size, and also power consumption. the node will make sure to avoid spinning up as many hard disks as possible, and will make sure to fill up one disk before spinning up another one. That’s part of zos philosophy to be green (in terms of power consumption)
Allowing the user to pick which disk a namespace is deployed on, means that the user has the knowledge of the internal hardware setup of the node. which is not true, and also will never be exposed in the future. the node capacity is exposed as a single storage capacity number (SRU, and HRU) for SSD and HDD.

naturecrypto · November 18, 2020, 1:30pm

I understand your point. Still, the feature could be easy to implement : during the workflow, if the user selects only one node for the S3 storage destination, you could show a warning saying that it is not a recommended configuration and permits the creation of the storage on the different disk layers available on the unique node.

The user would not have to select the HDD, the workflow could check that there is enough disk and free space available for the S3 workflow creation.

With this feature self hosting farmers could go live on threefold ! And this would lower the initial investment cost of creating a hosting solution with resilient storage inside a specific farm.

Thank you for considering this

azmy · November 18, 2020, 1:50pm

The problem is the “workflow” is just a client to the grid. It only pulls what primitives provided by the grid to build a working solution. Let me clarify:

Grid provides few primitives some of them are
- zdb namespace (persisted append-only key-value store)
- a volume (is like a disk partition that u can attach to a container
- a container (well, like docker)
The S3 solution workflow uses the grid apis to pull together a solution so what it does is that
- create as many zdb namespaces as needed (in your case let’s assume all on one node)
- create a volume for S3 metadata (will be attached to minio container)
- a container that runs our modified minio version that uses the configured zdbs

The work flow knows about how the solution is pulled together, but the grid does not. for zos there is no relation between any of the provisioned primitives, hence it can not decide how to optimize for your solution. And the current grid API can’t (and will not) allow the user to choose which disk a namespace should use.

naturecrypto · November 18, 2020, 4:10pm

Maybe I am mistaken but the number of nodes in a capacity pool is shown so it should be accessible by the S3 storage creation workflow when we select the desired capacity pool ? The number of nodes in a capacity pool isn’t a primitive provided by the grid ?

Blockquote And the current grid API can’t (and will not) allow the user to choose which disk a namespace should use.

I get this. But as a farmer renting his own capacity, I know what hardware I’ve got underneath. For example if I have in my node 4 1TB HDD and 1 256GB SSD, and I rent a capacity pool of 2 CU (2 vCPU 8 GB RAM) and 4 SU (3200 GB and 160 GB SSD), zos should be obliged to create the zdb namespaces on the 4 available disks isn’t it ? It should be predictable one way or another…

Edit : It’s difficult to know what are the real reservations made for 1 SU. The wiki states (https://wiki.threefold.io/#/cloud_units) :

1200GB HDD to get 1TB of usable space
On the storage part of the page (https://wiki.threefold.io/#/cloud_units?id=storage) we can see 300 GB of SSD reserved for 1 SU
On this part of the page (https://wiki.threefold.io/#/cloud_units?id=storage-unit-ssd-storage-su) it states 100 GB

The manual states (https://manual.threefold.io/#/3bot_capacity_new)

More precisely, 1 CU corresponds to 1 core and 4 GB of RAM. 1 SU corresponds to a storage capacity of 800 GB of HDD plus 40 GB of SSD (definition until end of 2020).

What are the real reservation values ? I’m kindof lost here

scott · November 25, 2020, 7:31am

Hey @naturecrypto,

I’ve been thinking along these same lines: I’d like to host some services on my own node while avoiding a single point of failure. I’m personally leaning towards adding redundancy by also renting capacity on one or more nodes outside of my (single node) farm. The reasons for this include wanting my website to remain up even if I had a temporary network outage and also wanting data backed up in different locations in case of a local catastrophe like fire or flood.

That said, I understand your desire for redundancy within a single node and hope I can lend some insight. The info on SUs is indeed confusing. I know a change was made recently to better reflect the current market price ratio between SSD and HDD, and it seems that didn’t get fully integrated into the wiki. As for that bit from the manual… not sure. I’ve only played with capacity pools a bit, but my understanding is that those SUs can be used on either SSD or HDD. Also, the capacity pool doesn’t claim the underlying resources until a solution is deployed.

If I understand your idea correctly, you’re thinking it should be possible to strategically reserve several storage blocks on your node so that they end up sitting on different drives, based on your knowledge of the drive sizes. This seems unlikely to be possible. You’d need to know the precise amount of space occupied by zos and then align your reservations precisely with the drive boundaries so there’s no overlap.

Last I heard it is possible to run zos as a VM, albeit that means forfeiting farming revenue (VM nodes would still generate cultivation income). You could run four zos VMs on your server and achieve redundancy that way. However, it would probably make more sense to farm and use some of the farmed tokens to rent space elsewhere.

naturecrypto · November 25, 2020, 4:46pm

Hi @scott ! That’s nice seeing your reply on this thread ! Your post Farming FAQ was really helpful

Blockquote you’re thinking it should be possible to strategically reserve several storage blocks on your node so that they end up sitting on different drives, based on your knowledge of the drive sizes. This seems unlikely to be possible. You’d need to know the precise amount of space occupied by zos and then align your reservations precisely with the drive boundaries so there’s no overlap.

I thought I read somewhere ZOS was running in RAM, but I’m probably mistaken
But yeah the idea was something like that ! If ZOS is indeed living in RAM, the capacity of brand new unused HDD is easy to get. The main question is how zos is dispatching the namespaces on the available HDD.

Blockquote How zos decide which disk to use when creating a namespace is based on many factor including size, and also power consumption. the node will make sure to avoid spinning up as many hard disks as possible, and will make sure to fill up one disk before spinning up another one. That’s part of zos philosophy to be green (in terms of power consumption)

Based on what @azmy is saying, the namespaces are probably placed on the same HDD until it’s filled up. I get the energy consumption point but I still think that it’s a big penalty from an IO perspective. If this is the case, indeed my idea couldn’t work, I’d be glad to know the right formula of namespace placement on the available storage !

Blockquote I’m personally leaning towards adding redundancy by also renting capacity on one or more nodes outside of my (single node) farm.

I’m not sure this is possible with the current S3 minio solution, I think the nodes must be placed on the same capacity pool. Need to check though.

Blockquote Last I heard it is possible to run zos as a VM, albeit that means forfeiting farming revenue (VM nodes would still generate cultivation income)

I confirm you can, my testnode is working in a KVM VM and seems to work OK (even if I didn’t succeed to expose my solutions to the net, see my other post here (Guidance needed on solution Expose - no gateway available (following Hugo website tutorial))

I could indeed create a RAID 5 and a big VM with zos inside, but It would be a shame to miss the farming opportunities although I’m providing much of my capacity to the threefold network…

scott · November 26, 2020, 1:31am

Great to hear that you found the FAQ helpful

You could very well be right about zos living in RAM. It gets downloaded at boot and doesn’t need to store state, so I don’t see why not. The only reference I could find to zos using disk space outside of deployments is this part about the storage module keeping a cache. It took actually digging into the code to confirm that lives on disk, but it appears that it does, as long as there’s a disk available.

I’m not familiar with S3 or how it’s implemented on the grid. I do know however that providing distributed self healing storage is a major goal behind building the grid, so I expect there is a way I know that TF has plans to provide these kinds of end user experiences on the grid in a way that’s easy to access for less technical folks, aka the “Digital Twin”. My guess is that those products will achieve data security/redundancy by using geographically distributed farms.

In my experience, farming has provided enough income that I really wouldn’t mind spending some TFT to secure my data elsewhere. Given how cheap hard drives are getting, setting up a NAS off grid to act as a backup might be a reasonable alternative, even if you’re losing the benefits of RAID.

naturecrypto · November 26, 2020, 11:27am

@scott, I’d be really grateful if you could share the details of your farming experience if you don’t mind (here or on MP) :

How much farming revenue do you get by month (I saw your post on your farming setup, nice indeed for the price ) ?
Are some people renting capacity on your server or the only revenue comes from farming ?

If farming revenues are enough, you’re right, I could rent some S3 space on another farm for backup. I’ll think about it depending on your answer

I would still prefer to host my data on my own server with RAID 5 like security, but well, if it is really not possible…

zaibon · November 28, 2020, 3:30pm

ZOS indeed mostly run on ram but like @scott said, it also reserved a piece of a disk to store some internal files and also the cache for the container image data.

At the moment, the strategy taken is to try to fill a disk as much as possible before moving to another one. The reason for this is that the node can then fully shutdown the disks that are not used and save on power consumption. For farmer with server with a lot of disks and low usage, this can make a big difference on the electricity bill.

It’s does not really have any impact on IO. The reason being that for HDD, the only process that have a write access is the 0-DB running on top. And this is the case if there is 1 or 10 namespaces in the 0-DB. Since 0-DB writes data in an always append style, it avoids having too move the write head on spinning disk too much and thus get the best write performance.

It is not. You can use 0-DB from any node from any farm in your S3 deployment. And this is even recommended if you want to have a solid resilience and security against disaster in one of the farm you use.

Right again, that’s the idea

scott · November 30, 2020, 11:59pm

Sure thing. I don’t mind sharing here, especially as these details are semi-public. It’s possible, though certainly not easy, to see farming revenue being distributed by browsing the Stellar public keys on the capacity explorer and then referencing transactions via a Stellar ledger viewer. My farm, with that single node (very nice, I’d say ) , has been generating ~1500 TFT per month in farming revenue. I see only one transaction suggesting “cultivation” income from someone using my node and it’s a very small amount. Then again, I’ve also enabled the option to accept FreeTFT so it’s possible there’s more usage I’m not seeing, as I’m pretty sure that the FreeTFT are burned rather than deposited to the farmer.

Another neat possibility I’m seeing would be organizing a kind of self hosting coop, with all members spreading data across each others nodes. With such an arrangement, the only real expense would be the small percentage that the TF Foundation keeps to promote Grid development.

naturecrypto · December 4, 2020, 11:19am

Hi @scott,

thanks a lot for your answer ! For sure this kind of revenue woud allow 3 reservations of 1 SU for S3 redundancy

Another neat possibility I’m seeing would be organizing a kind of self hosting coop, with all members spreading data across each others nodes. With such an arrangement, the only real expense would be the small percentage that the TF Foundation keeps to promote Grid development.

I really love your idea ! If my technical problems on testnet are solved and my tests are successfull, let’s create that together !

naturecrypto · December 4, 2020, 11:25am

Thanks for these informations ! If I get it correctly, HDD is oriented for archiving purpose in 3F, is it correct ? That would explain the way you write data on the disks. And all other purposes are meant to be stored on SSDs. How is data written for SSD, is it different ?

scott · December 8, 2020, 7:57pm

You are welcome

I’ve got an eye on your other thread too, but please do keep me posted on your progress. I’d love to chat more about how we can start utilizing the grid in a cooperative maner.

Mik · June 25, 2024, 4:01pm