RAID Support is a must if we want server HW on Threefold

If we really wanto to create an AWS competitor, we need to have RAID support on ZeroOS, this way it would be easier to have Servers farming on Threefold, without RAID support it seems that we are creating a Cloud of Desktops, I think its important to make sure that a good part of this “cloud” is made os server grade HW. Why isnt there support for RAID?

We’ll I could be wrong but you may get raid like functionality by utilizing different nodes together. Quantum safe file system I believe it’s called

HI @farmingtech. RAID is a technology that has brought resilience and security to the IT industry. But it has some limitations that we at ThreeFold did not want to get stuck in. We developed a different (and more efficient way to store data reliably. Please have a look here.

This Quantum Safe Storage overcomes some of the shortfalls of RAID and is able to work over multiple nodes geographically spread on the TF Grid.

2 Likes

HI, Quantum Safe seems to be a good technology, but doesnt really replace RAID, i think it works more like a geografical “copy”, RAID is resilience and uptime for the node.

Most servers will be used, with used Disks, with RAID we can mantain uptime a data on the server even if disks fail, we can also improve speed .

Quantum Safe Healing, will have its drawbacks, RAID decreases the need for said healing, in thousands of servers, there will allways be some Servers with bad Disks.

Its of course a choice to no include the RAID drivers, but having to disable a key componente of a server doenst seem to be the best way for geting the trust of the end clients.

It also lowers the use of servers on the grid, right now its easier to put a workstation working on threefold than it is to put a good server, everyone will keep the servers for other projects, this seems counter productive for the decentralized cloud.

You can still use the on board storage on a server without RAID. You can reflash the RAID card, turn on HBA mode, or install a different card. No need for RAID and lots of servers running on threefold already.

yes, RAID can be disabled, that is not the main issue, its more about the fact that we are disabling one of the core features of a server, a feature that exists mainly for reliability purposes.

A comparison could be made with a UPS Power Backup, if for example Threefold didnt support UPS because Quantum Safe is resilient enough, it doens mean that a UPS doesnt increase Uptime and avoids the need for healing of data and boot another node/server for the same service.

maybe i dont get the point, but one of the key features of TF is that the data is stored not just in your server. If a raid comes in we have again some IT-maintainance at server-level which we dont want i think. raid adds everytime some risk of loosing data if the raidcontroller doesnt work anymore or other problems. if one disk has an error you have to replace it… without raid it doesnt matter if a disk is broken, the data is saved somewhere on the grid.

or do you mean you want to use the server raidcontroller? Then use JBOD, i have had problems with IBM Servers. but after research and adjustments of the controller i managed it to work with jbod.

True, but we follow a different path. In IT server redundancy has been regarded as very important. May companies have innovated and created all sorts of technology to make servers redundant and “always up”. HA clusters, RAID controllers, redundant network and power etc etc. All good and well but what is not helping here to get this single (or double, cluster) server up and running is all of the complexity and costs that it brings. Not just in capital to buy all of it, but also the licenses and maintenance contracts, plus the skilled engineers to maintain it.

At ThreeFold we want to build resilience and redundancy for end user user cases and services but not at the hardware / server level. Data processing and storage should be reliable but not by making servers redundant. Servers (3nodes) can stop working, or network, or power and then the service / application should continue to run because the “Smart Contract for IT” can contract other capacity and restart that part of the architecture that failed with that single server.

It’s a very different approach, simple on the hardware requirements (that’s why anyone can own and participate in creating this compute and storage fabric).

6 Likes

Amazing answer. That’s really good to know. I added this, quoted, to the FAQ.

Well said. I added what you wrote here as a new question on the FAQ.

Im not saying to make RAID mandatory, but to include the drivers on the linux boot system, so that if available, they can be used, improving reliability, its not meant to be a cost as stated in your reply.

This will be very dificult to achieve with todays technology, when a working server is shutdown improperly, data will allways be lost, it cant be synced in real time through the internet, specialy not using home internet connections, a Server Cluster is hard enough to keep in sync in the same datacenter, it will not be feasible with home internet connections. A new Server may popop to fulfill the Smart Contract but the data consistency will be very hard to garantee. Having reliable Servers would help this.

Support for RAID and other resilience systems do not keep other people from joining, if they arent made mandatory.

Is RAID really unsupported? We advise farmers whose controllers don’t have JBOD/HBA mode to use a single RAID 0 per disk. I would think that redundant configs look the same way to Zos.

What’s certainly not supported is any vendor specific (or generic) tooling to enable RAID features like identifying failed disks without rebooting to BIOS. This would introduce significant development overhead for Zero OS, assuming said tools could be legally included in the first place.

Embracing RAID also implies more complexity in the farming model. I think most farmers would not bother with RAID unless there’s a boost in earnings. This means setting reward levels for different RAID types then tracking and reporting this within Zos. Of course RAID is only an asset if farmers actually replace failed drives in a timely manner, and it can multiply the liability from losing a single disk if they don’t. This creates additional considerations within the incentive model and for deployers when choosing capacity.

Erasure coding systems like QSS are being recognized as a rising alternative to RAID with desirable qualities in many applications. Where RAID can clearly help is making working storage space (versus archival) more reliable. SSDs already provide some reliability boost over HDD for this purpose. Having the ability to replicate data over two or more SSDs within a single node for especially fault sensitive workloads could mitigate this concern with minimal additional complexity.

My admittedly limited understanding of the cloud deployment landscape is that the trend towards containerized workloads means that applications are being developed to tolerate the failure of individual containers and even entire node outages gracefully. I think the impact of a drive failure under the current Grid model and ways to mitigate are certainly worth discussing further.

I have seen RAID as unsuported everywhere and assumed it wouldnt work, I will check if RAID is supported\working and report back.

I understand that the tools for RAID managment are not suported/included, BIOS managment of the RAID is a good first measure.

I completely agree with your point about RAID managment and disk replacemente, and increasing farmer rewards if using RAID, maybe this will be discussed later, when the network is more mature, and the requiremnts for data resiliancy are better understood.

Thank you for your insight, I think these issues will be evaluated when the Quantum DataSafe is in development.

1 Like

Wait should raid not work, I have 4 drives in raid that are detected and producing…

it depends on the server and RAID card.