What will happen if somebody will try to host illegal content on ThreeFold Grid?

trout2 · March 1, 2022, 4:00pm

We absolutely need a process. DAO can’t simply vote against taking down copyrighted material, there’s a legal requirement to do so.

Is ThreeFold Grid a Cloud Service Provider for GDPR purposes…or, ThreeFold Tech perhaps? https://www.tripwire.com/state-of-security/security-data-protection/cloud/impact-of-gdpr-on-cloud-service-providers

If a data subject asks/demands that their data is deleted (within 30 days) - who does the deletion and how?

Who is in charge of getting the consent of data subjects for their data to be held and for it to be processed and for stopping it leaving the EU etc.?

Data retention periods are very strict under GDPR - who is responsible for policing those - is this TF Tech?

Is ThreeFold considering becoming ISO 27001 compliant? Hard to demonstrate GDPR compliance without it.

If someone on the Grid is hosting copyright protected material and the copyright holder issues a take-down request / demand - Who does it get sent to? The farmer(s) who are hosting the content, the intermediary who sold the hosting solution, TF Tech?

scott · March 3, 2022, 2:42am

The problem I see is that any top level mechanism for this can also be used to achieve censorship. Giving farmers tools for controlling what happens on their own node can help protect minority political speech in the case that the DAO is attempting censorship and also allow farmers to respond to legal issues with greater agility.

No, at least not according to the terms and conditions. Farmers, however, may be.

Your post got me curious about the GDPR implications for farmers, who could fall under the data processor category. Of particular interest to me was article 28, section 1:

Where processing is to be carried out on behalf of a controller, the controller shall use only processors providing sufficient guarantees to implement appropriate technical and organisational measures in such a manner that processing will meet the requirements of this Regulation and ensure the protection of the rights of the data subject.

This suggests to me that data controllers are responsible for choosing compliant processors. Since it’s probably not currently technically feasible for farmers to achieve GDPR compliance, I think the answer for now is that data controllers who need this compliance should not use the Grid. What changes we’d need to see in the system to allow compliance is another question.

They can trace it to the public IP and whichever farmer owns it. The fact that the IP was rented from the Grid is unlikely to turn up, unless you knew where to look.

noretreat · March 3, 2022, 3:37am

Do we even have access to remove the file if they sent it to the owner of the farm?

trout2 · March 3, 2022, 10:46am

GDPR compliance is non-optional in the EU - it’s the law of the land:

Who does the GDPR apply to?

All EU organisations that collect, store or otherwise process the personal data of individuals residing in the EU, even if they’re not EU citizens.
Organisations based outside the EU that offer goods or services to EU residents, monitor their behaviour, or process their personal data.

Fines for non-compliance:

There are two tiers of administrative fines that can be levied as penalties for non-compliance:

Up to €10 million, or 2% annual global turnover – whichever is higher.
Up to €20 million, or 4% annual global turnover – whichever is higher.

governance.eu/nl-nl/dpa-and-gdpr-penalties-nl

“If you have not yet started your GDPR journey, you should prioritise tackling those areas where a lack of action leaves your organisation exposed. When an infringement occurs, demonstrating you have made a start could help reduce potential penalties.”

trout2 · March 3, 2022, 11:11am

A couple more things- the data controller isn’t usually the individual (individuals are “data subjects”). All EU residents have the rights contained in GDPR - it’s up to the data controllers and processors to protect/enforce those rights.

If, as in the t&c’s, the farmer is indeed the data processor -“You undertake to comply with any legal obligations which may possibly be applicable to you as a data processor under the GDPR and/or any other applicable data privacy regulations.” - then farmers need the tools to be able to stay on the right side of the law. Right? How can we undertake to comply with any legal obligations if we don’t have the tools to do so?

For instance:

the most famous one is Article 17 - the right to be forgotten. Data subjects have the right to have their personal data erased without undue delay (about 30 days).

If the farmer is indeed the data processor and they don’t know what’s happening on their nodes, how can they effect this erasure? As the rights relate to the individual under GDPR…someone has to take responsibility for enforcing them or they would be meaningless.

Someone has to be the controller and someone has to be the processor.

The most important / responsible person/entity is the data controller. They’re the ones that take responsibility for the data when the data subject hands it over. The data processor is any third party that acts on that data to carry out a service for the controller.

In the case of, say, an OwnCloud user - I would argue that OwnCloud is the data controller for GDPR purposes as the user gives their data to Owncloud initially.

But OwnCloud then puts that data onto ThreeFold Grid / farmer servers - using them as they would any other cloud service provider. Cloud service providers are typically seen as Data Processors:

“Typically a cloud service provider would qualify as a processor when your enterprise uses their services. The cloud service provider will process personal data, which are stored within their databases or servers, on your behalf: the controller.”

If a user put their own data onto a threefold server, I think they would individually be a data controller…but it still leaves the question of who is the Data Processor open.

I would be very keen to read any legal advice ThreeFold has taken in this regard. If the farmers truly are seen as Data Processors under GDPR then responsibilities are non-trivial and the fines for non-compliance are very serious.

The contract between the Data Controller and the Data Processor (and the obligations within) becomes really important.

If TF Tech goes out into the market and takes personal data as the “Data Controller” and then utilizes the services of farmers who then become Data Processors to store that data (storage = processing) - then…we need to think this through.

Geert · March 4, 2022, 9:20am

I partly agree on this.
If there are legal obligations for farmers, these need to be addressed, absolutely. And indeed, with regard to GDPR, there is an interesting article written here. I conclude from there that farmers are IaaS providers, hence data processor (not data controllers, as there is no view on the data !) but that the question is still one that needs further investigation, not only with ThreeFold but in the global sector. So the solution should not come from Threefold alone either but should be looked at with the full industry.
All what ThreeFold can do is provide technology that protects the farmer by design from fines, as a farmer is unable to look into the real data of the user.

trout2 · March 7, 2022, 8:27am

Thanks @Geert, I agree that the farmers are data processors under the GDPR. The question then is who is the Data Controller. Is it the individual themselves (i.e. the Data Subject and the Data Controller are the same person)? I would suggest it is whomever the Data Subject is in contract with first - that’s the first link in the chain, right? So where is that first contract, the first terms and conditions? Is it TF Tech?

Geert · March 7, 2022, 8:54am

It depends on the workload. For many workloads, the data subject is his own data controller. If only the data subject has control over his workloads, evidently it’s him. For SaaS applications running on the grid, it’s the developper / seller of this SaaS software. But all depends on who has access to the data.

trout2 · March 17, 2022, 9:14am

Something for ThreeFold Tech to keep an eye on from the legislative perspective - UK criminal liability, including prison time, for tech CEOs that don’t deal with harmful content properly

Geert · March 17, 2022, 9:47am

If I read well, it’s about “social media companies and other content-focused platforms”. ThreeFold Tech is not content-focused, even the opposite. It does not want to know anything about your data, and has implemented features to make sure data controlled by the owner are not exploited by third parties.

altsens · March 28, 2022, 2:39am

Hello Weynandkuijpers,

Could you give me more information on that topic about data ownership? I am intrigued.

What I understand so for is that I could use the Grid to power a website and store the data on it in a decentralized fashion. So far so good.

But let say Facebook decides to host its website on the grid. I’ll probably have to accept their terms of use, where it will be stated that they own anything I publish on their site. Even though they don’t own the grid, they own their website. How am I owning my data then?

Is there something missing in my understanding?

Thanks you for helping me grasping the concept

Geert · March 29, 2022, 6:55am

What is meant is that there is an alternative now where a social media app can be created with no one having a full view over the data hosted by everyone. You as owner of the data post data and share with your friends and relatives, but you don’t have an instance that has a view on all data.
Facebook won’t probably do that, it’s their business model to steal our data and get rich by selling our profiles to advertisers. With a side effect that scandals such as the Cambridge Analytica one can happen. But that enrages quite some people, and this very business model basically goes against the spirit of privacy regulations such as GDPR.

altsens · March 29, 2022, 6:07pm

Thank you @Geert

What happens concretely when I upload, let say, my profile picture on a site hosted on the grid? Let say a forum like this one.

The old way of doing it would be:
1- The user uploads to the hosting server his picture.
2- The hosting server would save the picture on it’s drives and the link in a database.

Then, when someone access the site and displays my profile picture, the site has to:
1- Fetch the link from the database and insert it in its response (usually ‘a’ and ‘img’)
2- When your browser displays the response, it fetches the picture from where the link is pointing to.

On this forum, my profile picture is pointing to:
https://forum.threefold.io/user_avatar/forum.threefold.io/altsens/45/1413_2.png

I understand that on the grid, the data is saved in a decentralized fashion. It is not the image per say that is saved but something that helps rebuilding the information. I understand this concept.

But how does the host knows which data is related to its website? How does the host serves the data when I request a page? Where, in the chain between the user and the data, does the host sits?

Doesn’t he must have some knowledge about where the data is? If so, how does it work?

I am sorry for all those questions! lol
I’ve been a software engineer for the past 15 years and I have a huge interest in a decentralized internet. I just need to wrap my head around ThreeFold’s nuts and bolts

Thank you again for you time, it is greatly appreciate!

Geert · March 30, 2022, 10:50am

You have to be aware that ThreeFold is basically offering the very low-level foundation to make this decentralisation happening.
With the Digital Twin there is a solution in the making that brings true decentralisation also at the application level, where a user is true owner of the applications he is running.
This forum is still an open-source application that is running on a shared infrastructure, but where the one launching it is full owner of the application running. So the picture will also appear on this HW infrastructure.
But with Digital Twin the identity incl. the picture you mention, is hosted in a user’s digital vault and shared with whomever the data owner chooses. Big tech aggregators are then no longer in the loop. For a picture, that is nice, but maybe not strictly required. When we talk about personal documents such as an identity, my health info, my financial data, that’s info you absolutely don’t want to see any intrusion from an unwanted third party.

sensorium · March 31, 2022, 3:40pm

I’m still not convinced that as-is it’s “fine”. Because assuming you’re farmer and hosting some content. Okay, sure, you don’t know it’s illegal, but someone somewhere will load an application which will request some data from your server which is illegal. And through traffic monitoring one can see from where it’s coming from. So that particular someone then alerts the authorities and gives them your IP or files a complaint towards your IP for hosting something illegal/malicious.

Next step, you receive that email from your provider with a threat that you need to fix it within 24 hours or your server will be disconnected. Now let’s not go into the is it right or not to censor, but let’s say you agree that whatever it is should be taken down. But as a farmer you have no idea how to remove only that particular element from your farm without wiping the whole farm.

Or I’m wrong? And there is a system where you can trace down the “issue” and blacklist a certain contract/user?

Because I was interested in bringing up a server and wanted to start to get some applications developed, then I saw this issue and it stopped me so I was looking for mentions of this and found this thread. Would love someone to prove me wrong or give some concrete solution to the problem other than “you don’t know what’s the data stored and it’s okay”. Since I know it’s not. I’m running servers for years and have occasionally received an email from my provider to take down certain content which someone uploaded.

weynandkuijpers · April 1, 2022, 5:40am

Hi @sensorium. There’s two type of storage that van de used on the TF Grid.

VM which has a virtual disk. The virtual disk is a straightforward volume on a local hard disk. Everything stored on this virtual disk is stored only on this virtual (and thus physical) disk. Delete the VM and the virtual disk and the content is gone.
Quantum safe storage. Quantum safe storage uses a “Storage Engine” that parts, compresses, encrypts and then mathematically describes the data. Please see picture below in the diagram for a description of this process.

So the first way to store and use data create a single point to point at when illegal content is discovered. Depending on the jurisdiction the farmer with that VM on his server might get that email. At this point in time there is no mechanism that allows the farmer to delete that single VM with that single virtual disk. This is a decentralized system where there is no third party that can intervene and delete a specific VM (smart contract). In this case a farmer might be forced to wipe his server. This is no different (I think) that with any of the market leading cloud providers today, they also wipe disks and servers when required by the authorities. So it’s very important when architecting IT workloads that use VM’s and virtual disks (for the ones that do not have illegal content ) to always build in redundancy ant the application level to sustain a server crash (wipe).

The second storage mechanism stores data on multiple disks in multiple farms without having any original part of the data stored on those disks. It’s a zero knowledge proof storage system.

heaps · April 3, 2022, 5:10pm

Cloud providers can definitely remove offending content without removing/impacting data from workloads from other users / clients that are compliant.

Having to nuke an entire farm node / disk means potentially removing / affecting workloads on that node that have nothing to do with the problematic data.

This really isn’t an acceptable solution, nor can we say its the same as with existing cloud providers, it isn’t.

Mik · April 3, 2022, 6:20pm

There’s two type of storage that van de used on the TF Grid.

That’s great. I’ll make a Q+A out of this with a link to your explanation. Nice.

weynandkuijpers · April 4, 2022, 4:28am

Okay - nuking a node and removing other peoples workloads and/or data is indeed not ideal. But can it be done in a different way and is it done in a different way in today’s centralized systems? Help me think this through: the smart contract for IT has in it’s simplest form one “owner” that underwrites (and pays) for the contract. In order to “undo” the contract you need to have another signer that is listed in the contract (could be the farmer) of someone else (DAO members?).

This is a sliding path that raises more questions that it answers, who is to be that second / third signer, and who decides who that second signer is and then who decides with the decision makes if to assign the second / third signer. Once you go down that path the pyramid (centralized) start to build.

And I wonder, in really bad cases I think the authorities impound the whole server to investigate which means that all the other data / workloads are lost (for a while) as well.

Those are my thoughts, how do you see partial removal working in a decentralized system?

sensorium · April 4, 2022, 10:35am

Ok, let’s dissect that.

First I disagree that anyone will impound a server unless something really really bad has transpired.

Most usual what will happen is you will get an email from your colocation/server provider letting you know there is a problem with some content on your server, or your server did something malicious(e.g. it’s part of a DoS network) and you need to deal with it within 24 hours or so and if not they will simply deny service. So if I have 10 servers running all kinds of stuff and one of these is a farmer machine, having some stupid contract I could loose hosting for all of my servers because of that. So yeah, for most people running serious hardware they won’t go into this, and anyone hosting it for fun will stop farming when they get their first email.

So, we agree that just nuking the whole server is not an option, also, who guarantees the same content won’t arrive again after restarting? So that’s not really a solution as-is now.

Double or triple signing, yes, not something we should demand. But there are 2 tweaks to this I can see.

a) while not mandatory, a 2nd or 3rd signee could raise the level of “trustfulness” one can give to a contract, maybe even give cheaper hosting to such(optional implementation if there is desire), as there is less risk of “having to deal with issues”. Also additional signees will likely want to have some transparency about the content and or the app, so likely the application itself will be more public, e.g. their public endpoints will be known, thus allowing anyone to inspect what the application appears to be doing. So, more public trust == good thing. This still allows the people to run a 0 signee contract on their own risk(for all parties), especially in the beginning.

b) there should be a way for a farmer to deny certain contracts/issuers of hosting on their farm(s), hence allowing you to say “get off of my lawn”. Which right now if I understand right the farmer has no such ability. Basically a hosts.deny kind of a thing. Actually a good possibility would be contracts.deny contracts.allow method, which can be * as default, but subsequently would allow someone to pick their contracts. Which is also freedom, freedom to choose with whom you do business with, plus it protects the farmer in these edge cases of complaints so they have a tool to ban certain contracts/issuers.

Now, obviously if it’s something contested (not something simple as CP), there will be people who will support such content so it’s a game of whack a mole for the people wanting to take it down.

But without such a mechanism there is a real adoption problem of the tech. And I would like to support this and host it, and possibly also bring some developers to help with this. But I would like to know if there is a will to get something in place which will shield people who do honest farming.

For this to work you would need to be able to:
a) identify which contracts are running on your farm(s)
b) identify which contract was active at the time of the issue, so you are able to blacklist it.
c) identify applications which use your contracts so you can trace the illicit behaviour/content.

a) in my mind is simple and you guys already probably have it or can be added, b and c are potentially tricky, depends on the inner workings of TF.

Potentially you could just ban all contract currently on your server, and if they displace to other servers, while we play a game of whack-a-mole after a few “local bans” the suspicious contract could be identified. So a simple solution would be “button” to: publish a complaint against all contracts on my machine at the time of x(time of the infraction you got, usually there is a timestamp, but even “right now” will probably also be okay?), and ban them from being used on my machine(s) (probably people will want to ban a contract on a single machine but across all their machines, so it won’t just migrate to a 2nd machine in the cluster).

This could be implemented so that the contracts migrate gracefully?(depends on volume obviously), but they all get a +1 on the ban counter, the farmer can truthfully say he did all what he could do to fix the problem (usually you have to say you made sure the issue will not repeat when you get such an email), and if the issue happens to a few more farmers with the same contract… we’ll quickly identify the bad one, as most likely all contracts would go to different machines, so chances of the same contracts getting blacklisted again are slim. Now certainly, with age certain contracts might accumulate a number of these “infractions”, but with time one can probably identify the real problematic ones as they will get. e.g. 1 ban/month, vs 1 ban per year for the accidental ones.

Then again, the farmer could have control to say how comfortable is he with the # of bans to applications per e.g. month. E.g. 0.25 would be one ban per quarter.

Maybe this (probably a bit more fleshed out) could work?

I know this goes against what @heaps said, and what I also said, but assuming it’s difficult to identify offending contracts (which it probably is), this whack-a-mole-counter could be a gray solution.

Depending on how difficult of an implementation would we like it to be, this could be paired with signed/allowed/denied contracts so if you know certain workloads are okay(your own?), you could ban selectively. In any case something to think about. Thoughts?