What will happen if somebody will try to host illegal content on ThreeFold Grid?

Hello Weynandkuijpers,

Could you give me more information on that topic about data ownership? I am intrigued.

What I understand so for is that I could use the Grid to power a website and store the data on it in a decentralized fashion. So far so good.

But let say Facebook decides to host its website on the grid. I’ll probably have to accept their terms of use, where it will be stated that they own anything I publish on their site. Even though they don’t own the grid, they own their website. How am I owning my data then?

Is there something missing in my understanding?

Thanks you for helping me grasping the concept :slight_smile:

What is meant is that there is an alternative now where a social media app can be created with no one having a full view over the data hosted by everyone. You as owner of the data post data and share with your friends and relatives, but you don’t have an instance that has a view on all data.
Facebook won’t probably do that, it’s their business model to steal our data and get rich by selling our profiles to advertisers. With a side effect that scandals such as the Cambridge Analytica one can happen. But that enrages quite some people, and this very business model basically goes against the spirit of privacy regulations such as GDPR.

2 Likes

Thank you @Geert

What happens concretely when I upload, let say, my profile picture on a site hosted on the grid? Let say a forum like this one.

The old way of doing it would be:
1- The user uploads to the hosting server his picture.
2- The hosting server would save the picture on it’s drives and the link in a database.

Then, when someone access the site and displays my profile picture, the site has to:
1- Fetch the link from the database and insert it in its response (usually ‘a’ and ‘img’)
2- When your browser displays the response, it fetches the picture from where the link is pointing to.

On this forum, my profile picture is pointing to:
https://forum.threefold.io/user_avatar/forum.threefold.io/altsens/45/1413_2.png

I understand that on the grid, the data is saved in a decentralized fashion. It is not the image per say that is saved but something that helps rebuilding the information. I understand this concept.

But how does the host knows which data is related to its website? How does the host serves the data when I request a page? Where, in the chain between the user and the data, does the host sits?

Doesn’t he must have some knowledge about where the data is? If so, how does it work?

I am sorry for all those questions! lol
I’ve been a software engineer for the past 15 years and I have a huge interest in a decentralized internet. I just need to wrap my head around ThreeFold’s nuts and bolts :slight_smile:

Thank you again for you time, it is greatly appreciate!

1 Like

You have to be aware that ThreeFold is basically offering the very low-level foundation to make this decentralisation happening.
With the Digital Twin there is a solution in the making that brings true decentralisation also at the application level, where a user is true owner of the applications he is running.
This forum is still an open-source application that is running on a shared infrastructure, but where the one launching it is full owner of the application running. So the picture will also appear on this HW infrastructure.
But with Digital Twin the identity incl. the picture you mention, is hosted in a user’s digital vault and shared with whomever the data owner chooses. Big tech aggregators are then no longer in the loop. For a picture, that is nice, but maybe not strictly required. When we talk about personal documents such as an identity, my health info, my financial data, that’s info you absolutely don’t want to see any intrusion from an unwanted third party.

1 Like

I’m still not convinced that as-is it’s “fine”. Because assuming you’re farmer and hosting some content. Okay, sure, you don’t know it’s illegal, but someone somewhere will load an application which will request some data from your server which is illegal. And through traffic monitoring one can see from where it’s coming from. So that particular someone then alerts the authorities and gives them your IP or files a complaint towards your IP for hosting something illegal/malicious.

Next step, you receive that email from your provider with a threat that you need to fix it within 24 hours or your server will be disconnected. Now let’s not go into the is it right or not to censor, but let’s say you agree that whatever it is should be taken down. But as a farmer you have no idea how to remove only that particular element from your farm without wiping the whole farm.

Or I’m wrong? And there is a system where you can trace down the “issue” and blacklist a certain contract/user?

Because I was interested in bringing up a server and wanted to start to get some applications developed, then I saw this issue and it stopped me so I was looking for mentions of this and found this thread. Would love someone to prove me wrong or give some concrete solution to the problem other than “you don’t know what’s the data stored and it’s okay”. Since I know it’s not. I’m running servers for years and have occasionally received an email from my provider to take down certain content which someone uploaded.

3 Likes

Hi @sensorium. There’s two type of storage that van de used on the TF Grid.

  1. VM which has a virtual disk. The virtual disk is a straightforward volume on a local hard disk. Everything stored on this virtual disk is stored only on this virtual (and thus physical) disk. Delete the VM and the virtual disk and the content is gone.
  2. Quantum safe storage. Quantum safe storage uses a “Storage Engine” that parts, compresses, encrypts and then mathematically describes the data. Please see picture below in the diagram for a description of this process.
    image

So the first way to store and use data create a single point to point at when illegal content is discovered. Depending on the jurisdiction the farmer with that VM on his server might get that email. At this point in time there is no mechanism that allows the farmer to delete that single VM with that single virtual disk. This is a decentralized system where there is no third party that can intervene and delete a specific VM (smart contract). In this case a farmer might be forced to wipe his server. This is no different (I think) that with any of the market leading cloud providers today, they also wipe disks and servers when required by the authorities. So it’s very important when architecting IT workloads that use VM’s and virtual disks (for the ones that do not have illegal content :wink: ) to always build in redundancy ant the application level to sustain a server crash (wipe).

The second storage mechanism stores data on multiple disks in multiple farms without having any original part of the data stored on those disks. It’s a zero knowledge proof storage system.

2 Likes

Cloud providers can definitely remove offending content without removing/impacting data from workloads from other users / clients that are compliant.

Having to nuke an entire farm node / disk means potentially removing / affecting workloads on that node that have nothing to do with the problematic data.

This really isn’t an acceptable solution, nor can we say its the same as with existing cloud providers, it isn’t.

There’s two type of storage that van de used on the TF Grid.

That’s great. I’ll make a Q+A out of this with a link to your explanation. Nice.

1 Like

Okay - nuking a node and removing other peoples workloads and/or data is indeed not ideal. But can it be done in a different way and is it done in a different way in today’s centralized systems? Help me think this through: the smart contract for IT has in it’s simplest form one “owner” that underwrites (and pays) for the contract. In order to “undo” the contract you need to have another signer that is listed in the contract (could be the farmer) of someone else (DAO members?).

This is a sliding path that raises more questions that it answers, who is to be that second / third signer, and who decides who that second signer is and then who decides with the decision makes if to assign the second / third signer. Once you go down that path the pyramid (centralized) start to build.

And I wonder, in really bad cases I think the authorities impound the whole server to investigate which means that all the other data / workloads are lost (for a while) as well.

Those are my thoughts, how do you see partial removal working in a decentralized system?

Ok, let’s dissect that.

First I disagree that anyone will impound a server unless something really really bad has transpired.

Most usual what will happen is you will get an email from your colocation/server provider letting you know there is a problem with some content on your server, or your server did something malicious(e.g. it’s part of a DoS network) and you need to deal with it within 24 hours or so and if not they will simply deny service. So if I have 10 servers running all kinds of stuff and one of these is a farmer machine, having some stupid contract I could loose hosting for all of my servers because of that. So yeah, for most people running serious hardware they won’t go into this, and anyone hosting it for fun will stop farming when they get their first email.

So, we agree that just nuking the whole server is not an option, also, who guarantees the same content won’t arrive again after restarting? So that’s not really a solution as-is now.

Double or triple signing, yes, not something we should demand. But there are 2 tweaks to this I can see.

a) while not mandatory, a 2nd or 3rd signee could raise the level of “trustfulness” one can give to a contract, maybe even give cheaper hosting to such(optional implementation if there is desire), as there is less risk of “having to deal with issues”. Also additional signees will likely want to have some transparency about the content and or the app, so likely the application itself will be more public, e.g. their public endpoints will be known, thus allowing anyone to inspect what the application appears to be doing. So, more public trust == good thing. This still allows the people to run a 0 signee contract on their own risk(for all parties), especially in the beginning.

b) there should be a way for a farmer to deny certain contracts/issuers of hosting on their farm(s), hence allowing you to say “get off of my lawn”. Which right now if I understand right the farmer has no such ability. Basically a hosts.deny kind of a thing. Actually a good possibility would be contracts.deny contracts.allow method, which can be * as default, but subsequently would allow someone to pick their contracts. Which is also freedom, freedom to choose with whom you do business with, plus it protects the farmer in these edge cases of complaints so they have a tool to ban certain contracts/issuers.

Now, obviously if it’s something contested (not something simple as CP), there will be people who will support such content so it’s a game of whack a mole for the people wanting to take it down.

But without such a mechanism there is a real adoption problem of the tech. And I would like to support this and host it, and possibly also bring some developers to help with this. But I would like to know if there is a will to get something in place which will shield people who do honest farming.

For this to work you would need to be able to:
a) identify which contracts are running on your farm(s)
b) identify which contract was active at the time of the issue, so you are able to blacklist it.
c) identify applications which use your contracts so you can trace the illicit behaviour/content.

a) in my mind is simple and you guys already probably have it or can be added, b and c are potentially tricky, depends on the inner workings of TF.

Potentially you could just ban all contract currently on your server, and if they displace to other servers, while we play a game of whack-a-mole after a few “local bans” the suspicious contract could be identified. So a simple solution would be “button” to: publish a complaint against all contracts on my machine at the time of x(time of the infraction you got, usually there is a timestamp, but even “right now” will probably also be okay?), and ban them from being used on my machine(s) (probably people will want to ban a contract on a single machine but across all their machines, so it won’t just migrate to a 2nd machine in the cluster).

This could be implemented so that the contracts migrate gracefully?(depends on volume obviously), but they all get a +1 on the ban counter, the farmer can truthfully say he did all what he could do to fix the problem (usually you have to say you made sure the issue will not repeat when you get such an email), and if the issue happens to a few more farmers with the same contract… we’ll quickly identify the bad one, as most likely all contracts would go to different machines, so chances of the same contracts getting blacklisted again are slim. Now certainly, with age certain contracts might accumulate a number of these “infractions”, but with time one can probably identify the real problematic ones as they will get. e.g. 1 ban/month, vs 1 ban per year for the accidental ones.

Then again, the farmer could have control to say how comfortable is he with the # of bans to applications per e.g. month. E.g. 0.25 would be one ban per quarter.

Maybe this (probably a bit more fleshed out) could work?

I know this goes against what @heaps said, and what I also said, but assuming it’s difficult to identify offending contracts (which it probably is), this whack-a-mole-counter could be a gray solution.

Depending on how difficult of an implementation would we like it to be, this could be paired with signed/allowed/denied contracts so if you know certain workloads are okay(your own?), you could ban selectively. In any case something to think about. Thoughts?

1 Like

Thank you for the dissection. To make it simple, let’s say a letter from the authorities (not the service provider) arrives at a farmer. What would such a letter have as information to identify that there is illicit content on the device, and how does this letter connect the device to the farmer?

Before we think in creating a technology solution let’s get the requirements clear of what we are trying to solve. Let’s brainstorm (list is not meant to be exhaustive, us the first things that come to mind):

Identify the node / farmer:
“authorities were monitoring the network traffic of the ISP of the farmer and have found, unencrypted traffic that contains something that cannot be.”

Ok - so that seems reasonable and possible. But only if the farmer provides public IP addresses. Because all traffic between (VM’s) on nodes that do not have public IP addresses to rent (so in other words all devices behind a NAT) use encrypted, tunnelled, point-to-point wireguard / planetary networking. So this cannot be intercepted (and read) by the authorities. So we are talking about a subset of farmers that are potentially going to see this.

So the letter can be addressed to the owner of the IP address as traffic from his IP address contains (unencrypted) illicit content. How do we match IP addresses to specific farmers? IPaddresses are bought and owned or rented. So in the end (with more steps in between here the owner of the IP addresses will receive this letter, and this might be the owner of the IP addresses or the person renting them.

Good - so the letter arrives. What does it state with regards to what was monitored.

  • a username of the person doing illicit online activities?
  • an IPv4 address of the application that “processed” the illicit content?
  • an IPv4 address of the VM on which the application ran “processing” illicit content.

There is really nothing more that they can see when they monitor / filter network traffic. So now the farmer/someone needs to go and see which contract is using that specific IP address.

So we have a query tool to query TF Chain, and this has a lot of functionality to query the chain. For example to query what contracts are active for a specific node (8) and are using 1 or more IP addresses:

query MyQuery {
  nodeContracts(where: {numberOfPublicIPs_gt: 0, nodeId_eq: 8}) {
    contractId
    numberOfPublicIPs
  }
}

presenting:

{
  "data": {
    "nodeContracts": [
      {
        "contractId": 97,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 159,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 202,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 284,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 289,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 291,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 737,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 850,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 861,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 883,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 879,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 885,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 882,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 891,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 892,
        "numberOfPublicIPs": 1
      }
    ]
  }
}

And per contract you can find out the IP’s used:

{
  "data": {
    "publicIps": [
      {
        "contractId": 885,
        "ip": "185.69.166.150/24"
      }
    ]
  }
}

So if IP 185.69.166.150 was mentioned in the letter we now know that contract 885 is the one breaking the rules.

Finding out more about the contract:

query MyQuery {
  nodeContracts(where: {contractId_eq: 885}) {
    resourcesUsed {
      cru
      mru
      hru
      sru
    }
    createdAt
    createdById
    id
    nodeId
    deploymentData
    state
    twinId
    updatedAt
    updatedById
    version
    contractId
  }

This provides us with:

{
  "data": {
    "nodeContracts": [
      {
        "resourcesUsed": {
          "cru": "4",
          "mru": "8589934592",
          "hru": 0,
          "sru": "53687091200"
        },
        "createdAt": "2022-04-05T11:54:36.000Z",
        "createdById": "SddLDONB-l",
        "id": "X9idA_uWF",
        "nodeId": 8,
        "deploymentData": "0x",
        "state": "Created",
        "twinId": 3983,
        "updatedAt": "2022-04-05T11:55:32.159Z",
        "updatedById": null,
        "version": 4,
        "contractId": 885
      }
    ]
  }
}

Which presents twin 3983 as the contract owner. we also see that the contract is for

          "cru": "4",
          "mru": "8589934592",
          "hru": 0,
          "sru": "53687091200"

which is 2CU and 50GB of SSD space. So sherlocking our way from the IP address we now know 3983 is doing it.

query MyQuery {
  twins(where: {twinId_eq: 3983}) {
    accountId
    createdAt
    createdById
    deletedAt
    deletedById
    gridVersion
    id
    ip
    twinId
    updatedAt
    updatedById
    version
  }
}

provides

{
  "data": {
    "twins": [
      {
        "accountId": "5DZrBLxL3XKA2DAefpxZbRpTgQ64egGDV86AkTitSvTqujsF",
        "createdAt": "2022-04-05T09:16:48.000Z",
        "createdById": "gvMlq7xSwV",
        "deletedAt": null,
        "deletedById": null,
        "gridVersion": 1,
        "id": "JhBnEeFCQ",
        "ip": "127.0.0.1",
        "twinId": 3983,
        "updatedAt": "2022-04-05T09:16:48.000Z",
        "updatedById": null,
        "version": 1
      }
    ]
  }
}

So we can find which contract and which twin (individual) is involved in the detected activities, now we need to look at what to do with that info.

  • contact 3983?
  • cancel contract 885 without notifying 3983
  • cancel contract 885 with notifying 3983
  • send information to the authorities that 3983 is the one to send the letter to.
4 Likes

This is excellent, thanks for the detailed path! But I do have some questions (before going into what about 3983’s contract).

You say communication between the nodes is encrypted, that’s right I understand that, but what about exit traffic? If the node is doing a DoS(can be anything else like posting spam) against some external target, I would assume the traffic would go directly from the Farmer’s IP? Or that does never happen and it has to use gateways? (theoretically could if the contract hacks the machine even if it’s protected.) But still if it’s only gateways the gateways then would have a double problem since likely they would not be able to identify who used them at that point in time. And obviously even if it’s not encrypted if the traffic is doing something illicit it still can be an issue and you can still get a “letter”.

So what to do about the potentially(maybe it’s just buggy and causing issues) illicit contract. I think it would be fair to the farmer that the farmer has options to reach out and or cancel the contract, possibly through blockchain interaction so there is a trace of the action taken for all parties to be visible. This can help with identifying malicious farmers (cancelling contracts for no proper reason) and malicious users.

It was fun to do the trace - I don;t do this every day :slight_smile:

Every VM /Container deployment can have a IPv4, IPv6 and planetary network address, if these are available in the Farm. See HTTP://play.grid.tf

image

So if DDos activities are started on a VM with a public address, it’s easy to trace and pinpoint. But also from a farmer which has every server behind a NAT (Network Address Translation, basically a lot of private addresses behind one single public address) can start a DDoS, which will then highlight the Farmers Public address as the culprit

Whichever way the 3node(s) are connected, IP wise it will always point back to the farmer owning the IP. In the case of a gateway being deployed on a 3node / farm it is that farmer that will be exposed with it’s IP address. so yes - that is where the letter will arrive.

1 Like

This is the only solution if we want to stay decentralized. The contract is between the application deployer/consumer and the farmer. So only those two people should be involved in the case such an event arises. So these two people should be able to get in touch and discuss the matter, and if no agreement is reached the farmer should be able to cancel the contract.

Today a lot of what is can be done on mainnet (beta) is without any form of identity (verification). It’s based on private keys signing smart contracts and signing token transfers to pay for it. That’s it. In all honesty, this is nothing worse and nothing better than any online service today, you can have unverified email, and pre-pay cash cards to pay for services as well. Believe me, we are looking at connecting technologies that will bring verified credentials forward where people that consume or produce on the grid have / will build a reputation.

2 Likes

So I think we all agree now that the farmer needs a tool to handle these situations. At the very least to be able to “pause” the load so it can be investigated / dealt with.

Incentive based the farmer should not be hitting the pause without joust cause. Thoughts?

At least in some first iteration, later there can be more advanced methods if this happens often enough to justify creating e.g. migration tools. So a farmer hits pause, the customer then selects migration and moves to a different farm.

Sorry I have not seen this reply when I was writing the previous one, maybe we were writing at the same time or something. Anyway yes I agree. But in any case a farmer has to have the ability, as right now you also have the ability. If you are hosting multiple people on your server you have the ability to delete that particular account if the customer is not answering and it’s doing something malicious.Anything less than the current capabilities and people won’t use it.

1 Like

Thanks for this thread @weynandkuijpers and @sensorium super helpful.

3 Likes

Absolutely.

I appreciate the raising of DoS or spam outbound traffic, and this complicates things. Workloads that don’t reserve public IPs will all egress from the same IP, so now we also need a way to identify which workload is generating which traffic. I think giving farmers some visibility into this is okay without harming the privacy of the users.

You can stop a contract, and you can block an account, but you can’t block a user who can generate unlimited free accounts. To your point @weynandkuijpers, I think this is worse that the current situation because blockchain accounts are much easier to generate than prepaid cards. We have a “Sybil” issue here. Maybe all of those accounts have a common funding source which can be identified and used to thus block all accounts which receive funds from it, but this becomes a complicated thing for a farmer to manage.

Blacklisting is a way around this and probably works well enough for media. But what about code? How persistent are those DoS attackers and spammers to scramble up their code in such a way that will bypass the filter and then try again? Building this blacklisting system will be no small undertaking and will introduce a lot of overhead as everything passing through nodes needs to get scanned.

The simplest way forward seems to be adding some Sybil resistance to the system, and the simplest way to do that is requiring deployment accounts to cost something. Maybe that’s TFT which is staked with a time based unlock or burned altogether. Then give farmers enough visibility and power to protect themselves by stopping contracts and banning accounts.

Returning to GDPR concerns…

This is exactly what I meant: if you run a service that collects data from EU citizens, I would not recommend using random farmers on the Grid to host those services. Probably the biggest liability in all this is the case where someone’s node is stolen and there’s unencrypted data on the drives that gets used in someway that triggers GDPR enforcement.

That leaves the case where someone within the EU decides to use the Grid on their own. They yes, they become the data controller. If they want to be forgotten, they can just delete their own data or cancel the contract paying for the storage. Maybe there’s a case where someone loses access to their workload and loses control of the keys that opened the contract, but in that case they would also be unable to demonstrate that their data is stored on the node or that they ever owned the contract in the first place.

You mentioned ownCloud as a potential data controller when someone deploys a solution. To be clear, ownCloud does not control or process your data when you deploy an ownCloud instance on the Grid. You simply load code developed by ownCloud into capacity under your control. This is the beauty of the Grid, really, no intermediaries that can snoop on or control your digital experience.

Likewise, neither TF Tech nor the TF Foundation ever play the role of data controller, when we’re talking about Grid deployments. We supply the tooling for individuals to use the Grid, but all interactions with deployments happen directly, peer to peer.

very interesting, so the Grid essentially is operated by each peer in the peer-to-peer transaction and TF Tech and Foundation do not ever process or control any data for GDPR purposes, but every farmer who hosts a shard of QSFS would be a data processor for GDPR then, even if the Grid/TFTech/Foundation wasn’t. The duties of data processors start with a need for a contract with the data controller to keep personal data confidential, and go from there. https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/controllers-and-processors/what-does-it-mean-if-you-are-a-processor/

Minimum required terms of the contract linked here:
https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/contracts-and-liabilities-between-controllers-and-processors-multi/what-needs-to-be-included-in-the-contract/#2