What will happen if somebody will try to host illegal content on ThreeFold Grid?

weynandkuijpers · April 6, 2022, 7:32am

Thank you for the dissection. To make it simple, let’s say a letter from the authorities (not the service provider) arrives at a farmer. What would such a letter have as information to identify that there is illicit content on the device, and how does this letter connect the device to the farmer?

Before we think in creating a technology solution let’s get the requirements clear of what we are trying to solve. Let’s brainstorm (list is not meant to be exhaustive, us the first things that come to mind):

Identify the node / farmer:
“authorities were monitoring the network traffic of the ISP of the farmer and have found, unencrypted traffic that contains something that cannot be.”

Ok - so that seems reasonable and possible. But only if the farmer provides public IP addresses. Because all traffic between (VM’s) on nodes that do not have public IP addresses to rent (so in other words all devices behind a NAT) use encrypted, tunnelled, point-to-point wireguard / planetary networking. So this cannot be intercepted (and read) by the authorities. So we are talking about a subset of farmers that are potentially going to see this.

So the letter can be addressed to the owner of the IP address as traffic from his IP address contains (unencrypted) illicit content. How do we match IP addresses to specific farmers? IPaddresses are bought and owned or rented. So in the end (with more steps in between here the owner of the IP addresses will receive this letter, and this might be the owner of the IP addresses or the person renting them.

Good - so the letter arrives. What does it state with regards to what was monitored.

a username of the person doing illicit online activities?
an IPv4 address of the application that “processed” the illicit content?
an IPv4 address of the VM on which the application ran “processing” illicit content.

There is really nothing more that they can see when they monitor / filter network traffic. So now the farmer/someone needs to go and see which contract is using that specific IP address.

So we have a query tool to query TF Chain, and this has a lot of functionality to query the chain. For example to query what contracts are active for a specific node (8) and are using 1 or more IP addresses:

query MyQuery {
  nodeContracts(where: {numberOfPublicIPs_gt: 0, nodeId_eq: 8}) {
    contractId
    numberOfPublicIPs
  }
}

presenting:

{
  "data": {
    "nodeContracts": [
      {
        "contractId": 97,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 159,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 202,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 284,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 289,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 291,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 737,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 850,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 861,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 883,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 879,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 885,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 882,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 891,
        "numberOfPublicIPs": 1
      },
      {
        "contractId": 892,
        "numberOfPublicIPs": 1
      }
    ]
  }
}

And per contract you can find out the IP’s used:

{
  "data": {
    "publicIps": [
      {
        "contractId": 885,
        "ip": "185.69.166.150/24"
      }
    ]
  }
}

So if IP 185.69.166.150 was mentioned in the letter we now know that contract 885 is the one breaking the rules.

Finding out more about the contract:

query MyQuery {
  nodeContracts(where: {contractId_eq: 885}) {
    resourcesUsed {
      cru
      mru
      hru
      sru
    }
    createdAt
    createdById
    id
    nodeId
    deploymentData
    state
    twinId
    updatedAt
    updatedById
    version
    contractId
  }

This provides us with:

{
  "data": {
    "nodeContracts": [
      {
        "resourcesUsed": {
          "cru": "4",
          "mru": "8589934592",
          "hru": 0,
          "sru": "53687091200"
        },
        "createdAt": "2022-04-05T11:54:36.000Z",
        "createdById": "SddLDONB-l",
        "id": "X9idA_uWF",
        "nodeId": 8,
        "deploymentData": "0x",
        "state": "Created",
        "twinId": 3983,
        "updatedAt": "2022-04-05T11:55:32.159Z",
        "updatedById": null,
        "version": 4,
        "contractId": 885
      }
    ]
  }
}

Which presents twin 3983 as the contract owner. we also see that the contract is for

          "cru": "4",
          "mru": "8589934592",
          "hru": 0,
          "sru": "53687091200"

which is 2CU and 50GB of SSD space. So sherlocking our way from the IP address we now know 3983 is doing it.

query MyQuery {
  twins(where: {twinId_eq: 3983}) {
    accountId
    createdAt
    createdById
    deletedAt
    deletedById
    gridVersion
    id
    ip
    twinId
    updatedAt
    updatedById
    version
  }
}

provides

{
  "data": {
    "twins": [
      {
        "accountId": "5DZrBLxL3XKA2DAefpxZbRpTgQ64egGDV86AkTitSvTqujsF",
        "createdAt": "2022-04-05T09:16:48.000Z",
        "createdById": "gvMlq7xSwV",
        "deletedAt": null,
        "deletedById": null,
        "gridVersion": 1,
        "id": "JhBnEeFCQ",
        "ip": "127.0.0.1",
        "twinId": 3983,
        "updatedAt": "2022-04-05T09:16:48.000Z",
        "updatedById": null,
        "version": 1
      }
    ]
  }
}

So we can find which contract and which twin (individual) is involved in the detected activities, now we need to look at what to do with that info.

contact 3983?
cancel contract 885 without notifying 3983
cancel contract 885 with notifying 3983
send information to the authorities that 3983 is the one to send the letter to.

sensorium · April 6, 2022, 9:48am

This is excellent, thanks for the detailed path! But I do have some questions (before going into what about 3983’s contract).

You say communication between the nodes is encrypted, that’s right I understand that, but what about exit traffic? If the node is doing a DoS(can be anything else like posting spam) against some external target, I would assume the traffic would go directly from the Farmer’s IP? Or that does never happen and it has to use gateways? (theoretically could if the contract hacks the machine even if it’s protected.) But still if it’s only gateways the gateways then would have a double problem since likely they would not be able to identify who used them at that point in time. And obviously even if it’s not encrypted if the traffic is doing something illicit it still can be an issue and you can still get a “letter”.

So what to do about the potentially(maybe it’s just buggy and causing issues) illicit contract. I think it would be fair to the farmer that the farmer has options to reach out and or cancel the contract, possibly through blockchain interaction so there is a trace of the action taken for all parties to be visible. This can help with identifying malicious farmers (cancelling contracts for no proper reason) and malicious users.

weynandkuijpers · April 6, 2022, 12:13pm

It was fun to do the trace - I don;t do this every day

Every VM /Container deployment can have a IPv4, IPv6 and planetary network address, if these are available in the Farm. See HTTP://play.grid.tf

So if DDos activities are started on a VM with a public address, it’s easy to trace and pinpoint. But also from a farmer which has every server behind a NAT (Network Address Translation, basically a lot of private addresses behind one single public address) can start a DDoS, which will then highlight the Farmers Public address as the culprit

Whichever way the 3node(s) are connected, IP wise it will always point back to the farmer owning the IP. In the case of a gateway being deployed on a 3node / farm it is that farmer that will be exposed with it’s IP address. so yes - that is where the letter will arrive.

weynandkuijpers · April 6, 2022, 12:25pm

This is the only solution if we want to stay decentralized. The contract is between the application deployer/consumer and the farmer. So only those two people should be involved in the case such an event arises. So these two people should be able to get in touch and discuss the matter, and if no agreement is reached the farmer should be able to cancel the contract.

Today a lot of what is can be done on mainnet (beta) is without any form of identity (verification). It’s based on private keys signing smart contracts and signing token transfers to pay for it. That’s it. In all honesty, this is nothing worse and nothing better than any online service today, you can have unverified email, and pre-pay cash cards to pay for services as well. Believe me, we are looking at connecting technologies that will bring verified credentials forward where people that consume or produce on the grid have / will build a reputation.

sensorium · April 6, 2022, 12:27pm

So I think we all agree now that the farmer needs a tool to handle these situations. At the very least to be able to “pause” the load so it can be investigated / dealt with.

Incentive based the farmer should not be hitting the pause without joust cause. Thoughts?

At least in some first iteration, later there can be more advanced methods if this happens often enough to justify creating e.g. migration tools. So a farmer hits pause, the customer then selects migration and moves to a different farm.

sensorium · April 8, 2022, 5:18am

Sorry I have not seen this reply when I was writing the previous one, maybe we were writing at the same time or something. Anyway yes I agree. But in any case a farmer has to have the ability, as right now you also have the ability. If you are hosting multiple people on your server you have the ability to delete that particular account if the customer is not answering and it’s doing something malicious.Anything less than the current capabilities and people won’t use it.

trout2 · April 8, 2022, 7:56am

Thanks for this thread @weynandkuijpers and @sensorium super helpful.

scott · April 22, 2022, 7:55am

Absolutely.

I appreciate the raising of DoS or spam outbound traffic, and this complicates things. Workloads that don’t reserve public IPs will all egress from the same IP, so now we also need a way to identify which workload is generating which traffic. I think giving farmers some visibility into this is okay without harming the privacy of the users.

You can stop a contract, and you can block an account, but you can’t block a user who can generate unlimited free accounts. To your point @weynandkuijpers, I think this is worse that the current situation because blockchain accounts are much easier to generate than prepaid cards. We have a “Sybil” issue here. Maybe all of those accounts have a common funding source which can be identified and used to thus block all accounts which receive funds from it, but this becomes a complicated thing for a farmer to manage.

Blacklisting is a way around this and probably works well enough for media. But what about code? How persistent are those DoS attackers and spammers to scramble up their code in such a way that will bypass the filter and then try again? Building this blacklisting system will be no small undertaking and will introduce a lot of overhead as everything passing through nodes needs to get scanned.

The simplest way forward seems to be adding some Sybil resistance to the system, and the simplest way to do that is requiring deployment accounts to cost something. Maybe that’s TFT which is staked with a time based unlock or burned altogether. Then give farmers enough visibility and power to protect themselves by stopping contracts and banning accounts.

scott · April 22, 2022, 8:32am

Returning to GDPR concerns…

This is exactly what I meant: if you run a service that collects data from EU citizens, I would not recommend using random farmers on the Grid to host those services. Probably the biggest liability in all this is the case where someone’s node is stolen and there’s unencrypted data on the drives that gets used in someway that triggers GDPR enforcement.

That leaves the case where someone within the EU decides to use the Grid on their own. They yes, they become the data controller. If they want to be forgotten, they can just delete their own data or cancel the contract paying for the storage. Maybe there’s a case where someone loses access to their workload and loses control of the keys that opened the contract, but in that case they would also be unable to demonstrate that their data is stored on the node or that they ever owned the contract in the first place.

You mentioned ownCloud as a potential data controller when someone deploys a solution. To be clear, ownCloud does not control or process your data when you deploy an ownCloud instance on the Grid. You simply load code developed by ownCloud into capacity under your control. This is the beauty of the Grid, really, no intermediaries that can snoop on or control your digital experience.

Likewise, neither TF Tech nor the TF Foundation ever play the role of data controller, when we’re talking about Grid deployments. We supply the tooling for individuals to use the Grid, but all interactions with deployments happen directly, peer to peer.

trout2 · April 22, 2022, 9:07am

very interesting, so the Grid essentially is operated by each peer in the peer-to-peer transaction and TF Tech and Foundation do not ever process or control any data for GDPR purposes, but every farmer who hosts a shard of QSFS would be a data processor for GDPR then, even if the Grid/TFTech/Foundation wasn’t. The duties of data processors start with a need for a contract with the data controller to keep personal data confidential, and go from there. https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/controllers-and-processors/what-does-it-mean-if-you-are-a-processor/

Minimum required terms of the contract linked here:
https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/contracts-and-liabilities-between-controllers-and-processors-multi/what-needs-to-be-included-in-the-contract/#2

sensorium · April 25, 2022, 10:09am

Easy, just block everything by default and make it so that the one doing the deployment has to explicitly ask for permission. Immediately there is a list of who requested what. Plus you don’t have any strays doing something nobody wants them to do. Then it’s easy for the farmer to block that particular IP and or port from being used on his farm. So even if it’s deployed again it won’t work. That is the contract initiation can immediately fail as the requested ports/ip’s won’t get whitelisted by the system.

Worst what can happen is that you have more contracts using the same outbound ip/port, but you just drop them all.

scott · April 28, 2022, 9:58pm

I’m not sure that this is the case. Maybe a legal precedent would still need to be established here. A “shard” of QSFS data is meaningless in itself. It only becomes meaningful with the proper metadata, accompanying shards to make a full block, and the keys needed to decrypt. Effectively destroying QSFS data can be done even if nodes holding shards fail to comply.

That doesn’t totally skirt the issue though, because there’s always at least one node in a QSFS setup that’s responsible for intaking and reconstructing the plain data. And of course users can store their data without using QSFS too.

If all European Grid users need to sign a contract with the individual farmer whose capacity they are using, that should be technically possible, although I imagine many farmers would choose not to participate.

scott · April 28, 2022, 11:00pm

This seems like a high cost to the UX of the non offending majority. Actually, this would probably kill certain use cases which choose a random outbound port to open a connection on. Or, the deployments gets to say “I want all ports in this range”, which then dilutes the utility of requiring them to be explicitly opened.

Let’s say, for example, that a bad actor carries out a DoS attack against Wikipedia from some 3Node. Does the farmer then ban all future workloads from connecting to that Wikipedia IP to carry out legitimate requests?

sensorium · April 29, 2022, 5:07am

Anyone choosing all ports will know the risk. (Any report coming in to that machine will get your contract killed)

And yes, if I get an email from Wikipedia I would keep them blacklisted on my machine for a while(because a repeat offense would mean I have not solved the issue, and they ask you if you solved it and what exactly did you do. Saying you blacklisted the IP is a reasonable measure). Anyone who wants to deploy a machine will see the blacklist and can choose a different machine. Not everyone will blacklist wikipedia, and not everyone will use it.

trout2 · May 2, 2022, 12:00pm

It’s just my view based on doing a bunch of reading (as we all are). I think getting an official legal opinion is a good idea, then we can say, that this is the best legal advice we have, and the farmers can point to it and say, either we are processors and so these are obligations or according to this legal advice we’re not etc.

weynandkuijpers · May 5, 2022, 9:20am

Love the discussion here and I the effort and results. Now it’s time to condense what was discussed into a one-pager that we agree describes the identified problems.

When that’s done let’s bring in someone that has a legal head or background. If you know one, please invite her or him to this discussion after we have finished the one-page problem definition.

Let’s use our collective brains to list the issues and make it a very specific problem statement. I will start copying and pasting some of the things posted ^^^ in a doc, but I’d love it to be a community effort. (BTW will try to find a collaboration tool that is not centralized. If I fail I will post a gdoc to co-edit).

Tried Github, but in order to edit, you have to have an account which I am sure the future law expert might not have…

sensorium · May 6, 2022, 6:56am

Cryptpad? Not decentralized but good enough.

Rasputin_Rick · May 6, 2022, 11:45am

I was under the impression Uhuru would be such a decentralized collaboration tool. Is it not?

weynandkuijpers · May 6, 2022, 12:12pm

Yes - and we have a beta version online. Anyway, we have (had?) the cryptpad software on the grid. Will figure out of still available.

weynandkuijpers · May 6, 2022, 2:12pm

It is available. I have re-read the whole thread and condensed a document with a problem statement and findings list relevant to the problem statement. Please fee free to read and augment. I want to have the problem statement and (relevand) finding clear and agreed before moving forward and fomulate a solution (enhancement proposal).

Here’s the cryptpad: https://secure.threefold.me/pad/#/2/pad/edit/4Ttg2i7pPcBKIaNeAPUZ3r3r/