New grid user has some questions... :-)

I have been introduced to Dmytro a little while ago and he has made his first steps onto the ThreeFold grid. He has a history of building (larger) Enterprise solution and I do believe his company can benefit from using the ThreeFold grid.

While taking his first steps on the grid he has some findings which I believe make sense to share in the larger community and altogether we can help him find his way quicker through the forest of documentation and things to learn. Here are his finding with my initial responses:

  1. Your documentation for resource grid_name_proxy is not up to date. Spent some time figuring out cryptic errors, like ā€œinvalid deployment: failed to validate back end ā€˜https://185.69.166.155:443ā€™: failed to parse back end https://185.69.166.155:443 with error: address https://185.69.166.155:443: too many colons in addressā€ or ā€œinvalid deployment: failed to validate back end ā€˜https://185.69.166.155ā€™: invalid port in back end: //185.69.166.155ā€.
    I found this resolved bug: https://github.com/threefoldtech/zos/pull/1843 and based on the description - there should be meaningful errorsā€¦ But above does not enlighten you that for HTTP back end must be in the form ā€œhttp://:ā€ while for HTTPS just ā€œ:ā€.

response: We need to improve / update the documentation here. @samehabouelsaad Can you have a look and get this one updated. You seem to have dealt with the issue mentioned.

  1. resource grid_scheduler is not reliable in selecting reachable nodes (or maybe there is another reason, not sure) - sometimes operations are timed out which leads to terraform failure. For example, with errors like ā€œcouldnā€™t reach node 17: context deadline exceededā€ (this is error from grid_network resource that tries to configure nodes provided by grid_scheduler

response: I am not sure how stable the grid scheduler is. This is a question for the development team. I personally always do the search and select work manually where I choose the region for where I would like nodes to have VM / storage and then find a server with public IPā€™s (or not) as required. Then - you can specify where to deploys in the terraform script directly and have lesser ambiguity.

Example:

resource "grid_network" "net12" {
    nodes = [4000, 5453, .....]
    ip_range = "10.2.0.0/16"
    name = "network2"
    description = "newer network"
    add_wg_access = false
}
resource "grid_deployment" "d1" {
  node = 4000
  network_name = grid_network.net12.name
  disks {
    name = "data"
    size = 20
    description = "volume holding app data"
  }
  vms {
    name = "nextcloud"
    flist = "https://hub.grid.tf/tf-official-vms/ubuntu-22.04-lts.flist"
    cpu = 2
    publicip = true
    memory = 4096
    planetary = true
    entrypoint = "/sbin/zinit init"
    mounts {
        disk_name = "data"
        mount_point = "/app"
    }
    env_vars = {
      SSH_KEY ="...."
	}
  }
}
  1. Is there any mechanism that configures the VM during start up? I re-created Kubernetes cluster several times but SSH fingerprint (and even external IP address) did not change (this usually means that the SSH host key is embedded into the VM image and not cleaned up or generated during start up).

response: The so called flist is an image that is used to boot the (micro of full) VM. This flist is fully configurable and you can make your own. There are a few ways to do this, the easiest being creating a docker image on any other platform and make it do exactly what needs to be, including all the startup commands and ENV vars parsed etc (micro VM). The other way is a little bit more involved where you can start with any of the cloud images provides and make it full VM image customized to your needs.

When done (with either of the two ways) you can import the created image on http://hub.grid.tf and user is to deploy your specific customer images.

There is probable some more ways to customize the existing terraform based simple kubernetes install, but I am not the expert of Kubernetes. Anyone?

  1. In the example documentation Kubernetes cluster is deployed with Public IP for master node only, but all nodes in the cluster have Internet access. How to determine outgoing NAT address for these nodes? My experiments showed that even if I run worker nodes on a farm without public addresses, all egress traffic goes via some IP in the subnet of the master node (not the IP address of the master node).
    It looks like adding Public IP to the Master node just exposes this node to the Internet (just a 1-to-1 NAT). Checked with netcat/tcpdump that my SYN packets are forwarded correctly to the node. I assume that is for easier demonstration only and recommended way is to run without public IP and connect via WireGuard?

response: Not sure about this, @scott do you have some more insights here?

  1. It looks like when nodes are provisioned, they are given public IPv6 address implicitly, at least Terraform tries to remove this attribute on each update:

       # grid_kubernetes.k8s1 will be updated in-place
       ~ resource "grid_kubernetes" "k8s1" {
             id                 = "83582926-c3f5-49f1-bf8b-ef96c12b8c4b"
             name               = "myk8s"
           + nodes_ip_range     = (known after apply)
             # (5 unchanged attributes hidden)
           ~ master {
               - flist_checksum = "c87cf57e1067d21a3e74332a64ef9723" -> null
                 name           = "mr"
               - publicip6      = true -> null
                 # (10 unchanged attributes hidden)
             }
    

    As I donā€™t request an IPv6 address - this looks like a huge security hole to me.

response: For normal VMā€™s you specify whether a IPv6 address is to be provided. Example:

......
    publicip = true         # IPV4 address request
    publicip6 = true       # IPv6 Address request
    planetary = true       # (yggdrasil) planetary network address request
.......

I expect for Kubernetes workers / master to have the same manner to turn addresses on and off.

BTW - the planetary network is based on the yggdrasil technology, peer-to-peer, overlay network encrypted traffic between various nodes where routing is done by all nodes involved (option) and therefore it does not use traditional routing on international and national networks.

  1. Kubernetes example (one on the demo video) uses grid_name_proxy to expose service running in the Kubernetes. Master node is running with public IP, so this name_proxy is configured with a public IP address as a back end. Question - in that case traffic from name_proxy to the back end is routed via Internet (as it is public IP) or you are doing some magic to contain this public traffic inside your overlay network?

response: In the kubernetes cluster a private IP network provides inter worker and controller nodes. This is a wireguard mesh.

1 Like

Yes, I understand that.
For Terraform provider you do not have defaults, so if this value is not specified I assume it is off (false), but your API (or whatever is creating VMs) has default value to true if not specified (or if just publicip=true is specified) - thatā€™s where confusion comes from. I do not specified publicip6=true, so expect it to be false, but on next terraform run I see that state refresh shows it as trueā€¦

Even without scheduler it looks like nodes are not uniformly configured (even though they are seen as UP in the explorer). For example I tried to launch VM (k8s master) with public IP on node 19.
VM started, it has public IP attached, correct (seems) routes:

# ip r
default via 185.69.166.1 dev eth1
10.1.0.0/16 via 10.1.3.1 dev eth0
10.1.3.0/24 dev eth0 proto kernel scope link src 10.1.3.2
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
100.64.0.0/16 via 10.1.3.1 dev eth0
185.69.166.0/24 dev eth1 proto kernel scope link src 185.69.166.156 

but canā€™t reach internet:

# apt-get update
Err:1 http://security.ubuntu.com/ubuntu focal-security InRelease
  Temporary failure resolving 'security.ubuntu.com'

(Looks more like infra problems to me rather than node misconfiguration, but VMs on other nodes operate normally)

Hi - this I have never seen. again I spent more time deploying and configuring plain vanilla VMā€™s than kubernetes managers and workers. Also seems more like a name resolution issue / challenge than a network issue. What if you do a manual poing to that security.ubuntu.com. Does it resolve then?

Got it. @reem Could you verify / enlighten us on the default settings for the terraform provider scripts?

If I understand correctly how your networking works, then grid_name_proxy is not able to connect to WireGuard network (in the latest version of the terraform provider at least, development unreleased version has network parameter for that). And if specified public IP address (IPv4 or IPv6) as a backend traffic is routed via local/isp networks, not via encrypted overlay network.

Yes, I could see that DNS servers specified in VM are external and that might be temporary connectivity issue.
Unfortunately I do not keep VMs running indefinitely, with terraform I could launch one for experiment and terminate immediately after, so I canā€™t reproduce the error.

1 Like

There is no dns server specified on the public address by default, but there is on the Wireguard interface.

Typically public is interface 1 and Wireguard 0, add a dns server to pub interface in the net plan at

etc/netplan/something cloudinit.yaml 

Then

netplan apply

I have a year old github issue for this actually, itā€™s closed they didnā€™t see why it needed one.

As for configuration of full vm images, forget the concept of starting with a basic ubuntu image from the other clouds.

You can create a customized image with all of your payloads in place (minus anything private) and host it on the hub. So for example I have some images that deploy with a cockpit web interface already installed and running and I can code them just by using their url on the hub, you can also use them by finding hub page.

Every twin so anyone on forums has a hub page where they can host their own images, but itā€™s completely public, this has always been a pain to me we need private flist.

I have a tutorial on prepping images for the hub.

Once you have your full vm image the only env variable passed will be the ssh key, but you can sneak a tailscale key into the ssh key comment, Scott has info on that.

So you really canā€™t fully automate deployment, but you streamline it so all you have to do is put the private payloads in by user input or script.

ā€”ā€”ā€”ā€”

I also have a script for generating Terraform configuration files that could pretty easily be retooled to take non users inputs for automatic main.tf creates. And it does have reasonable defaults, I donā€™t think thereā€™s been any changes since it was last updated but Iā€™m not a team member so it hasnā€™t been updated in a good while. if you run into problems just shoot me a pm and I can take a look.

1 Like

The other thing to watch out for here is your planetary address masquerades as a ipv6 address, itā€™s actually the public part of a cryptographic key pair that allows your node to send an outgoing packet to the yggdrasil network and establish a tunnel to your vm.

That address is, theoretically, a public address to any other client on the yggdrasil network.

ā€”-

Public ip addresses are provided by nodes that have access to complete public ip block. When you reserve one zos creates a static reservation for that ip address within the host network and attaches it to vm. What you expose from there is up to you, but that ip is dedicated to your vm and is a direct connection not behind any local nat.

1 Like

Further remarks/feedback from @dmz_tftest


Some answers on the public forum really helped to understand how things work, but, as a person coming from public clouds or enterprise infrastructure, Iā€™m still struggling to understand some operation specifics.

For example, internet access - it looks like for VMs without public IP there is implicit NAT configured (which, I would assume, routes always to Node #1) and there is no way to control this behaviour.


Response: Deployed VMā€™s need access to the outside world to communicate with a number of grid components.


Enterprises usually tend to control IP addresses they use (firewalls, access policies, etc.). Same applies for public IPs - it seems that there is no way to reserve and re-use IP addresses (actually this is true for internal addressing as well).


Response: For internal IP addressing you can specify the IP address (in the defined network) for each VM

### Nested Schema for `vms`

Required:

- `flist` (String) Flist used on this vm, e.g. https://hub.grid.tf/tf-official-apps/base:latest.flist. All flists could be found in `https://hub.grid.tf/`.
- `name` (String) Vm (zmachine) workload name. This has to be unique within the deployment.

Optional:

- `corex` (Boolean) Flag to enable corex. More information about corex could be found [here](https://github.com/threefoldtech/corex)
- `cpu` (Number) Number of virtual CPUs.
- `description` (String) Description of the vm.
- `entrypoint` (String) Command to execute as the ZMachine init.
- `env_vars` (Map of String) Environment variables to pass to the zmachine.
- `flist_checksum` (String) if present, the flist is rejected if it has a different hash.
- `ip` (String) The private wireguard IP of the vm.
- `memory` (Number) Memory size in MB.
- `mounts` (Block List) List of vm (ZMachine) mounts. Can reference QSFSs and Disks. (see [below for nested schema](#nestedblock--vms--mounts))
- `planetary` (Boolean) Flag to enable Yggdrasil IP allocation.
- `publicip` (Boolean) Flag to enable public ipv4 reservation.
- `publicip6` (Boolean) Flag to enable public ipv6 reservation.
- `rootfs_size` (Number) Root file system size in MB.
- `zlogs` (List of String) List of Zlogs workloads configurations (URLs). Zlogs is a utility workload that allows you to stream `ZMachine` logs to a remote location.

I am assuming that the VMā€™s private IP is then also the Kubernetes controller / worker main IP address. However I am not sure about this.

So when needed you can control which VM uses which internal IP address.

For the kubernetes part something similar: here you can see that the IP address of a controller of worker node
you can set the IP as well:

			"workers": {
				Type:        schema.TypeList,
				Optional:    true,
				Description: "Workers is a list holding the workers configuration for the kubernetes cluster.",
				Elem: &schema.Resource{
					Schema: map[string]*schema.Schema{
						"name": {
							Type:        schema.TypeString,
							Required:    true,
							Description: "Worker node ZMachine workload name. This has to be unique within the node.",
						},
						"flist": {
							Type:        schema.TypeString,
							Optional:    true,
							Default:     "https://hub.grid.tf/tf-official-apps/threefoldtech-k3s-latest.flist",
							Description: "Flist used on worker node, e.g. https://hub.grid.tf/tf-official-apps/threefoldtech-k3s-latest.flist. All flists could be found in `https://hub.grid.tf/`.",
						},
						"flist_checksum": {
							Type:        schema.TypeString,
							Optional:    true,
							Description: "if present, the flist is rejected if it has a different hash.",
						},
						"disk_size": {
							Type:        schema.TypeInt,
							Required:    true,
							Description: "Data disk size in GBs.",
						},
						"node": {
							Type:        schema.TypeInt,
							Required:    true,
							Description: "Node ID to deploy worker node on.",
						},
						"publicip": {
							Type:        schema.TypeBool,
							Optional:    true,
							Description: "Flag to enable/disable public ipv4 reservation.",
						},
						"computedip": {
							Type:        schema.TypeString,
							Computed:    true,
							Description: "The reserved public ipv4.",
						},
						"publicip6": {
							Type:        schema.TypeBool,
							Optional:    true,
							Description: "Flag to enable/disable public ipv6 reservation.",
						},
						"computedip6": {
							Type:        schema.TypeString,
							Computed:    true,
							Description: "The reserved public ipv6.",
						},
						"ip": {
							Type:        schema.TypeString,
							Computed:    true,
							Description: "The private IP (computed from nodes_ip_range).",
						},
						"cpu": {
							Type:        schema.TypeInt,
							Required:    true,
							Description: "Number of virtual CPUs.",
						},
						"memory": {
							Type:        schema.TypeInt,
							Required:    true,
							Description: "Memory size in MB.",
						},
						"planetary": {
							Type:        schema.TypeBool,
							Optional:    true,
							Default:     false,
							Description: "Flag to enable Yggdrasil IP allocation.",
						},
						"ygg_ip": {
							Type:        schema.TypeString,
							Computed:    true,
							Description: "The allocated Yggdrasil IP.",
						},

Networking part is not well documented, so there are still open questions about routing from Web Gateway to VM public IP address or supported protocols over the internal WireGuard mesh network - such topics are very well covered in public clouds documentation.

Response: I agree that documentation can be improved upon. I had to go and search the source code to find answers, I think we should do a better job to create detailed documentation to make this information available outside the code.


I believe the networking part is not specific to Kubernetes, but it just makes extensive use of it.


Response: Iā€™ll see what I can do to start documenting this better.

To expand on this point a bit, every VM gets a virtual NIC within the private overlay network (even if itā€™s a single node). This is indeed has NAT to the public internet by default, but it goes over the node that the VM is deployed onā€”there nothing special about node #1 in terms of this part of the networking.

I actually donā€™t think this is true. Both the Wireguard and Yggdrasil connections are facilitated by the node and simply exposed within the VM. Iā€™m not sure if thereā€™s any advantage to allowing VMs to be configured without outbound internet access, versus using a firewall within the VM, but I donā€™t see any reason it wouldnā€™t be possible.

WireGuard is an off the shelf open source component, so there should be plenty of information available about whatā€™s supported on top. As for our bespoke components like the gateways, indeed more documentation is needed.