I have been introduced to Dmytro a little while ago and he has made his first steps onto the ThreeFold grid. He has a history of building (larger) Enterprise solution and I do believe his company can benefit from using the ThreeFold grid.
While taking his first steps on the grid he has some findings which I believe make sense to share in the larger community and altogether we can help him find his way quicker through the forest of documentation and things to learn. Here are his finding with my initial responses:
- Your documentation for resource
grid_name_proxy
is not up to date. Spent some time figuring out cryptic errors, like āinvalid deployment: failed to validate back end āhttps://185.69.166.155:443ā: failed to parse back end https://185.69.166.155:443 with error: address https://185.69.166.155:443: too many colons in addressā or āinvalid deployment: failed to validate back end āhttps://185.69.166.155ā: invalid port in back end: //185.69.166.155ā.
I found this resolved bug: https://github.com/threefoldtech/zos/pull/1843 and based on the description - there should be meaningful errors⦠But above does not enlighten you that for HTTP back end must be in the form āhttp://:ā while for HTTPS just ā:ā.
response: We need to improve / update the documentation here. @samehabouelsaad Can you have a look and get this one updated. You seem to have dealt with the issue mentioned.
- resource
grid_scheduler
is not reliable in selecting reachable nodes (or maybe there is another reason, not sure) - sometimes operations are timed out which leads to terraform failure. For example, with errors like ācouldnāt reach node 17: context deadline exceededā (this is error fromgrid_network
resource that tries to configure nodes provided bygrid_scheduler
response: I am not sure how stable the grid scheduler is. This is a question for the development team. I personally always do the search and select work manually where I choose the region for where I would like nodes to have VM / storage and then find a server with public IPās (or not) as required. Then - you can specify where to deploys in the terraform
script directly and have lesser ambiguity.
Example:
resource "grid_network" "net12" {
nodes = [4000, 5453, .....]
ip_range = "10.2.0.0/16"
name = "network2"
description = "newer network"
add_wg_access = false
}
resource "grid_deployment" "d1" {
node = 4000
network_name = grid_network.net12.name
disks {
name = "data"
size = 20
description = "volume holding app data"
}
vms {
name = "nextcloud"
flist = "https://hub.grid.tf/tf-official-vms/ubuntu-22.04-lts.flist"
cpu = 2
publicip = true
memory = 4096
planetary = true
entrypoint = "/sbin/zinit init"
mounts {
disk_name = "data"
mount_point = "/app"
}
env_vars = {
SSH_KEY ="...."
}
}
}
- Is there any mechanism that configures the VM during start up? I re-created Kubernetes cluster several times but SSH fingerprint (and even external IP address) did not change (this usually means that the SSH host key is embedded into the VM image and not cleaned up or generated during start up).
response: The so called flist is an image that is used to boot the (micro of full) VM. This flist is fully configurable and you can make your own. There are a few ways to do this, the easiest being creating a docker image on any other platform and make it do exactly what needs to be, including all the startup commands and ENV
vars parsed etc (micro VM). The other way is a little bit more involved where you can start with any of the cloud images provides and make it full VM image customized to your needs.
When done (with either of the two ways) you can import the created image on http://hub.grid.tf
and user is to deploy your specific customer images.
There is probable some more ways to customize the existing terraform based simple kubernetes install, but I am not the expert of Kubernetes. Anyone?
- In the example documentation Kubernetes cluster is deployed with Public IP for master node only, but all nodes in the cluster have Internet access. How to determine outgoing NAT address for these nodes? My experiments showed that even if I run worker nodes on a farm without public addresses, all egress traffic goes via some IP in the subnet of the master node (not the IP address of the master node).
It looks like adding Public IP to the Master node just exposes this node to the Internet (just a 1-to-1 NAT). Checked with netcat/tcpdump that my SYN packets are forwarded correctly to the node. I assume that is for easier demonstration only and recommended way is to run without public IP and connect via WireGuard?
response: Not sure about this, @scott do you have some more insights here?
-
It looks like when nodes are provisioned, they are given public IPv6 address implicitly, at least Terraform tries to remove this attribute on each update:
# grid_kubernetes.k8s1 will be updated in-place ~ resource "grid_kubernetes" "k8s1" { id = "83582926-c3f5-49f1-bf8b-ef96c12b8c4b" name = "myk8s" + nodes_ip_range = (known after apply) # (5 unchanged attributes hidden) ~ master { - flist_checksum = "c87cf57e1067d21a3e74332a64ef9723" -> null name = "mr" - publicip6 = true -> null # (10 unchanged attributes hidden) }
As I donāt request an IPv6 address - this looks like a huge security hole to me.
response: For normal VMās you specify whether a IPv6 address is to be provided. Example:
......
publicip = true # IPV4 address request
publicip6 = true # IPv6 Address request
planetary = true # (yggdrasil) planetary network address request
.......
I expect for Kubernetes workers / master to have the same manner to turn addresses on and off.
BTW - the planetary network is based on the yggdrasil technology, peer-to-peer, overlay network encrypted traffic between various nodes where routing is done by all nodes involved (option) and therefore it does not use traditional routing on international and national networks.
- Kubernetes example (one on the demo video) uses
grid_name_proxy
to expose service running in the Kubernetes. Master node is running with public IP, so this name_proxy is configured with a public IP address as a back end. Question - in that case traffic from name_proxy to the back end is routed via Internet (as it is public IP) or you are doing some magic to contain this public traffic inside your overlay network?
response: In the kubernetes cluster a private IP network provides inter worker and controller nodes. This is a wireguard mesh.