Node 1655 does not seem to be connected correctly [RESOLVED]

weynandkuijpers · February 3, 2023, 2:27pm

Hi, not sure who to contact but I have been deploying VM’s on node 1655, farmID 84 (Terminator). The network configuration is not done properly. AFAICS I have provisioned an IPv4 address:

{
    "version": 0,
    "contractId": 14356,
    "nodeId": 1655,
    "name": "VMc02ecdfc",
    "created": 1673433242,
    "status": "ok",
    "message": "",
    "flist": "https://hub.grid.tf/tf-official-vms/ubuntu-18.04-lts.flist",
    "publicIP": {
        "ip": "87.251.36.6/24",
        "ip6": "",
        "gateway": "87.251.36.1"
    },
    "planetary": "304:5069:f7aa:c456:ede2:6255:b0c8:8607",
    "interfaces": [
        {
            "network": "NW61a62ca3",
            "ip": "10.20.2.2"
        }
    ],
    "capacity": {
        "cpu": 4,
        "memory": 4096
    },
    "mounts": [
        {
            "name": "DISK84fd1857",
            "mountPoint": "/",
            "size": 53687091200,
            "state": "ok",
            "message": ""
        }
    ],
    "env": {
        "SSH_KEY": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAOP0h6VImNcxnIBRMoMfbMfb0xwGHDlaPxZ+nu0CL8ATJekVDHDLMGEPdvACfHBe0sqIw/l6jqoEMR4Dzhjgm4bVEUBVEnG1FvkeNB59sT2DOxDCZuqJvjx2M1bJlH8AR/JQXxUQ+zvfTbavc4/zfCuJm4PYNUsmEt/IQmRwLznGOkoJbwYLhKCC3ykZd0EGpmCWgUUYn0ihaaYkyrliQi5Ny00x0s6jOIJg0CG2Xh5xcrkhOfCZMxZAB+/LGQpZ3tu+Cy5jRf8V/JZ8XQmtYM2GmBUZ1KGcMcsGzrtuudn13JeYLtWJBw6A7Q3Fb7dQSCMLC9UA0uMSZk67M6DFV john@RescuedMac"
    },
    "entrypoint": "/init.sh",
    "metadata": "{\"type\":\"vm\",\"name\":\"VMc02ecdfc\",\"projectName\":\"Fullvm\"}",
    "description": "",
    "corex": false
}

But pinging it shows a routing issue inside the DC network or switch / router connecting the 3nodes:

➜  ~ ping 87.251.36.6
PING 87.251.36.6 (87.251.36.6) 56(84) bytes of data.
From 213.136.2.25 icmp_seq=1 Destination Host Unreachable
From 213.136.2.25 icmp_seq=2 Destination Host Unreachable
From 213.136.2.25 icmp_seq=3 Destination Host Unreachable
From 213.136.2.25 icmp_seq=4 Destination Host Unreachable

and a tracepath

root@meet:~# tracepath 1.1.1.1
 1?: [LOCALHOST]                      pmtu 1500
 1:  ???                                                 1403.418ms !H
     Resume: pmtu 1500 
root@meet:~#

Or:

 ping meet.mytrunk.org
PING meet.mytrunk.org (87.251.36.6) 56(84) bytes of data.
From lo0.leaf-sw1.bit-2d.network.bit.nl (213.136.2.25) icmp_seq=1 Destination Host Unreachable
From lo0.leaf-sw1.bit-2d.network.bit.nl (213.136.2.25) icmp_seq=2 Destination Host Unreachable
From lo0.leaf-sw1.bit-2d.network.bit.nl (213.136.2.25) icmp_seq=3 Destination Host Unreachable

from within the VM (you can connect to it over the planetary network):

root@meet:~# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
From 87.251.36.6 icmp_seq=1 Destination Host Unreachable

Also a tracepath from the VM does this:

root@meet:~# tracepath 1.1.1.1
 1?: [LOCALHOST]                      pmtu 1500
 1:  ???                                                 1403.418ms !H
     Resume: pmtu 1500 
root@meet:~#

weynandkuijpers · January 11, 2023, 11:06am

In the mean time I will delete the VM, need to have something deployed that has network connectivity.

ParkerS · January 11, 2023, 4:10pm

@teisie, i believe 1655 is yours?

weynandkuijpers · January 14, 2023, 7:55am

Seems that I am late to the party, it’s been detected and report a long time ago. Here’s the issue: https://github.com/threefoldtech/test_feedback/issues/364

teisie · January 16, 2023, 9:31pm

Hello, yes indeed this is my node. The problem I asked the team many times to help me on the IPv4 setup but haven’t got any clear reaction. Before I went to the DC I asked @RobertL and Jan De Landsheert what I needed to do because Robert got it working already. My problem in the setup is that every cable is in random switches. So there’s no seperation between private and public. Unfortunately again i had to hear that after going to the DC. I will be at the dc again in about 10 days and will test everything there to be 100% sure. If @weynandkuijpers can wait for 10 days I got a valid setup and they should be working.

weynandkuijpers · January 18, 2023, 5:09am

No worries, it’s just that these node are fantastic large nodes and I have not been using them. What do you mean with: “Every cable is in random switches”. Let’s do some planning before you go to the DC and make sure we can get your serverpark online and ready for workloads

teisie · January 18, 2023, 8:07pm

Hé Weynand, according to the GitHub that Jan unfortunately send me after I went to the DC it clarifies that there should be 2 switches, one for the first cable and 1 for the second. To separate the public and private. Also cable 1 should be in NIC 1 and cable 2 in NIC 2.

What I did wrong is to have all these cables randomly plugged. So there is really no separation.

Like it says here all the way in the bottom:

github.com

threefoldtech/zos/blob/master/docs/network/setup_farm_network.md

# ZOSv2 network considerations

Running ZOS on a node is just a matter of booting it with a USB stick, or with a dhcp/bootp/tftp server with the right configuration so that the node can start the OS.
Once it starts booting, the OS detects the NICs, and starts the network configuration. A Node can only continue it's boot process till the end when it effectively has received an IP address and a route to the Internet. Without that, the Node will  retry indefinitely to obtain Internet access and not finish it's startup.

So a Node needs to be connected to a __wired__ network, providing a dhcp server and a default gateway to the Internet, be it NATed or plainly on the public network, where any route to the Internet, be it IPv4 or IPv6 or both is sufficient.

For a node to have that ability to host user networks, we **strongly** advise to have a working IPv6 setup, as that is the primary IP stack we're using for the User Network's Mesh to function.

## Running ZOS (v2) at home

Running a ZOS Node at home is plain simple. Connect it to your router, plug it in the network, insert the preconfigured USB stick containing the bootloader and the `farmer_id`, power it on.
You will then see it appear in the [Cockpit](https://cockpit.testnet.grid.tf/capacity), under your farm.

## Running ZOS (v2) in a multi-node farm in a DC

Multi-Node Farms, where a farmer wants to host the nodes in a data centre, have basically the same simplicity, but the nodes can boot from a boot server that provides for DHCP, and also delivers the iPXE image to load, without the need for a USB stick in every Node.

A boot server is not really necessary, but it helps ;-). That server has a list of the MAC addresses of the nodes, and delivers the bootloader over PXE. The farmer is responsible to set-up the network, and configure the boot server.

This file has been truncated. show original

teisie · January 18, 2023, 8:08pm

If there is another way on Unifi Dream Machine Pro to fix this I would really like to know!

Dany · January 18, 2023, 10:20pm

Well… it is possible to run both nic connections (ZOS primary grid connection NATed via router which supply’s local IPs via DHCP and a second connection for public IP routing) on a single switch. But it depends on how your WAN uplink is configured. F. e. If your DC provides you public IPs (to be used for nodes) in the exact same range/net like the gateway (and the routers WAN-IP) it should be working. In this case the second nic connections would bypass your router and talk directly to the gateway. This would also work with a second (or more) switch(es). Of course a clean setup would walk on separated/isolated nets which could be easily achieved by configuring vLANs on the dream Maschine and the switches. However we need more information about your setup to figure it out. I had the above described setup working in DC running +40 nodes on two ubiquity unify switches without vLAN. you can shoot me a DM and we can have a quick call to figure it out if you don’t want to publish sensible information about your setup here.

teisie · January 18, 2023, 10:58pm

I can partially read what you mean. The internet gateway is the same from the public IP’s if that’s what you mean.

But somewhere it has to know which cable to bypass and which not right?

The cables are completely randomly plugged in. Every node has 2 cables but both cables are randomized. So there also nodes where the 2 cables connect to 1 switch. And nodes where it’s separated.

Sure what’s your telegram?

Dany · January 19, 2023, 11:02am

The question is: how is the DC providing your internet uplink? I guess your rack is uplinked with one (or more) CAT/RJ45 connections (or LWL or SFP+ doesn’t make any difference) where the public IP range is routed to and you have connected this uplink with the WAN-side of your router? Or the DC has an own router/gateway installed in your rack where you connect the WAN-side of your own router to any LAN port of the gateway/router provided by the DC. Correct?

In this case your setup needs a very specific configuration (depends also on some other pararmters). The fact that your gateway uses the same IP range like those public IPs provided to the nodes your router isn’t really routing…thats simple switching. A setup like this would require “bridging” of two different router ports with both using the same IP range. However this is possible but isn’t needed and also requires much more ressources (CPU and memory) on your router as well as specific routing and firewall roules. “routing” between similar nets is used when you want to build a “stealth firewall” (also called “Transparent Filtering Bridge”). I guess thats not what you want to do unless you are trying to intercept your nodes traffic and try to hack workloads delpoyed on your nodes with f. e. a man-in-the-middel-attack.

There is a very simple solution to avoid a setup like this but requires that you DC uplink is established by simple static IP (which is the case in most DCs and also here). If you have to use PPPoE or pptp/L2TP (like a consumer internet connection at most homes) this would not work. Let’s assume your WAN is established by static IP. In this case you just simply attach the WAN uplink provided by the DC to one of your switches and NOT to the WAN-side of your own router. The WAN-side of your router needs than to be attached to the switch too. By doing so your nodes will be able to connect directly to the DC gateway (in the same way your router is connecting its WAN-side to the gateway) without the public IP traffic beeing routed/bridged through your router (bypassing). With a network configured like this it is absoultly not important on which ports you connect which nic of your nodes. You can just randomly plug them anywhere. But (!) with one restriction: the DC uplink must use a static IP. Dynamic IP would also not work because you would then have two DHCP servers in the same physical network (the one from the DC and your own router).

Looks like this…

PS: If you are using any kind of LOM you should NOT connect those interaces to the switch where your WAN uplink is going to, because you would expose the LOM interfaces diretcly to the web.

Mik · January 19, 2023, 7:41pm

Amazing advice and wisdom here @Dany

Dany · January 20, 2023, 1:01pm

@teisie: would love to know if this works out for you.

Pretty sure @weynandkuijpers can check after you have reconfigured the setup

weynandkuijpers · January 23, 2023, 7:01am

I will, thank you @Dany for all the extended insights and help!

Dany · January 23, 2023, 11:34am

@teisie: in addition to yesterdays call here is the post on how to assign fixed public IPs on a 3node via Polkadot UI:

teisie · February 1, 2023, 8:09pm

Just went to the DC and this is the config:

So the public is completely separated of the private part.

But unfortunately still not working

Dany · February 2, 2023, 8:42am

the setup definitly should work!! When the router is connected to the internet/gateway with static public IP settings on its wan-side, then every other host that is connected to the “switch 2” should be online too.

Are you sure that the IP block is routed correctly?

can you ping the wan side of your router from outside the DC? (the router needs to be configured to respond on wan side pings).

Dany · February 2, 2023, 9:23am

You need to talk to the DC guys…looks like there is something weird with the routing.

This is the result of a trace route to 87.251.36.1

and this is the result of a trace route to 87.251.36.7

as you can see… the request is hopping on different hosts in between. this is not normal for plain routing of a /24 net.

teisie · February 2, 2023, 6:08pm

The weird thing is i tried a server with a static public IP to test the IPs, and it was perfectly working. It just seems the public NIC doesn’t get a static. Can i check somewhere if a NIC got a static ip thru the deployment or anywhere?

Could it be my DC doesn’t automaticly allow SSH port 22?

Dany · February 2, 2023, 6:27pm

That’s impossible. Apart from… the public IP on the 3node should at least respond to pings.