1/3 of grid nodes offline

When checking in on the Bancadati and TLRE farms, I noticed they were all offline since Saturday.
I rebooted most of them, only for them to be online for a couple of minutes and then be offline again.

Also checking the grid explorer only 1/3 of the registered nodes seem to be online.

Right now, we set nodes offline when there is no IPv6, and that happens indeed after a while after been registered.
Next week (we’re finalizing tests) we’ll roll out our solution for nodes that are IPv4-only, so then all nodes will be online.

@delandtj that cannot be the solution I hope, this should be resolved sooner, if there was a code change then this to me is a bug. It was working before.

This was indeed already introduced in 0.3.3 only to be rolled back by Christophe in 0.3.4 and now reintroduced in 0.3.5?

then to me a mistake was made because we cannot introduce a code change which has negative impact without having a solution.

Btw, does this mean that the nodes without ipv6 support are not minting until next week?

Let me clarify the situation here.
Until 0.3.4 any nodes booted with 0-OS were eligible for farming. This means even nodes that were not capable of processing reservation were still producing tokens.

This process was put in place to not penalize farmers that did not have access to ipv6.

In 0.3.5, the support for ipv4 only node has been rollout. Which means there were not more reason for any node to not be able to process reservation (actually provide useful capacity). Following this logic, only the node actually ready to process reservation are marked online.

@chrisvdg for you this means that if your node are marked offline there is an issue somewhere. Most probably in the network configuration of your farm. I noticed your farm going offline after the rollout of 0.3.5
And even personally asked to get in touch with your Organization to figure out what was going on I your farm. Seems the communication there was not optimal. Let’s solve that tomorrow, I’m back from holiday, we can have a look together.

1 Like

The IPv4 in the farms seems to be working just fine, and hasn’t changed much since delivery of the nodes on Grid 1.0

Also only 1/3 of ALL grid nodes being online would suggest that it may not be a problem in the network config unless all those other farms and nodes that are now offline also have farms that do not comply for the IPv4 support.
Bancadati and TLRE represents a bit less than 1/3 of the Grid nodes (275 / 930), which means another 1/3 is now offline for unknown reasons.

Are there specific things I can check for you?

Your numbers are wrong. There was always around 500 nodes online before 0.3.5 was released. The only farm I’ve seen not coming back is bancadati and tlre.

Please get in touch with @delandtj to inspect your network. Or wait for me but I’m only available in the afternoon

@delandtj, were you able to find anything?
If not we will turn off these nodes until we can get the IPv6 sorted out.

@chrisvdg things were solved since last Friday already. Check your nodes status.

Oh wow, thanks guys!
Just out of curiosity, what was the issue?

Wrong config of your DHCP server

The nodes get their IP’s correctly, so what was wrong with the DHCP?

Ok, so the nodes needed a second IP pool for the ndmz namespace (whatever that is) according to Jan.

This is a requirement no one told me before nor find documentation for.

Anyway, thanks for resolving this

are we ok here? Is everything resolved?

Sorry for the late reply, but yes, the nodes are online again after Jan enabled dynamic DHCP to allow the nodes to take a second IP address as the DHCP was “hard coded” in the datacenter.