tfgrid billing for down nodes

I noticed that a 3node (#969) that I have a vm with a workload went down for about 24hrs right now. I believe the billing is by the hour and I seem to have an hourly billing rate equal to my other workloads that are working. Just wondering if that is something that is addressed. Billing and (payout to the 3node operator) should be stopped if a node goes down?

On a similar note I have found some nodes I just can’t connect to even when they appear to be up and working. Wondering if some testing tools should be put into place or other 3node metrics so I’m not deploying on 3nodes that will not work

HI @jelco. I would think that when a node is down with active deployments (smart contract) on it billing stops. That would be the expected behavior. Can you confirm being billed (hourly) for the node that in not online. I’ll check myself to see as well.

I can conform that the farming rewards are impacted. There is a process that check uptime of the 3node periodically and stores the result in the TF Chain. This information is used at the end of the period to calculate the farming rewards

With regards to you last point, I have the same experience that some nodes are presented up and running but do not accept workloads (deployment results in an error). Please share node ID’s with me (or here) if you come across one of those.

@weynandkuijpers Is there a place here, on Telegram or elsewhere were someone could test for (my) nodes to be fully functional as expected?

It was node 969. I deleted the deployment. After I saw the workload was down for about 24hrs I tried to ssh in and it didn’t seem to connect after waiting about a minute or so. 3433 and 3033 are a couple others that I wasn’t even able to initially ssh into.

Hi, I will try to replicate a “down” node in my lab and try to replicated your situation. I’ll also try to reach and other 2 nodes mentioned and see if I can connect.

Just tested the node id’s you mentioned: 3433 works for me:

➜  terraform-dany git:(main) ✗ eval `ssh-agent -s`
Agent pid 313880
➜  terraform-dany git:(main) ✗ ssh-add $HOME/.ssh/mainkey
Identity added: /home/johndoe/.ssh/mainkey (/home/johndoe/.ssh/mainkey)    
➜  terraform-dany git:(main) ✗ ping 300:99ac:56a5:41dc:959d:74c8:73fa:ef86
PING 300:99ac:56a5:41dc:959d:74c8:73fa:ef86(300:99ac:56a5:41dc:959d:74c8:73fa:ef86) 56 data bytes
64 bytes from 300:99ac:56a5:41dc:959d:74c8:73fa:ef86: icmp_seq=1 ttl=63 time=346 ms
64 bytes from 300:99ac:56a5:41dc:959d:74c8:73fa:ef86: icmp_seq=2 ttl=63 time=348 ms
64 bytes from 300:99ac:56a5:41dc:959d:74c8:73fa:ef86: icmp_seq=3 ttl=63 time=346 ms
64 bytes from 300:99ac:56a5:41dc:959d:74c8:73fa:ef86: icmp_seq=4 ttl=63 time=347 ms
64 bytes from 300:99ac:56a5:41dc:959d:74c8:73fa:ef86: icmp_seq=5 ttl=63 time=351 ms
^C
--- 300:99ac:56a5:41dc:959d:74c8:73fa:ef86 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 345.989/347.ssh root@300:99ac:56a5:41dc:959d:74c8:73fa:ef86
➜  terraform-dany git:(main) ✗ ssh root@300:99ac:56a5:41dc:959d:74c8:73fa:ef86
The authenticity of host '300:99ac:56a5:41dc:959d:74c8:73fa:ef86 (300:99ac:56a5:41dc:959d:74c8:73fa:ef86)' can't be established.
ED25519 key fingerprint is SHA256:GZduV+dbPnaRHqRwNzoO1JRVXS5eH+E7ZKeTcfzLrX8.
This host key is known by the following other names/addresses:
    ~/.ssh/known_hosts:33: 301:d12e:3351:4208:c32b:a725:4b6c:aa03
    ~/.ssh/known_hosts:35: 301:d12e:3351:4208:9a7c:cf44:d2d4:f788
    ~/.ssh/known_hosts:36: 301:d12e:3351:4208:6243:5c48:b899:8932
    ~/.ssh/known_hosts:37: 301:5393:bfab:9bfd:35f:140b:5736:7042
    ~/.ssh/known_hosts:38: 178.250.167.69
    ~/.ssh/known_hosts:39: 302:302f:4555:2f55:13f9:5f22:819c:f060
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '300:99ac:56a5:41dc:959d:74c8:73fa:ef86' (ED25519) to the list of known hosts.
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.12.9 x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

root@proxy1:~# 

Since this 3node does not have a public IPv4 address (see explorer: https://explorerv3.grid.tf/nodes) you cannot ask to provision an IPv4 address. Planetary network IP addresses are always available (but for that you have to install the planetary network on your device eg. laptop / desktop).

Same for the other (3033) node:

The authenticity of host '300:be:23eb:3755:549a:afcb:e1ac:f2db (300:be:23eb:3755:549a:afcb:e1ac:f2db)' can't be established.
ED25519 key fingerprint is SHA256:GZduV+dbPnaRHqRwNzoO1JRVXS5eH+E7ZKeTcfzLrX8.
This host key is known by the following other names/addresses:
    ~/.ssh/known_hosts:33: 301:d12e:3351:4208:c32b:a725:4b6c:aa03
    ~/.ssh/known_hosts:35: 301:d12e:3351:4208:9a7c:cf44:d2d4:f788
    ~/.ssh/known_hosts:36: 301:d12e:3351:4208:6243:5c48:b899:8932
    ~/.ssh/known_hosts:37: 301:5393:bfab:9bfd:35f:140b:5736:7042
    ~/.ssh/known_hosts:38: 178.250.167.69
    ~/.ssh/known_hosts:39: 302:302f:4555:2f55:13f9:5f22:819c:f060
    ~/.ssh/known_hosts:40: 300:99ac:56a5:41dc:959d:74c8:73fa:ef86
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '300:be:23eb:3755:549a:afcb:e1ac:f2db' (ED25519) to the list of known hosts.
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.12.9 x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

root@proxy1:~# 

Hi @roest. I’ll do a video today on how you can do that. I use a couple of standard, striaghtforward tests that provide me with the information on the node (working or not…). Stay tuned!

3 Likes

And before doing this - maybe we as a community at large can think about what tests we would like to conduct ad tests to determine of the VM (flist) works properly. We can create a “test” flist that runs a number of tests and send a report to a provided email address.

Suggestions, I have my “standard” list as a starting point:

  • ICMP on any provisioned IP address (IPv4, IPv6, wireguard, planetary) depending on what is avaialable on the 3node
  • launch a few services in the VM and access them over the (chosen) virtual NIC + port
  • htop to check proper raw resource unit allocation
  • depending on the base Unix flavor, udate the package library and upgrade the system to the latest long term supported software
  • deploy a virtual disk

This got me curious, so I did a little reading in the code. The hourly billing rate you see in the playground does not account for the fact that a node may be offline. It simply queries the last amount that was billed and displays that.

The expected behavior is that nodes submit their billing reports on an hourly basis and TF Chain bills you only when a report has been submitted. Nodes that are offline do not submit billing reports and you should not be billed for the time a node spent offline.

With your contract id, we could double check that this was the case. I tried querying all contracts for node 969, but I couldn’t find any that had more than a single billing report associated with them.

Yep, there’s definitely room for improvement with assessing and reporting node reliability. I’d be curious to know more about what happened in this case, since Weynand was able to connect to VMs on those same two nodes.

I don’t have a contract Id (that I can find) since I deleted the deployment. I was running a presearch node on the VM. It had been working but then I saw it went offline and I couldn’t ssh into it.

If you know your twinID, you might be able to find the contractID by querying nodeContracts in https://graphql.grid.tf/graphql .