Caprover deployment inaccessible [Resolved]

tototator · October 24, 2023, 7:04pm

A caprover was deployed and a wordpress instance was launched.
A few issues have developed so I am trying to login to the caprover instance but it seems to refuse incoming ssh connections.

ssh: connect to host 185.69.167.215 port 22: Connection refused

The JSON file looks as follows:

    {
    "version": 0,
    "contractId": 14620,
    "nodeId": 3010,
    "name": "CRLnaiein",
    "created": 1674043928,
    "status": "ok",
    "message": "",
    "flist": "https://hub.grid.tf/tf-official-apps/tf-caprover-main.flist",
    "publicIP": {
        "ip": "185.69.167.215/24",
        "ip6": "",
        "gateway": "185.69.167.1"
    },
    "planetary": "",
    "interfaces": [
        {
            "network": "NWnaiein",
            "ip": "10.200.2.2"
        }
    ],
    "capacity": {
        "cpu": 1,
        "memory": 1024
    },
    "mounts": [
        {
            "name": "data0",
            "mountPoint": "/var/lib/docker",
            "size": 53687091200,
            "state": "ok",
            "message": ""
        }
    ],
    "env": {
        "SWM_NODE_MODE": "leader",
        "CAPROVER_ROOT_DOMAIN": "naiein.com",
        "CAPTAIN_IMAGE_VERSION": "v1.4.2",
        "PUBLIC_KEY": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCwa04MlP4jib8+UdKMOzoWUfAFqC2nGrLFlImSqdQUDdjDtfgzVAYcbjtex0hncP2rotX76uCnVdzWMIoJMMm+xNkHlkbUB9GT2LAijHdKZyxthwDielV1hRvUBVSsSB4xNGGgafSIoYF+qsGL9NftlqLv04tsVgL75mtJ9i82FJ6GiZ/mh64AsvWsF8IJHhm+O3y/Su1ta1scLzELzrrn8kEGRftkvJl3uQwStAi8/N7/WWYRb0fO7uuV1pKJg5kT5gCMzhjLS2Mwruo0bkE69p4y/N3NIbH2LNsKPWueyQAwd24e7zeCPNuY6Sz+RIS/UbBIahL68NNA3alOLwRT5KjzSXaI9fiUSQSyRi33H6/1NB6VJzXEENpXqvOe+Bj5N5GgxDZ0bB1sjci+/cPLDg8OnqBALqe62AtgLx998goowuItmIACBYVFsELpECazE1buTup3Fualy+IJi8x1yIxBwr5zKC8jKD7rXkFBmUtyd7kYddgRqzX27teyen0= toto@Totos-MacBook-Air.local",
        "DEFAULT_PASSWORD": "///"
    },
    "entrypoint": "/sbin/zinit init",
    "metadata": "{\"type\":\"vm\",\"name\":\"naiein\",\"projectName\":\"CapRover\"}",
    "description": "caprover leader machine/node",
    "corex": false
}

http://captain.naiein.com says that nothing is there yet and https://captain.caprover.naiein.com/ times out. These used to be fine…

The wordpress editor is available though via http://naiein-web-wordpress.caprover.naiein.com/

But this is useless if I can’t modify the caprover deployment since I need to change a few php files and check of error logs.

Am I missing something here and how can I access the deployment?

Mik · February 16, 2023, 8:03pm

Hi @tototator

I am not sure but can you check if you have enough TFT in your deployment wallet?
Try and add some more, perhaps.

For example, Teis in the Telegram channel couldn’t SSH into his 3nodes due to lack of fund in the wallet.
When he put more TFT back into the wallet, the SSH connection was finally working.

When you go into your deployments, you can see if they are into grace period or not:

You can check in the Deployments section of the Playground:

Screen Shot 2023-02-16 at 2.58.58 PM

If that is the case, once you fill up some more TFT in your wallet, the deployments’ status should go back to “Created” instead of “GracePeriod”. It can take up to an hour or so.

tototator · February 17, 2023, 10:22am

Hi Mik,

Thanks for the quick response.

Fortunately or unfortunately I have enough TFT in the wallet and the deployment is not in a GracePeriod.

Mik · February 17, 2023, 10:13pm

OK. Let us troubleshoot some more then.

Do you have a firewall set up?
It could be blocking the port.

What if you try to restart the SSH server?

sudo service ssh restart

Did you change the SSH key pair during your deployment?
Sometimes it happens and then the SSH connection doesn’t work anymore.

scott · February 18, 2023, 12:58am

The SSH server is running on the remote VM, which isn’t available by SSH in the first place. So this won’t help.

scott · February 18, 2023, 1:37am

I did some checks. The Caprover deployment includes an SSH server in the image, and I was able to connect via SSH to a freshly deployed instance.

Then I tried to SSH to the instance at 185.69.167.215, and it looks like it’s trying to authenticate:

Is the issue ongoing for you @tototator?

tototator · February 20, 2023, 9:01am

Just checked and the issue seems to be resolved and I can ssh into the instance.

This bothers me though, why does it work now and not before? What changed and how can I prevent this from happening in the future?

Mik · February 20, 2023, 6:33pm

Did you use the same profile manager?

Perhaps you logged into the profile manager and the connection then worked.

tototator · February 21, 2023, 9:52am

Yes, I only use this one profile manager on this machine.

Mik · February 21, 2023, 6:44pm

OK. Thanks for the information.

I will open an issue on Github. The TF Dev team might be able to have a look and find out why this happened. I will let you know how it unfolds.

EDIT: The issue is available here https://github.com/threefoldtech/grid_weblets/issues/1315.

scott · February 22, 2023, 6:49am

Changing the profile in the manager could only change the SSH key supplied to future deployments—it couldn’t change an existing deployment.

The only other issue similar to this I could find was caused by a RAM issue in the node. I suspect it could be some intermittent problem with either the node hardware or networking, but I wouldn’t rule out some software issue in Zos either. Hopefully we can get some input from dev/ops to narrow it down.

Mik · October 24, 2023, 7:04pm