How to access deployed QSFS?

tototator · August 20, 2022, 5:57pm

Greetings,

I managed to deploy a qsfs on the testnet with terraform from the examples on github: https://github.com/threefoldtech/terraform-provider-grid/blob/development/examples/resources/qsfs/main.tf

Now I want to access it to upload some files and perform a few tests.

How do I access the server? I’m used to just ssh-ing into a vm.

For clarity, here is my output:

And my terraform config:

terraform {
  required_providers {
    grid = {
      source = "threefoldtech/grid"
    }
  }
}

provider "grid" {
}

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4"]
}

resource "grid_network" "net1" {
    nodes = [7]
    ip_range = "10.1.0.0/16"
    name = "network"
    description = "newer network"
    add_wg_access = true
}

resource "grid_deployment" "d1" {
    node = 7
    dynamic "zdbs" {
        for_each = local.metas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 10
            mode = "user"
        }
    }
    dynamic "zdbs" {
        for_each = local.datas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 10
            mode = "seq"
        }
    }
}

resource "grid_deployment" "qsfs" {
  node = 7
  network_name = grid_network.net1.name
  ip_range = lookup(grid_network.net1.nodes_ip_range, 7, "")
  qsfs {
    name = "qsfs"
    description = "description6"
    cache = 10240 # 10 GB
    minimal_shards = 2
    expected_shards = 4
    redundant_groups = 0
    redundant_nodes = 0
    max_zdb_data_dir_size = 512 # 512 MB
    encryption_algorithm = "AES"
    encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
    compression_algorithm = "snappy"
    metadata {
      type = "zdb"
      prefix = "hamada"
      encryption_algorithm = "AES"
      encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode != "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
    groups {
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode == "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
  }
  vms {
    name = "vm"
    flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"
    cpu = 2
    memory = 1024
    entrypoint = "/sbin/zinit init"
    publicip = true
    planetary = true
    env_vars = {
      SSH_KEY = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIwVBFL95gmLMcck2XVlZIKNDDOEWq09q8xFtsiMb7JU toto@adastraindustries.com"
    }
    mounts {
        disk_name = "qsfs"
        mount_point = "/qsfs"
    }
  }
}
output "metrics" {
    value = grid_deployment.qsfs.qsfs[0].metrics_endpoint
}
output "ygg_ip" {
    value = grid_deployment.qsfs.vms[0].ygg_ip
}

output "public_ip" {
    value = grid_deployment.qsfs.vms[0].computedip
}

output "node1_zmachine1_ip" {
    value = grid_deployment.qsfs.vms[0].ip
}

output "wg_config" {
    value = grid_network.net1.access_wg_config
}

tototator · August 22, 2022, 12:58pm

So I got access over ygg_ip while using yggdrasil.

But now I have another question, I don’t see zdbfs running on the system? Isn’t that required for the qsfs to work?

Are there any standard tests I can run to verify that the qsfs is up-and-running correctly?

vm:/qsfs# ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 /sbin/zinit init
    2 root      0:00 [kthreadd]
    3 root      0:00 [rcu_gp]
    4 root      0:00 [rcu_par_gp]
    5 root      0:00 [kworker/0:0-eve]
    6 root      0:00 [kworker/0:0H-kb]
    8 root      0:00 [mm_percpu_wq]
    9 root      0:00 [rcu_tasks_kthre]
   10 root      0:00 [rcu_tasks_rude_]
   11 root      0:00 [rcu_tasks_trace]
   12 root      0:00 [ksoftirqd/0]
   13 root      0:00 [rcuc/0]
   14 root      0:00 [rcu_preempt]
   15 root      0:00 [rcub/0]
   16 root      0:00 [migration/0]
   17 root      0:00 [idle_inject/0]
   19 root      0:00 [cpuhp/0]
   20 root      0:00 [cpuhp/1]
   21 root      0:00 [idle_inject/1]
   22 root      0:00 [migration/1]
   23 root      0:00 [rcuc/1]
   24 root      0:00 [ksoftirqd/1]
   25 root      0:00 [kworker/1:0-rcu]
   26 root      0:00 [kworker/1:0H-ev]
   27 root      0:00 [kdevtmpfs]
   28 root      0:00 [netns]
   29 root      0:00 [inet_frag_wq]
   30 root      0:00 [kauditd]
   31 root      0:00 [kworker/1:1-eve]
   32 root      0:00 [khungtaskd]
   33 root      0:00 [oom_reaper]
   34 root      0:00 [writeback]
   35 root      0:00 [kcompactd0]
   36 root      0:00 [ksmd]
   37 root      0:00 [khugepaged]
   54 root      0:00 [cryptd]
   64 root      0:00 [kintegrityd]
   65 root      0:00 [kblockd]
   66 root      0:00 [blkcg_punt_bio]
   67 root      0:00 [md]
   68 root      0:00 [edac-poller]
   69 root      0:00 [devfreq_wq]
   70 root      0:00 [watchdogd]
   71 root      0:00 [kworker/u4:1-ev]
   72 root      0:00 [kworker/u4:2-ev]
   73 root      0:00 [kworker/1:1H-kb]
   76 root      0:00 [kswapd0]
   78 root      0:00 [kthrotld]
   79 root      0:00 [irq/1-ACPI:Ged]
   80 root      0:00 [acpi_thermal_pm]
   81 root      0:00 [hwrng]
   82 root      0:00 [kworker/0:2-eve]
   83 root      0:00 [drbd-reissue]
   84 root      0:00 [rbd]
   85 root      0:00 [raid5wq]
   86 root      0:00 [bch_btree_io]
   87 root      0:00 [bcache]
   88 root      0:00 [bch_journal]
   89 root      0:00 [dm_bufio_cache]
   90 root      0:00 [kmpathd]
   91 root      0:00 [kmpath_handlerd]
   92 root      0:00 [ipv6_addrconf]
   97 root      0:00 [kstrp]
   98 root      0:00 [ceph-msgr]
  114 root      0:00 [zswap1]
  115 root      0:00 [zswap1]
  116 root      0:00 [zswap-shrink]
  117 root      0:00 [kworker/u5:0]
  133 root      0:00 [kworker/0:1H-kb]
  154 root      0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
  159 root      0:00 sshd: root@pts/0
  161 root      0:00 -ash
  217 root      0:00 ps aux

animalfarm · August 23, 2022, 12:38pm

Would be nice if someone could assist

maxux42 · August 24, 2022, 12:06pm

Did you try to use /qsfs ? Like copying/reading file from/to it ?

It’s possible that zdbfs runs on the host and is not visible in the vm, I’ll double check.
EDIT: services runs on the host, it’s normal you don’t see it

tototator · August 24, 2022, 12:08pm

Thank you @maxux42

I did, copied a few files over from my local machine. Seemed to work okay.

I would like to verify though that is not just a disk mounted as /qsfs. That the data is spread over the shards and the total storage including the redundancy is indeed more efficient than traditional architectures.

Is there a test suite available where this is demonstrated?

I know I can remove a few data blocks through terraform, but there still seems to be problem: https://github.com/threefoldtech/terraform-provider-grid/issues/246

maxux42 · August 24, 2022, 3:42pm

If you want to ensure data are secure and zdb are used, you can directly query them
Use a redis-client and connect to zdb (you can get info to contact them with terraform outputs) and do some INFO namespacename to see if data comes in, data usage etc.

That’s a low level way to find out if it’s well used but it’s reliable

tototator · August 25, 2022, 1:51pm

Great, thank you!

Can we also see on which nodes the different shards are placed with this?

maxux42 · August 25, 2022, 2:15pm

Can you share the list of zdbs and their namespaces ? Don’t provide any password of course

scott · August 26, 2022, 9:37am

There’s also the metrics endpoint, which Terraform will spit out a link to when it’s done running the qsfs example, ending in :9100/metrics. It aggregates info from the zdbs as well as some data from the zstor and zfs components.

I guess this information is stored in the metadata zdbs. Zstor works with blocks of data, not individual files, so what you’d see might not be so interesting. At the metrics endpoint above you can see the total data stored in each zdb.

For the 2+2 setup in the example, I can see that my 1gb of random data in /qsfs resulted in ~500mb in each zdb or 2gb total. So we can tolerate two site failures with 2x data usage. Full replication would require 4x data used.

Before QSFS was a zos primitive, I had some good methods to test that the system could actually reconstitute data when some backends were unreachable. Now that zos handles the work directly, these methods don’t work. I’ve been playing around a bit to find another way. I think the best test is reconstructing the data on a new VM with a subset of the backends available. I’ll share details if I’m able to get a working setup.