What should we do with deployments that run out of tokens

The issue

Yesterday and today I was involved in two cases where individuals have deployed workloads with a small number of tokens in their wallets and the wallets ran dry of tokens. The unfortunate situation is that when a wallet runs dry (eg. zero tokens left) the contract is canceled and the workloads/deployments are deleted.

The grid is made up of a (growing) number of farmers who own and operate 3nodes. Funding wallets for overage is prone to abuse and allowing workloads to exist for longer than feasible based on the wallet balance will also attract smart cookies that will find a way to get “free” capacity.

So this post is here to source smart ideas from the community on how to deal with this issue. The current implementation is clearly not ready to go mainstream and be accepted.

Possible solution avenues.

  1. Based on the total value of the wallet at the start of a contract there is a certain percentage of time (which translates easily into tokens as the contract subtract x TFT’s per hour from the wallet) that the workload can overrun. This sounds like an idea that could work, but it only pushes the same issue out. What do we do it the extra time is used. Do we then delete?

  2. Make the workload in-accessible when the wallet runs out of tokens, but keep all the data and configuration work done. This can be done in a number of ways:

    • just retain the data, on SSD volumes and Quantum Safe Storage ZDB’s but stop the processing (CU capacity)
    • freeze / hibernate the whole architecture / solution until the wallet is funded again
    • just make the architecture inaccessible.

Challenge here is that while this workload is inaccessible the resources used are still in play and we need to fund somehow the workload to continue to exist.

So what do you think is a smart way to deal with this situation?

4 Likes

Same happend to me while testing and getting first experience with deployments on the grid. It defenitelly would have been an annoying experience if it was serious workload.

No matter where you rent some server space you’re responsible to pay it.
But if you rent your equipment at a datacenter they charge your CC or bank account. Even if a payment fails once they won’t shut down everything and delete your VM.

So you’re totally right, the momentary situation is a bit uncomfortable.

Freezing the job and the resources after the wallet is empty sounds fair to me. Combined with a time limit and extra fee to resume. The extra fee could be used to compensate the farmers if someone never shows up again to resume and so resources were blocked without getting rewarded.

Some reminders might also be helpful for the deploying people before their wallet runs out of tokens. And a calculator showing how long they can run the actual configuration with their amount of TFT.

3 Likes

Awesome write-up. I think solution 2 is the most ideal solution. Some people often run out of tokens and need to purchase them. If we can retain the data for the user, it will be beneficial to them, but we should incentivize them to pay on time, like the longer they don’t pay the higher the fee is. Since we are using resources that need to be paid.

Perhaps we could have some collateral. For example, you need 100 TFT for 10 hours load. But you also need to have a deposit of (e.g.) 10 TFT more that is kept as a backup if you go over your resource time. Then, you get a warning when your wallet is near going dry of tokens, you have the time to add TFT if you want to continue. If you do not do it on time, the TFT deposit starts being consume (Extended Deposit Period) and the server goes to a less expansive, by-pass mode with less access to the user to incite action to refund the wallet. You can also at this moment decide to not use the server, and you get back the 10 TFT deposit in that case.

The TFT deposit could be consume at different rate depending on the situation. The first time you get a warning, before your main TFT goes dry, the server is running 100%. Then when you get into the TFT deposit stash, the functions of the server become limited, inciting the user to fund the wallet and limiting the cost of TFT. After this period, if it’s not refunded, the server could simply close. OR you have an additional period with interests to cover costs. Next paragraph.

When people go pass this time, they would need to ask for a prolonged period on the Grid to keep the data while the wallet is funded. If they say no, the data is loss. If they want to prolong, this stage could include interests to cover potential loss.

This additional step would then be: you pay 100 TFT to use the 3node and 10 TFT more as a deposit. You go out of your 100TFT, step into 10 TFT deposit with limited access (EDP). When you go out of the 10 TFT deposit, you’re asked: do you want additional time to refund (this comes with X% interest per Y day(s) ). So, e.g., you ask for 5 additional TFT credit, to have time to refund. But you’d need to pay some interest, let’s say 2 TFT more, high interest. Obviously, no one would want to get in this situation (Extended Interest Period), so they’d refund the wallet in the low-access period consuming the TFT deposit.

The additional interests could end up in a treasury and cover the cost for the situation where users don’t pay. We’d need to have reliable methods of calculation to make sure no TFT is lost on TF side.

I think that adding support for Grid 3 wallets into the TF Connect app and enabling notifications when balances run low will help a lot with this.

It’s certainly a bummer when your workload expires unexpectedly and data is lost (or so I’ve heard anyway :wink:).

In line with idea 2, nodes could retain data from decomissioned workloads, as long as they don’t need to reclaim the space for a new workload. Then a workload could be reinstated, assuming some CU are available at that time on the same node. This seems like the simplest approach, with little room for abuse.

2 Likes

I think choice is what would could make the Grid stand out.
Offer ALL possible Solutions at the beginning and make them either pre-paid or on-demand (deposit TFT):

  1. Overtime usage based on contract: X amount percentage of TFT in wallet.
  2. “Freezing” of access. Keep the workload and give customers a chance to refill the wallet.
  3. Make it possible to backup the workload into private cloud/lokal servers, so customers can just re-upload the expired workload if their wallet runs dry.
1 Like

What I think we also need to discuss is what abuse cases any solution to the challenge might bring.

What I mean is the following: when zero wallet balances allow for deployed workloads to be retained (inactive) for a period you can create a “DDOS attack” from this on 3nodes. Max out reservations on 3node while not paying for it? The longer we make the period to more we are open for mischief like this.

Anybody any ideas how to balance the “nice” level of retaining workloads and people abusing it (and if I am overthinking this also please tell me… :wink: )

1 Like

How about allowing user configurable email warnings along with the warnings from the threefold app. Or possibly building services that enables some form of auto billing or a smart contract with auto transfer of tokens, or backup payment wallets?

wouldn´t this question @weynandkuijpers " what should we do with deployments that run out of tokens ? " consider what type of deployment we´re talking about here ? you see, all the feedback that I read above is having as starting point what options one can have to avoid to loose the workload itself …but from an individualistic perspective (e.g. buy time to be able to buy TFT) … what about workloads that have in it´s core clear humanitarian purposes ( they´re not for profit driven) and that depends not of an specific individual ou org, but of an network of backers, for instance … what type of network solution we could come up with that doesn´t uniquely boils down to " buy time to buy TFT " where this type of workload could be consider a type of building block of the so called public digital commons in the TF grid … just wondering here guys … and maybe this comment is totally off-topic and if it´s the case, please don´t consider it.

All good thoughts and remarks. Let me pull the information together and post a single “summary” post with all the thoughts and suggestions. I think we’re not at the end of the discussion but summarizing and then putting forward a few example use cases would be good to debate them one by one.

Please allow me to collect the input in this thread and put them together in the coming days. Looking forward to continue to discuss!

4 Likes

@weynandkuijpers did you actually had the time to put this thread together in order for all of us to continue to evolve some common understanding ? just checking …

1 Like

Hi, thank you for asking, working on it… Will try to finish today. In the meantime I did write myself a little script that I run in crontab. It send me the wallet balances to a telegram bot twice a day.

#!/bin/bash

# contant definitions
wallet=("<<valid_wallet_address_1" \
        "<<valid_wallet_address_2")
len=${#wallet[@]}

# github repos
export CLI_TOOL=$HOME/opt/github/threefoldtech/tfchain/cli-tool


export SUBSTRATE_API_URL=wss://tfchain.grid.tf
export MNEMONIC="<<insert_valid_mnemonic>>"

for (( num=0; num<$len; num++ ));
do
        node $CLI_TOOL/cli.js balance get ${wallet[$num]}> ${wallet[$num]}.balance 2>1 | grep Address > ${wallet[$num]}.out
        balance=$(awk '{ print $4 }' ${wallet[$num]}.out)
        /home/johndoe/bin/telegram-send "${wallet[$num]}:$balance"
        rm ${wallet[$num]}.out
done

PS> not a coder… :slight_smile:

Which sends the output to a telegram bot I started. Just FYI. Will come back with the synopsis later today.

4 Likes

That’s great! It’s always fun to read some codes.

Got to learn how to work with the Grid like this.

Thanks for sharing.

Why not go with the classic route of having 3rd party auto-refill gateways which are tied into credit cards? The reality is that most people will struggle with acquiring TFT’s, making it simple so they don’t need to deal too much with crypto can only be good for adoption. Obviously this would be more expensive, but who wants to penny pinch can go and buy TFT’s on the market, but he can still have this as a backup with settings like if balance < x then add z amount of TFT’s per the current market price, with a limit of y $ per TFT or alert me via email if something does not work out. With a large enough x he still has a safety net and the responsibility is not lying on TF. Even though I like the suggestion above of suspending the load while there are free resources on the machine. Maybe the length of the suspension can be tied into how much TFT has been burned by the process, this way a 0.01$ spammer won’t have anything from it(so a DoS is not feasible), but a serious load which lasted for days has a nice grace period.

2 Likes

You might be correct here. This is the opportunity for a lot of service providers, developers and system integrators out there to use the grid and have a standard interface for consumers. They can do all of complicated stuff, including the token purchase and payment and provide services to consumers. They can (and will) have standard pre and post billing solutions.

2 Likes

We should allways retain the data for a fixed Time. 1 month to 3 months, if this can be a source of atack? by deploying many many workloads and not paying them… yes, but how much money will they spend in this attack?

Also when the GRID is getting to its full state, then this project would be a sucess and more server would be coming online every day, making it even harder to get the network full.

Retaining data should be an option when creating a contract. Default ON, and would require enough TFT to cover the data retention. If we are using a bidding system (see my pricing post) that should solve many shortcomings and would not be expensive since there is loads of free disk space and with a reasonable bidding price should work fine.
But it should be optional, as not everyone needs their data from the vm once it‘s done processing.
I thought moving the data to a hdd once idle, but that complicates things.

Hi - you don’t need a lot of money to “block” a large contingent of storage. Since the system is post pay, and I believe the initial amount of TFT you need to present in your wallet is > 2TFT you can do a lot with very little. Executing a terraform script (can be done in parallel from different sources) does not take long and you can block x times the deployment for just over 2 TFT.

And to repeat what I have said earlier, the current cloud industry offers this because they own the infrasrtucture. In this case we have 1000’s of farmers owning the equipment and a digital currency fueling a smart contract deployment mechanism for IT workloads.

We cannot just do what the hyperscalers are doing today, we have to come up with a better way of dealing with this.

3 Likes

Quick update on this discussion: With Grid 3.6.1, the team implemented a new feature for Zero-OS to solve this issue. Zero-OS now supports a function for pausing workloads for up to two weeks before cancelling the contract. Learn more in our new post on the 2-Week Grace Period for Unfunded Deployments! :mag:

5 Likes

Fantastic! This is a great step forward and evidence that TF listens to debates and requests on the forum :clap:

3 Likes