.tf Domain Outage & Action Plan

Note that the outage has been resolved as of the morning of June 18 although the team continues to assess the impact. For transparency, we’ve kept the original text of the post below and we’ll add updates at the bottom as the situation unfolds. A Telegram group has been initiated around this topic as well and we invite you to join to follow along.

On Friday, we started experiencing a full outage of all of our sites hosted with a .tf extension, including some of our websites and docs and many essential grid services. We investigated and reached out to our Domain Name Registrar Name.com on Friday night to resolve the situation as quickly as possible. We were told they would escalate the situation.

On Saturday morning we talked to Afnic, who confirmed our suspicions of this not being an issue on our side but rather a decision from the registrar. Meanwhile, we reached out to Name.com and were notified that we had been placed on client hold status. This was done without warning or explanation. No further information was provided despite continued attempts to engage over the weekend.

Because of the lack of communication, we began to take action, registering alternative (sub)domains and modifying affected services to be exposed on these domains as much as possible.

On Monday evening, around midnight, we finally received a response from Name.com, stating an “administrative error” on our contact details of the .tf domains. These domains require domain registrants to be located within the European Union, which was not the case since ThreeFold DMCC is located in the United Arab Emirates. Due to a change in policies (that were neither communicated nor transparent), they chose to put our client status (and other owners of .tf domains that are in a similar situation) on hold and completely block us, instead of giving us the chance to rectify and move this to be owned by our Belgian company, TF Tech NV.

We have since changed our contact details on our Name.com account, and have provided the necessary documents as requested. We are still awaiting a change of our client status to finally be operational again.

We are currently in discussion with a company which is willing to help us migrate our .tf domains from Name.com to a Belgian registrar. In case Name.com does not comply in releasing our client status within the next couple of days, we will start the migration process. Even if the domains are released soon, we do believe that this breach of trust and transparency is a valid reason for us to migrate to a different DNS registrar.

We have exposed some of our essential grid services on alternative domains:

Please note that while these domains have been changed, Zero OS integration with our services is still a problem. Such changes in ZOS require a re-download of the boot image and rebooting the machine, which is not ideal.

The situation also highlighted a question from farmers: should we keep our nodes on?

The suggestion is to indeed keep nodes on since we expect the issue to be resolved this week. Unfortunately, TF nodes will have experienced significant downtime due to this incident. A discussion about the minting will be held, and we will keep the community posted.

What did this teach us and what’s the plan?

First and foremost, the situation demonstrates that registrars are essentially a single point of failure. A single entity can click a button and shut down any site registered through it, intentionally or unintentionally.

Second, it is a lesson in redundancy and decentralization of our own systems. As mentioned by Kristof in the chat, “We have redundancy on quite some levels but unfortunately we never expected that our .tf would go away in total.” He continued, “Of course there is something good to learn from this. We need to think about this and make sure that our validator stacks answer on multiple top level domains, this is not something we thought about.” Already you can see we have taken and are taking action to rectify.

Some suggestions were brought forward by the community (here and here) and are being considered.

Looking forward, V4 (the commercial version of the grid, to enable a cashflow positive situation for all participants), will need to be able to deal with all of these issues so this never happens again. This is already in the plans with Mycelium and more.

Kristof suggested that we organize some sessions in mid July (to be planned) and even come together in Interlaken (see the poll here if you are interested) for a couple of days to go over v4 architecture in depth and discuss the future of the grid.

June 18 Update

After sending the necessary documents and adjusting the contact information on our Name.com account, we are happy to announce that all our services are back.

We are still in the midst of assessing the impact that came with this issue, and will keep everyone posted about that. In the meantime, we do have a plan to migrate away from Name.com. Everything has been prepped for this migration, and we do not expect any downtime. More to follow.

3 Likes