Automating Support: NLP Bot (for Q+A)

Mik · January 2, 2023, 3:08am

Hey TFarmers and Threefolders in general.

We all know that Threefold will be the biggest thing that happened to the Internet, since the Internet! So let’s prepare for the waves of questions from future farmers and users.

We now have a good base as a FAQ and it will constantly grow. It will soon be available on the Threefold website. Then, what’s next?

I think we could try and build a NLP bot (natural language processing) to process questions from users. The bot would take its main data from the Threefold’s documentation. This could take care of 80% (rule 80-20) of questions. Then if people can’t find their answers, the human support kicks in. Of course, once the human support solves the user’s problem, we then add the newly gained information on the FAQ, making the bot now apt to answer to the previous unanswerable question.

This might look trivial for now, but imagine when 10 000x 3nodes are being set in a week by as many different users!

Also, what would be GREAT, being decentralized, is of course to offer certified 3node builders and their clients the opportunity to use such bot. As we know, those builders will be responsible to provide support too!

The more input this project can get (Q+A from users), the more resilient and diverse our Threefold library becomes and thus the more efficient our NLP bot can be.

What do you guys think?
I have some idea on how to proceed but I’d be very curious to know what others think about this too.

hannahcordes · June 22, 2022, 12:31pm

Love the idea, @Mik!

Mik · January 2, 2023, 3:17am

NLP Bot Follow-Up

Hi guys,

Just a follow-up to this post.

I am working on a FAQ bot right now and it is near completion.

The General Procedure

A user asks the FAQ bot a question.
The FAQ bot gives them the most appropriate answer.
If they want, the user can ask for more related questions or ask another question.
If the QnA isn’t in the FAQ, the user is invited to ask a question on the forum or the telegram channels.
When the new question gets an appropriate answer, the new QnA is added to the FAQ bot.

In short, we have a feedback loop to add new QnA content when needed, and we can then adjust and grow the TF documentation as users ask (new) questions.

The QnA are coming directly from the Threefold FAQ, converted from markdown to HTML. Thus, all links are clickable and linked to the Threefold Library.

Quick Example

So for example, someone wants to deploy a mastodon node. The user asks:

How can I deploy a mastodon node?

The FAQ bot replies with the most related QnA.
This QnA explains the different methods to deploy a node, with quick tutorials for each method.
At the end of the QnA, we include links to the complete guides on the Threefold Library.

Where Would the FAQ Bot be Available?

The FAQ Bot could be available as a Telegram bot, much like what @scott did with the amazing Status bot.
The FAQ Bot could also be available on the Threefold Library.

Feedback from the TF Community

What do you guys think?
Sounds like a nice feature?

Do you have any propositions?

Thanks for reading!

Note: Since this forum is written in markdown, it would be easy to implement ways to transfer posts from the community members to the official Library. This could be further developed.

Mik · January 4, 2023, 10:22am

The Threefold FAQ Bot is On Its Way

The NLP Bot is ready to be unleashed.
Some testings will be available for the public in the coming days.

It can be accessed on Telegram. The Telegram link will soon be published here and on the Telegram channels of Threefold.

What the Bot is Made Of

The bot runs on python, uses machine learning/artificial intelligence packages as well as some Telegram bot package.

It should be hosted on the TF Grid. It’s a good example of what the TF Grid can do in terms of utilization.

A TFT Grant to Run the Bot (?)

It would be very interesting to see if we can get a Threefold grant to fund the TFTs needed to run the 3node hosting the TF FAQ Bot.

Perhaps it could be a small quantity to start with, to see if the community likes the TF FAQ Bot. Then we adjust with the feedback.

Mik · January 8, 2023, 1:20am

The Transformers Library

Part of this NLP bot is built with packages from www.huggingface.co, most especially the Transformers library.

This enterprise seems very much in line with the Threefold project, in terms of sharing knowledge and working with open-source material.

Hugging Face, Open-Source Machine Learning

From wikipedia:

Hugging Face, Inc. is an American company that develops tools for building applications using machine learning.[1] It is most notable for its Transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets.

The Transformers library is a Python package that contains open-source implementations of transformer models for text, image, and audio tasks. It is compatible with the PyTorch, TensorFlow and JAX deep learning libraries

If you explore the website, you’ll see that you can easily try out the AI/ML thanks to their API interface, running on servers.

Here you can try the english-spanish translator.

One of many examples.

A Potential Collaboration (?)

Could Threefold collaborate with projects such as Hugging Face and provide network, storage and compute units to run the APIs?

Could third parties collaborate with Threefold to build such AI/ML to be used by everyone, anywhere!?

So many possibilities are there with the Threefold ecosystem.

@weynandkuijpers

Mik · January 17, 2023, 5:57pm

Hey guys,

I just want to give you some news on the FAQ Bot.

The Threefold FAQ Bot is Alive

The Simple Bot
Basic Tests for the FAQ Bot
The Bot: Now and Later
Closing Words

The Simple Bot

How to Use It

The Threefold FAQ Bot is now deployed here: https://t.me/TF_FAQ_Bot

There are few commands:

/start
start or restart the bot
/ask
ask a question to the bot
/more
once you asked a question and got answered, you can ask to see more related questions (3 new questions at a time)

On both cellphone and computer, you can either click on the command as presented by the bot, or you can write the command (/start, /ask, /more).

The Threefold Grid Data and the Bot Architecture

The Threefold FAQ Bot is currently running on test net, on the node 20.

It is using an IPv4 address. It is running on a full virtual machine (vm) and it uses Ubuntu 22.04.

The code is written in python and the main AI/ML library is the almighty sentence-transformers.

First the bot scraps the raw FAQ version on the Github page of the Threefold manual.

Then, it presents it to the AI/ML.

The AI/ML ranks from best to worst the most related Q&As compared to the question asked by the user. Once it has the ordered list of Q&A, the bot gives the answer in markdown through the telegram chat.

Limitations of the Bot

The facts: AI/ML workloads are most easily done by GPUs (graphical processing units).

In short (and in a simplified presentation), natural language processing language turns words into tokens (part/segment of words) and those tokens are turned into numbers. Then the computer compares numbers and gives answers. The answers are converted back into letters or the related words are presented.

Since AI/Ml takes a lot of workload, and that it needs to compute a lot of things in parallel, the more cores there is, the more efficient the AI/ML is.

For this reason, GPU are preferred over CPU (central processing unit). Even if CPU are in essence faster, GPU can compute a greater quantity of work in parallel.

If you want to, picture this: CPU can calculate quickly lines of numbers, while GPU can calculate more slowly mountains of numbers. The mathematics behind AI/ML is that it uses tensors. And tensors are very cool. They are basically matrices at each point of a mathematical grid.

Imagine 10 runners running 10 km per hours. How many kms are they running within an hour? 10 * 10 = 100 kilometers. That would be the CPU.

Imagine now 50 runners running 5 km per hours. How many kms are they running within an hour? 50 * 5 = 250 kilometers. That would be the GPU. Each runner runs slower, but the total distance made is greater.

In essence, GPU can do more different calculations, but those calculations are slower.

So a CPU is good to compute quickly some calculations. But to compute lots of data, GPU will do better.

In short, GPU would compute the answers faster, but it is not yet available on the Threefold Grid. This current version of the bot is thus running on CPU.

Also, note that there are now TPU: tensor processing unit. Those are made specifically for AI/ML compute workload.

Enough nerdy stuff for now. Let us test and let us see how the bot fares.

Basic Tests for the FAQ Bot

Here are some basic tests to show the speed of the bot. More tests would be clearly more precise, but the results are significant enough and in line with the literature to settle the score.

We present here two different deployments of a full virtual machine deployed on the Threefold Grid test net.

Note that in both cases, the bot used around 7.2gb of storage. The minimum storage is 15gb with a full VM, so this parameter was constant.

Let’s go!

4vcores, 8gb of ram, 15gb of storage

The first tested deployment was with 4vcores and 8gb of ram.
It used 2.4gb of ram maximum and the vcores were maxed out pretty good (all 95%+ during peaks).
One question got answered in 2m12s in average (n=2, just basic testing).
On test net, without yet any large amount of TFT to get 60% discount, it was 0.944 TFT/hour to run with some Internet traffic.

8vcores, 4gb of ram, 15gb of storage

The second tested deployment was with 8vcores and 4gb of ram.
It still uses around 2.4gb of ram and the vcores are still maxed out around 95% and more during peaks.
One question got answered in 1m12s (n=3).
Two questions, from two different Telegram accounts, got answered in 1m06s (n=1).
On test net, without TFT for the 60% discount, it is 1.136 TFT/hour to run with some Internet traffic.

The Brief Analysis

In short, doubling the vcores more or less doubles the speed. Even with two queries at at time, it took less than 1m20s. This is normal. AI/ML needs cores to do different calculations. The more cores, the quicker the bot.

For around 1 TFT/hour, anyone can run on the Threefold test net an AI/ML bot running on CPU, and you would count 2 TFT/hour on main net. Of course, the cost will increase as there is more Internet traffic with the bot.

To get more speed, you need more vcores. For 8vcores, you nearly double the speed, compared to 4vcores.

GPU and Future Testings

Of course, when GPU is implemented it will be very interesting to do some more testing here.

AI/ML is a huge use case and the market will demand a lot of this type of cloud service.

The Bot: Now and Later

We could have the FAQ bot as is, and then add more vcores if there is a need.

Of course, it depends on how much the Threefold community uses the Threefold FAQ Bot. We will check the pulse of the community in terms of interest in using the FAQ Bot.

With (e.g.) 10 TFT per hour, we could surely have a crazy fast FAQ Bot with a lot of vcores.

If we can have a FAQ bot that answers questions within 5-15 seconds, it would be a highly effective tool.

But still, waiting 1 minute to get an answer, that is convenient.

Also, note that what takes time is to compute the best answers in order.

So, once the calculations are done, the user can then ask to see three (3) more questions (/more on the bot), and 3 questions then appeared quickly. What takes time is really to take the user’s question and compare it with all the FAQ questions, then ranking them. Once this is done, it is very fast to print answers.

When GPUs are implemented on the Threefold Grid, the Threefold FAQ Bot would gain much more speed.

Closing Words

The Threefold FAQ Bot will evolve as the Threefold community gives feedback and as we make improvements with the bot.

You are invited to try it and tell us what you think.

Is this the first AI/ML bot deployed by the Threefold community?
Is it the last? Most certainly not, as AI/ML and the Threefold Grid go very well together.

The future of Threefold, with GPUs, is brighthly shining towards the AI/ML world.
In the meantime, we will explore this world gladly with CPUs.

Let us know what you think and tell us if you have ideas for future AI/ML tools for the Threefold community!

Note: A tutorial could be done on how to build such an AI/ML bot on the Threefold Grid. Let’s see if there is some interest first!