I am fascinated by Natural Language Processing.
The idea of a computer being able to understand and respond to text as humans do is incredibly interesting. Since I had my first personal computer, I have had this interest.
I think it comes from the first adventure games I played on my ZX Spectrum. It was The Hobbit. I loved that game. It was like magic to me. I could write sentences on a keyboard, and the computer responded accordingly.
Natural Language Processing has evolved in incredible and unpredictable ways.
Just think about the power of GPT-3 to have an idea of where we are today. GPT-3 is publicly available, and I guess there are much more advanced models available and not yet been disclosed to the general public.
Recently, I was fantasizing about building a chatbot—no big deal. There are plenty of services and libraries that can help you do that.
I was thinking about writing a bot that acts just like me—a digital self.
I want to train the model with:
– All of the e-mail messages I have written over time. I never deleted my personal and professional e-mail messages.
– All the chats on Telegram, WhatsApp, Signal, SMS, etc.
– All the documents I have written and archived.
– All my Twitter, Facebook, LinkedIn, and blog posts.
That would be a large amount of data to train a model with. Not as massive as the material the current models have been trained with, though.
I want to check whether the bot would speak as I do or not.
It seems I have a new personal project to work on, and I think it will not be easy.
There are so many different sources of data. I could download some of that data bulk from Facebook, Twitter, LinkedIn, and Google and then process it to normalize it. Extracting the data from instant messaging applications is going to be more difficult. Documents stored locally, on my external hard drives, and in the cloud will be relatively easy.
I think I will have to find a way to automate this process since new data will be coming in every day. This is a big ETL (Extract, Transform, and Load) process repeated repeatedly, ingesting new data when available.
The next step is training the model. I don’t know anything about this, and I will need to study to get a grip. This is probably the most exciting part of the project.
Anyway, I think this may be interesting to put my hands on something I always liked but never actually used from a programming perspective.
I am not sure I would love talking to myself, but I would like to try it.
Being able to answer the question, “Can I talk to myself?” is quite exciting. Yes, I do not need much to get excited about programming.