Ai, Artificial Intelligence, Chatgpt, Codegemma, Fabric Ai, Llama 3, Low End Virtual, Mistral, Ollama, Openai, Rag

Shirley’s Journey: Crafting a Truly Useful Self-Hosted AI Personal Assistant with ChatGPT, Long-Term Memory, and RAG

After reading Situational Awareness, I had the following thought: if AI is really only a few years away from AGI, then perhaps it is close enough to create the AI assistant I have always wanted to have.

Now, you may very well argue that AGI is nowhere near to 3 years away, and that’s fine.  My point was to ask if current AI technology is good enough to really create a personal assistant.

I mean, we have them, sort of.  For example, I can say to my phone or my watch “remind me at 1pm next Tuesday that I have a dentist appointment” and it will faithfully do so.  But that’s not really AI.  That’s speech recognition being fed into an app.

What I Want

I want an AI that I can talk to (typing is fine, voice would be nice) that can do the following:

  1. Manage todo lists, appointments, notes, and projects
  2. Never forget anything I tell it
  3. Do all the stuff ChatGPT can do as far as answering gen-AI type questions (“write me a python function”, “who shot Abraham Lincoln”, etc.)
  4. Interact with the web
  5. Be able to do things on a schedule
  6. Be private (and by implication of that, secure)
  7. Have a personality

Can ChatGPT Do This?

The answer is no. ChatGPT cannot do #2 (never forget), only partly do #4 (it can read from the web but it can POST), #5 (there’s no cron), or #6 (it’s sort of private with Teams).

I somewhat naively attempted to do this ChatGPT. I have a private Teams account with Memory on and so I just started talking to it.

  • I gave it the personality of a bitchy 1950s secretary who smokes named Shirley, with some personality hints
  • I told it my travel agenda for the next five days
  • I told it to create a number of todo lists
  • I taught it to play Ironsworn: Starforged

…and then I opened a new session and that was all gone.

OK, attempt #2 involved giving it a bunch of “remember…” directives. That worked. To a point.

To about 5,000 words of condensed info, in fact. Beyond that, you exceed what Memories can hold. Now, I didn’t really expect to be able to say “here’s 20TB of data, remember it” but I figured it was a lot more than 5,000 words.

With Custom GPTs on OpenAI, you can create bots that have a lot more info. But that’s all setup before the bot starts working. In other words, that’s ROM (read only memory) and the amount of “disk space” you get is 5,000 words. Actually, I’m not sure Custom GPTs even get Memory.

Now, I must say, before I filled up the memory, it was fantastic. We discussed projects, reorganized todos, and made lists. That travel agenda was really fun because I said “ok, I’m here, now if I drive for 8 hours, what cities do I get to. What’s around that. What if I go a different way,” etc. in a very natural conversation like Shirley was a travel agent.

It was a brief, fleeting glimpse of digital paradise.

Can I Make It Myself?

I do have access to a couple GPUs: the one on my M1 Max laptop, and a 4080 in my gaming PC.

You can’t download ChatGPT, but you can download llama-3 and various other models. The easy on-road to your own self-hosted AI looks like this:

  • Go to ollama.com and download the package
  • If you’re on Windows, setup WSL
  • Install ollama
  • Use ollama commands to pull down different models, which you can search for on ollama.com. The commands are docker-ish. You might enjoy the llama3, codegemma, or mistal models, but there are many.
  • Then “ollama run <model>” and you’ll be at the prompt.
  • If you want something fancier, fire up docker and run open-webui. It’ll provide a nice, ChatGPT-ish interface

NetworkChuck has a nice video on this. I love NetworkChuck.

So that’s straightforward and you have a local AI. It’s not quite as good as ChatGPT (which is trained on trillions of tokens – most of these models are on the 7 or 70 billion scale). However, llama-3 (one of the most advanced freely-available models, trained by Meta) is quite capable.

There is also a technique known as Retrieval Automated Generation (RAG) where you hook up a vector database and populate it with however much data you’d like. Then when you talk to the AI, if you ask something that’s in the database (“what is cousin mike’s address?” or “how many jars of applesauce did we sell in Ohio last September?”) it will consult it.

So we’re…getting kind of close, right? Let’s compare ChatGPT and a self-hosted system, based on my requirements, which I’ve reformatted a little:

ChatGPT Self-Hosted
Manage todo lists, appointments, notes, projects
Never forget anything I tell it
Have an extensive knowledgebase of personal info 🟡
Normal GenAI stuff
Interact with web 🟡 🟡
Be able to do things on a schedule
Have a personality
Be able to access anywhere
Available by voice
Private 🟡

Some notes:

  • When I say “manage todo lists, appointments, etc.” I mean I can talk to it like it’s my secretary. It understands these concepts. In effect, it can “roleplay” a secretary (but I won’t be chasing it around a virtual desk. Though there is the llama3-uncensored model…)
  • ChatGPT can read from the web but can’t POST. Probably to prevent it from taking over the world. I kid…or do I?
  • One could run ChatGPT on a schedule through its API, but ChatGPT itself can’t initiate things
  • ChatGPT is sort of Private. You can set up private workspaces on Teams and OpenAI says it doesn’t train on that data, but regardless, it’s still your stuff in the cloud on a platform that has to be undergoing massive attacks daily.
  • By “access anywhere” ideally that’s both typing on my laptop and talking to the AI on my phone. If I got the other 9 items, I’d invest the time to learn how to write an iOS app. In the meantime, just being able to talk to it over ssh or a web app would be fabulous.

So let me say that it’s kind of cool that I could build 50% of my ideal personal assistant at home and run it on relatively modest hardware today. You’re talking to a guy who started with a Timex Sinclair 1000.

The Easier Ones

“Interact with web” is probably the easiest. I’m not sure what’s involved, but it’s just stitching some kind of web interface to the AI. Out of the box, llama3 thinks it can POST to a URL but it doesn’t actually do anything, like it’s role-playing or something. I think this is just an app thing.

Likewise, “available by voice” is also an app thing. I don’t know the state of FOSS speech recognition, but macOS and iOS surely have good libraries.

Scheduling kind of violates the talk/response model. LLMs don’t just sit there and think. They transform text. It may be possible to solve through automation. Tell the AI “here’s what you need to remember” and then have a job that asks it every minute if there is anything due, etc. Again, the talk/response metaphor is a limiting factor. On its own, it’s never going to say “hey, is anything due” so you have to ask it.

The Really Hard Ones

We’re down to “knowledgebase” and “perfect memory”.

There is sort of a solution to the KB. You can hook up a RAG, but from what I’ve read, FOSS solution here are pretty weak… more or less like a glorified grep.

When I’ve loaded spreadsheets into a custom GPT on ChatGPT, it’s been able to summarize, tabulate, etc. info from these sheets, so it is possible but not sure the FOSS solutions are there yet.

In theory the AI should be able to feed info back to the RAG. That’s the holy grail: a memory store that grows.

However, turns out this is the frontier.

My todo list:

  1. Investigate the state of RAG technologies
  2. Explore fabric more
  3. Create test cases, from easy to what I ultimately want, and start scoring LLMs and tech solutions based on how they can handle them. Track progress to the glorious digital future.

I’m tracking it all in a Jupyter notebook hosted on PikaPods.

You might also like

Leave a Reply

Your email address will not be published. Required fields are marked *