Charlie Mnemonic (Good AI)

davewaring · May 8, 2024, 1:17pm

The closest thing I have seen to a digital brain that is built for and is open source so can be owned by individuals. Asked the devs to look at it and they came back with this architecture overview. We’re going to continue to investigate and see if we can get a hosted version up and running (if they don’t already have one) that can be used by a non techie.

Will post more updates here shortly and if anyone has experience with this service please post thoughts below:

How to run

Install Postgres14 then follow the instructions below for each os
Clone the repo
Install virtual env with python 3.10 - python3.10 -m venv env
Activate environment - source env/bin/activate (mac)
Create a users folder in the base path
Create .env file and copy contents of .env_sample to it
Add open ai keey to OPENAI_API_KEY
Configure database url properly
Run uvicorn main:app –reload
Go to browser, possible prompt to input OPENAI API Key
Sign in, username = admin, password = admin

Common error = database name.

How to fix

Architecture.

Architecture is strictly based on OpenAI LLMs

RAG PIPELINE = Pgvector for Postgres DB and Chromadb

Pgvectore is being used to create vector embeddings and they are being saved into a local chroma sqlite db

LLM roles

Machine: A computer program aiming to fulfill user requests.
Brain: An AI Brain Emulation tasked with updating user-specific data based on received messages, focusing on important information while discarding irrelevant details, within a word count limit.
Subject: Identifying the observed entity in a given observation.
Observation: Breaking down a last message into search queries to retrieve relevant messages with a vector search, following a specific format.
Categorise_query: Categorizing a last message into a category and a search query to retrieve relevant messages with a vector search, adhering to a specific format.
Categorise: Categorizing a last message into predefined categories, each on a separate line.
Retriever: Breaking down a last message into multiple search queries to retrieve relevant messages with vector searches, following a specific format.
Notetaker: An advanced note and task processing assistant, determining whether to add, update, delete, or skip tasks based on the last message and existing notes, following a specific format.
Summary_memory: Summarizing current notes while maintaining essential details, excluding imperative instructions.
Summarize: Summarizing the current message while maintaining all details.
Date-extractor: Extracting the target date for the last message, providing it in specific date formats or ‘none’.

When a user sends a message, The last messages stored in the database are retrieved and formatted to the (role: assistant, content: “”) format and the new message is also appended. The message is then passed to the LLM and the Observation role is passed to retrieve some data from the memory. Observations are then generated and added to the instruction string. If an image prompt is provided, it is appended to the message. The LLM decides the new role with prompt and a full response is generated by the LLM, the response it also processed to update the notes and database memory

Deep Dive

When A user sends a message, the message is passed to a process_message function in the utils.py file. The user settings are then retrieved from the load_settings function. The previous messages are also retrieved through the get_recent_messages function. The function also retrieves data from the get_most_recent_messages which then refers to the get_memories that retrieves the “active brain” data from the collection, this get_memories sorts and filters the previous conversation and returns the last 100 which is then passed to get_most_recent_ messages for further sorting and then to the get_recent_messages for the final restructuring and assigned to a variable last_messages_string. Last_messages_string is the further formatted into a (user: message, assistant: message) string format and concatenated with the new message sent by the user (all_messages). The all message and new message along with some configuration data is then passed into the process_active_brain function.

In the process_active_brain method, the new message is split into chunks in the split_text_into_chunks function, and then appended to the active_brain collection database in the create_memory method which returns the create_memory function in the agentmemory/main.py. The all_messages is then passed to the LLM with role of retrieval. Personal fix: The response is then passed to the process_observation function to split the text into a list. The list is then looped through and each item is passed to the search_memory function, the result is then stored in the corresponding list in a process_dict dictionary. The method sorts the results list based on the distance and removes the result with the highest distance if the list contains more than one result. The method then generates the final result string and checks if it exceeds the remaining tokens. If it does, it removes the result with the highest distance until the token count is within the limit. The method also creates a set to store unique results and adds the results to the set. Finally, the method returns the result string, token count, and unique results.

In summary, the process_active_brain is responsible for processing the active brain and generating a response based on the user’s input. It handles memory creation, search queries, and token management to ensure the response is within the specified token limit.

In the process_message, the new message and all messages are also passed to the process_episodic_memory method, this method passed the messages into the llm with role of date-extractor, and the date extracted is passed to the search_memory_by_date with searches the collection table and filters the date by date. The result from the database and response from the LLM date-extractor is then returned. The all_message and new_message is also passed process_incoming_memory function in the memory.py file. In this function, the message is passed to openAI model with the categorise_query prompt which is determined by the get_role_content function (datetime.datetime.now() is also passed into the prompt with this function). The LLM then decides if the category is one of below:

Factual Information
Personal Information
Procedural Knowledge
Conceptual Knowledge
Meta-knowledge
Temporal Information

The Model response is then parsed into the process_category_query function which then returns a set of the category and the message.

Example 1 response = [(‘procedural_knowledge’, ‘calculate BMI’)]. (Query= “Can you calculate my BMI?”)

The response is then looped through and reformatted into a key, value format, with the input as the first key, value, and the category and responses from the model as the second key, value.

Example 2 = {‘input’: ‘can you calculate my BMI?’, ‘procedural_knowledge’: {‘query_results’: {‘calculate BMI’: }}}

This is then passed into the search_query function where a full searched result (plain text) and a process list of sets is returned.

In the Search query function, the query (query_results value in Example 2) is looped through, (it is converted to a list passed into the search query function) and the category and query is passed into the search_memory function in the memory.py file which then returns a search_memory function in the agentmemory/main.py file.

In the search memory function, the category is retrieved/created from the chromadb, the minimum number of results is then set to prevent searching for more elements than are available. The types to be included in the result is retrieved from the get_include_types. If filter_metadata is provided and contains more than one key-value pair, it maps each key-value pair to an object shaped like { “key”: { “$eq”: “value” } }. It then combines these objects into a single object with the $and operator. If novel is set to True, it adds a filter for memories that are marked as novel to the filter_metadata object. The function then performs a query using the query method of the collection, providing the query texts, filters, and other parameters. The result query is then passed to the chroma_collection_to_list to get the relevant data from metadata, documents, embeddings and distances table.

The process_incoming_memory also returns some data and some unique data just like process_active_brain, The unique result from both functions is then merged and formatted to retrieve the closest conversations based on similarity score.

The all_messages and new_message is also passed into the note_taking method, this method get the contents in the notes file and passes it to the LLM with the summary_memory role. The user message is also passed to the LLM with the role summarize, and the response from the LLM is concatenated with the current time, the response from summary_memory role LLM, the content of the notes and the response from summarize role LLM and assigned to variable called final_message, final message is then passed to the LLM with role note-taker. The response is then passed to the process_note_taking_query function to extract the action to be performed. (create, read, update, delete), any action specified is performed on the notes.

Finally, the messages, history_messages is passed to the generate_response function which returns the response from the LLM with the system_prompt prompt.

The content of the response is then extracted and passed to the process_response function, which updates the active brain memory through process_incoming_memory_assistant function.

The response from the process_response function is then returned to the user.

davewaring · May 9, 2024, 1:12pm

Yesterday I spoke with Aqeel who is a developer helping me out and tried to get this installed on my laptop. We were not successful and I think that call was a good example of why a community like this is needed. It’s currently way too complicated for non developers like me to use these systems.

I would still like to test this out however so I asked Aqeel to set it up as a hosted service.

davewaring · May 9, 2024, 1:13pm

@MudassirAqeel I got logged in to the hosted service. It’s working but running pretty slow especially in comparison to chatgpt. Is this something that we have to live with on this service or are there ways to speed it up? Thanks!

MudassirAqeel · May 9, 2024, 1:22pm

Hey Dave, so the reason it’s slower than a typical chatgpt is because, its doing multiple steps before generating the final answer in a sequential manner. It can be optimized, but it will always be slower than a typical chat application like chatgpt.

davewaring · May 9, 2024, 1:23pm

Ok thanks Aqeel. I’m going to play with it today and see how good the memory is and will post back here with thoughts. Dave

davewaring · May 9, 2024, 1:34pm

@MudassirAqeel ok it’s hanging up on me now. I have to log out and log back in to get the next message to appear otherwise just get the spin. Can you take a look when you have a chance please? Thanks!

MudassirAqeel · May 9, 2024, 1:44pm

Alright let me look and I’ll get back to you

MudassirAqeel · May 9, 2024, 7:51pm

It should be working now, had some troubles with it.

davewaring · May 10, 2024, 2:01pm

Thanks @MudassirAqeel working for me now. I’ve been playing with it some more and here are some additional thoughts:

The accuracy is good. I’ve been adding and removing things from a to do list and it remembers the list accurately and adds and removes things even when I ask it to do so using a variety of different languages.
It takes 20 to 30 seconds to give me a response which feels like an eternity in comparison to the company owned digital brains like ChatGPT.
I am able to use it on my iphone but the mobile responsiveness is not good and that’s where I primarily use it and think most others will as well. I think this is going to especially be a problem when there are lots of chats running on this system.

Do they have any solutions to #2 and #3 currently? If not I think we would need to work on helping them solve these before this is a viable option.

Also I think we should have a standardized list of the things we look at when evaluating personally owned digital brain systenms. I’ve added that to our to do list and will work on getting that together for review.

Thanks
Dave

MudassirAqeel · May 10, 2024, 2:12pm

Well, this is great @davewaring that the solution is accurate, I don’t think they’re tackling either of the 2 problems we listed, I looked at their GitHub issues, and it doesn’t have that mentioned, I also found that there are only 2 contributors (Devs) to the charlie project.

I’m thinking of finding out the individual time each component is taking that add’s up to 20 - 30 seconds

also benchmarking something is a really good idea, we can do some typical metrics aswell like recall, precision, hallucinations, etc.

I’m gonna try some runs myself over the weekend for a use case I’ve in mind over this app we have hosted, I’ll try to create a different user and hope it’s not mingling with yours.

Mudassir

ianerc · February 15, 2025, 1:20pm

Is there a nosted version I could try? Link?
I’ve been trying this and it would be great, if only it worked!
I;m trying to run it under windos and the first bug I noticed is the whole thing crashes with a 500 server error if you try to set the time zone.
Even when it is working OK it is painfully slow, a lot slower than the 30 seconds report ed here. At wort it continually crashes or just freezes on trying the GET from the server.
I am running it through Docker Desktop (which I hate) - cold that be an issue? Would I better trying to install it the other way?
I used ot be an IT Project Manager, but I have zero developed or coding abilities, so I found installation notes very unclear. Oh, for a Windows installer.