What Are Smallville and AI Town?
"Smallville" and "AI Town" are sophisticated computer simulation programs featuring non-playable characters (NPC) who can autonomously engage in human-like social interactions. Smallville is first published in a research paper from Stanford University and Google. Inspired by this paper, A16Z infra team developed AI Town.
The most remarkable aspect of these simulations is their level of autonomy, far surpassing anything we have seen in the realm of computer-controlled NPCs. These fully autonomous NPCs can plan, act, and reflect based on predefined identities and overarching goals, all of which are powered by the advanced large language model (e.g., GPT3.5).
Why Are They Interesting?
These simulations are not only revolutionary for the gaming industry, but also for large scale research experiments that delve into human behavior analysis. The ability to customize characters and design environments to your liking is an incredible AI advancement with immense potential and influence. It’s akin to being a god in a virtual realm, where you shape inhabitants and landscapes, all while witnessing events and interactions seamlessly unfold in response to your creative decisions.
Let's Compare Them
We will compare the environmental setup and generative agent architecture, as well as explore customization options of these simulations. The simulation architecture, depicted in the diagram below, is designed with human-inspired intelligence to inform character behavior.
Our discussion will be structured in four main sections:
- Perception and Memory
- Planning and Reflection
Image Source: Smallville paper (https://arxiv.org/abs/2304.03442)
Before diving into details, below is an over-simplified overview of their differences.
|🌳 Environment||More realistic as a human world||More like a chat room setup|
|🧐 Perception||Most new events nearby||Chat history, Character identity|
|🧠 Memory Retrieval||Importance, relevance and recency based||<-- same|
|📝 Planning||Detailed hourly plans||Pre-programmed goal for simple plans|
|🤔 Reflection||Focal point based reflection||Through simple predefined prompt|
|🤼🏼 Action||Many: party, coffee, cooking etc.||walk, talk, idle|
|⚙️ Customization||Character identity||Character identity, Map|
The agent’s actions and observations are strongly influenced by the environments in which they "live". In both simulations, agents follow the shortest calculated path to reach their target. The key difference between the simulation environments lies in the structure and the agent's understanding of their surroundings.
In smallville, similar to the human world, agents can discern environmental details such as houses, bedrooms, grocery stores. In contrast, AI Town resembles a chat room, where agents are unable to capture specific environment details.
Interaction with Environment
Agents not only recognize their environments, but also possess the ability to alter the state of objects they interact with. For example, if an agent is brewing a cup of coffee, then the state of the coffee machine will be turned from ‘off’ to ‘on’.
This feature does not apply.
Agent Path Calculation Algorithm
The simulation utilizes a modified breadth-first search (BFS) algorithm with multiple targets to find the nearest path from an agent’s current location to the target coordinate among four neighboring coordinates around the endpoint.
The simulation employs a modified A search algorithm to generate the shortest and most efficient path between an agent and its target destination. The algorithm combines elements of A search with custom endgame logic and player positioning management. A priority queue is used to explore paths with most efficiency.
🧐🧠Perception and Memory
In both simulations, agents create memories based on their daily experiences. Each memory includes a memory description, creation timestamp, importance score, last access time, and memory Id. The key difference between the two simulations is where memories are stored and the types of memories the agents choose to remember.
As agents explore the environment, they will create environment trees by storing area and object descriptions in interconnected nodes. These trees help agents generate precise string addresses for executing actions in a specific location. Furthermore, the environment trees are updated each time an agent enters an area to reflect the most up to date environment state.
This feature does not apply.
"Vision" and "Text" Perception
What the agent can observe depends on predefined values, specifically vision radius, attention bandwidth, and retention. Agents use their vision radius to detect nearby objects or events, and restrict the number of new observations that are made based on their attention bandwidth. The retention value distinguishes between a new or already perceived event.
Agents only make observations regarding conversation histories, their own identities, plans, reflections, and relationships with others. They form memories of chat histories at the end of conversations and generate new memories after self-reflection.
Agent memories are locally stored in the memory stream, a comprehensive record of all the agent’s past experiences that can influence their future actions.
Agent memories are stored in an external database, called Convex.
Memory retrieval plays a significant role in agents’ emulation of human behavior in both simulations. Memories fulfill several roles such as contextualizing conversations, facilitating reflections, and aiding in planning. In both simulations, memories are ranked and retrieved based on a final retrieval score, which combines the importance, relevance, and recency scores of each memory. Specific implementation details of memory retrieval procedure in Smallville and AI Town are outlined below.
Retrieve all agent memories (thoughts and events) and sorted based on time of creation
Determine recency, importance, and relevance scores.
- Recency Score: calculated using an exponential decay function that covers the duration of the game hours since the last memory was retrieved, so more recent memories are assigned a higher score. Decay factor: 0.995.
- Importance Score: assigned when the memory was created by prompting GPT to generate a score between 1 and 10 for each memory object. A higher score is assigned to more important memories.
- Relevance Score: cosine similarity between the embedding vector of the agent's memories and the current situation embedding vector. This determines how relevant a memory is to the current situation.
Normalize individual scores between 0 and 1 and combine them for the final retrieval score.
Top-ranked memories for each focal point are returned.
Retrieve all agent memories associated with the agent using memory embeddings.
Get recency, importance, and relevance scores.
- Recency Score: calculated using an exponential decay function, covering the duration of game - hours since the last memory was retrieved. More recent means higher score. Decay factor: 0.99.
- Importance Score: prompt GPT to choose on a scale of 0 to 9 how important the memory is to the agent. A default value of 5 is assigned if no valid response is given.
- Relevance Score: calculated using a built-in Convex vector similarity search function that assigns a value between -1 and 1 for the relevance score to each memory. 1 indicates most similar, while -1 is least similar.
Normalize individual scores between 0 and 1 and combine them for final retrieval score
Memories are returned. Top-ranked memories are selected and marked with most recent access time if not recently accessed.
📝🤔Planning and Reflection
Smallville and AI Town share a common approach of relying on predefined long term goals to guide their actions, but they differ significantly in their daily plans.
In Smallville, agents autonomously create highly detailed daily plans, scheduling tasks in intervals as short as 5-15 minutes. The plans are flexible, accommodating updates based on agent’s reactions to new events.
On the other hand, AI town agents strictly follow a predefined broad plan, with the game engine dictating when they start a new task. To appreciate the stark differences between them, let’s dive into how Smallville and AI town agents plan out their day.
Daily Long Term Plan Generation Procedure
- Based on the agent’s predefined lifestyle, determine their wake-up hour.
- Determine whether it is a “New Day” vs. “First Day” to generate daily requirements as needed.
- “New Day” means that it is a brand new day in an already ongoing simulation. Therefore, we need to create a new long term daily plan that is based on the agent’s memories of their schedule from their previous day. In this case, agents already have a list of daily requirements that only need to be updated.
- “First Day” means that the agent does not have a previous set of daily requirements, so GPT is prompted to generate a broad schedule of tasks for the agent from scratch and saved as their list of daily requirements.
- Generate the hourly schedule by recursively decomposing broadstroke plan from daily requirements list into hourly specific tasks or activities with assistance from GPT.
- The hourly schedule is further decomposed into specific actions/subtasks that the agent will execute during their day. The tasks are usually 5-15 minutes long and specific actions for the tasks are determined by GPT.
- The agent’s detailed daily plan is recorded in their memory for reference throughout the day.
Agents in AI Town operate without detailed plans, instead relying on the game engine for scheduling. The scheduling encompasses two main activities: engaging in conversations with other agents and reflecting on their past memories.
The game engine checks for potential collisions between agents by consulting a journal structured event log. This will determine if agents are grouped for future conversations or designated to engage in solo self-reflections. The engine systematically awakens agents and schedules tasks for those who have completed their previous task to ensure tasks are completed sequentially.
Reacting and Updating Plans
Agents modify their plans upon completing an action or perceiving an event. The agent's type of reaction to an event is determined through GPT prompts. For example, if one agent observes another agent’s actions, an agent might initiate conversation. This unpredicted conversation would lead to a revision of the agent’s plan, resulting in the regeneration of the plan from the time of the conversation onwards.
Agents do not react or update their long term goal. However, the game engine will reschedule agents to complete a new task upon completing their previous task. This involves regrouping agents for conversation and reflections, and calculating new agent paths.
In both simulations, agents utilize reflections to gain higher-level insights by drawing upon their past memories. A reflection is synthesized thought based on specific past memories. The primary distinction between the two simulations is their method of generating reflections: Smallville relies on focal points, while AI Town employs a predefined GPT prompt.
- A reflection is triggered when the sum of the importance scores of the agent’s latest memories exceeds 150, which is typically around two to three reflection occurrences per day.
- To initiate the reflection process, 100 of the most recent memories are retrieved, prompting GPT to generate “3 most salient high-level questions” about the memories.
- These three questions serve as focal points to retrieve a subset of relevant memories for reflection generation.
- For each question, GPT is prompted to generate “5 high-level insights” about the subset of relevant memories.
- The generated insights are saved as reflections in the memory stream with pointers to the supporting memory objects.
In Smallville, as the agent generates more and more reflections, a reflection tree is created. Along with reflections on past memories, reflections on conversations are also made to inform an agent’s actions in future plans and build relationships between agents.
- Agent’s 100 most recent memories are retrieved using Convex built-in query function.
- A reflection is triggered, if the sum of the importance scores for these memories exceeds 500
- A predefined reflection prompt instructs GPT to extract three high-level insights from the memory descriptions. Attached to the prompt includes all 100 memory descriptions and their memory Ids.
- Each insight is saved as a new reflection to the Convex database. Each reflection memory contains the playerId, insight description, memory data type (‘reflection’) and memory Ids of supporting memories.
In both simulations, agents exhibit emergent social behaviors such as having conversations and forming relationships. However, a notable distinction arises in the range of available actions for an agent. AI Town restricts agents to four different actions: walking, talking, reflecting, and remaining idle, while Smallville offers a more diverse set of actions that closely resembles human behavior. These include sleeping, cooking, decorating for a party, having conversations, and even writing a book, among others. Although, having conversations is a common behavior in both simulations, the implementation details differ slightly. Let’s dive into the specifics of how conversations are implemented in both simulations.
Having conversations is only a fraction of an agent’s responsibilities in Smallville, but are still crucial to the relationship development between agents. Conversations in Smallville are less predictable compared to AI town, often occurring at the agents’ discretion, except in specific predefined scenarios like when an agent is sleeping.
- The conversation is initiated by the speaker that approaches the conversation first.
- Conversations are guided by the agents’ relationship to each other, location, current context, past memories, agent traits, and chat history. LLM prompts are sent to GPT to generate messages.
- Conversation ends depending on which condition is met first: an agent chooses to end the conversation or 16 total messages have been exchanged throughout the entire conversation.
- To manage the frequency and duration of interactions, code implementations include chat buffer time and predefined conversation lengths. The chat buffer time is set to a value of 800 after two agents engage in conversation. This value decreases over the course of the game, so that when the value is 0, the same agents can choose to converse again.
Having conversations is a major part of an agent’s role because the primary objective of the simulation is to observe the agents’ socializing. Here, conversations are highly predictable as agents are designated for conversation at the beginning of the simulation by the game engine. Memories relevant to the current conversation are filtered beforehand.
- A collision between two agents will result in a conversation. Conversations are initiated by a chosen lead speaker (first name in grouped agents list).
- Conversation flow is guided by the agents' relationship to each other, when their last conversation was, and the relevant memories to the current conversation. An LLM prompt with these details is prompted to GPT to generate messages.
- The conversation ends when an agent decides to walk away from the conversation.
- To manage conversation frequency, an ignore list is employed to prevent repetitive interactions. The ignore list continues to update with agent names that an agent has already talked to.
- A memory embedding will be created by both agents to inform future conversations.
Both Smallville and AI Town simulations provide the option to customize agents, but they differ significantly in complexity.
In AI Town, you have the flexibility to customize the agent’s identity, plan, relationship, and starting position in the simulation.
Smallville, on the other hand, provides a more comprehensive and intricate level of customization compared to AI Town.These include components like vision radius, attention bandwidth, lifestyle, and more, in addition to what’s available in AI Town.
In Smallville, further customization to make agents more human-like are possible in addition to those mentioned for AI Town.
Customize an agent's identity, plan, relationship, and initial position on the game map.
Map customization is unclear and may require adjustments to the source code.
Map customization is straightforward using the Tiled software. It enables you to design a map, export it as a CSV file, and easily convert it into a 2D array when integrated into your code.
Smallville agents impress with their intricate, human-like complexity, making it a compelling choice for experienced users in the field of generative agents. This simulation offers valuable insights into AI’s effective emulation of human behavior. However, customization can prove challenging, particularly for beginners unfamiliar with the intricacies of generative agents.
AI Town offers an entirely different experience. It functions more like a game in which you can eavesdrop on character conversations. AI Town is notable for its user-friendly customization capabilities, making it accessible to individuals of all skill levels. The ability to tailor agents to your own needs is advantageous, but the limitations on the actions that agents can perform make them feel less human-like. Despite this constraint, the ease of customizing the game environment makes it an appealing choice for users of varying expertise levels.