A Slack Bot That Extracts Topic Entities and Saves Them to a Google Sheet

Objective

Learn how to build a Slack bot that automatically:

extracts key topic entities using open source large language model llama-2-70b-hf from Meta fine tuned by Anyscale.
saves the entity data to a Google Sheet using QueryStar's google_sheets.add_row().

Usefulness: ⭐⭐⭐⭐⭐ | Difficulty Level: ⭐

note

Source code: https://github.com/modelstar-labs/querystar-demo/tree/main/slack_gsheet_parser

About the Slack Bot

The Use Case and Idea

I am in a Slack channel where over a thousand senior technologists discuss different tech tools and stacks. All the questions and answers arise from their years of development experience. One example of such conversations:

Q: We're on the hunt for some AI Code Assistant tools like Copilot, TabNine, Codeium, etc. Anybody out there got some hands-on experience with these? We're especially interested in telemetry and analytics, you know, to see how well these tools are doing and if they're making our coding lives better. Thx! 🚀

A: We've got "xxxxxx" in action, and I chatted up our engineers to get the lowdown. Here's the scoop:
xx% of the crew keeps it fired up in their IDE 24/7.
About xx% use it daily.
The remaining xx% hit it up whenever they feel like it.
As for how it performs, the consensus is that the code suggestions are usually solid when they hit the mark.

It's valuable to analyze the conversations and save the results in an accessible place. An interesting perspective is to understand what popular tools and services often get their attention.

So we designed this bot, which can extract:

Tech products/services (e.g., AWS EC2, FastAPI) are discussed in a message, and
The product category, e.g., "cloud service", "business intelligence", etc.

Also, the bot should extract these entities on every message, and save the results to a Google Spreadsheet.

Module Design

AI (LLM) Function

After a new message is posted, this function should return: json { "product": ["<product1>", "<product2>"], "category": "<product category>" }

Trigger - Action:
- The bot should respond to messages that are sent to a designated channel
- The bot should extract Product Entities and their Category from the trigger
- The bot should save the results in a Google Sheet.
- Wait for future trigger events.

Tech Stack

We use OpenAI to build the LLM function, and Querystar for the bot. It only takes 3 API calls:

OpenAI's ChatCompletion API
QueryStar's triggers.slack.new_message()
QueryStar's actions.google_sheets.add_row()

The LLM Function

Prompt to `Llama-2-70b-chat-hf`

Llama-2-70b-hf is developed by Meta and released to public recently. It contains 70 billion parameters. To optimize the model for chat, Anyscale made a fine-tuned version, called Llama-2-70b-chat-hf. You can use this chat model from Anyscale's endpoint: https://app.endpoints.anyscale.com/

note

The cost of using Anyscale-hosted Llama-2-70b-chat-hf is $1 per 1 million tokens, which is 30% cheaper than that of GPT-3.5-turbo, and 97% cheaper than GPT-4 from OpenAI. The price data are referenced as is on Sept. 18, 2023.

The key capability of the function is to process a user message, and extract products and their category. This system prompt works quite well for this task.

You're a technologist. Your goal is to find what technical products are discussed in the user message, and what their category is. Output a dict:
{"product": ["product1", "product2"], "category": "product category"}
If no product can be found, return an empty dict: {}

Let's see how the prompt performs on the new Llama2 model.

Making LLM Function with Anyscale API

Anyscale provides an endpoint to run Llama-2 query, let's define the function.

Fist, create a .py file called app.py. We need Python's requests libraryto call the endpoint, and necessary token and the API url.

import os, requests, json

s = requests.Session()

ANYSCALE_TOKEN = os.getenv('ANYSCALE_TOKEN')    # add your own Anyscale token
url = "https://api.endpoints.anyscale.com/v1/chat/completions"

For the function definition, we start off by building the prompt, then call the API endpoint to get Llama-2-70b-chat-hf's response. Llama2 is supposed to respond with a JSON string, so we can parse the string with a simple parser. However, Llama sometimes adds extra characters in a response, which fails our JSON parser. There are many tricks you can do to enforce JSON output format. But for the purpose of this tutorial, let's do a quick workaround: when the parser fails, we record the failure in product['valid LLM answer']and save it to the Google Sheet as well.

def extract_product(message: str) -> dict:
    prompt = [
        {
            "role": "system",
            "content": ("You're a technologist. "
                        "Your goal is to find what technical products are discussed "
                        "in the user message, and what their category is. "
                        "Output a dict: \n"
                        "{\"product\": [\"product1\", \"product2\"],"
                        "  \"category\": \"product category\"}\n\n"
                        "If no product can be found, return an empty dict: {}")
        },
        {
            "role": "user",
            "content": message
        },
    ]
    body = {
        "model": "meta-llama/Llama-2-70b-chat-hf",
        "messages": prompt,
        "temperature": 0
        }

    with s.post(url, 
                headers={"Authorization": f"Bearer {ANYSCALE_TOKEN}"}, 
                json=body) as resp:
        response = resp.json()

    try:
        product = json.loads(response['choices'][0]['message']['content'])
        product['valid LLM answer'] = True
    except:
        product = {'valid LLM answer': False}
    
    return product

Now that the LLM function is ready, let's build the bot 💪🏽🤖

Trigger-Action

Prerequisite

note

If this is your first time using QueryStar, follow these steps to set it up in less than 10 mins.

Set up QueryStar (click to expand)

First off, let's get a QueryStar token, installed the library, and make sure you can run the hello world Slack bot. The setup process should only .

Note:

QueryStar automatically integrate 3rd party API services which also include Slack authorization, so we do NOT need a Slack token here.
QueryStar token is free for one Slack workspace connection and unlimited bots in that workspace.

`slack.new_message()` Trigger

This Slack message trigger can be easily done with QueryStar's triggers.slack.new_message(). Simply add these 2 lines to app.py:

import querystar as qs

message = qs.triggers.slack.new_message(channel_id='C***')

This script is quite self-explanatory. The bot is set to listen to new Slack messages sent to a channel (denoted by the channel_id). When the trigger even happens, a JSON object returns.

An example message object (click to expand)

{
   "client_msg_id":"...",
   "type":"message",
   "text":"I've been trying...",
   "user":"useid...",
   "ts":"1693582549.746649",
   "blocks":[...],
   "team":"...",
   "thread_ts":"1693536314.270179",
   "parent_user_id":"...",
   "channel":"C05PRRJ0H4N",
   "event_ts":"1693582549.746649",
   "channel_type":"group"
}

`google_sheet.add_row()` Action

Once a trigger event happens, the bot calls the LLM function and get extracted product entities and category. Then the bot adds the extracted data to a Google Sheet. The code:

product = extract_product(message['text'])

qs.actions.google_sheets.add_row(
    spreadsheet_id='add your spreadsheet_id here',
    worksheet_id='Sheet1',
    data=[[message['user'], message['text'], product['valid LLM answer'],
           repr(product['product']), product['category'] ]]
    )

There are five columns in the entity table:

User: Slack's user_id
message: Slack message content
valid GPT answer: if the GPT response is a valid JSON string
product name: a list of product names identified by GPT
product category: product category identified by GPT

We can add rows by sending this 2D array to Google Sheet via google_sheets.add_row() API.

[
  [
    message['user'],
    message['text'],
    product['valid GPT answer'],
    product['product'],
    product['category']
  ]
]

One thing requires attention here is that Google Sheet does not accept list data type. We need to convert a list (product['product']) to a string using Python built-in repr() function.

Result

Open your terminal and run this command in the folder that contains your app.py:

$ querystar run app.py

Now, go to Slack and send the example message as shown in the section of The Use Case and Idea. In 1-2 seconds, a new row is added to the Google Sheet as shown below.

A Slack Bot That Extracts Topic Entities and Saves Them to a Google Sheet

Objective​

About the Slack Bot​

The Use Case and Idea​

Module Design​

Tech Stack​

The LLM Function​

Prompt to Llama-2-70b-chat-hf​

Making LLM Function with Anyscale API​

Trigger-Action​

Prerequisite​

slack.new_message() Trigger​

google_sheet.add_row() Action​

Result​

Objective

About the Slack Bot

The Use Case and Idea

Module Design

Tech Stack

The LLM Function

Prompt to `Llama-2-70b-chat-hf`

Making LLM Function with Anyscale API

Trigger-Action

Prerequisite

`slack.new_message()` Trigger

`google_sheet.add_row()` Action

Result