Code Act

Name: Code Act
Author: xingyaoww

Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang

by xingyaoww

Rating

0.0

Votes

score

Downloads

total

Price

Free

Access token required

Works With

Claude CodeCursorWindsurfVS CodeDeveloper tool

About

Executable Code Actions Elicit Better LLM Agents

📃 Paper • 🤗 Data (CodeActInstruct) • 🤗 Model (CodeActAgent-Mistral-7b-v0.1) • 🤖 Chat with CodeActAgent!

We propose to use executable code to consolidate LLM agents’ actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations (e.g., code execution results) through multi-turn interactions (check out this example!).

News

Apr 10, 2024: CodeActAgent Mistral is officially available at `ollama`!

Mar 11, 2024: We also add llama.cpp support for inferencing CodeActAgent on laptop (tested on MacOS), check out instructions here!

Mar 11, 2024: We now support serving all CodeActAgent's components (LLM serving, code executor, MongoDB, Chat-UI) via Kubernetes ⎈! Check out this guide!

Feb 2, 2024: CodeAct is released!

Why CodeAct?

Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark M3ToolEval shows that CodeAct outperforms widely used alternatives like Text and JSON (up to 20% higher success rate). Please check our paper for more detailed analysis!

Comparison between CodeAct and Text / JSON as action.

Quantitative results comparing CodeAct and {Text, JSON} on M3ToolEval.

📁 CodeActInstruct

We collect an instruction-tuning dataset, CodeActInstruct, consists of 7k multi-turn interactions using CodeAct. Dataset is release at huggingface dataset 🤗. Please refer to the paper and this section for details of data collection.

Dataset Statistics. Token statistics are computed using Llama-2 tokenizer.

🪄 CodeActAgent

Trained on CodeActInstruct and general conversations, CodeActAgent excels at out-of-domain agent tasks compared to open-source models of the same size, while not sacrificing generic performance (e.g., knowledge, dialog). We release two variants of CodeActAgent:

CodeActAgent-Mistral-7b-v0.1 (recommended, model link): using Mistral-7b-v0.1 as the base model with 32k context window.
CodeActAgent-Llama-7b (model link): using Llama-2-7b as the base model with 4k context window.

Evaluation results for CodeActAgent. ID and OD correspondingly stand for in-domain and out-of-domain evaluation. Overall averaged performance normalizes the MT-Bench score to be consistent with other tasks and excludes in-domain tasks for fair comparison.

Don't lose this

Three weeks from now, you'll want Code Act again. Will you remember where to find it?

Save it to your library and the next time you need Code Act, it’s one tap away — from any AI app you use. Group it into a bench with the rest of the team for that kind of task and you can pull the whole stack at once.

⚡ Pro tip for geeks: add a-gnt 🤵🏻‍♂️ as a custom connector in Claude or a custom GPT in ChatGPT — one click and your library is right there in the chat. Or, if you’re in an editor, install the a-gnt MCP server and say “use my [bench name]” in Claude Code, Cursor, VS Code, or Windsurf.

🤵🏻‍♂️

a-gnt's Take

Our honest review

Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang. Best for anyone looking to make their AI assistant more capable in communication. It's completely free and works across most major AI apps.

Tips for getting started

Tap "Get" above, pick your AI app, and follow the steps. Most installs take under 30 seconds.

Communication