- Home
- Communication
- Code Act
Code Act
Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang
Rating
Votes
0
score
Downloads
0
total
Price
Free
Access token required
Works With
About
Executable Code Actions Elicit Better LLM Agents
π Paper β’ π€ Data (CodeActInstruct) β’ π€ Model (CodeActAgent-Mistral-7b-v0.1) β’ π€ Chat with CodeActAgent!
We propose to use executable code to consolidate LLM agentsβ actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations (e.g., code execution results) through multi-turn interactions (check out this example!).
News
Apr 10, 2024: CodeActAgent Mistral is officially available at `ollama`!
Mar 11, 2024: We also add llama.cpp support for inferencing CodeActAgent on laptop (tested on MacOS), check out instructions here!
Mar 11, 2024: We now support serving all CodeActAgent's components (LLM serving, code executor, MongoDB, Chat-UI) via Kubernetes β! Check out this guide!
Feb 2, 2024: CodeAct is released!
Why CodeAct?
Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark M3ToolEval shows that CodeAct outperforms widely used alternatives like Text and JSON (up to 20% higher success rate). Please check our paper for more detailed analysis!
Comparison between CodeAct and Text / JSON as action.
Quantitative results comparing CodeAct and {Text, JSON} on M3ToolEval.
π CodeActInstruct
We collect an instruction-tuning dataset, CodeActInstruct, consists of 7k multi-turn interactions using CodeAct. Dataset is release at huggingface dataset π€. Please refer to the paper and this section for details of data collection.
Dataset Statistics. Token statistics are computed using Llama-2 tokenizer.
πͺ CodeActAgent
Trained on CodeActInstruct and general conversations, CodeActAgent excels at out-of-domain agent tasks compared to open-source models of the same size, while not sacrificing generic performance (e.g., knowledge, dialog). We release two variants of CodeActAgent:
- CodeActAgent-Mistral-7b-v0.1 (recommended, model link): using Mistral-7b-v0.1 as the base model with 32k context window.
- CodeActAgent-Llama-7b (model link): using Llama-2-7b as the base model with 4k context window.
Evaluation results for CodeActAgent. ID and OD correspondingly stand for in-domain and out-of-domain evaluation. Overall averaged performance normalizes the MT-Bench score to be consistent with other tasks and excludes in-domain tasks for fair comparison.
Don't lose this
Three weeks from now, you'll want Code Act again. Will you remember where to find it?
Save it to your library and the next time you need Code Act, itβs one tap away β from any AI app you use. Group it into a bench with the rest of the team for that kind of task and you can pull the whole stack at once.
β‘ Pro tip for geeks: add a-gnt π€΅π»ββοΈ as a custom connector in Claude or a custom GPT in ChatGPT β one click and your library is right there in the chat. Or, if youβre in an editor, install the a-gnt MCP server and say βuse my [bench name]β in Claude Code, Cursor, VS Code, or Windsurf.
a-gnt's Take
Our honest review
Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang. Best for anyone looking to make their AI assistant more capable in communication. It's completely free and works across most major AI apps. This one just landed in the catalog β worth trying while it's fresh.
Tips for getting started
Tap "Get" above, pick your AI app, and follow the steps. Most installs take under 30 seconds.
What's New
Imported from GitHub
Ratings & Reviews
0.0
out of 5
0 ratings
No reviews yet. Be the first to share your experience.
From the Community
What Is Claude Code and Why Every Developer Should Try It
A deep dive into Anthropic's Claude Code CLI β what it does, how it works, and why it might change the way you write software.
Hacks: The Prompt That Turns a Code Diff Into a Changelog People Will Actually Read
The prompt that takes a raw diff and turns it into the changelog entry you were going to write tomorrow and now don't have to.