Skip to main content
0
A

AI Employe

Create browser automation as if you were teaching a human using GPT-4 Vision.

Rating

0.0

Votes

0

score

Downloads

0

total

Price

Free

No login needed

Works With

Claude CodeCursorWindsurfVS CodeDeveloper tool

About

Install

Try without Firebase authentication (temporary solution): https://github.com/vignshwarar/AI-Employe/issues/2#issuecomment-1880328518

Our stack consists of Next.js, Rust, Postgres, MeiliSearch, and Firebase Auth for authentication. Please sign up for a Firebase account and create a project.

In Firebase, navigate to Project settings -> Service accounts, generate a private key, and save it inside `firebaseAdmin/cert/dev.json` if it's for development or prod.json if it's for production.

After that, make sure you install the dependencies before starting the app.

  • Copy the the .env.sample file to .env.production or .env.development
  • Fill the .env file with your credentials
  • Run npm install
  • Run npm run db:deploy
  • Run npm run dev (for development)
  • Run npm run build (for production)
  • Run npm run start (for production)

Once you have run 'dev' or 'build', you will find the extension built inside the ./client/extension/build folder. You can then load this folder as an unpacked extension in your browser.

How it Works

There are several problems with current browser agents. Here, we explain the problems and how we have solved them.

Problem 1: Finding the Right Element

There are several techniques for this, ranging from sending a shortened form of HTML to GPT-3, creating a bounding box with IDs and sending it to GPT-4-vision to take actions, or directly asking GPT-4-vision to obtain the X and Y coordinates of the element. However, none of these methods were reliable; they all led to hallucinations.

To address this, we developed a new technique where we index the entire DOM in MeiliSearch, allowing GPT-4-vision to generate commands for which element's inner text to click, copy, or perform other actions. We then search the index with the generated text and retrieve the element ID to send back to the browser to take action. There are a few limitations here, but we have implemented some techniques to overcome them, such as dealing with the same text in multiple elements or clicking on an icon (we are still working on this).

Problem 2: GPT Derailing from Workflow

Don't lose this

Three weeks from now, you'll want AI Employe again. Will you remember where to find it?

Save it to your library and the next time you need AI Employe, it’s one tap away — from any AI app you use. Group it into a bench with the rest of the team for that kind of task and you can pull the whole stack at once.

⚡ Pro tip for geeks: add a-gnt 🤵🏻‍♂️ as a custom connector in Claude or a custom GPT in ChatGPT — one click and your library is right there in the chat. Or, if you’re in an editor, install the a-gnt MCP server and say “use my [bench name]” in Claude Code, Cursor, VS Code, or Windsurf.

🤵🏻‍♂️

a-gnt's Take

Our honest review

Create browser automation as if you were teaching a human using GPT-4 Vision. Best for anyone looking to make their AI assistant more capable in data & databases. It's completely free and works across most major AI apps. This one just landed in the catalog — worth trying while it's fresh.

Tips for getting started

1

Tap "Get" above, pick your AI app, and follow the steps. Most installs take under 30 seconds.

2

Your data stays between you and your AI — nothing is shared with us or anyone else.

What's New

Version 1.0.06 days ago

Imported from GitHub

Ratings & Reviews

0.0

out of 5

0 ratings

No reviews yet. Be the first to share your experience.