[BA]

Browser Automation Agents

AI-powered browser automation tools that can navigate websites, extract data, and perform web interactions autonomously for testing and scraping tasks.

42 Entries GitHub Stats Available
Showing 42 of 42 entries

Open agentic framework for computer-use agents that operate desktop environments. Agent-S focuses on planning, memory, grounding, and GUI interaction across operating systems and applications.

Framework Python 11310 stars Apache-2.0
#benchmark #computer-use #desktop-agent #gui-agent #open-source

An AI-powered query language and suite of tools for connecting AI agents to the web, featuring natural language selectors for web scraping and automation. AgentQL offers resilient, self-healing web element location that adapts to website changes, with SDKs for Python and JavaScript, REST API, and integrations with popular frameworks like LangChain.

Tool

Secure cloud browser infrastructure for computer-use agents and automated web workflows. Anchor Browser provides hosted Chromium environments, authentication support, VPN options, and enterprise controls for agent execution.

Platform Paid Web
#browser-automation #cloud-browser #computer-use #enterprise #security

Cloud platform for web scraping, browser automation, and AI agents with 7,000+ ready-made tools and automation solutions. Apify provides AI Web Agent for natural language web browsing, MCP Server integration enabling agents to extract data from social media and search engines, and specialized agents for data extraction, monitoring, and analysis. Features integration with LangChain, LlamaIndex, and the wider LLM ecosystem for building production-ready web automation workflows.

Tool

Open-source web navigation agent from Tsinghua's THUDM group. AutoWebGLM uses large language models, simplified HTML representations, and environment feedback to complete browser tasks across Chinese and English websites.

Framework Python 932 stars Apache-2.0
#open-source #web-agent #browser-automation #research #llm

A no-code browser automation and RPA tool that enables users to build browser bots for automating website actions and repetitive tasks. Axiom.ai features a Chrome extension for easy bot building, integration with Zapier and Make, AI-powered automation with ChatGPT, and supports both scheduled and triggered bot execution.

Tool

An AI-powered data extraction platform that transforms websites into live data pipelines without requiring coding skills. Browse AI enables users to easily scrape web data, monitor webpage changes, and turn websites into APIs with built-in bot detection avoidance, proxy management, and automated data extraction workflows.

Tool

An open-source library that empowers AI agents to interact with web browsers using natural language commands for automated web tasks. Browser Use provides a simple interface for AI agents to control browsers, extract data from websites, fill forms, and perform complex web interactions without requiring specific selectors or manual configuration.

Tool 93948 stars

A headless browser infrastructure platform specifically designed for AI agents and applications, offering scalable browser automation capabilities. Browserbase provides stealth features, captcha solving, residential proxies, and comprehensive observability tools, making it ideal for AI-powered web automation with enterprise-grade security and compliance.

Tool

A new capability from Anthropic that allows Claude to interact with computer interfaces by looking at screens, moving cursors, clicking buttons, and typing text. This represents a significant advancement in AI agents' ability to perform complex computer tasks autonomously.

Tool

Open-source vision-language GUI agent project for computer-use research. CogAgent provides models and code for understanding interface screenshots and grounding actions in desktop or web environments.

Research Python 1179 stars Apache-2.0
#computer-use #gui-agent #open-source #research #vision-language-model

Open-weight mini version of Convergence AI's Proxy assistant for web automation. Proxy Lite demonstrates browser task execution and web interaction capabilities from the Convergence agent research line.

Framework 988 stars
#open-source #browser-automation #web-agent #computer-use #research

Open-source crawler and scraper that converts websites into LLM-friendly content for agents and RAG workflows. Crawl4AI supports asynchronous crawling, browser automation, extraction, and Markdown-oriented outputs.

Tool Python 65540 stars Apache-2.0
#browser-automation #llm-crawler #open-source #python #rag #web-scraping

Open-source infrastructure for computer-use agents, including sandboxes, SDKs, and benchmarks for full desktop control. CUA supports macOS, Linux, and Windows agent environments.

Framework Python 16717 stars MIT
#benchmark #computer-use #desktop-agent #open-source #sandbox

API and open-source platform for searching, scraping, crawling, and interacting with the web for AI applications. Firecrawl converts pages into Markdown, JSON, screenshots, and agent-ready web data.

Platform TypeScript 119882 stars AGPL-3.0
#api #browser-automation #llm-crawler #open-source #web-scraping

Google DeepMind browser agent research project for navigating and acting on web interfaces. Project Mariner uses multimodal reasoning to understand browser state, plan tasks, and execute actions in web workflows.

Research Web
#browser-automation #computer-use #google #hosted-service #multimodal

Cloud browser platform for automated browser sessions at scale. Hyperbrowser supports browser-use agents, computer-use models, stealth browsing, integrations, and hosted infrastructure for web automation.

Platform Paid Web
#api #browser-automation #cloud-browser #computer-use #hosted

Browser-based AI personal assistant that can research, manage email, fill forms, and complete online tasks through HyperWrite's Chrome extension and web assistant experience.

Platform Freemium
#browser-automation #personal-assistant #productivity #browser-extension #automation

Open-source service for converting URLs into LLM-friendly Markdown input. Jina AI Reader supports simple URL prefixes for page reading and search workflows used by agents and RAG systems.

Tool 10813 stars Apache-2.0
#api #llm-reader #markdown #open-source #rag #web-scraping

Open-source framework for building AI web agents that can operate web interfaces. LaVague combines LLM planning with Selenium or Playwright execution for browser task automation and QA workflows.

Framework Python 6344 stars Apache-2.0
#browser-automation #open-source #playwright #selenium #web-agent

Open-source vision-first browser agent and test runner. Magnitude uses pixel-level browser interaction for natural-language browser automation and evaluation workflows.

Tool TypeScript 4050 stars Apache-2.0
#browser-automation #open-source #playwright #testing #vision

Vision-driven UI automation framework for web and mobile interfaces. Midscene.js lets developers write natural-language or TypeScript/YAML automation that interacts with UI through visual understanding.

Framework TypeScript 13031 stars MIT
#android #browser-automation #ios #mobile-agent #typescript #vision

Research family of GUI agents for operating mobile devices. Mobile-Agent uses multimodal perception and planning to automate Android and HarmonyOS tasks through visual understanding and device actions.

Research Python 8666 stars MIT
#android #mobile-agent #multi-agent #open-source #research #vision

Hosted web automation API for executing browser tasks from natural language commands. MultiOn exposes an autonomous browsing API and documentation for agents that browse websites, complete multi-step actions, extract data, and interact with web applications.

Platform
#browser-automation #web-agent #api #hosted-service #data-extraction

Framework for building reliable browser-using AI agents and serverless web automation functions. Notte focuses on production browser workflows, hosted browser infrastructure, and AI-assisted web task execution.

Framework Python 1957 stars
#browser-automation #open-source #playwright #python #web-agent

Microsoft Research screen-parsing tool for GUI agents. OmniParser converts screenshots into structured UI elements that vision-language agents can use for computer control and interface automation.

Tool Python 24759 stars CC-BY-4.0
#computer-use #gui-agent #microsoft #screen-parsing #vision

OpenAI browser-using agent powered by a Computer-Using Agent model. The original Operator research preview has since been integrated into ChatGPT agent mode, where it can inspect and interact with web interfaces.

Tool Paid ChatGPT
#browser-automation #computer-use #hosted-service #openai

Benchmark for evaluating multimodal agents on open-ended computer tasks in real operating-system environments. OSWorld includes task suites and tooling for browser, file, coding, and desktop automation evaluation.

Research 2845 stars Apache-2.0
#benchmark #computer-use #desktop-agent #open-source #research

A framework for web testing and automation that has been enhanced with AI-powered tools for intelligent web interaction. Playwright provides cross-browser automation capabilities and has been integrated with various AI tools like Auto-Playwright and ZeroStep for natural language test generation and AI-powered element selection.

Tool

Official Microsoft MCP server exposing Playwright browser automation to AI agents. Playwright MCP uses accessibility snapshots for deterministic, token-efficient browser interaction without relying on screenshots.

Tool TypeScript 32516 stars Apache-2.0
#accessibility #browser-automation #mcp #microsoft #playwright

A Node.js library providing a high-level API to control Chrome or Firefox browsers, increasingly used in AI agent implementations for web automation. Puppeteer serves as the foundation for many AI-powered web agents, offering programmatic browser control, screenshot capabilities, and integration with vision models for intelligent web navigation.

Tool

Python scraping framework that uses LLMs and graph-based pipelines to extract structured data from websites. ScrapeGraphAI supports natural-language extraction prompts and integrations with common AI workflow tools.

Tool Python 25317 stars MIT
#data-extraction #llm #open-source #python #web-scraping

Research project for computer control with visual language models. ScreenAgent uses a plan-action-reflection loop and accompanying dataset to study desktop task execution from screen observations.

Research Python 598 stars
#benchmark #computer-use #desktop-agent #research #vision-language-model

Open-source framework that lets multimodal models operate a computer by observing the screen and issuing actions. Self-Operating Computer is an early reference project for desktop computer-use agents.

Framework Python 10247 stars MIT
#computer-use #desktop-agent #multimodal #open-source

Open-source vision-language-action model for GUI agents and computer use. ShowUI provides research code and models for grounding visual UI understanding into executable actions.

Research Python 1831 stars Apache-2.0
#computer-use #gui-agent #open-source #research #vision-language-model

Open-source browser workflow automation platform for AI agents. Skyvern uses browser automation and AI reasoning to complete form filling, data extraction, and repetitive web tasks without relying only on brittle selectors.

Tool Python 21609 stars AGPL-3.0
#browser-automation #open-source #playwright #web-agent

Browserbase SDK for building browser agents on top of Playwright. Stagehand adds LLM-friendly act, extract, and observe primitives so developers can combine deterministic browser automation with natural-language instructions.

Framework TypeScript 22657 stars MIT
#browser-automation #llm #playwright #sdk #typescript

Open-source browser API and sandbox for AI agents and web automation. Steel Browser provides managed browser sessions, infrastructure primitives, and automation tooling for agent applications.

Platform TypeScript 7022 stars
#api #browser-automation #cloud-browser #headless #open-source

Open-source vision utilities library for multimodal web agents by Reworkd, enabling AI models to interact with and automate web browsers through visual understanding. Tarsier visually tags interactable elements with brackets and IDs for LLM action mapping, includes OCR algorithm for converting screenshots to structured text, and achieves 10-20% performance gains on benchmarks. Used extensively in production for tens of thousands of real web tasks.

Tool 1761 stars

Search and extraction API built for LLM agents and RAG workflows. Tavily provides web search, URL extraction, and crawl endpoints designed to return clean content for agent applications.

Tool Freemium API
#api #llm-search #rag #web-scraping

Open-source multimodal AI agent stack for desktop, browser, and mobile GUI automation. UI-TARS Desktop connects vision-language models with agent infrastructure for computer-use workflows.

Tool TypeScript 33926 stars Apache-2.0
#browser-automation #computer-use #gui-agent #mcp #mobile-agent #open-source

Research project for multimodal web agents that navigate real websites using visual understanding. WebVoyager introduced an end-to-end benchmark and reference implementation for web task completion with large multimodal models.

Research Python 1088 stars Apache-2.0
#benchmark #browser-automation #multimodal #research #web-agent