Browser Automation Agents
AI-powered browser automation tools that can navigate websites, extract data, and perform web interactions autonomously for testing and scraping tasks.
Open agentic framework for computer-use agents that operate desktop environments. Agent-S focuses on planning, memory, grounding, and GUI interaction across operating systems and applications.
An AI-powered query language and suite of tools for connecting AI agents to the web, featuring natural language selectors for web scraping and automation. AgentQL offers resilient, self-healing web element location that adapts to website changes, with SDKs for Python and JavaScript, REST API, and integrations with popular frameworks like LangChain.
Secure cloud browser infrastructure for computer-use agents and automated web workflows. Anchor Browser provides hosted Chromium environments, authentication support, VPN options, and enterprise controls for agent execution.
Cloud platform for web scraping, browser automation, and AI agents with 7,000+ ready-made tools and automation solutions. Apify provides AI Web Agent for natural language web browsing, MCP Server integration enabling agents to extract data from social media and search engines, and specialized agents for data extraction, monitoring, and analysis. Features integration with LangChain, LlamaIndex, and the wider LLM ecosystem for building production-ready web automation workflows.
Open-source web navigation agent from Tsinghua's THUDM group. AutoWebGLM uses large language models, simplified HTML representations, and environment feedback to complete browser tasks across Chinese and English websites.
A no-code browser automation and RPA tool that enables users to build browser bots for automating website actions and repetitive tasks. Axiom.ai features a Chrome extension for easy bot building, integration with Zapier and Make, AI-powered automation with ChatGPT, and supports both scheduled and triggered bot execution.
An AI-powered data extraction platform that transforms websites into live data pipelines without requiring coding skills. Browse AI enables users to easily scrape web data, monitor webpage changes, and turn websites into APIs with built-in bot detection avoidance, proxy management, and automated data extraction workflows.
An open-source library that empowers AI agents to interact with web browsers using natural language commands for automated web tasks. Browser Use provides a simple interface for AI agents to control browsers, extract data from websites, fill forms, and perform complex web interactions without requiring specific selectors or manual configuration.
A headless browser infrastructure platform specifically designed for AI agents and applications, offering scalable browser automation capabilities. Browserbase provides stealth features, captcha solving, residential proxies, and comprehensive observability tools, making it ideal for AI-powered web automation with enterprise-grade security and compliance.
A new capability from Anthropic that allows Claude to interact with computer interfaces by looking at screens, moving cursors, clicking buttons, and typing text. This represents a significant advancement in AI agents' ability to perform complex computer tasks autonomously.
Open-source vision-language GUI agent project for computer-use research. CogAgent provides models and code for understanding interface screenshots and grounding actions in desktop or web environments.
Open-weight mini version of Convergence AI's Proxy assistant for web automation. Proxy Lite demonstrates browser task execution and web interaction capabilities from the Convergence agent research line.
Open-source crawler and scraper that converts websites into LLM-friendly content for agents and RAG workflows. Crawl4AI supports asynchronous crawling, browser automation, extraction, and Markdown-oriented outputs.
API and open-source platform for searching, scraping, crawling, and interacting with the web for AI applications. Firecrawl converts pages into Markdown, JSON, screenshots, and agent-ready web data.
Google DeepMind browser agent research project for navigating and acting on web interfaces. Project Mariner uses multimodal reasoning to understand browser state, plan tasks, and execute actions in web workflows.
Cloud browser platform for automated browser sessions at scale. Hyperbrowser supports browser-use agents, computer-use models, stealth browsing, integrations, and hosted infrastructure for web automation.
Browser-based AI personal assistant that can research, manage email, fill forms, and complete online tasks through HyperWrite's Chrome extension and web assistant experience.
Open-source service for converting URLs into LLM-friendly Markdown input. Jina AI Reader supports simple URL prefixes for page reading and search workflows used by agents and RAG systems.
Open-source framework for building AI web agents that can operate web interfaces. LaVague combines LLM planning with Selenium or Playwright execution for browser task automation and QA workflows.
Vision-driven UI automation framework for web and mobile interfaces. Midscene.js lets developers write natural-language or TypeScript/YAML automation that interacts with UI through visual understanding.
Research family of GUI agents for operating mobile devices. Mobile-Agent uses multimodal perception and planning to automate Android and HarmonyOS tasks through visual understanding and device actions.
Hosted web automation API for executing browser tasks from natural language commands. MultiOn exposes an autonomous browsing API and documentation for agents that browse websites, complete multi-step actions, extract data, and interact with web applications.
Framework for building reliable browser-using AI agents and serverless web automation functions. Notte focuses on production browser workflows, hosted browser infrastructure, and AI-assisted web task execution.
Microsoft Research screen-parsing tool for GUI agents. OmniParser converts screenshots into structured UI elements that vision-language agents can use for computer control and interface automation.
OpenAI browser-using agent powered by a Computer-Using Agent model. The original Operator research preview has since been integrated into ChatGPT agent mode, where it can inspect and interact with web interfaces.
Benchmark for evaluating multimodal agents on open-ended computer tasks in real operating-system environments. OSWorld includes task suites and tooling for browser, file, coding, and desktop automation evaluation.
A framework for web testing and automation that has been enhanced with AI-powered tools for intelligent web interaction. Playwright provides cross-browser automation capabilities and has been integrated with various AI tools like Auto-Playwright and ZeroStep for natural language test generation and AI-powered element selection.
Official Microsoft MCP server exposing Playwright browser automation to AI agents. Playwright MCP uses accessibility snapshots for deterministic, token-efficient browser interaction without relying on screenshots.
A Node.js library providing a high-level API to control Chrome or Firefox browsers, increasingly used in AI agent implementations for web automation. Puppeteer serves as the foundation for many AI-powered web agents, offering programmatic browser control, screenshot capabilities, and integration with vision models for intelligent web navigation.
Python scraping framework that uses LLMs and graph-based pipelines to extract structured data from websites. ScrapeGraphAI supports natural-language extraction prompts and integrations with common AI workflow tools.
Research project for computer control with visual language models. ScreenAgent uses a plan-action-reflection loop and accompanying dataset to study desktop task execution from screen observations.
Open-source framework that lets multimodal models operate a computer by observing the screen and issuing actions. Self-Operating Computer is an early reference project for desktop computer-use agents.
Open-source vision-language-action model for GUI agents and computer use. ShowUI provides research code and models for grounding visual UI understanding into executable actions.
Open-source browser workflow automation platform for AI agents. Skyvern uses browser automation and AI reasoning to complete form filling, data extraction, and repetitive web tasks without relying only on brittle selectors.
Browserbase SDK for building browser agents on top of Playwright. Stagehand adds LLM-friendly act, extract, and observe primitives so developers can combine deterministic browser automation with natural-language instructions.
Open-source browser API and sandbox for AI agents and web automation. Steel Browser provides managed browser sessions, infrastructure primitives, and automation tooling for agent applications.
Open-source vision utilities library for multimodal web agents by Reworkd, enabling AI models to interact with and automate web browsers through visual understanding. Tarsier visually tags interactable elements with brackets and IDs for LLM action mapping, includes OCR algorithm for converting screenshots to structured text, and achieves 10-20% performance gains on benchmarks. Used extensively in production for tens of thousands of real web tasks.
Open-source multimodal AI agent stack for desktop, browser, and mobile GUI automation. UI-TARS Desktop connects vision-language models with agent infrastructure for computer-use workflows.
Research project for multimodal web agents that navigate real websites using visual understanding. WebVoyager introduced an end-to-end benchmark and reference implementation for web task completion with large multimodal models.
No Results Found
Try adjusting your search or filters