신규

에이전트 S: 컴퓨터를 인간처럼 사용하는 개방형 에이전트 프레임워크

2025년 2월 27일

사켓 아가셰*,

주저우 한*,

헤이!몇 달 전에 프린스턴 대학교에서 에이전트와 Simular에 대한 제 생각에 대해 강연을 한 적이 있어요.요약을 정리해서 블로그 게시물로 바꿔야겠다고 생각했습니다.

최첨단 퍼포먼스

저의 첫 직장은 Google DeepMind에서 연구 과학자로 일했는데, 여기서 제 역할 중 중요한 부분은 다양한 Google 제품 팀과 협업하여 최첨단 AI 기술을 적용할 기회를 파악하는 것이었습니다.하지만 한 구글 직원이 제게 전혀 관련 없는 질문을 던져 결국 DeepMind를 떠나 Simular를 시작하기로 결정하게 됐을 수도 있습니다.

에이전트 S는 뉴 에이전트
뼈대 활성화하도록 설계
다음과 같이 사용할 컴퓨터
인간처럼 직관적으로

경험 증강 계층 계획 방법을 소개합니다.이 방법은 자주 변경되는 소프트웨어 및 웹 사이트에 대한 최신 정보를 얻을 수 있는 온라인 웹 지식을 활용하고, 내러티브 메모리를 활용하여 과거 상호 작용에서 얻은 높은 수준의 경험을 활용합니다.에이전트 S는 복잡한 작업을 관리 가능한 하위 작업으로 나누고 에피소드 메모리를 사용하여 단계별 지침을 제공함으로써 지속적으로 작업을 개선하고 경험을 통해 학습하여 조정 가능하고 효과적인 작업 계획을 수립합니다.

초록

다음과 같은 개방형 에이전트 프레임워크인 Agent S를 소개합니다. 자율적 상호작용을 가능하게 함 복잡한 다단계 작업을 자동화하여 인간-컴퓨터 상호 작용을 혁신하는 것을 목표로 하는 그래픽 사용자 인터페이스 (GUI) 를 통해 컴퓨터와 함께

이를 위해 Agent S는 다양한 수준에서 외부 지식 검색과 내부 경험 검색을 통해 학습하는 경험 증강 계층 계획을 도입하여 효율적인 작업 계획 및 하위 작업 실행을 촉진합니다.

또한 에이전트-컴퓨터 인터페이스를 사용하여 멀티모달 대형 언어 모델을 기반으로 하는 GUI 에이전트의 추론 및 제어 기능을 더 잘 이끌어냅니다.OSWorld 벤치마크를 바탕으로 평가한 결과, Agent S는 성공률에서 기준 대비 9.37% (상대 개선 83.6%) 더 뛰어난 성능을 보이며 새로운 첨단 기술을 달성한 것으로 나타났습니다.종합적인 분석을 통해 개별 구성 요소의 효율성을 강조하고 향후 개선을 위한 통찰력을 얻을 수 있습니다.

또한 Agent S는 새로 출시된 버전에서 다양한 운영 체제에 대한 광범위한 일반화 기능을 보여줍니다.
윈도우 에이전트 아레나 벤치마크.

에이전트 S는 컴퓨터 작업 자동화의 세 가지 주요 문제를 해결합니다.

작업 지침

도와주세요 계정 제거 “anonym-x2024@outlook.com”

개요 에이전트 S 프레임워크

작업 Tu와 초기 환경 관찰 0o가 주어지면 관리자는 웹 지식과 내러티브 메모리를 사용하여 경험 증강 계층적 계획을 수행하여 하위 작업 So,..., Sn을 생성합니다.작업자 Wi는 각 Si에 대해 에피소드 메모리를 활용하여 시간 t에 작업을 생성하고, ACI가 이 작업을 실행하여 다음 즉각적인 관찰값을 ot+1로 반환합니다.자체 평가 모듈은 요약된 하위 작업 및 전체 작업 궤적을 내러티브 메모리와 에피소드 메모리에 저장하여 루프를 닫습니다.

의 파이프라인 메모리 구성 및 업데이트

메모리 구성 및 업데이트의 파이프라인으로, 자체 지도 탐색과 지속적인 메모리 업데이트라는 두 단계로 구성됩니다.초기 내러티브 및 에피소드 기억은 탐구 단계에서 무작위로 선별된 몇 가지 작업을 통해 구성되며, 추론 작업을 기반으로 지속적으로 업데이트됩니다.

Pipeline of Memory Construction and Update

주요 결과

이 표는 전체 OSWorld 테스트 세트에서 평가된 Agent S와 기준 모델 간의 성능 비교를 보여줍니다.GPT-4o 모델의 경우 에이전트 S의 전체 성공률은 20.58% 로 가장 적합한 기준 (GPT-4o, 11.21%) 의 성능을 거의 두 배로 높였습니다.

에이전트 S는 각각 27.06% 와 36.73% 의 성공률에 도달한 '일일' 작업과 '전문가' 작업에서 기준선보다 지속적으로 높은 성과를 보였습니다. 이는 12.33% 와 14.29% 의 최고 기준 결과입니다.이러한 작업은 일반적으로 일상 생활에서 사용되거나 지식 집약적인 전문 응용 프로그램과 관련되어 있습니다. Agent S의 검색 기능을 통해 더 많은 이점을 얻을 수 있습니다. Claude-3.5-Sonnet과 GPT-4o 모두 대부분의 작업에서 기본 버전보다 성능이 뛰어납니다.심지어 클로드 3.5-소넷은 “일상” 및 “전문” 작업에서도 GPT-4o보다 성능이 뛰어납니다.

결과는 다양하고 복잡한 작업을 기본 접근 방식보다 더 효과적으로 처리하는 에이전트 S의 향상된 능력을 보여줍니다.

전체 369개 테스트 예제의 OSWorld 전체 테스트 세트에 대한 성공률 (%) 의 주요 결과

분석

Agent S의 개별 모듈의 효과를 입증하기 위해 65개의 하위 집합을 계층화했습니다.
인스턴스, 절제 연구를 위한 전체 테스트 세트의 testsub입니다.추론 비용을 고려하여 gPT-4o를 다음과 같이 활용했습니다.
베이스라인과 에이전트 S 모두에 대한 모든 절제 연구를 위한 LLM 백본

경험을 통한 학습은 GUI 에이전트의 도메인 지식을 향상시킵니다.

Main results of Successful Rate (%) on the OSWorld full test set of all 369 test examples

전체 369개 테스트 예제의 OSWorld 전체 테스트 세트에 대한 성공률 (%) 의 주요 결과

웹 지식으로 제공되는 보편적인 경험을 통해 에이전트 S는 광범위한 작업에 대해 정보에 입각한 계획을 세울 수 있으며 가장 큰 영향을 미칩니다.내러티브 기억과 에피소드 기억을 통한 학습은 웹 검색과 효과적으로 시너지 효과를 발휘하며, 결과를 보면 이러한 기억의 절제가 에이전트의 복잡한 작업 처리 능력에 어떤 영향을 미치는지 자세히 설명하여 경험적 학습의 가치를 잘 보여줍니다.이러한 결과는 각 구성 요소가 상담원의 영역 지식을 향상시키는 데 중요한 역할을 한다는 것을 보여줍니다.세 가지 구성 요소를 모두 제거하면 (모두 제외) 성능이 크게 저하되어 설계 경험을 통한 학습의 중요성이 드러납니다.

ACI는 LLM의 더 나은 추론 능력을 이끌어냅니다 더 나은 에이전트 학습을 지원합니다.

기준선을 에이전트 S (ACI 전용) 와 비교하면 ACI를 통합하여 얻을 수 있는 향상된 추론 능력을 강조할 수 있습니다.또한 체험 학습 프로세스를 통합하여 ACI가 에이전트 러닝에 미치는 영향을 조사했습니다.기본적으로 체험 학습을 추가하면 전반적인 성과가 약간 향상되었습니다.그러나 에이전트 S (ACI 전용) 에 추가했을 때 성능이 크게 향상되어 ACI가 에이전트 학습을 향상시키는 데 효과가 있다는 것을 알 수 있습니다.

계층적 계획 지원
장기 워크플로

의 ACI 전용+체험 학습 설정에서는 계층적 계획이 없는 Agent S의 성능과 관찰된 성능 저하를 보여줍니다. 전체 에이전트 S와 비교했을 때 (26.15% ~ 20.00%) 는 장기 워크플로 모델링에서 계층적 계획이 얼마나 중요한지 잘 보여줍니다.관리자가 하위 작업 계획 단계에서 더 상세하고 정확한 계획을 수립할 수 있기 때문에 경험적 학습이 있는 경우 계층적 공식화의 효과가 두드러집니다.

탐색, 지속적인 메모리 업데이트 및 자체 평가기는 메모리 구성에 없어서는 안될 필수 요소입니다.

탐색을 제거하면 메모리 업데이트가 추론 단계로만 제한됩니다.지속적 메모리 업데이트를 제거한다는 것은 후속 업데이트 없이 탐색 단계에서 얻은 메모리만 사용한다는 의미입니다.자체 평가 도구를 제거하려면 요약된 경험을 원래의 전체 궤적으로 대체해야 합니다.그 결과 지속적인 메모리 업데이트와 자체 지도 탐색 단계를 모두 없애면 성능이 저하되고 자체 지도 탐색이 훨씬 더 큰 영향을 미치는 것으로 나타났습니다.자체 평가기를 없앤 것은 계획을 위한 전체 궤적 예시 대신 요약된 궤적을 사용할 때의 이점을 더욱 잘 보여줍니다.

다른 것으로의 일반화 운영 체제

우리는 작업과 동시에 출시된 Windows OS 벤치마크인 WindowsAgentArena에서 수정 없이 에이전트 S 프레임워크를 테스트합니다.GPT-4o를 MLLM 백본으로, 접근성 트리+이미지를 입력으로 사용하고, OCR을 사용한 구문 분석을 사용하여 유사한 구성을 가진 에이전트 S를 비교합니다.표에서 볼 수 있듯이 에이전트 S는 새로운 Windows 환경에 적응하지 않고도 Navi 에이전트보다 성능이 뛰어납니다.

Results of Successful Rate (%) on WindowsAgentArena using GPT-4o and Image + Accessibility Tree input on the full test set of all 154 test examples

Windows AgentArena에서 GPT-4o 및 Image+ 접근성 트리 입력을 전체적으로 사용한 성공률 (%) 결과

빕텍스

@misc {에이전트,
  title= {Agent S: 컴퓨터를 사람처럼 사용하는 개방형 에이전트 프레임워크},
  저자= {사켓 아가쉬*, 주저우 한*, 슈유 간, 지아첸 양, 앙 리, 신 에릭 왕},
연도= {2024},
  인쇄= {},
  아카이브 접두사= {arXiv},
  프라이머리 클래스= {cs.AI} 
}

Understanding the AI Agentic Framework

The AI agentic framework is a modern approach that combines artificial intelligence (AI) with agent-based modeling. This combination aims to improve decision-making processes. With this framework, intelligent agents can work on their own within a system, which makes workflows smoother and promotes collaboration. By using machine learning and automation, the agentic framework creates a solid foundation for developing multi-agent systems that adjust to various situations.

Here are some key components of this framework:

Intelligent Agents: These software entities can take independent actions to achieve specific goals.
Decision-Making Algorithms: These algorithms help agents make informed choices based on the information they receive.
Agent Systems: This refers to groups of interconnected agents collaborating to complete complex tasks.

Microsoft and other tech leaders are using this framework to create smarter applications that need less human involvement.

Key Concepts of the Agentic Framework

The agentic framework includes several important concepts that are essential for its successful application:

Agent-Based Framework: A setup where individual agents work together to accomplish tasks, boosting efficiency.
Agentic Approach: This method encourages agents to act independently and highlights their ability to learn and adapt.
Workflows: Built in AI workplace assistants, these are the planned paths that agents follow to enhance processes and ensure smooth task execution.
Human-Agent Interaction: This is how humans communicate and guide the agents.

By incorporating languages like Python, developers can effectively use design patterns, adaptive agents, and debugging methods. This integration helps create better feedback loops and improves the overall performance of the system.

Applications of AI Agentic Framework

The applications of the AI agentic framework are broad and relevant across various fields:

AI Framework Variations: Different types can be adjusted to meet specific industry needs, ensuring flexibility.
AI Solutions: From virtual assistants to intricate management systems, these solutions expand operational possibilities.
Agent Orchestration: This involves coordinating multiple agents to achieve unified results.
Security and Management: The framework helps boost organizational efficiency while upholding security standards.

Prominent examples include platforms like GitHub and tools such as Langchain, showcasing how agentic AI can be implemented in real-world settings. These applications illustrate how intelligent systems can reshape business functions and enhance user experiences.

Benefits of Using an Agentic Framework

Using an agentic framework comes with many advantages:

Efficiency: It increases productivity by automating repetitive tasks, reducing the need for manual work.
Quality Management: The framework ensures consistent quality in results through structured processes.
Continuous Integration: Updates and improvements become easier, keeping the systems current and effective.
Cooperative Agents: It encourages collaboration among different agents, leading to improved problem-solving abilities.

This framework also addresses ethical concerns in AI, promoting transparency and responsible use of self-learning agents.

Challenges in Implementing Agentic Frameworks

While there are clear benefits, organizations may face a few challenges when adopting agentic frameworks:

Data Privacy: Protecting sensitive data is critical when implementing intelligent systems.
AI Governance: Setting regulations is necessary to manage the proper use and oversight of AI technologies.
Agent Performance Metrics: Finding suitable metrics to measure how well agents perform their tasks is essential.
Real-Time Agents: Managing agents in fast-paced environments requires advanced strategies and resilient systems.

Tackling these challenges is vital for successfully integrating the AI agentic model into existing systems to ensure safety and trustworthiness.

Conclusion

The AI agentic framework shows promise in the realm of artificial intelligence by providing a structured way to effectively utilize intelligent systems. By grasping its core concepts, applications, benefits, and challenges, organizations can better leverage AI to foster innovation and enhance efficiency.

Feel free to explore more about the AI agentic framework or share your opinions in the comments! Your questions and insights are valuable as we move forward in this exciting field.

Understanding the AI Agentic Framework

The AI agentic framework is a collection of ideas and methods aimed at creating intelligent systems that can act and make decisions on their own. This framework enhances collaboration between human users and artificial intelligence (AI) agents, promoting smooth workflows and effective automation.

Key aspects of the agent-based framework include:

Intelligent Agent Frameworks: These form the foundation for developing AI solutions that function in real-time.
Collaboration Mechanisms: Good communication among multiple agents boosts system performance.
Human-Agent Interaction: This part emphasizes how people can work alongside cognitive agents, leading to better experiences.
Multi-Agent Systems: Different agents work together to accomplish complex tasks, which may be too much for a single agent to handle.

You can see real-world applications of this framework in areas like healthcare, finance, and logistics, where AI applications enhance processes, lower mistakes, and improve results.

Key Components of an Agentic Approach

An agentic approach consists of essential components that define how it works and its effectiveness.

Agent Autonomy: The level of independence an agent has is crucial for effective automation.
Decision-Making Algorithms: These allow agents to evaluate situations and make smart choices based on current data.
Agent-Based Modeling: This method helps simulate interactions within a system, improving understanding and optimization.
Design Patterns: Established design patterns assist with programming agent systems, making them easier to maintain and scale.
Agent Cooperation: Successful implementation depends on agents working together smoothly.

A strong agentic model includes these components, enabling powerful agent technologies that drive innovation across various sectors.

Applications of the AI Agentic Framework

The AI agent framework has many applications across different sectors, highlighting its flexibility and effectiveness.

Some noteworthy examples are:

Project Management: AI agents improve project workflows, ensuring tasks are completed quickly and on time.
Data Privacy: Intelligent agents help manage sensitive data while ensuring compliance with regulations like GDPR.
Autonomous Agents: These self-operating agents take care of repetitive tasks, such as entering data so that humans can concentrate on strategic work.
Task-Oriented Agents: Designed to perform specific functions, these agents carry out tasks with great accuracy.

Leading companies like Microsoft and Nvidia utilize the agentic AI framework, showing how AI capabilities can be integrated effectively into their operations.

Benefits of Implementing Agentic Systems

Implementing agentic systems brings a variety of benefits that can boost efficiency and effectiveness in organizations:

Automation: Cuts down on manual work, speeding up task completion.
Ease of Use: Built with user experience in mind, making acceptance simple.
Real-Time Analytics: Offers instant feedback, supporting data-driven decisions.
AI Ethics: Complies with ethical standards, building trust with users.
Performance Metrics: Measures agent effectiveness, promoting continuous improvement.

These benefits explain why many organizations are adopting agentic variations to stay competitive in their fields.

Challenges and Considerations

While the agentic framework offers many chances for improvement, it also presents challenges that businesses should think about:

Security Risks: Protecting data and systems from cyber threats is crucial.
Complexity: Creating and implementing multi-agent systems can be intricate and time-consuming.
Data Governance: Organizations must follow regulations and best practices for data management.
AI Accountability: Figuring out who is responsible when AI makes decisions is an important concern.

Addressing these challenges requires a solid grasp of the framework's varieties and the underlying technologies, along with effective governance and accountability strategies in distributed AI systems.

Call to Action

Are you interested in exploring the potential of the AI agentic framework? Join the conversation below, share your thoughts, or learn more about how Simular AI can assist you in embracing intelligent automation.

Understanding the AI Agentic Framework

The AI Agentic Framework marks a significant change in how we design and use artificial intelligence (AI) systems. This framework aims to create intelligent systems that can make decisions on their own, work together with other agents, and adjust to changing environments. It serves as a foundational structure for cognitive agents to interact, manage workflows, and respond to dynamic situations effectively.

Key aspects include:

Agent-based Approach: This involves using independent entities that act according to specific guidelines and goals.
Multi-Agent Systems: These systems enable various agents to collaborate, which boosts overall efficiency and effectiveness.
Decision-Making Algorithms: These sophisticated algorithms help agents make informed choices by analyzing available data and context.

By leveraging this framework, AI can perform tasks more like humans do, leading to increased productivity and innovative applications across various fields.

Key Components of Agentic AI Systems

To build successful agentic AI systems, several key components need to be considered:

Management Tools: These tools help streamline coordination among agents to ensure smooth operation.
Automation Features: Automation minimizes the need for manual input, which enhances process efficiency.
Reasoning Capabilities: Intelligent agents utilize strong reasoning skills to evaluate situations and make sound decisions.
Design Patterns: By implementing established design patterns, developers can effectively structure complex agent systems.
Debugging Tools: These tools are vital for maintaining system reliability by quickly identifying and fixing issues.
Agent Collaboration Mechanisms: Encouraging cooperation among agents is essential for achieving complex objectives.

Together, these components work to enhance the effectiveness of the agentic approach, paving the way for advanced AI solutions.

Applications of the Agentic Framework in AI

The agentic framework supports a wide range of applications that can greatly benefit different industries:

Virtual Agents: Often used in customer support, these agents provide 24/7 assistance, improving user satisfaction.
Autonomous Agents: In logistics and supply chain management, these agents optimize delivery processes.
Human-Agent Interaction: The framework helps improve user interfaces for better engagement and accessibility when used to build AI agent apps like ai browser automation.
Data Integration: It enables seamless connectivity between various data sources which enriches decision-making.
Feedback Mechanisms: These allow agents to learn from interactions, enhancing their capabilities over time.

This broad versatility illustrates how the framework adapts to different sectors, from finance to healthcare.

Challenges and Considerations

While the AI agent framework holds great potential, it also brings along certain challenges:

Data Privacy Concerns: With the increase in data usage, protecting personal information becomes essential.
Security Risks: Addressing vulnerabilities is crucial to safeguarding against cyber threats.
Ethical Considerations: The deployment of AI must follow ethical standards to prevent misuse.
Project Management Complexity: Coordinating multiple agent systems requires effective leadership and clear guidelines.
Performance Metrics: Setting performance metrics for agents is important for measuring success and adjusting strategies.

Tackling these challenges is important for the successful rollout of agentic systems, ensuring they remain efficient, secure, and ethically sound.

Overall, the AI Agentic Framework lays a solid foundation for developing advanced AI systems. By focusing on collaborative, intelligent agents, organizations can reach new heights in efficiency and creativity. As you explore the potential applications of this framework, keep in mind its benefits and the challenges that may arise to maintain a balanced approach to AI deployment.

If you found this information useful or have questions, feel free to share your thoughts below or distribute this article to others interested in the evolving landscape of AI.

사용할 준비가 되셨나요?
비슷한 방식의 컴퓨터?

기억을 공유하고 정리하고 작업을 개인화하세요.

트라이 사이

에이전트 S: 컴퓨터를 인간처럼 사용하는 개방형 에이전트 프레임워크

최첨단 퍼포먼스

에이전트 S는 뉴 에이전트 뼈대 활성화하도록 설계 다음과 같이 사용할 컴퓨터 인간처럼 직관적으로

초록

작업 지침

도와주세요 계정 제거 “anonym-x2024@outlook.com”

개요 에이전트 S 프레임워크

의 파이프라인 메모리 구성 및 업데이트

주요 결과

분석

경험을 통한 학습은 GUI 에이전트의 도메인 지식을 향상시킵니다.

ACI는 LLM의 더 나은 추론 능력을 이끌어냅니다 더 나은 에이전트 학습을 지원합니다.

계층적 계획 지원장기 워크플로

탐색, 지속적인 메모리 업데이트 및 자체 평가기는 메모리 구성에 없어서는 안될 필수 요소입니다.

다른 것으로의 일반화 운영 체제

빕텍스

사용할 준비가 되셨나요? 비슷한 방식의 컴퓨터?

에이전트 S는 뉴 에이전트
뼈대 활성화하도록 설계
다음과 같이 사용할 컴퓨터
인간처럼 직관적으로

ACI는 LLM의 더 나은 추론 능력을 이끌어냅니다 더 나은 에이전트 학습을 지원합니다.

계층적 계획 지원
장기 워크플로

사용할 준비가 되셨나요?
비슷한 방식의 컴퓨터?