The ALRM framework consists of three modules. The Task Planner Agent, built on ReAct, decomposes high-level instructions into subtasks through iterative cycles of reasoning and feedback. The Task Executor Agent translates these subtasks into actions via two strategies: Code-as-Policy (CAP), which generates Python code to call functions in one run, and Tool-as-Policy (TAP), which generates nested tool calls. The API Server provides RESTful endpoints for robotic control, including pick-and-place, motion, and perception. Execution results are returned as observations, enabling the planner to refine subsequent actions until the task is completed or the step limit is reached.
The proposed LLM-based agent architecture
for solving high-level arm manipulation robotic tasks. The
proposed architecture contain three main modules: (1) task
planner agent, (2) task executor agent, and (3) API server.
TBC...
We created a benchmark of 54 high-level,
linguistically diverse tasks across three environments to evaluate LLM performance under the ALRM
framework.
Evaluation uses an LLM-as-a-judge, which compares the natural language task, the ground-truth action
sequence,
and the LLM-generated actions, assigning a score of 0 (no subtasks solved),
1 (some subtasks solved), or 2 (all subtasks solved).
Evaluation involved 10 LLMs across CAP and TAP modes, grouped into large-scale (GPT-5, Gemini-2.5-Pro, Claude-4.1-Opus, DeepSeek-V3.1) and small-scale (Falcon-H1-7B, Qwen3-8B, Llama-3.1-8B, DeepSeek-R1-7B, Granite-3.3-8B, Mistral-7B). Three distinct LLMs (GPT-4.1, Claude-Sonnet-4, Gemini-2.5-Flash) served as judges, using majority vote or averaging in case of disagreement.
In summary, the main findings related to success rate are:
In summary, the main findings related to latency are:
@misc{santos2026alrmagenticllmrobotic,
title={ALRM: Agentic LLM for Robotic Manipulation},
author={Vitor Gaboardi dos Santos and Ibrahim Khadraoui and Ibrahim Farhat and Hamza Yous and Samy Teffahi and Hakim Hacid},
year={2026},
eprint={2601.19510},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2601.19510},
}