One major challenge in deploying autonomous agents is building systems that can adapt to changes in their environments without the need to retrain the underlying large language models (LLMs).

Memento-Skills, a new framework developed by researchers at multiple universities, addresses this bottleneck by giving agents the ability to develop their skills by themselves. "It adds its continual learning capability to the existing offering in the current market, such as OpenClaw and Claude Code," Jun Wang, co-author of the paper, told VentureBeat.

Memento-Skills acts as an evolving external memory, allowing the system to progressively improve its capabilities without modifying the underlying model. The framework provides a set of skills that can be updated and expanded as the agent receives feedback from its environment.

For enterprise teams running agents in production, that matters. The alternative — fine-tuning model weights or manually building skills — carries significant operational overhead and data requirements. Memento-Skills sidesteps both.

The challenges of building self-evolving agents

Self-evolving agents are crucial because they overcome the limitations of frozen language models. Once a model is deployed, its parameters remain fixed, restricting it to the knowledge encoded during training and whatever fits in its immediate context window.

Giving the model an external memory scaffolding enables it to improve without the costly and slow process of retraining. However, current approaches to agent adaptation largely rely on manually-designed skills to handle new tasks. While some automatic skill-learning methods exist, they mostly produce text-only guides that amount to prompt optimization. Other approaches simply log single-task trajectories that don’t transfer across different tasks.

Furthermore, when these agents try to retrieve relevant knowledge for a new task, they typically rely on semantic similarity routers, such as standard dense embeddings; high semantic overlap does not guarantee behavioral utility. An agent relying on standard RAG might retrieve a "password reset" script to solve a "refund processing" query simply because the documents share enterprise terminology.

"Most retrieval-augmented generation (RAG) systems rely on similarity-based retrieval. However, when skills are represented as executable artifacts such as markdown documents or code snippets, similarity alone may not select the most effective skill," Wang said. 

How Memento-Skills stores and updates skills

To solve the limitations of current agentic systems, the researchers built Memento-Skills. The paper describes the system as “a generalist, continually-learnable LLM agent system that functions as an agent-designing agent.” Instead of keeping a passive log of past conversations, Memento-Skills creates a set of skills that act as a persistent, evolving external memory.

These skills are stored as structured markdown files and serve as the agent's evolving knowledge base. Each reusable skill artifact is composed of three core elements. It contains declarative specifications that outline what the skill is and how it should be used. It includes specialized instructions and prompts that guide the language model's reasoning. And it houses the executable code and helper scripts that the agent runs to actually solve the task.

Memento-Skills achieves continual learning through its "Read-Write Reflective Learning" mechanism, which frames memory updates as active policy iteration rather than passive data logging. When faced with a new task, the agent queries a specialized skill router to retrieve the most behaviorally relevant skill — not just the most semantically similar one — and executes it.

After the agent executes the skill and receives feedback, the system reflects on the outcome to close the learning loop. Rather than just appending a log of what happened, the system actively mutates its memory. If the execution fails, an orchestrator evaluates the trace and rewrites the skill artifacts. This means it directly updates the code or prompts to patch the specific failure mode. In case of need, it creates an entirely new skill.

M