Exploring Agentic AI Systems: A Hands-On Guide to Building Secure Agent Workflows

Introduction to AI Agents

The landscape of artificial intelligence has evolved dramatically in recent years, with generative AI systems capable of creating text, images, code, and other content with unprecedented sophistication. AI agents have now emerged as autonomous systems that can understand goals, make decisions, and take actions to accomplish specific tasks.

Want to learn more about the difference between AI Agents and Agentic AI? Watch this webinar.

The Rise of Agentic AI

Agentic AI represents the next step in AI evolution, moving beyond traditional generative AI models. While generative AI focuses on producing content in response to prompts, AI agents are designed to be interactive and purposeful, capable of maintaining context over extended interactions and executing multi-step processes. These agents can:

Navigate websites
Use external tools and APIs
Analyze complex data
Collaborate with other agents

Although the concept of AI agents has roots in early AI research on autonomous systems, mainstream adoption accelerated in 2023, fueled by large language models and frameworks that enable interaction with external systems. Companies like OpenAI, Anthropic, and Google are leading the charge in deploying agent-based systems, with a growing ecosystem of startups specializing in industry-specific applications.

A recent webinar by Arthur’s co-founder and Chief Scientist, John Dickerson, provided valuable insights into AI agents, covering their history, design patterns, advantages, and challenges. The session also explores emerging trends and predictions for AI agents in 2025

Challenges of AI Agents

1. Security & Access Control

AI agents require broad system access to perform their tasks effectively, creating unique security risks. Unauthorized access to sensitive data and exploitation by bad actors are significant concerns. Organizations must implement strict security controls to balance access with risk mitigation.

2. Monitoring & Debugging Complexity

Unlike traditional software, AI agents make dynamic, multi-step decisions that may not follow predictable paths. This unpredictability makes debugging and quality assurance challenging, requiring robust monitoring tools to trace and understand agent behavior.

3. Governance & Accountability

Determining accountability for AI agent errors remains a major challenge. Is the developer, the deploying organization, or the agent itself responsible? This issue is particularly significant in regulated industries, where compliance and liability considerations are critical.

Use Case: Implementing a Task-Based Orchestrator in Financial Services

To illustrate the potential of AI agents, consider a large financial services firm employing analysts who research financial securities. Traditionally, analysts spend days gathering data on stock price movements, fund strategies, and risk exposure. However, a Task-Based Orchestrator can automate this workflow, drastically reducing turnaround time.

The Task-Based Orchestrator Workflow

User Agent captures user queries and facilitates interaction.
Orchestrator Agent processes the request, breaking it into structured subtasks.
Tool Execution: The agent invokes tools to retrieve data, perform calculations, and generate insights.
Aggregation: The Orchestrator Agent consolidates outputs into an actionable response.
Delivery: The final output is returned to the user, ensuring efficiency and accuracy.

Building the Orchestrator: Implementation

The following implementation is built using the Autogen library (https://github.com/microsoft/autogen), which provides extensible APIs for interacting with and building out Agentic components. For the full implementation, along with instructions to run this code, please check out our Github Repository where this is hosted (https://github.com/arthur-ai/arthur-autogen-agentic-demo) .

1. Creating Essential Tools

To implement this workflow, AI-driven tools are needed for:

Fetching historical stock data
Defining financial terms
Pricing options using the Black-Scholes model

Example: Stock Data Retrieval Tool

1class StockInfoTool(BaseTool[StockDataInput, StockDataOutput]):
2    def __init__(self):
3        super().__init__(
4            args_type=StockDataInput,
5            return_type=StockDataOutput,
6            "fetch_stock_data",
7            "Fetch only historical stock data for a given ticker.",
8        )
9    async def run(self, args: StockDataInput, cancellation_token: CancellationToken) -> StockDataOutput:
10        try:
11            stock = yf.Ticker(args.ticker)
12            data = stock.history(period="1d")
13            formatted_data = f"Open: ${data['Open'].iloc[0]:.2f}, High: ${data['High'].iloc[0]:.2f}, Low: ${data['Low'].iloc[0]:.2f}, Close: ${data['Close'].iloc[0]:.2f}"
14            return StockDataOutput(data=formatted_data)
15        except Exception as e:
16            raise RuntimeError(f"Error fetching stock data for {args.ticker}: {e}")

2. Developing the Orchestrator Agent

The Orchestrator Agent coordinates tool usage and efficiently processes user interactions.

1class SoloOrchestratorAssistantAgent(RoutedAgent):
2    def __init__(self, name: str, description: str, model_client: ChatCompletionClient, initial_message: AssistantTextMessage = None) -> None:
3        super().__init__(description)
4        self._model_context = BufferedChatCompletionContext(buffer_size=20, initial_messages=[UserMessage(content=initial_message.content)])
5        self._name = name
6        self._model_client = model_client
7        self._system_message = [SystemMessage(content="I am an AI assistant that helps parse and understand tasks.")]

3. Managing Workflow Execution

A core Workflow Manager coordinates AI agent interactions, tracks conversations, and ensures smooth execution.

1class WorkflowManager:
2    def __init__(self):
3        self.state_persister = MockPersistence()
4
5    async def trigger_agentic_workflow(self, config_file: Dict[str, Any], latest_user_input: Optional[str] = None) -> None:
6        with open(config_file) as f:
7            model_config = json.load(f)
8        model_client = ChatCompletionClient.load_component(model_config)
9
10        runtime = SingleThreadedAgentRuntime()
11        await OrchestratorAgent.register(runtime, "Orchestrator", lambda: OrchestratorAgent("Orchestrator", model_client=model_client))
12        runtime.start()

Implementing Guardrails with Arthur’s Evaluation Engine

To secure the AI workflow, guardrails are implemented at two levels:

1. Tool-Level Protection

This prevents security risks at the individual tool execution level, ensuring that data is processed safely and securely.

1async def validate_tool_responses(self, tool_responses: list[dict], message: str, context: list[LLMMessage], conversation_id: str) -> str:
2    """
3    Validates tool responses using the evaluation engine to ensure safety and relevance.
4    """
5    tool_context = []
6    for tool_response in tool_responses:
7        validation_metric = get_eval_engine_metric("tools", tool_response["name"], self._config)
8        inference_result = InferenceResult(
9            await send_response_to_eval_engine(tool_response["response"], validation_metric, conversation_id, context)
10        )
11        tool_context.append(inference_result.get_pass_fail_string())
12    tool_validation_message = "".join(tool_context)
13    await self._model_context.add_message(SystemMessage(content=f"System: {tool_validation_message}"))
14    # Perform PII check at the tool level
15    PII_status = inference_result.return_pii()
16    logger.debug(f"[ToolValidation] PII status: {PII_status}")
17    if not PII_status:
18        tool_validation_message += "Warning: The response contains sensitive information and cannot be shared."
19    return tool_validation_message

2. Orchestrator-Level Protection

At this level, data integrity and compliance are ensured. This involves validating inputs and outputs through an AI safety framework.

Changes to the Orchestrator:

1@message_handler
2async def handle_message(self, message: UserTextMessage, ctx: MessageContext) -> None:
3    """
4    Processes user messages, coordinates with evaluation engine, and manages tool interactions.
5    """
6    conversation_id = str(uuid.uuid4())
7    messages = await self._model_context.get_messages()
8    if len(messages) == 0:
9        await self._model_context.add_message(SystemMessage(content=f"System:{self._system_message}"))
10        logger.info(f"Received user message from {message.source}")
11    result, tool_validation_message = await self.message_loop(message, ctx, self._system_message, 0, conversation_id)
12    # Evaluation Engine call and final response formatting
13    eval_engine_response = await send_prompt_to_eval_engine(message.content, self._orchestrator_task, conversation_id)
14    inference_result = InferenceResult(eval_engine_response)
15    await self._model_context.add_message(SystemMessage(content=f"Eval Engine respone: {inference_result.get_pass_fail_string()}", source=message.source))
16    hallucination_status = inference_result.return_hallucinations()
17    logger.debug(f"[SoloOrchestratorAssistantAgent] Hallucination status: {hallucination_status}")
18    if hallucination_status:
19        result = "The answer is not safe to share."
20        final_response = f"[Trace ID: {conversation_id}] {result}"
21    speech = AssistantTextMessage(content=final_response, source=self.metadata["type"])

A graphical representation of that would look like this, where each agent is protected by an Evaluation Model (represented by a purple bubble) and each agent is protected by one or more Evaluation Models (represented by purple bands)

Conclusion

AI agents are transforming workflows across industries by automating complex tasks. However, ensuring security, accuracy, and compliance is essential for responsible deployment. By implementing structured orchestrators, robust monitoring, and governance frameworks like Arthur’s Evaluation Engine, organizations can harness the full potential of AI agents while mitigating risks. As AI continues to evolve, the future of agentic systems promises greater efficiency, adaptability, and innovation.

Ready to take the next step? Explore the full implementation on GitHub to dive into the code, or reach out to our team at product@arthur.ai to learn how Arthur’s Evaluation Engine can help you deploy AI agents safely and at scale.