What is the difference between an AI agent and an AI workflow?

A workflow is a system where the steps are predefined in code. An agent is a system where the LLM itself decides what steps to take and in what order. The key difference is who controls the logic — the developer or the model.

Do I need to know how to code to build an AI agent?

No. Tools like Claude Code let you describe what you want in plain language and handle most of the implementation. What matters more is clarity about what the system should do and what a good result looks like.

Why is observability important for AI agents?

AI systems are non-deterministic — they can behave differently across runs. Observability traces each step of an agent's execution so you can debug failures, understand outputs, and improve performance over time.

What is the Arthur Engine?

The Arthur Engine is a free, open-source tool for AI observability and evaluation. It traces every step of an AI agent or workflow so teams can see what happened, measure performance, and catch issues before users do.

2023 has been an unprecedented year in many ways—especially for the AI space. With the launch of ChatGPT in November 2022, there have been countless innovations around large language models and their capabilities. It’s never been more important for enterprises to be able to deploy LLMs into mission-critical applications quickly and safely.

At Arthur, we spent the year working on that and more. From the launch of three new LLM-centric products, to events and meetups at our office, to conferences and award ceremonies around the country, it’s been a year to remember. If you’ve been along for the ride, we appreciate your continued support—and if you’re just starting to follow the Arthur journey, welcome!

Without further ado, we present: Arthur’s 2023 Wrapped.

‍

Product Launches

Arthur Shield

In May, we launched Arthur Shield: the world’s first firewall for LLMs. Shield is our solution to help companies deploy LLM applications like ChatGPT faster and more safely, helping to identify and resolve issues before they become costly business problems—or worse, result in harm to customers. Specifically, Shield protects against serious risks like hallucinations, prompt injection, toxic language generation, and sensitive data leakage.

Arthur Bench

In August, we followed up the wildly successful Shield launch with a new product: Arthur Bench. Bench is an open-source evaluation product that compares LLMs, prompts, and hyperparameters for generative text models. This enables businesses to compare how different LLMs will perform in real-world scenarios so they can make informed, data-driven decisions when integrating the latest AI technologies into their operations. You can check out our GitHub repo here.

In conjunction with the announcement of Arthur Bench last month, we also shared work from our Generative Assessment Project. GAP is an ongoing research initiative ranking the strengths and weaknesses of LLM offerings from industry leaders like OpenAI, Anthropic, and Meta as well as other open-source models.

Arthur Chat

Last but not least, we introduced Arthur Chat this month. Chat is a turnkey, secure chat platform that empowers companies to quickly and safely deploy AI-powered chat apps leveraging their proprietary enterprise data. Not only does Chat’s flexibility allow enterprises to easily switch between language models, but it also has Arthur Shield built in, ensuring protection and real-time monitoring against risks like hallucinations, prompt injections, and data leakage.

‍

Events

One of the more exciting things to happen this year was that we kicked off Ground Truth, our event series that features talks from the best and brightest in AI and ML.

Rachel Cummings on Differential Privacy

First up was Rachel Cummings, Associate Professor of Industrial Engineering and Operations Research at Columbia University. She joined us at Arthur HQ for a talk about differential privacy and public policy as they relate to machine learning and data science, as well as a Q&A session about her career and predictions for the future of the field.

Diego Oppenheimer on the Future of MLOps

Diego Oppenheimer is a Partner at Factory HQ, a venture fund specialized in AI investments, and was previously an executive vice president at DataRobot as well as the founder & CEO of Algorithmia. He sat down with Arthur’s CEO Adam Wenchel back in April to chat about the future of MLOps, LLMs, and other new and exciting developments in the space.

Jacopo Tagliabue on Recommender Systems

Jacopo Tagliabue was co-founder and CTO of Tooso, a NLP startup in San Francisco acquired by Coveo. He led Coveo’s AI and MLOps roadmap from scale-up to IPO, and built out Coveo Labs, an applied R&D practice rooted in collaboration, open source and open science.

Jacopo gave a compelling presentation about testing recommender systems through a behavioral-based methodology he co-created called RecList. John Dickerson, Arthur’s Chief Scientist, also hosted a Q&A/fireside chat with Jacopo where they further discussed recommender systems, MLOps, and Jacopo’s recent research which sits at the intersection of language, learning, and retrieval.

The Future of LLMs with Arthur, MosaicML, LangChain, and Weaviate

Our biggest Ground Truth event yet, “The Future of LLMs” featured an all-star lineup of folks from the LLM world:

Angela McNeal, Ex-Palantir AI, Co-Founder & CEO
Jonathan Frankle, Chief Scientist, MosaicML
Bob van Luijt, Co-Founder & CEO, Weaviate
Harrison Chase, Co-Founder & CEO, LangChain
John Dickerson, Co-Founder & Chief Scientist, Arthur
Adam Wenchel, Co-Founder & CEO, Arthur

They discussed their experiences building and monetizing successful LLM companies, what’s next in the world of LLMs, and more.

Webinars

We also launched a series of webinars this year, focused on a variety of hot topics in the industry as well as some of our own research.

Hosted by our talented team of data scientists, researchers, and engineers, you can take a look at these sessions below:

Awards

We were thrilled to have received a number of industry awards this year, both as a company and as individuals.

Built In’s 2023 Best Places to Work

We started the year off by being honored for the second year in a row by Built In. Specifically, we were named to the following lists: New York City Best Startups to Work For, New York City Best Places to Work, and U.S. Best Startups to Work For.

Crunchbase’s 2023 Influential Women in Sales

Our Commercial Accounts Lead, Victoria Vassileva, was honored by Crunchbase on their 2023 Influential Women in Sales list. Victoria is an incredible member of the Arthur team who always leads with passion, drive, and a constant focus on our mission to make AI better for everyone.

VentureBeat’s 2023 Women in AI Awards

Victoria was also chosen as a nominee for VentureBeat’s Women in AI Awards in the “Responsibility and Ethics of AI” category. Victoria’s leadership and passion for responsible AI are an inspiration to our entire team. You can watch her on a panel here that Arthur co-hosted with Out in Tech, talking about responsible and inclusive innovation, as well as the Let’s Chat Ethics podcast and a Tech in Motion panel about the future of AI.

Bloomberg’s 2023 New Economy Catalysts

Recently, our CEO Adam Wenchel was selected as a 2023 Bloomberg New Economy Catalyst, joining a global community of leaders and innovators whose ideas are reshaping our world for the better. You can see the full list, which includes leaders from 12 countries around the globe, here.

2023 Women in AI Awards

One of our incredibly talented ML Engineers, Teresa Datta, was awarded “Young AI Role Model of the Year” at the 2023 Women in AI Awards. Teresa is a rising star in the field of responsible AI who takes a human-centered approach to analyzing AI/ML within larger sociotechnical systems. Check out her work from SaTML here and her work from ICLR here.

2023-2024 Cloud Awards

We were also honored to have made the shortlist for the Cloud Awards in the categories of “Best Use of AI in Cloud Computing” as well as “Cloud Development Innovation of the Year.”

‍

From being a founding member of the Amazon Web Services Generative AI Center of Excellence to speaking at the AI Summit New York and the Wall Street Journal’s Tech Live, there’s so much more we could add to this list.

What we’re most proud of from 2023 is that, by enabling enterprises to deploy LLMs quickly and safely through our suite of LLM-centered products, we have continued delivering on our mission to make AI better for everyone. We can’t wait for what’s to come in 2024 and beyond!

‍

Want to be the first to know about what’s new with Arthur (as well as MLOps and LLMOps at large)? Subscribe to our newsletter and follow us on LinkedIn and Twitter.

Arthur’s 2023 Wrapped

Product Launches

Arthur Shield

Arthur Bench

Arthur Chat

Events

Rachel Cummings on Differential Privacy

Diego Oppenheimer on the Future of MLOps

Jacopo Tagliabue on Recommender Systems

The Future of LLMs with Arthur, MosaicML, LangChain, and Weaviate

Webinars

Awards

Built In’s 2023 Best Places to Work

Crunchbase’s 2023 Influential Women in Sales

VentureBeat’s 2023 Women in AI Awards

Bloomberg’s 2023 New Economy Catalysts

2023 Women in AI Awards

2023-2024 Cloud Awards

3 Reasons Model Monitoring is Vital for Strong AI Performance

2023 Updates to the OWASP API Security Top 10

Arthur’s 2023 Wrapped

Product Launches

Arthur Shield

Arthur Bench

Arthur Chat

Events

Rachel Cummings on Differential Privacy

Diego Oppenheimer on the Future of MLOps

Jacopo Tagliabue on Recommender Systems

The Future of LLMs with Arthur, MosaicML, LangChain, and Weaviate

Webinars

Awards

Built In’s 2023 Best Places to Work

Crunchbase’s 2023 Influential Women in Sales

VentureBeat’s 2023 Women in AI Awards

Bloomberg’s 2023 New Economy Catalysts

2023 Women in AI Awards

2023-2024 Cloud Awards

SHARE

3 Reasons Model Monitoring is Vital for Strong AI Performance

2023 Updates to the OWASP API Security Top 10