Building an AI Ops Agent: A Cloud Engineer's Journey | Ranjan Kumar

As a cloud engineer with years of experience in infrastructure, I’m venturing into uncharted territory: building an AI ops agent from scratch. I’ve never worked with AI before, and I’m excited (and a bit terrified) to dive into this new world.

My goal is to create an ops assistant that can chat naturally about our systems, search through internal docs, pull logs and metrics, analyze data, and take actions like restarting services or scaling resources. It’s ambitious, but I’m hoping to create something truly useful.

I’ve got a few questions for those who have done this before:

## Choosing the Right Tools
I’m a Go dev, so I’m wondering if LangChain Go is a good fit for this project. Has anyone used it for building an AI ops agent?

## Handling Conversational Memory
How do you handle conversational memory without breaking the bank? I don’t want to burn through cash every month just to keep context between questions.

## Safety and Architecture
Should I split out services for API calls, CLI actions, and other tasks, or is there a better pattern? How do you ensure safety checks are in place without sacrificing functionality?

## RAG and Embeddings
For doc Q&A and multi-hop questions, is RAG with embeddings the right approach? Does it actually work well across totally different docs?

I know there’s a lot to learn, and I’m eager to hear from those who have built something similar. Any tips on architecture, safety, or ‘don’t make this mistake’ would be greatly appreciated.

This is a new frontier for me, and I’m excited to share my journey and learn from others.

Leave a Comment Cancel Reply