AIOps Tooling and Stack
This page is a practical reference for tools and libraries mentioned in AI-assisted SRE contexts. By the end, you should be able to pick a minimal stack for a pilot and justify tradeoffs.
AIOps and intelligent observability (platforms)
Section titled “AIOps and intelligent observability (platforms)”Know conceptually (not every feature):
| Area | Examples |
|---|---|
| APM / monitoring AI | Datadog Watchdog, Dynatrace Davis |
| AWS | DevOps Guru |
| Incident correlation | Moogsoft, BigPanda, PagerDuty AIOps |
Anomaly detection (Python)
Section titled “Anomaly detection (Python)”| Library | Typical use |
|---|---|
| prophet | Time-series forecasting (capacity, traffic) |
| scikit-learn | IsolationForest, clustering for alert grouping |
| statsmodels | Statistical tests, ARIMA |
| pyod | Outlier detection |
| river | Online/streaming models |
RAG and LLM integration
Section titled “RAG and LLM integration”| Layer | Examples |
|---|---|
| Orchestration | LangChain |
| RAG indexing | LlamaIndex |
| APIs | openai, anthropic SDKs |
Embeddings and vector stores
Section titled “Embeddings and vector stores”| Component | Examples |
|---|---|
| Embeddings | sentence-transformers |
| Local / prototype | ChromaDB, FAISS |
| Managed | Pinecone, Weaviate |
Intelligent runbooks and coding assistants
Section titled “Intelligent runbooks and coding assistants”| Tool | Notes |
|---|---|
| Cursor | Multi-file context |
| Aider | Terminal coding agent |
| Continue.dev | Open-source, self-hostable |
Log parsing and diagnostics
Section titled “Log parsing and diagnostics”| Library | Notes |
|---|---|
| loguru | Structured logging |
| elasticsearch-py | Query logs for LLM context |
| tiktoken | Token counting for context limits |
Structured output and incident summaries
Section titled “Structured output and incident summaries”| Library | Notes |
|---|---|
| instructor | Pydantic-structured LLM outputs |
| guidance (Microsoft) | constrained generation |
Evaluation
Section titled “Evaluation”| Tool | Notes |
|---|---|
| ragas | RAG faithfulness, relevance |
| deepeval | Test-style assertions |
| promptfoo | Prompt regression across models |
Experimentation and drift (optional)
Section titled “Experimentation and drift (optional)”| Tool | Notes |
|---|---|
| mlflow | Experiments, model registry |
| evidently | Drift monitoring |
| great-expectations | Data quality checks |
Highest-ROI practice exercises
Section titled “Highest-ROI practice exercises”- LangChain + ChromaDB + Anthropic — small RAG over runbooks.
- Prophet or IsolationForest — anomalies on sample metrics.
- Instructor + Anthropic — structured fields from a log snippet.
- Ragas — evaluate the RAG pipeline.
Related reading
Section titled “Related reading”- 60-Day AIOps Learning Plan
- Python — language fundamentals if you need a refresher