60-Day AIOps Learning Plan
By the end of 60 days, you should have hands-on artifacts (sample RAG app, anomaly notebook, eval notes) and a one-pager you could discuss in an internal review or with stakeholders.
Weeks 1–2: LLM fluency for SRE tasks
Section titled “Weeks 1–2: LLM fluency for SRE tasks”Focus: Build intuition for what LLMs do well and where they fail.
- Use Cursor, Continue.dev, Aider, or Claude / GPT-4 for:
- drafting runbooks,
- parsing log snippets,
- drafting post-mortem sections.
- Deliverable: 3 short runbook drafts + a “failure log” (where the model was wrong and why).
Weeks 3–4: One AIOps platform in depth
Section titled “Weeks 3–4: One AIOps platform in depth”Focus: Move past UI tours to how signals become anomalies or correlated incidents.
- Pick one: Datadog AI features, AWS DevOps Guru, or similar.
- Deliverable: Notes on inputs (metrics/logs), outputs (stories, insights), and limitations.
Weeks 5–6: RAG conceptually (and lightly in code)
Section titled “Weeks 5–6: RAG conceptually (and lightly in code)”Focus: Chunking, embeddings, retrieval, grounding.
- Skim LangChain or LlamaIndex docs for mental model—even a tiny prototype counts.
- Deliverable: Diagram + minimal RAG prototype (see AIOps Tooling and Stack).
Weeks 7–8: Strategy and stakeholder readiness
Section titled “Weeks 7–8: Strategy and stakeholder readiness”Focus: Connect your past operational pain to AI interventions.
- Draft a mock “AI Strategy for SRE” document:
- toil map → intervention,
- 30/60/90 day rollout,
- risks and metrics.
- Deliverable: 2-page doc you can present.
Practice checklist (highest ROI)
Section titled “Practice checklist (highest ROI)”Aligned with AIOps Tooling and Stack:
- LangChain + ChromaDB + Anthropic — RAG over runbooks.
- Prophet or isolation forest — sample metrics anomalies.
- Instructor + Anthropic — structured extraction from logs.
- Ragas — evaluate your RAG pipeline.
What you can skip
Section titled “What you can skip”Deep model training, full MLOps platforms, and heavy math—unless your role explicitly requires it. This plan targets consuming and deploying AI for operations.