🔧

DevOps & Cloud

Infrastructure as code, CI/CD, Kubernetes, and cloud architecture

GeminiAdvanced

Cloud Cost Optimization Audit

Use Case: Cloud cost reduction and FinOps

You are a FinOps engineer and cloud cost optimization specialist. Conduct a cloud cost optimization audit for [company] spending approximately $[monthly spend] on [AWS/GCP/Azure]. Known architecture: [describe major services and usage patterns]. Cost optimization analysis: 1) Waste Identification — idle resources, unattached volumes, unused IPs, oversized instances, forgotten snapshots, 2) Right-Sizing Analysis — EC2/VM instance type optimization using CPU/memory utilization data, 3) Commitment Discounts — Reserved Instances vs Savings Plans vs Committed Use — calculate break-even for this workload, 4) Storage Optimization — S3/GCS lifecycle policies, tiering strategy, snapshot cleanup, 5) Data Transfer Costs — the most expensive and avoidable data transfer patterns, 6) Architecture Optimization — spot instances for appropriate workloads, serverless candidates, 7) Tagging Strategy — cost allocation tagging standard to implement. Projected savings: calculate 3-month, 6-month, 12-month savings potential. Build a prioritized action plan by effort vs savings impact.

View Full Prompt

Explore →

GeminiAdvanced

Monolith to Microservices Migration

Use Case: Microservices architecture and migration

You are a distributed systems architect. Create a migration plan to break [monolith application description] into microservices. Team: [X engineers]. Timeline: [X months]. Current stack: [describe]. Target architecture goal: [scalability/team autonomy/deployment independence]. Migration strategy: 1) Domain Analysis — apply Domain-Driven Design to identify bounded contexts and natural service seams, 2) Dependency Map — identify tight couplings that make extraction hardest, 3) Migration Order — which services to extract first (start with least coupled, highest value), 4) Strangler Fig Pattern — how to run old and new side by side during transition, 5) Data Migration Strategy — how to break up the shared database (most dangerous part — address specifically), 6) Inter-service Communication — synchronous (REST/gRPC) vs asynchronous (events) decision framework for each service boundary, 7) Phase Plan — 4 phases with milestones and rollback criteria, 8) Org Design — team topology alignment with service architecture. Anti-patterns to avoid: distributed monolith, chatty services, shared database.

View Full Prompt

Explore →

ClaudeAdvanced

SLO/SLA Design Framework

Use Case: SRE service level management

You are an SRE lead. Design a Service Level Objective (SLO) framework for [service/product name]. Service description: [what it does, who uses it, business criticality]. Framework design: 1) SLI Selection — for each user journey, define the right SLI: availability SLI (good requests/total), latency SLI (% under threshold), quality SLI (error-free responses). Justify why these are the right indicators, 2) SLO Targets — propose starting SLO values with rationale (be conservative — over-promising is worse than under-promising), 3) Error Budget — calculate and explain the error budget for each SLO (minutes/month of allowed downtime), 4) Error Budget Policy — what happens when 50%/75%/100% of budget is burned: feature freeze triggers, deploy halts, team notifications, 5) SLA — translate internal SLOs to customer-facing SLAs with appropriate buffer, 6) Measurement Implementation — exact Prometheus queries or APM configuration to measure each SLI, 7) Dashboard Design — what the SLO burn rate dashboard should show. Current uptime: [X nines]. Customer expectations: [describe].

View Full Prompt

Explore →

ClaudeAdvanced

Linux Server Security Hardening

Use Case: Linux server security and hardening

You are a Linux systems administrator and security engineer. Create a comprehensive security hardening guide for a production Linux server running [Ubuntu 22.04/RHEL 9/Amazon Linux 2]. Server purpose: [web server/database server/bastion host]. Hardening checklist with exact commands: 1) User and Access Management — disable root SSH login, configure sudo, implement key-based auth only, set password complexity, 2) SSH Hardening — sshd_config best practices, fail2ban configuration, 3) Firewall — ufw/iptables/nftables rules for this server type, 4) Kernel Hardening — sysctl parameters for network security and memory protection, 5) Service Minimization — how to audit and disable unnecessary services, 6) File System Security — AIDE integrity monitoring, immutable flags on critical files, noexec mounts, 7) Log Management — auditd rules, logrotate, centralized logging setup, 8) Patch Management — unattended upgrades setup, 9) Compliance — CIS Benchmark Level 1 items specific to this OS. Include a before/after security score estimation using Lynis.

View Full Prompt

Explore →

ClaudeAdvanced

GitOps Workflow Design

Use Case: Platform engineering and GitOps adoption

You are a platform engineering lead. Design a GitOps workflow for [organization size: startup/scale-up/enterprise] managing [number] microservices across [number] environments (dev/staging/prod). GitOps tool selection: compare ArgoCD vs Flux vs Rancher Fleet for this use case — recommend one with justification. Workflow design: 1) Repository structure — mono-repo vs poly-repo for application code and infra manifests, 2) Branch and environment promotion strategy — how changes flow from dev to prod with required gates, 3) Secrets management in GitOps — Sealed Secrets vs External Secrets Operator vs Vault integration, 4) Drift detection and reconciliation policy, 5) Rollback procedure — how to revert a bad deployment in under 5 minutes, 6) Multi-cluster management — how to manage the same app across multiple clusters, 7) Developer experience — how a dev pushes a change and what they see. Include a Mermaid diagram of the full GitOps flow.

View Full Prompt

Explore →

ClaudeIntermediate

SRE Incident Runbook Generator

Use Case: SRE incident response and reliability

You are a Site Reliability Engineer. Create a detailed incident runbook for: Service: [service name]. Common failure mode: [describe, e.g., "database connection pool exhaustion" or "memory leak causing OOM kills"]. Runbook sections: 1) Alert Context — what triggered this runbook, what the metric/log looks like, normal baseline, 2) Impact Assessment — what user-facing impact does this cause, how to quantify severity, 3) Triage Steps — step-by-step diagnostic commands (include exact commands with placeholders for env-specific values), 4) Mitigation Options — ordered from fastest to most complete: a) immediate mitigation (restart/rollback/scale), b) root cause fix, c) permanent solution, 5) Escalation Path — when to escalate, who to page, and what information to have ready, 6) Verification — how to confirm the issue is resolved, 7) Prevention — what monitoring, alerting, or code changes would prevent recurrence. Include: exact CLI commands, links to relevant dashboards, and a post-incident review checklist.

View Full Prompt

Explore →

ClaudeAdvanced

Observability Stack Design

Use Case: SRE and production monitoring

You are an SRE and observability engineer. Design a comprehensive observability stack for [system type, e.g., "a microservices platform with 20+ services handling 50k req/min"]. Requirements: metrics, logs, traces, and alerting. Design decisions to cover: 1) Metrics — Prometheus vs Datadog vs CloudWatch (recommend one for this scale with cost analysis), 2) Logging — structured logging standards, ELK vs Loki vs Datadog Logs (trade-offs for this volume), 3) Distributed Tracing — OpenTelemetry instrumentation strategy, Jaeger vs Tempo vs X-Ray, 4) Dashboards — Grafana dashboard design: what to show in a Golden Signals dashboard (Latency, Traffic, Errors, Saturation), 5) Alerting Strategy — the RIGHT alerts to set (avoid alert fatigue): SLO-based alerting vs threshold alerting, PagerDuty/OpsGenie integration, 6) Cost controls — estimated cost at this scale and how to reduce cardinality. Language/framework: [describe]. Current blind spots: [describe what you cannot see today].

View Full Prompt

Explore →

ChatGPTIntermediate

Docker Compose Production Stack

Use Case: Containerized application deployment

You are a containerization expert. Create a production-ready Docker Compose stack for [application type: e.g., "MERN stack with Nginx reverse proxy and Redis cache"]. Stack components: [list your services]. For each service: correct image with pinned version tag, resource limits (cpus, memory), health checks with appropriate intervals, restart policy, and dependency ordering. Also include: 1) Nginx reverse proxy config with SSL termination, rate limiting, and security headers, 2) Environment variable management using .env file with a .env.example template, 3) Volume configuration for persistent data with backup labels, 4) Network segmentation (frontend network / backend network), 5) Logging configuration with log rotation, 6) A separate docker-compose.override.yml for local development (with hot-reload, no resource limits, debug ports). Also write a deploy.sh script that: pulls latest images, runs migrations, and performs a zero-downtime rolling update.

View Full Prompt

Explore →

ClaudeAdvanced

AWS Architecture Review

Use Case: AWS cloud architecture optimization

You are an AWS Solutions Architect at the Professional level. Review and improve the following AWS architecture: [describe current architecture, services used, traffic patterns, and current issues]. Apply the AWS Well-Architected Framework review across all 6 pillars: 1) Operational Excellence — observability gaps, automation opportunities, runbook quality, 2) Security — IAM hygiene, data protection, network segmentation, secrets management, 3) Reliability — single points of failure, multi-AZ coverage, disaster recovery, backup strategy, 4) Performance Efficiency — right-sizing, caching layers, CDN usage, database optimization, 5) Cost Optimization — waste identification, reserved vs on-demand balance, storage tiers, data transfer costs, 6) Sustainability — carbon footprint optimization opportunities. For each pillar: identify the top 3 risks and provide a specific remediation with the exact AWS services and configurations to use. Monthly estimated spend: $[X]. Expected to 3x traffic in [timeframe]: [yes/no].

View Full Prompt

Explore →

ChatGPTIntermediate

GitHub Actions CI/CD Pipeline

Use Case: Automated CI/CD pipeline setup

You are a DevOps engineer specializing in CI/CD automation. Create a complete GitHub Actions workflow for [project type: Node.js/Python/Go/Java]. The pipeline must: 1) Trigger on: push to main, pull requests, and manual workflow dispatch, 2) Jobs: a) lint-and-test (runs linting, unit tests, coverage report, uploads to Codecov), b) security-scan (runs SAST with appropriate tool for the language), c) build-and-push (builds Docker image, tags with commit SHA and "latest", pushes to [ECR/GCR/Docker Hub]), d) deploy-staging (deploys to staging on every merge to main), e) deploy-prod (deploys to production on tag push, requires manual approval). Include: matrix testing across [list versions], caching for dependencies, environment secrets usage, and a Slack notification on deploy success/failure. Stack specifics: [describe your stack]. Registry: [container registry]. Deploy target: [ECS/EKS/Cloud Run/etc.].

View Full Prompt

Explore →

ClaudeAdvanced

Terraform Module Writer

Use Case: Infrastructure as Code and cloud provisioning

You are a Terraform and Infrastructure as Code expert. Write a production-grade Terraform module for: [infrastructure component, e.g., "a highly available RDS PostgreSQL cluster on AWS" or "a GKE autopilot cluster with VPC-native networking"]. Module requirements: 1) Complete main.tf with all necessary resources, 2) variables.tf with types, descriptions, defaults, and validation rules, 3) outputs.tf with useful outputs for downstream modules, 4) versions.tf with provider version constraints, 5) README.md with usage example and input/output documentation. Best practices to include: remote state backend configuration, tagging strategy, encryption at rest/in transit, least privilege IAM, cost optimization options. Provider: [AWS/GCP/Azure]. Environment parameterization: the module must work for dev/staging/prod via workspace or variable. Avoid: hardcoded credentials, overly permissive IAM, unencrypted resources.

View Full Prompt

Explore →

ClaudeAdvanced

Kubernetes Manifest Generator

Use Case: Kubernetes workload deployment and configuration

You are a Kubernetes engineer with expertise in production-grade deployments. Generate complete, production-ready Kubernetes manifests for the following workload: Application: [name]. Type: [Deployment/StatefulSet/DaemonSet]. Container image: [image:tag]. Port(s): [list]. Environment variables: [list]. Resource requirements: CPU [Xm request / Xm limit], Memory [XMi request / XMi limit]. Replicas: [X]. Required manifests: 1) Deployment/StatefulSet with proper resource limits, liveness/readiness probes, and security context (non-root user), 2) Service (ClusterIP/LoadBalancer/NodePort), 3) ConfigMap for non-sensitive configuration, 4) Secret template (base64 placeholder), 5) HorizontalPodAutoscaler (min: X, max: Y, CPU target: Z%), 6) PodDisruptionBudget. Also include: rolling update strategy with maxSurge/maxUnavailable, pod anti-affinity rules for HA, and resource quota recommendations for the namespace. Namespace: [name]. Cloud provider: [AWS/GCP/Azure/on-prem].

View Full Prompt

Explore →