DevOps & CI/CD Setup

Replace "hope it works"
with git push origin main.

Containers, CI/CD pipelines, Kubernetes, infrastructure-as-code, observability and release automation — all from one team. Developer runs "git push"; lint, test, build, security scan, image push and deploy flow automatically in minutes. If anything is off, it is rolled back in 18 seconds. Built to German DevOps standards.

The only way to turn releases into a non-event is disciplined automation. If you want to stop pulling Friday-night shifts, you are in the right place.

Request pipeline audit Get in touch

Deploys this week +2250%

before: 2/week

~/app — zsh

$ git push origin main

Enumerating objects: 24, done.

Counting objects: 100% (24/24), done.

Delta compression using up to 8 threads

Writing objects: 100% (13/13), 2.41 KiB | done.

To github.com:partnerfy/api.git

a3f12b9..c8e441d main -> main

Pushed to GitHub.

# workflow #4719 started

CI/CD Pipeline #4719 PASS

Lint

Test

Build

Scan

Push

Deploy

lint → test → build → security → image → rollout

Production — rolling update 0s downtime

v1.2.1 (old)

v1.2.2 (new)

rollout 78% • readiness OK • healthcheck PASS

The problem

"Hope-it-works" releases are burning your team out.

Let us be honest about the numbers: a failed release is not just lost revenue — it is lost trust within the team. Here are the five most common patterns we see.

01 — Manual deploy = a two-day Friday

If a human SSHs into a server, pulls files and restarts a service, that release eats half a working day. If a step is missed, it surfaces on Monday. The Friday-evening deploy slips to midnight; the DevOps engineer who was supposed to be off-duty is not, really. By end of month this loop produces burnout, by end of quarter it produces resignations. The fix: let the pipeline press deploy, not a person.

02 — No rollback plan

When a bug appears after a release, the "git revert + redeploy" approach takes hours, not minutes. Payments fail, signups fail, the call centre drowns. In a proper pipeline rollback is a button — the previous image still sits in the registry, the Kubernetes deployment is rolled back to the previous revision, you are restored in 18 seconds. What you need is not a rollback "plan", but a mechanism.

03 — Untested code in production

With no CI pipeline, tests do not run — because "we are in a rush". Every small bug that surfaces in production piles up internally because the customer is never notified. After a while developers stop writing code and start writing replies to support tickets. The fix: every commit triggers the full test suite, static analysis, type checks and security scan automatically. Failing code never merges to main.

04 — Undocumented infrastructure changes

Six months ago, someone changed an nginx config on the production server. That person no longer works here. Nobody remembers the change. When you spin up a new server, you cannot reproduce the same behaviour. Infrastructure-as-Code fixes this: Terraform, Pulumi, Helm charts, Kubernetes manifests — everything lives in git. Destroy a server and rebuild an identical one in eight minutes.

05 — Weekend fire-fighting

The system went down at 03:14 on Saturday. No Slack alert, only a customer on Monday morning saying "it does not open". It takes four hours to find logs, three hours to understand the cause. Real DevOps means: a Prometheus alert wakes you at 03:14 on Saturday, the cause is visible on the Grafana dashboard, the runbook is executed, the problem is resolved in twelve minutes. Weekend stays a weekend.

Engineering

A self-healing, continuously improving production.

From Kubernetes pod auto-healing to Terraform IaC, from DORA metrics to one-click rollback — below is what daily operation actually looks like.

Kubernetes — auto-heal

If a pod dies, the kubelet brings a new one up instantly

SELF-HEALING

replicas

6/6

restarts

uptime

99.98%

> liveness probe failed, container restarted (OOMKilled)
> pod web-7d4 ready in 2.1s — service continues

infrastructure/main.tf

resource "aws_eks_cluster" "prod" { name = "partnerfy-prod" version = "1.29" role_arn = aws_iam_role.eks.arn vpc_config { subnet_ids = aws_subnet.priv[*].id } }

$ terraform apply -auto-approve
Plan: 14 to add, 0 to change, 0 to destroy.
Apply complete! Resources: 14 added.

DORA Metrics

The four core indicators of production health

Last 30 days

Deploy Frequency

47 / week

Elite tier

Lead Time

34 min

Elite tier

MTTR

12 min

Elite tier

Change Failure

2.1 %

Elite tier

One-click rollback

Roll back to previous revision

kubectl rollout undo deployment/web

Rolled back in 18 seconds

Who it is for

Where it makes the biggest difference.

We have built DevOps for many industries over the years. The eight profiles below are the teams that extract the most value from us.

SaaS shipping daily

Customer request → branch → review → in production in 30 minutes. 5–15 releases a day should feel routine.

E-commerce with frequent A/B tests

Feature flags push a variant to 5% of traffic, metrics auto-compare, winner rolls out automatically.

Fintech under regulation

Approval chain, audit log, immutable builds, separation of duties. Pipeline ready for SOC2 and PCI.

Multi-tenant B2B

Per-tenant environment, staged rollout, tenant-aware flags, per-customer SLA monitoring.

CD to TestFlight / Play

Fastlane + Bitrise + signing automation. Every PR builds IPA/AAB, distributed to test devices.

Startup migrating monolith → services

Strangler-fig pattern, service mesh (Istio/Linkerd), gradual cut-over. Both worlds run side by side.

Enterprise modernising legacy

Containerise on-prem VMs, move legacy Jenkins to GitOps, migrate Active Directory to OIDC.

Agency running 20+ client repos

Pipeline templates, repo-of-repos, central secret management. New client = live in 2 hours.

Scope

The ten layers of a DevOps stack.

At every layer we pick the most appropriate tool — instead of forcing a stack, we balance the team's existing habits against long-term sustainability.

Docker · Buildah · Podman

Containerisation

App is moved into a Dockerfile, multi-stage builds produce minimal images, auto-pushed to the registry. The same image travels dev → staging → prod.

GitHub Actions · GitLab CI · Jenkins · CircleCI

CI pipeline

Every commit: lint, unit + integration tests, static analysis, security scan, image build. Failing code does not merge.

ArgoCD · Flux · Spinnaker

CD pipeline

GitOps approach: git manifests are the single source of truth. ArgoCD reconciles continuously; if drift appears, it auto-corrects or alerts.

EKS · GKE · AKS · k3s

Kubernetes setup

Managed (EKS/GKE/AKS) or self-hosted (k3s, kubeadm). Cluster sizing, node pools, autoscaling and network policy designed together.

Helm · Kustomize

Helm charts

App is packaged as a parameterised chart — different values files for staging and prod. Upgrading a version is one command.

Terraform · Pulumi · OpenTofu

Terraform / Pulumi IaC

Cloud resources (VPC, RDS, S3, IAM) defined as code. Plan → review → apply loop, state remote + locked, with drift detection.

Vault · SOPS · AWS Secrets · Doppler

Secret management

Leak-prone .env files become history. Encrypted with Vault or SOPS — the pipeline fetches them dynamically at runtime, the developer never sees them.

Prometheus · Grafana · Loki · Tempo

Observability

Metrics (Prometheus) + logs (Loki) + traces (Tempo) + dashboards (Grafana). SLO-based alerts and error-budget tracking.

PagerDuty · Opsgenie · Grafana OnCall

On-call & alerting

Which alert wakes whom? Rotation, escalation, alerts linked to runbooks. No more waking the wrong person at 3 a.m.

Snyk · Trivy · Anchore · Dependabot

Security scanning

CVE in the image? Critical vulnerability in a dependency? Pipeline halts the build, opens an auto-fix PR. Shift-left security.

Process

From old pipeline to modern stack in six steps.

We do not flip the whole infrastructure overnight. Phase by phase, without interrupting the business, bringing your team along with us.

Current pipeline audit

Two-week sprint: how do you deploy today, which step is manual, which service is a single point, where are secrets kept.

Containerise applications

Every service gets a Dockerfile, slimmed via multi-stage build, pushed to a registry. Dev + prod same package.

Build the CI pipeline

Lint + test + build + scan automated. When a PR is opened, the bot returns green/red within 6–8 minutes.

Build the CD pipeline

GitOps with ArgoCD or Flux; staging auto on every merge, prod with one approval. Canary + rollback ready.

Wire up observability

Prometheus + Grafana + Loki. SLOs written, alert rules linked to runbooks, on-call rotation set up.

Train the team, hand over

Two-day workshop + documentation + runbook set. Next three months: quiet support, replies within the hour.

Toolset

The technologies we run in production.

Docker Kubernetes GitHub Actions GitLab CI ArgoCD Flux Terraform Pulumi Helm Ansible Prometheus Grafana Loki Sentry Datadog Vault Snyk Trivy

Cases

Before / After — in numbers.

SaaS · 35-person team

2 deploys / week 47 deploys / week

Jenkins → GitHub Actions + ArgoCD. Within 12 minutes of merge, change is in production. Friday-night deploy fear ended.

E-commerce · 4M visitors/mo

MTTR 4 hours MTTR 12 minutes

Prometheus + PagerDuty + runbook automation. Alerts fire before the dip, to the right person, with resolution steps attached.

Fintech · payments platform

No audit log SOC2 Type II certified

Immutable builds, separation of duties, full audit trail, encrypted secrets. Audit-ready in 6 months, zero findings.

B2B SaaS · 200+ tenants

Customer-visible downtime 99.97% SLA met

Rolling updates + readiness probes + canary releases. Not a single customer-visible outage in six months.

Mobile game · 8 countries

Manual TestFlight build Auto build on PR

Fastlane + Bitrise + signing automation. QA sees a new build on their device within 4 minutes of every PR.

Agency · 24 client repos

Every client different One pipeline template

Reusable GitHub Actions workflow + bootstrap CLI. New client opens a repo and is live in 2 hours.

FAQ

Most asked questions

Short answer: no. If your scale is small (1–3 servers in one region, under 20k requests/min), Docker Compose, AWS ECS Fargate, GCP Cloud Run, Fly.io or Render bring far less operational burden. Kubernetes earns its complexity when you have 5+ services, multi-region, auto-scaling, advanced rollout strategies (canary/blue-green) and inter-service mesh needs. The decision is made together in the audit; we will not push you to "follow the trend". If we do go to K8s, we recommend managed (EKS/GKE/AKS) — you do not have to run your own cluster.

Usually no, not immediately. If Jenkins works, we run the new pipeline in parallel (GitHub Actions / GitLab CI / Drone). New services flow through the new pipeline, old ones stay on Jenkins. They co-exist for 2–3 months; once the team is comfortable, migration completes. The "throw it all away and rewrite" approach takes six months and leaves everything half-done. Our experience: a strangler-fig strategy always wins. If Jenkins is genuinely unmaintainable (ancient version, plugin explosion, no one understands it) — we will tell you straight.

Typical schedule: basic CI/CD (containers + test + build + deploy) 2–4 weeks. Adding Helm + Kubernetes +2–4 weeks. Full observability (Prometheus + Grafana + Loki + alerts) +2–3 weeks. Cloud management with Terraform/IaC +3–6 weeks (depends on existing resources). Total: small team 4 weeks, mid-size 8–10 weeks, enterprise 12–16 weeks. Important: phase by phase, business never pauses. You benefit the moment phase one ships; every later phase adds value. We do not wait for a single "everything is ready" moment.

Yes, and we treat that as part of the job. When setup completes: (1) full markdown documentation of the pipeline, (2) two-day hands-on workshop, (3) a 30-day shadow period — we watch in the background, the team leads, we answer questions. A developer's daily life barely changes: git push, pipeline runs. The complexity hides in infrastructure. For the ops side (alert handling, runbooks, rollback), we set up an on-call rotation. After 3 months your team can own the pipeline solo.

Two line items: (a) our setup + 3-month handover service — priced per project, low four figures for a small one, mid five figures for an enterprise build (EUR/USD, fixed price). (b) Ongoing infrastructure cost — you pay this from your own cloud account, we are not a middleman. Typical numbers: small SaaS 200–600 EUR/month cloud, mid 1500–4000 EUR, enterprise 8000+. Important: a correct design typically saves 30–50% of cloud spend (rightsizing, spot instances, auto-scaling). DevOps usually pays for itself within six months.

Yes. AWS + GCP + Azure + Hetzner + DigitalOcean — active production experience on all. Terraform/Pulumi provide vendor-agnostic abstraction; Kubernetes manifests run on every cluster. "Multi-cloud" is often not what you actually want (operational complexity 2.5×) — we usually recommend a single primary cloud + different region (multi-region) + a second provider for DR only. If true multi-cloud is required (e.g. EU data on AWS, US data on GCP) we set that up too. Region choice driven by data sovereignty (GDPR/KVKK) is a frequent conversation.

Yes. We have built pipelines compliant with ISO 27001, SOC2 Type II, PCI DSS, GDPR and KVKK. What we put in place: immutable builds (cannot be altered retroactively), separation of duties (the author cannot deploy), full audit trail (who approved what when), encrypted secrets (Vault/SOPS), encryption at rest + in transit, MFA-gated production access, vulnerability scan on every build. Auditor evidence packs come as ready templates. We have 6+ cases where independent auditors signed off on pipelines we built — references available.

When a problem is spotted after a release, two paths. (1) Kubernetes rollout undo: the previous revision still sits in Deployment history; one command or one ArgoCD UI button — the old image is live again in seconds, not minutes. Typical time: 12–25 seconds. (2) If a database migration is involved, more care: reverse migration first, then container rollback. That is why we always design migrations as backward-compatible (NULL-allow → backfill → NOT-NULL, two-step pattern). Automated rollback rule: an SLO breach or error-rate spike halts ArgoCD's progressive rollout and rolls back if needed. Human intervention is optional.

Free pipeline audit

Make releases a non-event.

In a 30-minute call we review your current pipeline together. Where the wins are, which steps are critical — we draft a concrete 90-day plan.

Schedule a meeting Get in touch

Fixed price, fixed scope 3-month handover support German DevOps standard

Get to Know Partnerfy

Why Partnerfy?

Resources & Support

Replace "hope it works"
with git push origin main.

"Hope-it-works" releases are burning your team out.

A self-healing, continuously improving production.

The four core indicators of production health

Roll back to previous revision

Where it makes the biggest difference.

SaaS shipping daily

E-commerce with frequent A/B tests

Fintech under regulation

Multi-tenant B2B

CD to TestFlight / Play

Startup migrating monolith → services

Enterprise modernising legacy

Agency running 20+ client repos

The ten layers of a DevOps stack.

Containerisation

CI pipeline

CD pipeline

Kubernetes setup

Helm charts

Terraform / Pulumi IaC

Secret management

Observability

On-call & alerting

Security scanning

From old pipeline to modern stack in six steps.

Current pipeline audit

Containerise applications

Build the CI pipeline

Build the CD pipeline

Wire up observability

Train the team, hand over

The technologies we run in production.

Before / After — in numbers.

Most asked questions

Make releases a non-event.

Replace "hope it works" with git push origin main.

"Hope-it-works" releases are burning your team out.

A self-healing, continuously improving production.

The four core indicators of production health

Roll back to previous revision

Where it makes the biggest difference.

SaaS shipping daily

E-commerce with frequent A/B tests

Fintech under regulation

Multi-tenant B2B

CD to TestFlight / Play

Startup migrating monolith → services

Enterprise modernising legacy

Agency running 20+ client repos

The ten layers of a DevOps stack.

Containerisation

CI pipeline

CD pipeline

Kubernetes setup

Helm charts

Terraform / Pulumi IaC

Secret management

Observability

On-call & alerting

Security scanning

From old pipeline to modern stack in six steps.

Current pipeline audit

Containerise applications

Build the CI pipeline

Build the CD pipeline

Wire up observability

Train the team, hand over

The technologies we run in production.

Before / After — in numbers.

Most asked questions

Make releases a non-event.

Replace "hope it works"
with git push origin main.