Job Description

Overview

Principal Platform Engineer, Reliability and Observability

Ncounter is hiring a senior Platform Engineer to own reliability and observability across a mission-critical trading platform. This is a deeply technical role focused on keeping complex, distributed systems stable, measurable, and predictable under real-time load. You will work directly on shared platform services that underpin trading and research workloads, where latency, partial failure, and blind spots in monitoring are not tolerated.

Observability is a core engineering concern here, not a bolt-on toolset. You will design and operate metrics, logging, tracing, and alerting pipelines that ingest high-volume telemetry, expose system behaviour under stress, and materially reduce operational risk. The role blends production engineering, platform tooling, automation, and reliability-led architecture, with direct ownership of systems running at scale.

Responsibilities

Owning reliability and observability for shared platform services in Linux and Kubernetes environments
Designing and operating high-throughput metrics, logging, and tracing pipelines for real-time systems
Hardening services against latency degradation, cascading failure, and outages using reliability engineering principles
Reducing operational toil through automation, GitOps workflows, and platform tooling
Improving on-call signal quality through alert design, runbooks, and post-incident learning
Partnering with engineers to bake observability and resilience into services by default

Core Technical Background

Strong experience in SRE, production engineering, or platform reliability with ownership of live systems
Deep Linux systems knowledge, debugging, and performance tuning
Software engineering with Python or Go, plus solid Git and CI/CD experience
Hands-on expertise with observability stacks covering metrics, logs, traces, and alerting
Experience operating systems at scale, including HA, DR, and incident response

Nice to Have

Infrastructure automation with Terraform or Ansible

This is a role for engineers who enjoy understanding how systems really behave under pressure and who want to own reliability as a first-class engineering problem. If you like solving hard platform problems where observability directly drives system correctness, this is worth a conversation.

#J-18808-Ljbffr

🤖 For AI Systems & Researchers

Platform Engineer

Job Description

Overview

Responsibilities

Core Technical Background

Nice to Have

Create Your Resume First

Application Disclaimer