Only considering candidates eligible to work in the USA ⚠️

Description

*Preference for applicants located in Eastern time zone

*Full-time with benefits

What you will do

You will be responsible for building, scaling, and maintaining highly available systems that power our production environments and machine learning workloads. You’ll work closely with software engineers, data scientists, and platform architects to ensure our microservice and ML infrastructure runs reliably, securely, and efficiently across Kubernetes-based environments.

Responsibilities

  • Design, build, and maintain scalable, secure, and reliable cloud-native infrastructure on Kubernetes (EKS/GKE/AKS).
  • Architect and manage microservice deployments, ensuring consistent CI/CD and service reliability.
  • Experience with service development in Golang and Python
  • Partner with ML and Data teams to design, optimize, and monitor ML/AI workflows (Databricks, Spark, Flyte, Airflow, or similar).
  • Drive improvements in system architecture, including performance, scalability, fault tolerance, and cost optimization.
  • Establish and enforce SLOs/SLIs, conduct incident postmortems, and improve production reliability and developer velocity.
  • Support secure infrastructure practices (IAM, secret management, policy-as-code, compliance controls).
  • Mentor junior engineers and contribute to best practices across observability, infrastructure-as-code, and production readiness.
  • Perform additional related duties as required.

Who you are

We value team members who are innovative, driven, and enthusiastic. You are a seasoned Site Reliability Engineer with a passion for building resilient, scalable, and highly available systems. You thrive at the intersection of software development and infrastructure, using automation and observability to ensure seamless, reliable service delivery.

Requirements

  • Bachelor’s degree in a related field or equivalent work experience.
  • At least 8 years of industry experience in software, systems, or DevOps engineering.
  • Strong expertise in Kubernetes (deployment, scaling, networking, monitoring, debugging).
  • Competency in Golang and Python.
  • Solid understanding of distributed systems, cloud architecture, and container orchestration.
  • Experience building and maintaining microservice-based architectures in production.
  • Proficiency with CI/CD pipelines (GitLab CI, ArgoCD, Flux, or similar).
  • Deep experience with monitoring/observability stacks (Datadog, Prometheus, Grafana, OpenTelemetry).
  • Experience designing or operating ML workflows and data pipelines (e.g., Databricks, Spark, MLflow, Flyte) is preferred.
  • Background in system design or infrastructure architecture is preferred.
  • Exposure to multi-cloud environments (AWS, GCP, Azure) is preferred.
  • Experience with security, compliance, and automation in production-grade environments is preferred.
  • Contributions to open-source projects or internal platform tooling is a plus.
  • Experience leading large-scale migrations or SRE transformation initiatives is a plus.
  • Familiarity with service meshes (Istio, Linkerd) and API gateways (Kong, Envoy) is a plus.

Who we are

At Clearspeed, we are driven by corporate DNA committed to the service of others and a passion for our AI-enabled technology that is redefining risk assessment. Our fast-growing team spans operations across the US, Canada, and EMEA. Together our team of brilliant minds, diverse in expertise and experience, collaborate and contribute to a shared vision of enabling trust, faster.

Our company is committed to providing equal employment opportunities to all individuals. We value diversity and inclusion in our workplace, and we welcome and encourage individuals of all backgrounds to apply and join us on our journey to build trust faster.

Why join us?

*Impactful work

*Collaborative environment

*Work-life balance

*Remote/hybrid work flexibility

Our benefits (may vary based on geographical location)

*Competitive compensation: salary + performance-based bonuses

*Stock options

*Unlimited paid time off

*Health and wellness coverage

Join us at Clearspeed and be a part of our success story. Together, we can make a difference!

Salary

Salary range is based on national benchmark data of comparable roles, skills and experience level. Exact compensation will be based on individual skills and experience, and location which will be assessed during the interview process.

Salary Description

Salary range: $138,000 - $213,000

Automatically Apply to Remote Engineering Jobs!

Let our copilot automatically search & apply to remote jobs from all across the web.

Try it now
Clearspeedclearspeed.com

Clearspeed delivers AI-powered voice analytics that help global organisations detect fraud, verify trust and fast-track low-risk customers.

Working Week

We work a 9 day fortnight. That's 9 slightly longer days over 2 weeks (80 hours total), with every second Friday off.

  • Mon
  • Tue
  • Wed
  • Thu
  • Fri
  • Mon
  • Tue
  • Wed
  • Thu
  • 🏖️
    Fri

Remote Working Policy

Headquartered in San Diego with remote employees across North America, EMEA and APAC.

Company Benefits

  • Medical cover in primary country
  • 401(k) or local equivalent
  • Stock options
  • Equipment allowance
  • Annual learning budget

Desirable Skills and Experience

Share this job:

Report incorrect data

Let us know if the job has expired