Site Reliability Engineer

Lucidya

In June 2024 we announced that we will be switching to a 4 day work week - the first company in Saudi Arabia to do so!

Only considering candidates eligible to work in Riyadh, Saudi Arabia ⚠️

We are looking for a Site Reliability Engineer (SRE) to join Lucidya Cloud Engineering team and contribute to improving the reliability, scalability, and automation of our cloud-based infrastructure. The ideal candidate will have hands-on experience with cloud environments, containerized workloads, automation tools, and monitoring systems, as well as a proactive mindset for enhancing system availability and performance.

Key Responsibilities:

  1. Infrastructure Reliability
  • Ensure high availability (HA) and scalability of critical infrastructure components (e.g., Redis, RabbitMQ, Kubernetes clusters).
  • Proactively identify and eliminate single points of failure across the cloud environment.
  • Linux Systems Administration: Handle infrastructure management tasks such as patching, performance tuning, and monitoring of Linux-based systems.
  1. Cloud Operations
  • Manage and optimize cloud-based workloads across AWS, GCP, or Azure.
  • Automate provisioning, scaling, and maintenance tasks using Infrastructure as Code (IaC) tools such as Terraform, AWS CloudFormation, or similar.
  1. Kubernetes Clusters
  • Manage the day-to-day operations of Kubernetes clusters, including deployment, scaling, upgrades, and troubleshooting.
  1. Monitoring and Incident Response
  • Implement and standardize monitoring solutions using tools like Datadog, Prometheus, or Grafana to track golden metrics and improve alerting systems.
  • Participate in on-call rotations, troubleshoot incidents, and drive post-incident reviews to implement lasting solutions.
  1. Automation and Scripting
  • Develop and maintain automation scripts for routine operational tasks to reduce manual efforts and increase efficiency.
  • Advocate for AWX/Ansible adoption to automate configurations and deployments.
  1. Collaboration and Best Practices
  • Work closely with DevOps and Engineering teams to identify and resolve performance bottlenecks.
  • Contribute to the establishment of best practices for infrastructure and application reliability.

Key Requirements:

  1. Experience and Knowledge
  • ~ 3 years of experience in a similar SRE, DevOps, or Infrastructure Engineer role.
  • Strong experience with at least one major cloud provider (AWS, GCP, or Azure).
  • Hands-on experience with Kubernetes and containerization (e.g., Docker).
  1. Technical Skills
  • Proficient in scripting languages such as Python, Bash, or similar for automation.
  • Familiarity with Infrastructure as Code (IaC) tools like Terraform, Pulumi, or AWS CloudFormation.
  • Strong understanding of load balancers, networking (IP management, subnetting), and HA architecture.
  • Experience with CI/CD tools (e.g., Bitbucket Pipelines, Jenkins, GitHub Actions).
  1. Monitoring and Observability
  • Experience with modern monitoring and observability tools (e.g., Datadog, ELK, Grafana).
  • Ability to define and track golden metrics and establish meaningful alerting thresholds.
  1. Problem Solving and Troubleshooting
  • Strong analytical skills and ability to resolve complex technical issues.
  • Proven track record in root cause analysis and incident management.
  1. Soft Skills
  • Excellent communication and collaboration skills to work across teams.
  • Self-motivated and proactive in improving systems and processes.
Lucidyalucidya.com

AI-powered platform for analyzing customer data and enhancing experiences.

Working Week

In June 2024 we announced that we will be switching to a 4 day work week - the first company in Saudi Arabia to do so!

  • Mon
  • Tue
  • Wed
  • Thu
  • 🏖️
    Fri

Our Vacation Policy

We offer 30 vacation days per year, including public holidays, with flexible time off policies.

  • 30 days
  • 52 Fridays
  • 82 days off per year

Remote Working Policy

We mostly work on-site but offer some hybrid working for certain positions.

Company Benefits

  • Health insurance
  • 401(k) company contribution
  • Equity / options
  • Professional Development Budget

Desirable Skills and Experience

Share this job:

Report incorrect data

Let us know if the job has expired