𝗖𝗼𝗿𝗲 𝗦𝗸𝗶𝗹𝗹𝘀 & 𝗥𝗲𝘀𝗽𝗼𝗻𝘀𝗶𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 (𝗠𝘂𝘀𝘁-𝗛𝗮𝘃𝗲)
✔️ 6+ years of hands-on experience as a 𝗦𝗶𝘁𝗲 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 (𝗦𝗥𝗘)
✔️ Define and manage 𝗦𝗟𝗜𝘀, 𝗦𝗟𝗢𝘀, and 𝗘𝗿𝗿𝗼𝗿 𝗕𝘂𝗱𝗴𝗲𝘁𝘀 across critical systems
✔️ Design and maintain highly reliable, scalable, and fault-tolerant production environments
✔️ Drive toil reduction, automation, and self-healing systems using Python / Bash
✔️ Strong Linux system administration experience in production environments
✔️ Hands-on experience managing Kubernetes or OpenShift platforms
✔️ Implement Infrastructure as Code (Terraform / Ansible)
✔️ Build and optimize CI/CD pipelines using GitLab, Jenkins, or Azure DevOps
✔️ Implement safe deployment strategies (Blue-Green, Canary, Rolling deployments)
✔️ Hands-on experience with AWS, Azure, or Private Cloud infrastructure
✔️ Own incident management, on-call rotations, RCA, and post-incident reviews
✔️ Implement and manage observability & monitoring using Prometheus, Grafana, ELK/EFK
✔️ Reduce alert noise and improve alert quality to avoid alert fatigue
✔️ Collaborate with development and platform teams to improve system reliability
✔️ Ensure compliance with security, governance, and standards (ISO, NIST, ITIL)
✔️ Experience supporting high-availability, mission-critical systems
𝗣𝗿𝗲𝗳𝗲𝗿𝗿𝗲𝗱 / 𝗚𝗼𝗼𝗱 𝘁𝗼 𝗛𝗮𝘃𝗲
➕ Experience in large-scale government or regulated environments
➕ Exposure to microservices-based architectures
➕ Familiarity with chaos engineering / resilience testing tools
➕ Strong stakeholder communication and documentation skills
Language skills: Arabic, English