Site Reliability Engineer

Job Overview

Location
Ibague, Tolima
Job Type
Full Time Job
Job ID
115193
Date Posted
10 months ago
Recruiter
John Jennifer
Job Views
95

Job Description

 Site Reliability Engineer

 (SRE) / Cloud Engineer We are looking for an experienced Site Reliability Engineer who will be an integral member of a global team charged with running our production cloud systems. Here you will be performing typical operations work amongst development teams as an engineer focused on eliminating toil and inefficiency. The ideal candidate should have strong experience and expertise in running best-in-class and modern cloud infrastructure, operations, and observability. You will have the opportunity to help decide what we as a team focus on and what paths we take as part of a brand new SRE team. You will be amongst peers with much experience who desire to help you in your career growth. This team will promote positivity, shared ownership, accountability, and self-initiative. SRE duties include:• Commitment to continually re-defining reliability goals, service-level objectives, measuring those goals, and working to improve our services as needed• Become a master of hands-off Administration of Kubernetes running on Azure AKS• Participate in On-Call rotation to respond to availability incidents and provide support for service engineers with customer incidents• Use your On-Call shift to prevent incidents from happening again• Follow an "automate all things" approach to service delivery and management• Efficiently coding and deploying Infrastructure using Terraform, Terraform Cloud, and AzDo• Make monitoring and alerting trigger on symptoms and not on outages• Completing Root Cause Analysis (RCA) investigations and blameless postmortems• Performing Readiness Reviews with internal service teams• Plan the growth and control the costs of our infrastructure• Create scalable and extendable patterns to apply across multiple teams Qualifications• Bachelor's degree in CS, engineering, software engineering, or related field.• Minimum of 5-10 years combined Operations & Software Development / Engineer experience with a preference of DevOps or SRE roles• Experience with at least one programming, scripting language (Preferences: PowerShell, C#, Python, go)• Solid understanding in the challenges and trade-offs to be made when building and deploying systems to production• Kubernetes certifications or an interest in obtaining these certifications are a big plus: (Certified Kubernetes Administrator (CKA) and Certified Kubernetes Security Specialist (CKS))• Experience with large scale distributed cloud service development, infrastructure, traffic management, and architecture• Good self-awareness, accountability, conflict resolution skills, and great at receiving feedback Skills:• Kubernetes, Terraform, Azure DevOps, Microsoft Azure, PowerShell, C#, PagerDuty, GitOps, SRE, DevOps, Infrastructure as Code (IaC), Operations, Cloud, Docker, Helm, Flux

Job ID: 115193

Similar Jobs

Cargill

Full Time Job

Site reliability engineer Site reliability engineer

A Typical Work Day May Include: • Completing preventative, predictive, ...

Full Time Job

Deloitte

Full Time Job

Site reliability engineer Site reliability engineer

Are you looking to elevate your cyber career? Your technical skills? Your opport...

Full Time Job

Cargill

Full Time Job

Site reliability engineer Site reliability engineer

Cargill Animal Nutrition is a global business that serves large-scale feed mill ...

Full Time Job

Veolia

Full Time Job

Site reliability engineer Site reliability engineer

Primary Duties / Responsibilities:● Assist in daily operational troublesho...

Full Time Job

Cookies

This website uses cookies to ensure you get the best experience on our website.

Accept