About the Position
As a member of the SRE team, you will bring a collaborative style in leading efforts that raise the maturity levels of the engineering practices across all agile teams delivering our products. The tools and use-cases are diverse, and our challenge is to increase the development velocity by optimizing various parts of the pipeline and increase application stability. Much of our software development focuses on optimizing existing systems by measuring elasticity and saturation, building infrastructure through IAC, and eliminating /reducing toil through automation. We also look to instill core SRE practices into the engineering teams including measuring SLIs/SLOs, increasing visibility/observability through monitoring tools, guide chaos engineering efforts to improve overall resiliency, and lead Gameday/Production Readiness reviews across all engineering disciplines. We’re experts in AWS and use cutting edge tools developed in-house and open-source software and enable teams to deploy faster with zero downtime.
We are looking for engineers who are passionate about automation, like planting the seeds of DevOps in an organization and watch the organization benefit and grow from your ideas, and own best practices facilitated by SRE principles to build scalable and highly reliable applications.
If you love to figure out how all the pieces are put together and if automation and building tools to monitor and manage your applications sounds interesting to you, we want to talk to you.
Cox Automotive is transforming the way the world buys, sells and owns cars. Come join the transformation!
Primary Responsibilities and Essential Functions:
As a Senior Site Reliability Engineer at Cox Automotive you will:
Have a natural tendency to avoid toil and want to automate it away
Automate anything and everything! (testing, deploying, monitoring, etc)
Take complex and not maybe well-defined problem and come up with a technically reasonable solution
Take ownership of processes or solutions that can be shared across teams globally
Build and rollout solutions to be consumed by multiple teams
Have innate curiosity about how things work
Design and assist in the authoring of software tools that reliably manage application delivery & performance
Define requirements, functional specifications, and deliverables
Create automated delivery pipelines for deployment of internal and third-party services
Design and assist in the setup and maintenance of application monitoring and alerting
Engage with product/capability engineering teams to ensure best practices are implemented
Improve predictability and reliability of software releases, workflows, and operating software.
Reduce application deployment windows by leading engineering teams towards a Continuous Deployment environment
Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automating recovery.
Facilitate Gamedays and Production Readiness reviews to continue increasing resiliency in our applications
Provide consulting expertise in AWS, cloud design, and operations
Identify new technologies that can improve our area of responsibility, design and conduct proofs-of-concept, and communicate results throughout the organization
Minimum Qualifications:
Bachelor’s degree in Computer Science or related field and 4+ years of relevant experience
Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
Ability to debug, optimize code, and automate routine tasks
Systematic problem-solving approach, coupled with effective communication skills and a sense of drive
Understanding of Linux/Windows operating systems
Experience with Python or PowerShell or related scripting languages
Experience with configuration management systems (Spinnaker, Chef, Puppet, or Ansible)
Experience rolling out highly available, mission-critical applications
Experience with version control systems (Git or SVN) and branching strategies
Experience with Cloud Computing platforms (Amazon AWS, Kubernetes, Heroku, etc)
Experience in release engineering / automation with cloud environments
Experience with security and network / distributed computing concepts
Experience with continuous integration tools (Jenkins, GitHub Actions, CircleCI, TeamCity, etc), Artifactory (or Nexus)
Experience with Database Server infrastructure (RDS, Aurora, DynamoDB, MySQL, Postgres, etc)
Experience with agile development, continuous integration and automated testing
Experience with Infrastructure as Code (Terraform or CloudFormation)
Excellent written communication, problem solving, and process management skills
Desire to work in a fast paced, evolving, growing, dynamic environment
Job ID: 125721
Apple Retail is where the best of Apple comes together. We bring our expertise t...
QM Specialist The following position is open in Jalisco, ...
ResponsibilitiesBeing a 7-Eleven Area Leader isn’t easy. In fact, itâ€...
ResponsibilitiesThe Area Leader is responsible for directly driving sales and pr...