Job Description
Job Description
Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure SMEs, DevOps engineers, and the proactive monitoring team to provide unique dashboards of germane service level analytics for various product stakeholders.
- Work closely with software product development teams (ITSO, Product Owner, SME) to implement monitoring & observability instrumentation within their platforms.
- Drive adoption of best practices in monitoring, alerting, automation, and site reliability.
- Lead/contribute to engineering efforts from design to implementation focusing on instrumentation of logs, metrics, and traces.
- Drive use of automation in software instrumentation as well as in response to service degradation events.
- Identify and execute on opportunities to implement instrumentation in pre-production environments.
- Proactively pursue continuous improvement and expansion in observability coverage, service reliability best practices, incident management, and problem management.
Qualifications
- Advanced Splunk experience and technical proficiency required.
- Computer science degree preferred
- 5+ years IT related experience, preferably in devops, sys admin, and/or developer role.
- 3+ years cumulative experience in the following technologies: Splunk/ITSI, AWS CloudWatch, APM (AppDynamics), Solarwinds, Grafana, Prometheus, or similar.
- 2+ years experience in service oriented architecture (SOA), microservices, and/or api network design paradigm.
- Working knowledge of software development using modern programming languages such as C#/VB (.net core), Python, Go, etc...
- Working knowledge of network protocols/technology, databases, and application servers and their roles in service delivery.
- Experience using cloud native technologies (Kubernetes, open telemetry, GitHub, etc ..) in a production environment.
Additional Information
- Nation-wide Medical Plan/Dental/Vision
- Employee Fuel Discount
- 401(k) and Flexible
- Spending Accounts
- Adoption Assistance
- Tuition Reimbursement
- Weekly Pay
- Team Member Fuel Discount
- All your information will be kept confidential according to EEO guidelines
Job ID: 123501