Principal Site Reliability Engineer

Zayo Group
vision insurance, parental leave, paid time off, 401(k)
United States, Colorado, Denver
1401 Wynkoop Street (Show on map)
Nov 14, 2024
Company Description Zayo provides mission-critical bandwidth to the world's most impactful companies, fueling the innovations that are transforming our society. Zayo's 141,000-mile network in North America and Europe includes extensive metro connectivity to thousands of buildings and data centers. Zayo's communications infrastructure solutions include dark fiber, private data networks, wavelengths, Ethernet, and dedicated Internet access. Zayo serves wireless and wireline carriers, media, tech, content, finance, healthcare and other large enterprises. Do you dream in high scalable systems, thrive in fast-paced environments and enjoy tackling complex technical challenges? Are you passionate about building and maintaining highly reliable, scalable systems? If so, Zayo is seeking a Principal Site Reliability Engineer (SRE)! We're looking for a talented Principal Site Reliability Engineer to play a critical role in ensuring the uptime, performance, and scalability of our critical infrastructure. Responsibilities: Automation: Develop and implement automation solutions to streamline operations and reduce manual effort. Monitoring and Alerting: Design and implement effective monitoring and alerting systems to proactively identify and address issues. Incident Management: Own the incident lifecycle, from leading root cause analysis and resolution to implementing preventative measures to avoid future occurrences. Be on-call to diagnose and resolve critical service outages. Reliability Engineering: Proactively identify and mitigate potential system risks, focusing on automation, monitoring, and tooling to ensure high service availability. Scalability and Performance: Design and implement solutions to ensure our infrastructure can handle ever-growing demands while maintaining optimal application performance. Collaboration: Work closely with developers, product managers, and other engineers to translate business needs into robust and reliable technical solutions. Become the beacon for best practices and efficient processes throughout the organization. Qualifications: Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience.) Minimum of ten (10) years of experience in a Site Reliability Engineering or related role. Strong understanding of system administration, Linux, and scripting languages (Python and various shells.) Expert at developing automation tools for monitoring, alerting, and deployment to ensure efficient and reliable operations. Expert at designing and implementing monitoring systems at scale. Expert at container orchestration (Kubernetes and Docker.) Experience with monitoring platforms such as SevOne, Assure1, and Nagios and various vendor NMS systems. Previous work in large scale distributed production environments. Experience with a variety of cloud platforms and tools (AWS, Google, etc.) Experience with a variety of monitoring and alerting tools (Prometheus, Grafana, Cacti, etc.) Strong working knowledge of networking concepts and application protocols, especially TCP/IP, BGP, DNS, TLS, and HTTP/S. Experience with infrastructure management tools such as Ansible, Terrafor, Puppet, to deploy and manage infrastructure at scale. Proven leadership skills, with the ability to mentor and inspire others. Excellent problem-solving, analytical, and critical thinking skills. A passion for automation and building efficient systems. *Bonus Points if you have:* Experience working with various vendor APIs (or netconf) including Nokia, Juniper, Fujitsu, Infinera, Cisco, and Ciena. Experience with various network orchestration platforms such as Ciena Blue Planet MDSO, Cisco NSO, Nokia NSP, or others. Base Salary Range: $147,800 - $180,600 USD/annually, commensurate with experience. #LI-NP1 Benefits, Rewards & Wellness Excellent Health, Dental & Vision Insurance Retirement 401(k) Savings Plan Fitness membership discounts Generous paid time off policy including paid parental leave Zayo provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, provincial or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.