Job Requirements
Job benefits
-
Flexible work hours
Productivity curve is not something steady and consistent as it depends on each person's unique traits and preferences. At our company, as long as your team is in sync and your goal is hit, you can flexibly decide when you want to work.
-
Remote work options
Thanks to technology, we no longer have to be physically present at the office to be productive. Joining our company allows you to work anywhere without place-constraint.
-
Medical insurance
To ensure your health and wellbeing, you have various medical plans to choose from depending on your situation and unique needs. From partial up to full medical coverage, we got you covered.
-
Vacation & Leaves
Feel a need for a short break from work? our company is quite flexible when it comes to leaves; be it for vacation, sick, personal, or mental health days. Simply discuss what you need and we will try to cater to those.
This job post is managed by
Skills
Job description for Site Reliability Engineering Lead at Glints
- Working closely with the Chief Software Architect to develop a holistic SRE roadmap that improves our reliability and performance
- Leading other SREs to develop and implement a comprehensive alerting and monitoring system to surface issues before they become major production issues
- Maintaining and optimizing our deployment and release workflow and supporting tools to support >= 50 engineers
- Implementing a triage system for issues that may arise in production and lead incident response as needed
- Partnering with the Chief Software Architect and other engineers to perform capacity planning, configuration and secrets management of new and existing services
- Maintaining, testing and executing disaster recovery procedures as needed
- Composure: When production issues occur, SREs should be able to maintain composure and systematically identify root causes
- Good Communication Skills: This role cuts across many service teams and requires coordination with them.
- Infrastructure-as-Code: Glints uses Terraform to provision infrastructure and Helm and Kubernetes files to provision services.
- Cloud Computing & Containers: Glints runs on cloud infrastructure, using a mix of AWS, GCP and DigitalOcean. We use Kubernetes, Docker and Linux extensively.
- Distributed Systems: Many services requiring HA also require understanding of key characteristics of distributed systems.
- Monitoring, Logging and Alerting Tools: Glints uses the ELK stack and plans to deploy monitoring and alerting using Prometheus and Grafana (or similar).
- Deployment Automation: Glints uses GitLab CI/CD with shell scripts and Helm charts to deploy to target environments. Knowledge of TypeScript, Go and or Python is a plus as these are the main languages in use.
- Documentation: It’s important to externalize operational knowledge onto an easy-to-access location.