Job Requirements
Job benefits
-
Flexible work hours
Productivity curve is not something steady and consistent as it depends on each person's unique traits and preferences. At our company, as long as your team is in sync and your goal is hit, you can flexibly decide when you want to work.
-
Remote work options
Thanks to technology, we no longer have to be physically present at the office to be productive. Joining our company allows you to work anywhere without place-constraint.
-
Medical insurance
To ensure your health and wellbeing, you have various medical plans to choose from depending on your situation and unique needs. From partial up to full medical coverage, we got you covered.
-
Professional Development
Every employee is an invaluable asset to any team; that's why we want to help you grow. Level up your skills and expertise through our professional co-development programs with notable organizations. We will cover the cost.
This job post is managed by
Skills
Job description for Senior Site Reliability Engineer (Remote possible) at Glints
- Work with other SREs to implement a comprehensive alerting and monitoring system to surface issues before they become major production issues
- Maintain and optimize our deployment and release workflow and supporting tools to support >= 50 engineers
- Triage issues that may arise in production and participate in incident response as needed
- Partner with Chief Software Architect and other engineers to perform capacity planning, configuration and secrets management of new and existing services
- Participate in disaster recovery procedures as needed
- Key strategic role and contributor: You will be an early member of the Platform Engineering team and have the chance to experience first-hand how it’s like to scale platform services for a growing engineering team.
- Cutting edge technologies: You will be able to learn and work on Kubernetes, Docker and other cloud-native technologies, and alongside ML/AI teams to operationalize their models. We have no legacy or on-premise infrastructure.
- Composure: When production issues occur, SREs should be able to maintain composure and systematically identify root causes
- Good Communication Skills: This role cuts across many service teams and requires coordination with them.
- Infrastructure-as-Code: Glints uses Terraform to provision infrastructure and Helm and Kubernetes files to provision services.
- Cloud Computing & Containers: Glints runs on cloud infrastructure, using a mix of AWS, GCP and DigitalOcean. We use Kubernetes, Docker and Linux extensively.
- Monitoring, Logging and Alerting Tools: Glints uses the ELK stack and plans to deploy monitoring and alerting using Prometheus and Grafana (or similar).
- Deployment Automation: Glints uses GitLab CI/CD with shell scripts and Helm charts to deploy to target environments.