Deskripsi pekerjaan DevOps Engineer Pt Solusi Teknologi Niaga
Company Description
Qasir.id is a technology company focused on empowering micro, small, and medium enterprises (MSMEs) through a powerful yet easy-to-use mobile point-of-sale (POS) application. The company is committed to building tools that support the growth and long-term success of MSMEs across Indonesia and beyond. Our engineering team operates in a remote-first environment, building and maintaining a high-scale microservices platform running on a self-managed RKE2 Kubernetes cluster.
About the Role
We are looking for a Engineer (SRE) who will take ownership of our production infrastructure and platform reliability. This is a solo contributor role, working directly with the CTO and supported by an external infrastructure vendor and AI-powered tools. The ideal candidate is comfortable working independently, troubleshooting complex systems, and continuously improving operational excellence.
Key Responsibilities
Site Reliability Engineering
- Manage Kubernetes clusters, virtual machines, containers, and infrastructure components - RKE2 preferred
- Monitor system health and performance using Grafana, Prometheus, Loki, Tempo, and Alertmanager
- Respond to production incidents, perform root cause analysis, and create post-mortem documentation
- Maintain platform reliability and availability targets
- Utilize AI tools to accelerate log analysis, alert triage, and troubleshooting
DevOps
- Manage and improve CI/CD pipelines using GitLab
- Coordinate application deployments with engineering teams
- Work with external vendors for infrastructure and networking tasks
- Create and maintain operational documentation and runbooks
Required Qualifications
- Experience in DevOps, Site Reliability Engineering, or Platform Engineering
- Hands-on experience managing Kubernetes environments (RKE2 experience is a strong plus)
- Experience with Docker, Helm, GitLab CI/CD, or GitHub Actions
- Strong Linux system administration skills
- Experience with observability and monitoring tools: Grafana, Prometheus, Loki, Tempo, Alertmanager
- Solid understanding of networking fundamentals (DNS, TCP/IP, Load Balancing, Firewall)
- Working knowledge of MySQL, PostgreSQL, and Redis
- Familiarity with AI-assisted engineering tools such as GitHub Copilot, Cursor, Claude Code, or similar
- Ability to work independently and manage priorities with minimal supervision
Preferred Qualifications
- Experience with Istio or Service Mesh technologies
- Infrastructure as Code using Terraform or Ansible
- Experience working with AWS or GCP
- Knowledge of Vault or other secrets management solutions
- Experience supporting high-availability production environments
We're looking for someone who:
- Takes ownership and proactively solves problems
- Has strong analytical and troubleshooting skills
- Stays calm and methodical during production incidents
- Writes clear technical documentation and runbooks
- Communicates effectively with engineering teams and external vendors
- Enjoys automation and continuous improvement
What We Offer
- Fully Remote Working Environment
- AI Tools & Productivity Budget
- Modern Cloud-Native Technology Stack
- Direct Collaboration with the CTO
- High Ownership & Real Technical Impact
- Opportunity to Shape and Improve a Large-Scale Production Platform
If you're passionate about DevOps, Kubernetes, automation, and building reliable production systems, we'd love to hear from you.
