SDE I I Observability Platform
Who we are:
grofers is leading the charge in transforming India’s vast, unorganised grocery landscape through cutting-edge technology and innovation. We believe every Indian deserves the opportunity to continually improve their life – a process that often begins at home. As part of our mission of helping consumers make healthier, better choices when buying everyday products, we make a wide range of high-quality grocery and household products accessible, affordable, and available right at their doorsteps.
Built on a proprietary technology stack, the grofers platform serves as a convergence of consumers looking for everyday essentials, partner stores who serve their needs efficiently, and manufacturers looking for a channel to reach a nation of consumers. While our technology caters to the burgeoning population of urban India, it is ready and poised to serve the next 100+ million Indians who are yet to start shopping online.
We believe the ecosystem we power can transform the lives of a billion Indians significantly over the coming decade. They will have access to everyday essentials like groceries at the best value, be able to discover products that improve their health and wellbeing, and spend more meaningful time with their families – with the assurance that their essential needs are being looked after by us. On the other side of this virtuous cycle are the millions of local businesses catering to a nation’s needs, helping create more opportunities for employment, growth, and above all, a better life.
It's a $600 Billion challenge to solve, which is why we are looking at hiring smart, articulate and ambitious individuals to be a part of the team building the future at grofers. If this seems exciting to you, join us! Read more about us here.
Why you will love working with us:
- Customer love: We always put the interests of customers ahead of our own. We work hard to earn and keep their trust, and to bring them delight
- Bias for action: We dream big, take risks and have a strong bias for action. In difficult situations we make sound decisions and take thoughtful action
- Frugality: We are always looking for ways to do more with less - by creating the highest leverage possible with our time, as well as resources
- Confidence: We are tenacious and optimistic, and do not take no for an answer. Our people are quietly confident and openly humble
- Challenge status-quo: We are candid, authentic and transparent. We speak our mind, make connections that others miss and take smart risks
- Learner’s mindset: We keep learning and evolving to be able to meet our audacious goal of empowering every Indian to lead a better life
About the Observability Platform team:
We are setting up the Observability Platform Team which will be responsible for system monitoring and all operational insights. The goal of this team is to provide our developers with platforms and tools to understand how their services are performing in production. You will achieve this by combining industry-leading third-party solutions with our own in-house developed solutions. Your work will spread across the entire stack building and maintaining libraries and infrastructure for metrics, error tracking, alert routing, and incident management. You will work closely with developers to understand their requirements and their usage of technologies. You will also work closely with Cloud Platform, Kubernetes Platform, CI/CD teams to understand the continuous improvements in the respective platforms and build capabilities within the observability platform to support these advancements.
About the role:
As a database reliability expert, you will be responsible for maintaining a healthy database infrastructure and database usage practices for optimal performance, reliability, cost, security and compliance.
You will be collaborating closely with engineers on the infrastructure team as well as product engineering teams (owners of microservices and their databases) to understand their usage of Postgres and helping them optimize the database infrastructure as well as recommending fixes in applications. You will be expected to build solutions in collaboration with developers to help them meet their uptime requirements. This may involve advising on schema design decisions, reviewing changes in database configuration, reviewing existing usage of databases using monitoring tools to identify performance bottlenecks, analyzing the architecture for availability and scalability, building disaster recovery plans, securing databases for unauthorized changes and access to sensitive data, and helping resolve production incidents to closure.
Another responsibility for this team is to build monitoring and observability tools that help developers identify incidents caused by databases quickly and accurately, thus reducing MTTR for production incidents.
We’re looking for people who have been developers and have a strong background and interest in systems and databases. We’d love to hear from you whether you’re a seasoned database admin, or whether you’ve just learned you might like working with databases.
What you will do:
- You will be working to ensure the availability of systems which build the grofers monitoring and observability platform
- You will be expected to become an expert on the area of monitoring, observability and incident management and evangelise the benefits of proper instrumentation and incident management process throughout the organisation to help teams improve their MTTX metrics.
- You will be building new features into our monitoring, logging, tracing and alerting pipelines to ensure that teams are getting notified for errors which are related to their domains and services
- You will be building monitoring and observability related features in our DevOps platform to simplify the end-to-end monitoring, observability and incident management process for our developers.
EXPERTISE AND QUALIFICATIONS
What you need:
- Expertise in at least one modern programming language, preferably one of Golang, Python or Java. Other languages are fine as well as long as you are ready to learn
- Comfortable with common scripting languages and techniques for automating operations
- Experience in infrastructure monitoring tools like NewRelic, CloudWatch, DataDog, HoneyComb, etc. and other open-source alternatives such as ELK stack, Grafana, Prometheus, InfluxDB, Alert Manager, Loki, etc. We have a combination of both.
- Experience with configuration management (such as Ansible, Chef, Puppet, etc.) and Infrastructure-as-Code (such as Terraform, Cloudformation, Pulumi, etc.)
- Experience working in Cloud (preferably AWS) and container orchestration platforms (preferably Kubernetes)
Good to have:
- Experience with OpenTelemetry.
Excited? You will be, once you visit our Engineering Blog where you can deep dive into all the cool stuff that our engineers have been working on.