SDE II I Site Reliability Engineering (Consumer)
Who we are:
grofers is leading the charge in transforming India’s vast, unorganised grocery landscape through cutting-edge technology and innovation. We believe every Indian deserves the opportunity to continually improve their life – a process that often begins at home. As part of our mission of helping consumers make healthier, better choices when buying everyday products, we make a wide range of high-quality grocery and household products accessible, affordable, and available right at their doorsteps.
Built on a proprietary technology stack, the grofers platform serves as a convergence of consumers looking for everyday essentials, partner stores who serve their needs efficiently, and manufacturers looking for a channel to reach a nation of consumers. While our technology caters to the burgeoning population of urban India, it is ready and poised to serve the next 100+ million Indians who are yet to start shopping online.
We believe the ecosystem we power can transform the lives of a billion Indians significantly over the coming decade. They will have access to everyday essentials like groceries at the best value, be able to discover products that improve their health and wellbeing, and spend more meaningful time with their families – with the assurance that their essential needs are being looked after by us. On the other side of this virtuous cycle are the millions of local businesses catering to a nation’s needs, helping create more opportunities for employment, growth, and above all, a better life.
It's a $600 Billion challenge to solve, which is why we are looking at hiring smart, articulate and ambitious individuals to be a part of the team building the future at grofers. If this seems exciting to you, join us! Read more about us here.
Why you will love working with us:
- Customer love: We always put the interests of customers ahead of our own. We work hard to earn and keep their trust, and to bring them delight
- Bias for action: We dream big, take risks and have a strong bias for action. In difficult situations we make sound decisions and take thoughtful action
- Frugality: We are always looking for ways to do more with less - by creating the highest leverage possible with our time, as well as resources
- Confidence: We are tenacious and optimistic, and do not take no for an answer. Our people are quietly confident and openly humble
- Challenge status-quo: We are candid, authentic and transparent. We speak our mind, make connections that others miss and take smart risks
- Learner’s mindset: We keep learning and evolving to be able to meet our audacious goal of empowering every Indian to lead a better life
About the Site Reliability Engineering (Consumer) team:
You'll be joining Grofers as one of the founding members of the Site Reliability Team, a team that is part of the DevOps Tribe. The Site Reliability Team is responsible for ensuring that all of the resiliency measures that have been implemented work as expected, discovering gaps and working with the teams on fixing them, and developing processes, tools, automation, and libraries that help ensure the reliability of our products. This team will be primarily aligned with the Consumer Tribe which builds and maintains the Grofers Apps & Website that Grofers customers interact with.
Here is a quick peek into some of our work that we have been doing in the DevOps tribe:
And a little about what the Consumer Tribe has been upto:
What you will do:
- Design and implement processes, tools, automation, and libraries that engineering teams (pods) can use to improve the reliability of the services they own. For example, adding a feature in our circuit breaker library or adding a feature to collect additional context in our internal logging library.
- Drive a culture that puts reliability first and establish processes, policies and tools that drive reliability within product engineering teams. This includes things like SLOs, error budgets, on-call response, incident management, observability best practices, creating tools and automation that empowers developers to think reliability first.
- Work with product engineering teams to ensure that reliability tools and best practices are adopted in every service. Just creating new tools and guidelines is not enough. We want to make sure that services use these tools and guidelines.
- Drive a culture of incident postmortems. Deeply investigate production incidents with product engineering teams. Apply learnings from postmortems to fixing the gaps in our code, architecture, processes and learning.
- Participating in design meetings, interviews, code reviews and other organization activities that help us become an elite engineering team.
EXPERTISE AND QUALIFICATIONS
What you need:
- 3+ years of experience working with developing complex, distributed web applications.
- Hands-on experience of operating your service in production, resolving production issues across the stack.
- Experience working with programming languages such as Python, Java, Node.js, Golang, etc. We are polyglot and use almost all of those but we are big users of Python. It is important for us that you have worked as a developer before.
- Disciplined coding practices, experience with code reviews and pull requests and a creative and conceptual problem-solving approach.
- Experience with database reliability. Must have dealt with common database related issues. We primarily use Postgres at Grofers. Postgres experience is not a must but experience with some RDBMS (like MySQL or MariaDB) and at least one common NoSQL datastore (like Redis, Mongo, RabbitMQ) is critical.
- Experience working with microservice architectures in large distributed cloud environments. We are hosted on AWS.
- Strong communication and team collaboration skills, both written and verbal. As a site reliability engineer, you will need to collaborate with multiple product engineering teams to get reliability related changes done.
Good to have:
- Experience with setting up and operating Kubernetes cluster
- Experience with GraphQL, RPC Frameworks (such as gRPC or Thrift). Understanding how services communicate with each other is crucial to find out where a failure can occur.
- Knowledge of networking protocols such as TCP, HTTP/2, WebSockets, DNS, etc.
- Experience with back-end technologies such as Django, Flask, Rails, etc.
- Understanding of compliance frameworks (such as ISO27001, PCI, SOX, etc.) and cloud-native security
Excited? You will be, once you visit our Engineering Blog where you can deep dive into all the cool stuff that our engineers have been working on.