Senior Site Reliability Engineer$ 100k/Year ($ 50/Hour for 40 hours of productive work per week) Remote Position Long-term
Interested in utilizing your years of software and system engineering expertise to integrate, build & scale cloud-based SaaS systems for a new acquisition per week?
Want to learn how to ‘lift & shift’ acquired company infrastructures into a modern Kubernetes/Docker/VMware cluster? Are you inspired to drive great uptime for enterprise software products across more than 100 products?
If so - this role is for you...
-Madalina S, Software Engineering Manager
As a Senior Site Reliability Engineer, you will use your software and system engineering expertise to build, scale & improve our cloud-based SaaS systems and products.
You will be working with the world’s top 1% talent on cutting edge cloud platforms and technologies. The role requires balancing availability, customer experience and the need to enhance the systems continually.
There’s a breadth of opportunities for SREs in our organization. Starting with the due-diligence & import teams that handle our constant stream of acquisitions or going through our infrastructure teams that manage and continuously improve our Kubernetes, Docker & VMware clusters or going all the way to our SaaS operations which will ensure great up-time and customer experience from our myriad of more than 100 products.
A hiring event is a scheduled online event where all our relevant testing relating to a role is conducted on the same day. Submissions received during the event are graded the following week, and successful candidates notified if they have progressed to the next round which is an online interview with a Hiring Manager.
Ensure that our multi-tenant infrastructure running more than 100 different products yields four nines and more of availability
Use IaaC to automate and enable scaling of environments and systems
Eliminate complexity from both architecture and processes
Optimize our public cloud computing costs
Manage the uptime error budget of your product
Be proactive and work closely with the engineering teams to enhance our design and improve our platforms offering
Perform capacity planning and pre-launch reviews
Employ modern instrumentation to enable production applications and infrastructure observability and then act upon the results
Practice sustainable incident response and blameless postmortems
As a Sr. Site Reliability Engineer in Crossover, you will get an opportunity to learn and work on cutting-edge monitoring tools, which allow you to operate at a scale of thousands of servers for 70+ different software companies. The candidate will be exposed to our metrics-driven culture, which is the foundation of our success in measuring and improving every engineering process and product we deliver.
Working with the top 1% talent will help the Site Reliability Engineering master managerial as well as technical skills. Ultimately Crossover’s dynamic environment will allow the candidate to move fast, set and achieve aggressive goals.
Work ExamplesEvery job creates excellent work. We want to show you the types of things you will learn, using real work examples of the processes, training examples, playbooks, projects you will build on the job.
This is an interview with one of our SaaSOps Engineering Manager, about deciding to build a Central repository to host all docker containers of all companies and products owned by Trilogy.
Relevant Files and LinksHere you can find external resources relevant to this role.
Prep notes for ASQ Certified Reliability Engineer exam
This is an article that provides you a practical tutorial on all the elements that make up the ASQ CRE body of knowledge.
This is a new ebook that combines thought leadership, best practices, and real-world learnings for professionals interested in leveraging the power of SRE.
Are there any Cloud certifications required in the role of Site Reliability Engineer?
At Crossover we are always fostering our team to learn continuously. We understand that some Cloud certifications may help you demonstrate your knowledge and skills, but there is no particular certification required for the role of Site Reliability Engineer.
What would be the priorities for a Site Reliability Engineer on a daily basis?
The top drivers are to ensure our infrastructure availability and our public cloud costs optimizations through eliminating complexity from both architecture and processes.
What are the expectations on Site Reliability Engineer during an outage?
You are expected to be personally involved and drive towards a timely resolution. Also, you have to be able to deep dive on all outages, keep track of the underlying root causes and push towards their resolution, to improve reliability over time.
What are the challenges to have a team of people spread across different countries?
The main challenges are related to bring together people from multiple time-zones and ensure they are in sync with the work to be delivered. We have developed an online productivity tool that helps remote workers manage their time more efficiently and receive a fair working environment.
What is the Crossover approach to acquire and integrate a new software company into the portfolio so aggressively?
Before acquiring a company, we conduct due diligence. Once a buying decision is made, we use our standard model to import the SaaS products into our centralized environment. We then enhance the products to match our standardized model.