Senior Site Reliability Engineer (Chief Architect)


Want to combine software and systems engineering to run distributed, fault-tolerant systems? Learn more about our Site Reliability Engineer role.

Senior Site Reliability Engineer (Chief Architect)

$ 100k/Year   Flexible   Long-term

Crossover, the world’s largest #remotework company is looking for high performing Senior SREs for long-term, full-time roles. We offer 100% #remotejobs working with the best talent from around the world. You will be working with one of the largest cloud-based SaaS operations in the world with over 100 applications and thousands of users. It is a great opportunity to design and manage large scale cloud-based multi-tenant clusters as well as our SaaS products running on top of it.

As a technical leader in a high performing SaasOps organization, you will be working with the top 1% of global engineering talent, you will be accountable for maintaining best in class performance, reliability and cost of our SaaS products and clusters. The pace is fast and rewarding as you will be solving the hardest problems in our products and processes. Apply now to start your long-term career path with us and shape the #futureofwork.

, we have  full time partners from your country,   Let’s make it !


 
 
 
 
WHY CROSSOVER?
Crossover recruits and builds world class high performing teams to power the fastest growing portfolio of software products in the world. No other company provides the training and the opportunities to test yourself on the depth and diversity of projects that we do. All roles are location independent so you are guaranteed to work with the best in the world. Challenge yourself. Be part of the change.
 
 
 
WHAT YOU'LL BE DOING

You will be responsible for the uptime, performance and operational cost of a large-scale cloud platform or some of our SaaS based products. You will make daily and weekly operational decisions with the goal of improving uptime while reducing cost. You will drive improvements by being familiar with the latest emerging trends in the cloud and SaaS technologies. All your decisions will be focused on providing the best in class service to the users of our SaaS products.

 
REMOTE CAMP PROGRAM

To apply for a role at Crossover, you will go through a series of online tests, usually during the online hiring event. If you pass these tests, you will be offered the opportunity to participate in our four-full-time week Remote Camp training program. This is elite training taught by our top instructors.

Here’s what our graduates have to say about Remote Camp,

"I am very pleased to say that because of Crossover's unorthodox and unique way of transferring the knowledge through (Paired sessions, coaching sessions with CSMs), I have never been more confident in my technical skills and abilities for my role."
-Mikael F

"The CTO Remote Camp was another thing that motivated me. I wanted to see how CTOs across the globe work and learn from them."
-Javed Z

"I've been with the company since Aug (been part of the second Remote Camp) and since then I've learned SQL, databases, servers, tapes, other content management systems etc- and that's only been in 3 months. Usually when I'm in a new company, I learned a lot about their platform, their tools etc during the length of my time with them but never at this speed!"
-Monnaliza T

It is offered as soon as you want to get started. You will be compensated for 40 hrs/ week at the hourly rate for the role you are applying to. Remote Camp training is an excellent opportunity to learn about our culture, expectations, tools, processes, and procedures. It's an intensive and demanding program, but every graduate is guaranteed a job at the end of it.

 
 
KEY RESPONSIBILITIES

Responsibilities

Design and deploy our world-class SaaS products using containerization (Kubernetes/Docker) and cloud services (AWS, VMWare on AWS, etc).

Drive our super-large-scale multi-tenant clusters while guaranteeing great uptime

Apply your software/programming expertise, combined with your in-depth understanding of system and cloud services in order to continuously improve our infrastructure and products

Commit to a 40-hr week, mostly Monday to Friday and meet aggressive weekly goals

 
CANDIDATE REQUIREMENTS

Requirements:

Bachelor's degree or equivalent in Computer Science or related technical field

3+ years of hands-on experience in administering large-scale SaaS applications in one of the major cloud platforms (AWS, GCP, Azure, IBM Cloud)

In-depth knowledge and 2+ years of hands-on experience with Linux

Fluent in writing code with 2+ years of experience coding in a popular coding language

Knowledge of Docker and Kubernetes

Experience in automating cloud system monitoring and operations with scripting

Nice to have:
  • Experience working with large scale cloud systems
  • Experience in a DevOps oriented organization
  • Knowledge of Cloud Management Platforms
 
 
WHAT YOU WILL LEARN
 

As a Sr. Site Reliability Engineer in Crossover, you will get an opportunity to learn and work on cutting-edge monitoring tools, which allow you to operate at a scale of thousands of servers for 70+ different software companies. 

The candidate will be exposed to our metrics-driven culture, which is the foundation of our success in measuring and improving every engineering process and product we deliver.

Working with the top 1% talent will help the Site Reliability Engineering master managerial as well as technical skills. Ultimately Crossover’s dynamic environment will allow the candidate to move fast, set and achieve aggressive goals.

 
 
CAREER PATH
 
SaaSOps Software Engineering Manager
3342
Manager

Responsible for a products uptime, cost optimizations, change request and automation, for a business unit in the group and their products as well as responsibility for central Engineering/SaaSOps teams delivering value to all products across the organization.

Ensure that our multi-tenant infrastructure running more than 100 different products yields four nines and more of availability

Drive best practices by coaching your team of engineers to guarantee quality of service

Manage the uptime error budget of your product

50
Senior Site Reliability Engineer
3343
Technical leader

RESPONSIBLE FOR A PRODUCTS UPTIME, COST OPTIMIZATIONS, CHANGE REQUEST AND AUTOMATION.

Ensure that our multi-tenant infrastructure running more than 100 different products yields four nines and more of availability

Use IaaC to automate and enable scaling of environments and systems

Eliminate complexity from both architecture and processes

Optimize our public cloud computing costs

Manage the uptime error budget of your product

50
Senior Cloud Operations Engineer
3362
Individual Contributor

INDIVIDUAL CONTRIBUTOR IN A TEAM RESPONSIBLE FOR A PRODUCTS UPTIME, COST OPTIMIZATIONS, CHANGE REQUEST AND AUTOMATION.

Ensure that our multi-tenant infrastructure running more than 100 different products yields four nines

Eliminate complexity from both architecture and processes

Optimize our public cloud computing costs

Be proactive and work closely with the engineering teams to enhance our design and improve our platforms offering

Perform capacity planning

30
 
 
 
Work Examples
Assets
blog
This is an interview with one of our SaaSOps Engineering Manager, about deciding to build a Central repository to host all docker containers of all companies and products owned by Trilogy.
https://medium.com/the-crossover-cast/how-esw-capital-crossover-build-infrastructure-to-handle-100s-of-software-products-83eb62642d9
Medium
Relevant files and links
External resources
article
Prep notes for ASQ Certified Reliability Engineer exam
This is an article that provides you a practical tutorial on all the elements that make up the ASQ CRE body of knowledge.
https://accendoreliability.com/creprep/
Accendoreliability
url
SRE ebook
This is a new ebook that combines thought leadership, best practices, and real-world learnings for professionals interested in leveraging the power of SRE.
https://newrelic.com/resource/site-reliability-engineering
NewRelic
 

Questions
and Answers

  • Are there any Cloud certifications required in the role of Site Reliability Engineer?

    At Crossover we are always fostering our team to learn continuously. We understand that some Cloud certifications may help you demonstrate your knowledge and skills, but there is no particular certification required for the role of Site Reliability Engineer.

  • What would be the priorities for a Site Reliability Engineer on a daily basis?

    The top drivers are to ensure our infrastructure availability and our public cloud costs optimizations through eliminating complexity from both architecture and processes.

  • What are the expectations on Site Reliability Engineer during an outage?

    You are expected to be personally involved and drive towards a timely resolution. Also, you have to be able to deep dive on all outages, keep track of the underlying root causes and push towards their resolution, to improve reliability over time.

  • What are the challenges to have a team of people spread across different countries?

    The main challenges are related to bring together people from multiple time-zones and ensure they are in sync with the work to be delivered. We have developed an online productivity tool that helps remote workers manage their time more efficiently and receive a fair working environment.

  • What is the Crossover approach to acquire and integrate a new software company into the portfolio so aggressively?

    Before acquiring a company, we conduct due diligence. Once a buying decision is made, we use our standard model to import the SaaS products into our centralized environment. We then enhance the products to match our standardized model.

 
 
 
WHAT CROSSOVER MEMBERS SAY ABOUT THE ROLE

 
ABOUT THE ROLE

 
FAQs