- Designing Products for an Asynchronous World
- Building a Process for Remote Product Creation
- Perfect Workflows with JIRA? Not a Chance.
- Then We Built ALP 2.0: Proving That Less is More
- 5 ALP System Elements: Explained
- The Upshot: Success for Remote Development Teams
Engineering teams worldwide grudgingly use what we call the JIRAgile process stack - they use Agile processes to define tasks and JIRA to manage the task lifecycle. It is not like any of their competitors do any better - WaterFall is just as dismal in outcomes as Agile, and all task management tools effectively converge to a JIRA core with slightly better (or worse) UI.
Products still get built, because engineers are awesome, good-looking and underpaid (and they wrote this blog post). OK, mostly they just have coffee together and fill the cracks in the JIRAgile surface.
As a fully remote company, we do not have the luxury of random synchronous coffee breaks. This post is about our pain points with the JIRAgile stack, why and how we started building a product we're internally referring to as "ALP 2.0"—a simpler alternative, short for "assembly line production".
Ready? Here’s some context.
Why Jira and Agile Are Unfit for Remote Development Teams
At the start of a typical project - product, engineering and project managers get together to define goals, and these are translated to tasks under an Agile (or Waterfall) process.
These tasks are then converted to JIRA tickets which house the task description and the initial dependencies that keep your team on track.
Typically, the ticket list burns down on the JIRA dashboard. The problem is that the best laid plans of mice and managers often crumble when faced with a real technical bottleneck.
The JIRA dashboard is frozen, or worse still, it says 90% of the project is complete when the foundational 10% is not. If your home was built by a textbook JIRAgile team, you’d have perfect windows and no plumbing.
We dare you to try it!
In practice, engineers recognize this problem, and without any Hollywood music at all, they either fix the foundation or redo task splits so that the project actually gets finished.
But in a purely remote environment, your team are in different time zones, which means you cannot rely on random chance encounters to unearth problems, and even the most capable engineers may assume that someone else is working on fixing the foundational problem they notice.
In fact, most “chance” encounters aren’t chance at all. Some engineer screams, “Hey, one of you is fixing the compiler flag, right?”, hears deafening silence, and then engineers a chance coffee meeting with a peer from another group they are not supposed to know about.
In a remote environment, this cannot happen because talking to oneself is generally misunderstood as insanity.
Designing Products for an Asynchronous World
Step 1: Realize that systems change in remote environments
The core problem with remote work is also its core strength - it is asynchronous. At his 2022 AWS re:Invent keynote, Dr. Werner Vogels spoke about the asynchrony of real life and how it’s central to building great computer systems. The Amazon CTO was certain that event-driven patterns are better models for the asynchronous nature of most human activity.
“Progress should be made under all circumstances,” he said.
That was the key to collective asynchronous work in remote dev teams. Progress was the benchmark.
At Crossover, we embrace asynchrony. Our mission is to assemble talented teams from all over the globe. We challenge these teams to build world-class enterprise software products for thousands of our customers.
To do this repeatedly and at scale, we have had to evolve both processes and tools to support asynchronous product building. A system that works to enhance output, instead of stifling it.
See our current software engineering roles.
Building a Process for Remote Product Creation
Step 2: See what is missing in the current process and update requirements
As we said before, Agile processes work well in office environments where team members can collaborate in-person, and create seamless feedback loops. But our software development team is 100% asynchronous.
Feedback loops took time, and ticket tasks stalled. For agile software teams used to constant collaboration, switching to work with team members in different time zones can be frustrating.
Even the most dedicated remote developers at Crossover were acutely aware that Agile methodologies kept us limping along, but were barely working for anyone.
Without face-to-face meetings to lean on, every software developer had to find ways to make progress alone. Agile was leaving us dead in the water.
Our processes needed an upgrade.
Here are some of those key process insights we mentioned earlier.
While almost all development processes (think Agile, Waterfall, Scrum…) break processes into tasks, the key to supporting decentralized remote work is making hard choices up front.
That way you limit variability.
Your remote development team needs you to make sure these elements are in place.
1) Fewer Types of Work
- Proactively design a few work units, with well-defined quality bars. Fit projects into these work units, instead of defining custom tasks for every project. Every piece of work should be quality controlled (QCed).
2) Fewer Choke Points
- Invest in training and certifying remote developers to do multiple work units. The result will be that any given work unit can be executed by multiple remote workers.
3) Shorter Fixed Time Allocations
- Decide on the time allotted to each work unit up front. Don’t try to estimate it for each specific project! It’s better to fail fast and get feedback than to have one worker polish the apple. This is particularly true for asynchronous work where remote workers do not get feedback in chance offline meetings.
4) Fewer and More Trustworthy Metrics
- Identify a few key metrics, and invest in making sure that those metrics are accurate, and continuously updated in near real-time. Put the metrics on one shared dashboard that all stakeholders can understand (workers, remote team leads, executives).
- Avoid custom dashboards like they’re plague rats coming for the health of your project. The radical transparency of a shared dashboard is the key to removing guesswork and office politics. Everyone knows how their work is evaluated, and how other team members are doing on that same scale.
5) Fewer Unplanned Worker Priorities
- In most remote teams, workers decide what to work on themselves. This is based on their own perception, or more frequently and inefficiently, by whoever happens to be pushing them to look at something. These informal decisions are then reviewed and second guessed – and are often the source of arguments.
- Instead, just feed all available tasks into a scheduler, which will pick the highest priority task that any given remote worker is qualified to do. Because of this we also limit the number of priority levels and the rules for determining that priority level.
While many of these are unusual, they are also self-reinforcing.
Avoiding choke points and limiting time allocation for example, is only possible because work units are designed upfront and QCed to support the quality of training.
Mandating a shared dashboard guarantees that metrics are judiciously chosen, and that their speed and accuracy is easily achieved by a small remote team.
Perfect Workflows with JIRA? Not a Chance.
Step 3: Understand the current process limitations
We used JIRA as our project management tool of choice for many years. It worked for us because it’s built to work for practically anyone. It supports custom fields, workflows and dashboards. You can customize it to be anything you want it to be.
But when we added our remote development team must-haves, it became clear that JIRA’s flexibility was the opposite of what we needed.
Here are some rules we had to implement to try and make it work.
- Do not create custom workflows because they only provide short-term convenience at the cost of long-term repeatability.
- Do not create information silos with custom dashboards that increase the time required to explain the data. This decreases the amount of total insight the team gets from using it.
- We wanted to assign tasks in a dynamic way to individual workers, but JIRA natively supported static, planned task allocation.
- JIRA didn’t natively enforce timeouts for all tasks assigned.
We made JIRA work by writing automations, particularly for rule 3 and 4 above. Eventually, we realized we had more JIRA automation than actual code for many of our projects. These rules were vague and hard to maintain for everyone.
We needed to sit down and radically rethink our how.
Then We Built ALP 2.0: Proving That Less is More
Step 4: Engineer a solution that meets the outlined requirements
ALP 2.0 (as it's currently known internally) is a less powerful tool that does the job better than the rickety JIRA/Agile system. Instead of being everything to everyone, it’s one thing that specifically works.
It has a simple approach to managing the process of work.
- Step Functions as a system of record and as a state machine, to simplify the mechanics of an assembly line.
- DynamoDB to store basic configurations and tasks.
- S3/ Athena/Quicksight for flexible and customizable assembly line reporting.
Plus, all data and state changes are published to an EventBridge event stream, then routed to other components that subscribe to each event type.
5 ALP System Elements: Explained
Here’s some of the thought-process around why we chose these specific system elements, and why they work.
In our initial JIRA-based system, we used spreadsheets to store task definitions. At the time it was easier for remote managers and flow designers. Then we wrote utilities that translated from the spreadsheet to JIRA API calls.
This was hard to maintain, error prone, and leaky! Soon we ended up with work unit sprawl because folks didn’t understand existing workflows and would unnecessarily create near-duplicates.
What we really needed was a state machine for each assembly line process. One that could maintain state for the time required to complete a work product, either days or weeks. It also needed to support features like asynchronous and parallel task execution.
Step Functions is the native AWS technology for building state machines that integrate with other AWS services. These can be defined using Amazon States Language (ASL) and low-code visual design tools.
Standard Workflows are used (not Express Workflows) because they can run for up to one year, versus 5 minutes. These have richer functionality and support all service integration patterns and Step Functions activities.
Step Functions already store workflow state, simplifying our database requirements. We wanted controlled flexibility in the underlying database.
In other words, we wanted the task management tool to support adding custom fields to work units, but we also wanted to force flow designers to think deeply about the work unit before creating it. This stood in opposition to just hacking up a few custom fields and workflows in JIRA.
DynamoDB can store virtually any data, but forces the designer to carefully consider design patterns. It’s a natural fit for our application.
[See more on DynamoDB data design in our upcoming posts on CloudCharging. Alex Debrie, who literally wrote the book on the topic, has worked with us on some projects.]
Lambda functions are an obvious way to deploy automations and translation logic. They keep with the event-driven, serverless ethos of the architecture, integrate natively with Step Functions, and can be used to build a wide range of automation capabilities.
These support both synchronous request/response invocation and asynchronous task invocation with call back.
S3 + Athena for Analytics:
Reporting is one of our key requirements. DynamoDB doesn’t lend itself to direct reporting, and our reporting is mostly in the aggregate (OLAP type queries) which can be delayed a few minutes.
An S3 + Athena pattern (with Glue doing the necessary transformations) meets our requirements, keeps with the Serverless pattern, and avoids expensive hard-to-maintain data warehouses.
Athena + Quicksight for Reporting:
QuickSight is not the most intuitive platform for data exploration, but if you recall - we wanted the exact opposite for reporting. The goal was to have one dashboard with shared visibility, rather than many siloed dashboards.
So QuickSight met our requirements and integrated easily into the rest of the AWS-based system.
SPICE is used in Quicksight to speed up query performance. SPICE capacity isn’t a current concern as event data is aggregated using Glue before being loaded into SPICE.
The Upshot: Success for Remote Development Teams
By limiting the development process, we simplified the requirements for ALP 2.0 and built a project management tool that works. While any dedicated team can continually fit square pegs into round holes and force JIRA to work, we knew that it was doomed to fail as an effective remote strategy.
We’ll say it again: JIRA and Agile processes don’t work in asynchronous dev environments. As a stack for remote team management, they're rubbish.
Instead, our remote dev teams found that a simple tool built with AWS-native serverless components in an event-driven architecture, was easier to maintain and grow. As a great side effect of simplification, ALP 2.0 also helped us reduce our ever-expanding JIRA costs.
Our remote development teams built ALP 2.0 to align the process and the tool for a truly scalable and efficient system. It will probably have another name when it reaches market, but together, this alignment helps remote engineers and product managers beat in-office teams when it comes to product delivery and quality.
The team kept to our central tenets: Bet on cloud provider native services, and use automation to analyze and reduce costs. Our product CloudFix is built on these tenets, read about it here.