My “Roadmap”: Spinnaker 2019

Back to Foundations

2019 is a year of both personal and professional rebuilding. As with any rebuilding, we need to revisit the foundations. Spinnaker is already a great product but it’s also 5 years old and some parts are starting to weather. Spinnaker can be even bester, but to get there we need to focus on the basics and rebuild the foundation where necessary to sail into the next 5 years. This isn’t an official roadmap, it’s my own — but you’re welcome to help.

A building can have all number of corners, but I like my buildings bland so my foundation has four corners:

  • Code health
  • Error handling
  • Extensibility
  • Scalability

Before we get into it:

  1. I’ll be doing a short talk at RedisConf this year in San Francisco about Spinnaker’s use of Redis. Come say hi if you’re going! I may have swag (stickers, books).
  2. If you’re hosting a conference and want something on Spinnaker, get a hold of me: I’ll maybe come and do a thing, or recruit someone from my team.

Code Health

Like I said, Spinnaker is 5 years old! As far as codebases go, that’s good longevity but now we need to improve on lessons learned, cleanup cruft, and address technical debt. There’s a lot of areas within Spinnaker that do not spark joy. This is natural in a project experiencing rapid iteration and experimentation — now we’ve learned lessons and need to apply those learnings more broadly!

The API, for one, is hard to use. It’s largely undocumented, inconsistent, dynamic — and as a result — aggravating. At Netflix, we have a myriad of teams who have built entire applications and platforms on the Spinnaker API and we understand how painful it can be today. Again, the API exists today as a result of experimentation, so we need to iterate on lessons learned.

I believe we can do better despite a deeply polymorphic domain model, especially now as primitive feature sets are solidifying. I’d like to see more effort put into a V3 API where strong typing, automatic documentation, and tooling are focuses where third-party programmers are the customers (because they are). People shouldn’t need to open their browser’s Network Inspector to reverse-engineer API operations.

Another couple examples, both of which are underway.

  • Spring Boot 2 upgrade. Spinnaker is still on 1.5! This means we’re behind on critical bug fixes and Spring Boot 1.x is EOL this year.
  • Echo Scheduler refactor. Just one case of simplification that needs to happen across our services. This will not only simplify the codebase but also greatly improve the operational story of Echo.

We need to iterate our RPC story as well (gRPC), continue to incrementally remove Groovy, and improve our developer tools. A whole lot of stuff that is ripe for the community to bite off (if you so choose).

Thank you, old code, for all that you’ve done for us.

Error Handling

Stage failed (No reason provided).

We need to improve error handling. When errors happen, I want to equip customers to fix their own problem, not to come to “the Spinnaker team” to figure out why a pipeline failed. Spinnaker should give them context and guide them to resolution if the resolution can’t be automated.

We need a more standardized way of propagating errors across service boundaries and understanding the difference between operator- and user-facing errors. Furthermore, we need a better story for distinguishing between: 1) System errors, 2) integration errors, and 3) user errors within code. System errors are ones where Spinnaker did something bad because of bugs (probably my fault). Integration errors would be third-party services & plugins, for an end-user this is going to typically manifest as a system error, yet operators need this additional dimension.

For system errors and integration errors, we need to beef up resilience to these failures wherever possible and either retry or have fallbacks. For user errors, we should be able to directly identify the problem and offer suggestions for remediation. We also need to include more upfront validation so these runtime errors occur less frequently.

Extensibility

I recently had a meeting with a company in the community that was surprised to know anyone can extend Spinnaker via code without upstreaming the customizations. Netflix Spinnaker is built atop OSS but it has a lot of customizations to integrate with other systems within Netflix, this is not a special sauce only Netflix can do! We need to be more obvious and explicit around the insane amount of extensibility offerings Spinnaker has today, this needs to be documented well and we need to offer guidance on how to add new extension points when necessary.

Not everyone has the time, expertise, or resources to create custom builds of Spinnaker, however. We need a better drop-in system: The preconfigured Docker stage is a step in the right direction: It allows you to write arbitrary code in a container and run it as a native stage. This needs better documentation and we need to open up other areas where it makes sense for similar extension points.

We, as a community, should work to support a marketplace of plugins that people can use to discover, download, install, and configure new features without bloating the core product. I’d love to see proposals come from the community on this. Don’t force me to start a side project. 😬

Scalability

My favorite “-ility”, so I saved it for last: Continued investment in performance, reliability, and availability. In this I mean making individual services faster, the system more resilient to internal and external failure, and support a strong multi-location deployment topology.

We’ve been focusing very heavily on this foundation over the last 6–8 months, and it’ll continue to be my personal primary focus. Month-over-month, we have been getting more reliable and faster, but what can I say, it’s not good enough for me yet.

Help!

Spinnaker needs you! If any of this sounds interesting to you, I’d be happy to talk. Join a SIG or propose one if you think one needs to be created, create an RFC or help solidify one that already exists, knock out issues that you’ve found, or add enhancements you think would be valuable. Pull requests are welcome and the Reviewers and Approvers are here to help.

Interested? Come join the #dev channel on the Spinnaker team Slack!


My “Roadmap”: Spinnaker 2019 was originally published in The Spinnaker Community Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

My “Roadmap”: Spinnaker 2019