The Future of Taskcluster

2017-04-21

Last week the Taskcluster team met to talk about the future of the platform. We called this the Taskcluster 1.0 discussion and so the first topic up for debate was:

Just what the heck does 1.0 mean anyway?

With that question in mind, we'll cover herein what we've decided we mean when we say that and how we intend to get there. This will be a quick summary of our current position.

What is 1.0

It boils down to a few categories, each of which are interrelated.

No more breaking changes

Obviously a Taskcluster 2.0 will always be a possibility allowing us to break core functionality, but that sort of transition is hard to pull off and will take a very long time. For all realistic scenarios, we'll be on 1.0 forever. This means that we must be quite happy with the core concepts that we use before we move to this release.

Remove deprecated parts

This is less important to be done other than for general ease of maintenance for us and ease of getting-started for others. Before we get to 1.0, anything that we've said is deprecated should actually be turned off. The major example of this is Task-Graph Scheduler. No Taskcluster internal services use this anymore, but some users haven't been moved off of it yet.

Improved self-serve for scopes

The scopes system we have currently works quite well for the purposes it was designed for. Adding new users will require us to make scopes more flexible. This might involve renaming certain concepts, using longer IDs, and rethinking how we structure scopes. In general, the work will be to enable as much self-service as possible. In addition, we'd like to have the auth service itself do authorization (not just authentication) for clients that have Taskcluster credentials. This will be safer and allow for some improvements in general.

Better introduction docs

This is related to the first three goals. Once the basic concepts are nailed down, the process of adding a project to Taskcluster will be streamlined and we should make it easy to figure out how to do so. In addition, once the initial steps are taken, figuring out how to use some of the more complicated parts of Taskcluster should be straightforward. We'll never be as easy to use as Travis, but that's not our goal. We want to handle more complicated projects for which Travis just isn't enough.

What is going to change

This is not an exhaustive list, but covers some of the most obvious work we have ahead.

Index and Artifacts

There's a lot of ongoing discussion here, but mostly we want to make it easier to connect tasks together. Currently we have a well defined way to "export" objects from a task via the artifacts field. However, if you want to use those same artifacts in a downstream task, you must write some code to download the artifacts at the start of your task. This is not a terrible amount of work, but we'd like to see if we can help make this process easier and less error prone.

In addition, the index in general will probably be updated. We are not sure that the default way of indexing via pulse routes is ideal and may move to some other strategy. In addition, being able to index failed tasks and have some limited historical view of tasks may be desirable as well. We have a lot to think about here.

Auth

Everything we're thinking about changing with scopes will mean changes in the auth service. The most noticeable external changes will involve user credentials. We will be reworking the way they work so that services such as Treeherder can keep up-to-date credentials for Taskcluster for as long as they wish to keep a user logged in for. In addition, we'll be better able to support permissions from Github and other providers.

We'll also likely be reworking how scopes are used in general. We will need to do this to be more flexible with things like priority levels in the queue.

What is still being debated

The path forward can take two general paths. First would be to nail down an integration layer on top of Taskcluster that is nicely defined and created specifically for CI use case. The latter would be to make all of our currently existing services "complete" before we move forward. It seems the current focus will continue to be on tamping down the ground around the core, but at some point in the future we may start to think about the integration layer much more.

It is very important that we continue to think of Taskcluster as a meta-CI, in the lineage of Buildbot. Rather than a full CI solution that comes out of the box ready to go like Travis or Jenkins. Keeping that in mind, we will try to make the common tasks much more easy than they are now while still enabling uncommon tasks to be done with Taskcluster. If you have any thoughts on this, please let us know! Now is the time to discuss.