Explanations

The Build System

A Little History

I'm not sure there's a lesson in it, but here's a little Rubinius history to give (hopefully) context to the new build system.

Rubinius started as a Ruby implementation. There were some cool ideas in it, like building as much in Ruby as possible, not using a global interpreter lock so native threads could run in parallel, using a JIT compiler to make things faster, using a precise, generational garbage collector to also help make things faster.

But it was still top-to-bottom "just" a Ruby implementation.

So we built it as one big monolithic thing. And since Rake was a beloved Ruby tool, we used Rake.

And it worked well. The whole thing. Using Rake, building it as one big monolith. We didn't wax poetic about "majestic monoliths" or anything like that, we just wanted it to be easy to build.

Even back then, though, Rubinius started to be more than just a Ruby implementation. A couple folks created fun languages that target Rubinius (Atomy and Fancy are notable).

build2

In the past ten-ish years, a lot has happened in programming languages and software. For one, there's a major new build system for C/C++: build2

The new Rubinius build system is based on build2 and just about three things:

Boundaries make good systems;
Do the same thing everywhere (i.e. no difference between a dev machine, CI, and a package builder); and
Use tools in the simplest way possible (but no simpler).

What we get out of these three is principled composability, so when some system just must do it its own way, it can remix easily and safely.

The rest of this article explains in detail how the Rubinius build system works.

Scale & Complexity

"One of these is not like the others" is the source of a lot of pain and suffering in many software systems because even small differences can add complexity. Where that difference is managed helps deal with that complexity.

Another challenge in systems is their scale. Since software is often intangible, it can be hard to get a sense of the scale of it, other than looking at all the source files, but even that can't directly show the functional complexity of a piece of software.

Scale:

Large galaxy: Linux;
Small galaxy: LLVM;
Solar system: Rubinius;
Jupiter-sized planet: rbx compiler;
Earth-sized planet: Ruby core library;
Large mountain on earth: A library like oniguruma;
Big hill: A library like libasio;
A large rock: Python PEG parser;
A smaller rock: CLI11 command line library.

Based on this, we can say that anything the size of a large mountain and smaller should be in its own separately-buildable package, and anything larger than a rock should only be a composition of such packages.

Every biological system is a collection of well-bounded components, most of them microscopic. Only humans, in their infinite "wisdom" conjure something like Bazel.

Putting planets, solar systems, and galaxies into one package (or one monolithic build system), is just begging for a lot of unnecessary complexity because the number of "one of these is not like the others" grows and tends to get mixed-in places that make changes harder and harder.

One reason for this is because without unbreakable boundaries (e.g. that code requires a separate git clone) "DRY" up the build script or "parameterize" a subroutine is often irresistible.

Setting Boundaries

Scale is one consideration for when to impose some boundaries. Some other reasons to put a component in its own repository:

It's in a different programming language;
It's not our code;
It's only used on some systems;
It's one of several viable options;
It has particular testing or security concerns;

Minimum Build Requirements

The intended requirements for building a Rubinius component are:

make
clang/clang++
build2

Makefile

PROJ = rbx

# Allow override (e.g., `make VERSION=1.2.3`)
VERSION ?= $(shell (git describe --tags 2>/dev/null || echo "develop") | sed 's/^v//')
REVISION ?= $(shell git rev-parse --short HEAD)

.PHONY: help setup config build install release test clean all

all: setup config build test

##@ Dependencies
setup: ## Clone all components
	@git submodule update --init --recursive
	@./.build2/scripts/setup-build2.sh $(PROJ)

config: ## Configure
	@./.build2/scripts/config-build2.sh $(PROJ)

##@ Development
build: ## Build all components
	@./.build2/scripts/build-build2.sh $(PROJ)

##@ Testing
test: ## Run the tests
	@./.build2/scripts/test-build2.sh $(PROJ)

##@ Maintenance
clean: ## Remove all build artifacts
	@./.build2/scripts/clean-build2.sh $(PROJ)

help: ## Display this help
	@awk 'BEGIN {FS = ":.*##"; printf "Usage:\n  make \033[36m<target>\033[0m\n"} /^[.a-zA-Z_-]+:.*?##/ { printf "  \033[36m%-15s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)

You may notice something a bit different in this Makefile: the help target. This is a neat way to add easy guidance to newcomers and is often helpful even for experienced contributors.

$ make help
Usage:
  make <target>

Dependencies
  setup            Clone all components
  config           Configure

Development
  build            Build all components

Testing
  test             Run the tests

Maintenance
  clean            Remove all build artifacts
  help             Display this help

The Build Steps

The steps to build a component are standardized as much as possible across all components.

setup

The setup target ensures that all source code and input files are present. If some of the files need to be generated, this step of the build process should handle that.

config

The config target prepares the source code for the build process.

build

The build target builds the default artifact. For a library, this would be the library itself. For an application, this would be the executable. For something like Ruby code, it may be the source files itself, or for a system like Rubinius, it could be bytecode files or even an executable.

test

The test target checks the integrity of the default artifact. For a library, this may be a sample application that exercises the library's API. For an executable, this may be automated tests that mimic a user's interaction.