When an architect designs a building, or an engineer designs a bridge, they don’t scribble rough sketches on paper and declare “We’re agile, we’ll figure out the details later!”. They create precise designs, and are able to prove important safety properties about their work before a single shovel hits the dirt.
But when designing highly complex, concurrent, fault-tolerant systems, developers often don’t create more than rough pseudo-code and a few Visio diagrams, before starting work and hoping that any design bugs will be discovered before going to production.
PlusCal/TLA+ is a pseudo-code language that allows for precise descriptions of digital systems, and provides tooling to run automated and exhaustive tests on the pseudo-code itself. Developers can discover design bugs before a single line of “real” code has been written.
Developers at AWS, Microsoft/Azure, etc. have used PlusCal to help design systems and find serious, highly-subtle bugs in products including S3, EC2, EBS, DynamoDB, the Xbox 360 memory system, and many others.
Its use at Amazon has been so successful that management now explicitly allocates engineering time to TLA+. To quote Chris Newcombe, former AWS Principal Engineer:
> TLA+ is the most valuable thing that I’ve learned in my professional career. It has changed how I work by giving me an immensely powerful tool to find subtle flaws in system designs. It has changed how I think…
This talk will provide a short introduction to specifying software designs and PlusCal.
Jay Parlar has been at Rackspace since early 2014, primarily working on internal support tooling and CI/CD pipelines. He has a Ph.D. in Software Engineering from McMaster University, and is currently working on a variety of customer-facing tools. Jay is one of three employees of Rackspace Canada.
Despite working mostly in JavaScript while at Rackspace, he considers himself to be a “Python guy”, having made Python his language of choice all the way back in 2001.