Orchestrating Pizza-Making: A Tutorial for AWS Step Functions with Stepwise

In this post, I will step through creating a AWS Step Functions state machine using Stepwise for a hypothetical pizza-making workflow. Before we get to that though, in case you're wondering why would you care about AWS Step Functions, or application workflow orchestration in general, I discussed our 4-year journey towards an message-based, orchestrated architecture previously.

Let's make some pizzas! Suppose these are the 4 steps to making a tasty Pizza Margherita:

  • make dough
  • make sauce
  • put ingredients on dough
  • bake

And suppose you built some robots to perform each of the step. How would you orchestrate your robots to work together? Well, one option is to use an application workflow orchestration engine. We will be using AWS Step Functions via Stepwise.

Why Stepwise?

Stepwise is our open source library to manage Step Functions using Clojure. Why would you use Stepwise instead of an infrastructure as code (IoC) tool, like CDK or Terraform, to manage Step Functions?

If your workers are written in Clojure, then using Stepwise would help keep your state machine definitions, which is essentially business logic, and application code in one place. Thus, making your development and maintenance experience more seamless. And as you will see in this post, Stepwise has a few features that will make your time working with Step Functions easier too.

Creating a state machine

Let's start easy and create a simple Step Functions (SFN) state machine. Launch a REPL from your local Stepwise repository,

[email protected]:~/stepwise$ lein repl

...

stepwise.dev-repl=>

Paste the following code onto your REPL to create your first state machine on SFN. This assume that your environment is already setup with awscli and that your account has permission to manage SFN.

(require '[stepwise.core :as stepwise])

(stepwise/ensure-state-machine

  :make-pizza ;; name of your state machine

  {:start-at :make-dough

   :states
   {:make-dough {:type     :task
                 :resource :make-pizza/make-dough
                 :next     :make-sauce}

    :make-sauce {:type     :task
                 :resource :make-pizza/make-sauce
                 :next     :put-ingredients-on-dough}

    :put-ingredients-on-dough {:type     :task
                               :resource :make-pizza/put-ingredients-on-dough
                               :next     :bake}

    :bake {:type     :task
           :resource :make-pizza/bake
           :end      true}}})

; output
; "arn:aws:states:us-west-2:111111111111:stateMachine:make-pizza"

We called ensure-state-machine to create a state machine named :make-pizza, specifying 4 states. The states are connected serially with the :next keyword. For information on the important concepts of SFN, AWS provides helpful documentation here.

make pizza step functions, first version

To see what you created on AWS, navigate to your SFN dashboard on AWS and find your make-pizza state machine. The diagram below (from my SFN dashboard) shows the corresponding layout of your make-pizza state machine.

This state machine doesn't do anything yet. We still need to:

  • assign workers to each of the step
  • start an execution

Assign workers to the steps

This is where the magic of Stepwise comes into play. If you're using an IoC tool to manage Step Functions, then you'd have to either create a Step Functions Activity or a AWS Lambda to run your application logic. But with Stepwise, we simply use the stepwise/start-workers! function to subscribe your Clojure functions to the SFN state machine.

Note in the code block below that the key names match the resource names defined in the state machine. Stepwise will automatically create the Step Functions Activity and run your workers in a core.async background process to respond to your Step Functions jobs.

(def workers
  (stepwise/start-workers!
    {;; key names must match one of your state machine resources
     :make-pizza/make-dough
     (fn [_] (println "making dough..."))

     :make-pizza/make-sauce
     (fn [_] (println "making sauce..."))

     :make-pizza/put-ingredients-on-dough
     (fn [_] (println "putting on ingredients..."))

     :make-pizza/bake
     (fn [_]
       (println "baking...")
       :done)}))

; output
; #'stepwise.dev-repl/workers

Run an execution

WARNING: there might be some AWS cost incurring to follow through with this tutorial from this point forward.

We have our pizza-making state machine defined. We have our pizza-making workers ready. Let's run this thing once on AWS.

(stepwise/start-execution!! :make-pizza {:input {}})

; output
;{:arn "arn:aws:states:us-west-2:111111111111:execution:make-pizza:242630d3-1b1c-4a76-b3c0-1bac266c29b2"
; :input {}
; :name "242630d3-1b1c-4a76-b3c0-1bac266c29b2"
; :output "done"
; :start-date #inst "2021-10-25T01:28:07.873-00:00"
; :state-machine-arn "arn:aws:states:us-west-2:111111111111:stateMachine:make-pizza"
; :status "SUCCEEDED"
; :stop-date #inst "2021-10-25T01:28:08.807-00:00"}

For Stepwise API, *-!! suffix to a function name denotes blocking calls. *-! denotes non-blocking calls. Similar to core.async convention. We're using the blocking version of start-execution here. There is a corresponding non-blocking start-execution! available.

You might have noticed that we passed in an empty map as input. For sake of simplicity, we're ignoring input and output entirely in this tutorial. Everybody is getting the same Pizza Margherita.

Operational features

As discussed in my previous post, one of the main benefits of using an application workflow orchestration engine instead of rolling your own are the provided operational features. I'll show a couple examples that I find the most useful. Execution history and error handling.

Execution history

Step Functions keep track of every one of your executions. One of the ways to access that logs is to use the AWS Console. On the make-pizza state machine view, you can see your list of execution history.

AWS Console execution history

Not only do you get a high-level list of executions, you have access to detailed history of each of your execution with input and output information for each step within the state machine, as well as information of each state transitions.

Handling errors

Our robots can be prone to failure. What happens if one of them fail? Let's say the dough-making robot fails some of the time. One of the ways to handle that is to retry the task upon failure. Let's add a retry logic to our state machine definition.

{:make-dough {:type     :task
              :resource :make-pizza/make-dough
              :retry [{:error-equals     :States.TaskFailed
                       :interval-seconds 30
                       :max-attempts     2}]
              :next     :make-sauce}}

We've configured :make-pizza/make-dough worker to retry a second attempt after 30 seconds if a task fail.

State machine control flow

Guess what? Your Pizza Margheritas are in high demand. We want to save some time in your workflow by making the dough and sauce simultaneously. We can do that with a Parallel state provided by SFN.

Putting all of these together, here is our final state machine definition with re-try for make-dough worker and simultaneous runs for make-dough and make-sauce workers.

(stepwise/ensure-state-machine
  :make-pizza
  {:start-at :make-base

   :states
   {:make-base
    {:type     :parallel
     :branches [{:start-at :make-dough
                 :states   {:make-dough
                            {:type     :task
                             :resource :make-pizza/make-dough
                             :retry    [{:error-equals     :States.TaskFailed
                                         :interval-seconds 30
                                         :max-attempts     2}]
                             :end      true}}}

                {:start-at :make-sauce
                 :states   {:make-sauce
                            {:type     :task
                             :resource :make-pizza/make-sauce
                             :end      true}}}]
     :next :put-ingredients-on-dough}

    :put-ingredients-on-dough {:type     :task
                               :resource :make-pizza/put-ingredients-on-dough
                               :next     :bake}

    :bake {:type     :task
           :resource :make-pizza/bake
           :end      true}}})

make pizza step functions, second version with control flow

This definition is getting a bit involved. The flow diagram representation is easier to understand what's happening.

Worker concurrency

Let's scale this workflow further. You built more robots to make pizzas. Let's increase the number of workers for this state machine to make use of the increased capacity. You can define worker concurrency with Stepwise when you call start-workers! by passing in a second argument map containing a task-concurrency key like so.

(def workers
  (stepwise/start-workers!
    {:make-pizza/make-dough
     (fn [_] (println "making dough..."))

     :make-pizza/make-sauce
     (fn [_] (println "making sauce..."))

     :make-pizza/put-ingredients-on-dough
     (fn [_] (println "putting on ingredients..."))

     :make-pizza/bake
     (fn [_]
       (println "baking...")
       :done)}

    {:task-concurrency
     {:make-pizza/make-dough               2
      :make-pizza/make-sauce               2
      :make-pizza/put-ingredients-on-dough 1
      :make-pizza/bake                     4}}))

Suppose your ensemble of robots can make 2 doughs, 2 sauces, and bake 4 pizzas at a time. We've set the worker concurrency to reflect that capacity. So now, any simultaneous executions happening will try to fill up those workers before queueing starts.

Summary

In this tutorial, we've set up a make-pizza state machine on AWS Step Functions using Stepwise. We demonstrated the core API of Stepwise: ensure-state-machine, start-workers!, and start-execution!!. We also ran through some features of Step Functions: handling errors and parallel flow. As well as a couple features of Stepwise such as seamless state machine and workers development, and worker concurrency.

If you want to find out more about Step Functions or Stepwise, check out the AWS Step Functions web site or Stepwise repository.