In this post, I will step through creating a AWS Step Functions state machine using Stepwise for a hypothetical pizza-making workflow. Before we get to that though, in case you're wondering why would you care about AWS Step Functions, or application workflow orchestration in general, I discussed our 4-year journey towards an message-based, orchestrated architecture previously.
Let's make some pizzas! Suppose these are the 4 steps to making a tasty Pizza Margherita:
- make dough
- make sauce
- put ingredients on dough
- bake
And suppose you built some robots to perform each of the step. How would you orchestrate your robots to work together? Well, one option is to use an application workflow orchestration engine. We will be using AWS Step Functions via Stepwise.
Why Stepwise?
Stepwise is our open source library to manage Step Functions using Clojure. Why would you use Stepwise instead of an infrastructure as code (IoC) tool, like CDK or Terraform, to manage Step Functions?
If your workers are written in Clojure, then using Stepwise would help keep your state machine definitions, which is essentially business logic, and application code in one place. Thus, making your development and maintenance experience more seamless. And as you will see in this post, Stepwise has a few features that will make your time working with Step Functions easier too.
Creating a state machine
Let's start easy and create a simple Step Functions (SFN) state machine. Launch a REPL from your local Stepwise repository,
paul@demo:~/stepwise$ lein repl
...
stepwise.dev-repl=>
Paste the following code onto your REPL to create your first state machine on SFN. This assume that your environment is already setup with awscli and that your account has permission to manage SFN.
(require '[stepwise.core :as stepwise])
(stepwise/ensure-state-machine
:make-pizza ;; name of your state machine
{:start-at :make-dough
:states
{:make-dough {:type :task
:resource :make-pizza/make-dough
:next :make-sauce}
:make-sauce {:type :task
:resource :make-pizza/make-sauce
:next :put-ingredients-on-dough}
:put-ingredients-on-dough {:type :task
:resource :make-pizza/put-ingredients-on-dough
:next :bake}
:bake {:type :task
:resource :make-pizza/bake
:end true}}})
; output
; "arn:aws:states:us-west-2:111111111111:stateMachine:make-pizza"
We called ensure-state-machine
to create a state machine named :make-pizza
, specifying 4 states. The states are connected serially with the :next
keyword. For information on the important concepts of SFN, AWS provides helpful documentation here.
To see what you created on AWS, navigate to your SFN dashboard on AWS and find your make-pizza
state machine. The diagram below (from my SFN dashboard) shows the corresponding layout of your make-pizza
state machine.
This state machine doesn't do anything yet. We still need to:
- assign workers to each of the step
- start an execution
Assign workers to the steps
This is where the magic of Stepwise comes into play. If you're using an IoC tool to manage Step Functions, then you'd have to either create a Step Functions Activity or a AWS Lambda to run your application logic. But with Stepwise, we simply use the stepwise/start-workers!
function to subscribe your Clojure functions to the SFN state machine.
Note in the code block below that the key names match the resource names defined in the state machine. Stepwise will automatically create the Step Functions Activity and run your workers in a core.async
background process to respond to your Step Functions jobs.
(def workers
(stepwise/start-workers!
{;; key names must match one of your state machine resources
:make-pizza/make-dough
(fn [_] (println "making dough..."))
:make-pizza/make-sauce
(fn [_] (println "making sauce..."))
:make-pizza/put-ingredients-on-dough
(fn [_] (println "putting on ingredients..."))
:make-pizza/bake
(fn [_]
(println "baking...")
:done)}))
; output
; #'stepwise.dev-repl/workers
Run an execution
WARNING: there might be some AWS cost incurring to follow through with this tutorial from this point forward.
We have our pizza-making state machine defined. We have our pizza-making workers ready. Let's run this thing once on AWS.
(stepwise/start-execution!! :make-pizza {:input {}})
; output
;{:arn "arn:aws:states:us-west-2:111111111111:execution:make-pizza:242630d3-1b1c-4a76-b3c0-1bac266c29b2"
; :input {}
; :name "242630d3-1b1c-4a76-b3c0-1bac266c29b2"
; :output "done"
; :start-date #inst "2021-10-25T01:28:07.873-00:00"
; :state-machine-arn "arn:aws:states:us-west-2:111111111111:stateMachine:make-pizza"
; :status "SUCCEEDED"
; :stop-date #inst "2021-10-25T01:28:08.807-00:00"}
For Stepwise API, *-!!
suffix to a function name denotes blocking calls. *-!
denotes non-blocking calls. Similar to core.async
convention. We're using the blocking version of start-execution
here. There is a corresponding non-blocking start-execution!
available.
You might have noticed that we passed in an empty map as input. For sake of simplicity, we're ignoring input and output entirely in this tutorial. Everybody is getting the same Pizza Margherita.
Operational features
As discussed in my previous post, one of the main benefits of using an application workflow orchestration engine instead of rolling your own are the provided operational features. I'll show a couple examples that I find the most useful. Execution history and error handling.
Execution history
Step Functions keep track of every one of your executions. One of the ways to access that logs is to use the AWS Console. On the make-pizza
state machine view, you can see your list of execution history.
Not only do you get a high-level list of executions, you have access to detailed history of each of your execution with input and output information for each step within the state machine, as well as information of each state transitions.
Handling errors
Our robots can be prone to failure. What happens if one of them fail? Let's say the dough-making robot fails some of the time. One of the ways to handle that is to retry the task upon failure. Let's add a retry logic to our state machine definition.
{:make-dough {:type :task
:resource :make-pizza/make-dough
:retry [{:error-equals :States.TaskFailed
:interval-seconds 30
:max-attempts 2}]
:next :make-sauce}}
We've configured :make-pizza/make-dough
worker to retry a second attempt after 30 seconds if a task fail.
State machine control flow
Guess what? Your Pizza Margheritas are in high demand. We want to save some time in your workflow by making the dough and sauce simultaneously. We can do that with a Parallel state provided by SFN.
Putting all of these together, here is our final state machine definition with re-try for make-dough
worker and simultaneous runs for make-dough
and make-sauce
workers.
(stepwise/ensure-state-machine
:make-pizza
{:start-at :make-base
:states
{:make-base
{:type :parallel
:branches [{:start-at :make-dough
:states {:make-dough
{:type :task
:resource :make-pizza/make-dough
:retry [{:error-equals :States.TaskFailed
:interval-seconds 30
:max-attempts 2}]
:end true}}}
{:start-at :make-sauce
:states {:make-sauce
{:type :task
:resource :make-pizza/make-sauce
:end true}}}]
:next :put-ingredients-on-dough}
:put-ingredients-on-dough {:type :task
:resource :make-pizza/put-ingredients-on-dough
:next :bake}
:bake {:type :task
:resource :make-pizza/bake
:end true}}})
This definition is getting a bit involved. The flow diagram representation is easier to understand what's happening.
Worker concurrency
Let's scale this workflow further. You built more robots to make pizzas. Let's increase the number of workers for this state machine to make use of the increased capacity. You can define worker concurrency with Stepwise when you call start-workers!
by passing in a second argument map containing a task-concurrency
key like so.
(def workers
(stepwise/start-workers!
{:make-pizza/make-dough
(fn [_] (println "making dough..."))
:make-pizza/make-sauce
(fn [_] (println "making sauce..."))
:make-pizza/put-ingredients-on-dough
(fn [_] (println "putting on ingredients..."))
:make-pizza/bake
(fn [_]
(println "baking...")
:done)}
{:task-concurrency
{:make-pizza/make-dough 2
:make-pizza/make-sauce 2
:make-pizza/put-ingredients-on-dough 1
:make-pizza/bake 4}}))
Suppose your ensemble of robots can make 2 doughs, 2 sauces, and bake 4 pizzas at a time. We've set the worker concurrency to reflect that capacity. So now, any simultaneous executions happening will try to fill up those workers before queueing starts.
Summary
In this tutorial, we've set up a make-pizza
state machine on AWS Step Functions using Stepwise. We demonstrated the core API of Stepwise: ensure-state-machine
, start-workers!
, and start-execution!!
. We also ran through some features of Step Functions: handling errors and parallel flow. As well as a couple features of Stepwise such as seamless state machine and workers development, and worker concurrency.
If you want to find out more about Step Functions or Stepwise, check out the AWS Step Functions web site or Stepwise repository.