An Imperative View of Monads
Read this firstWhat is a semicolon?The IO MonadOperator Overloading and the Maybe MonadAnd the RestThat's It
Read this first
This explanation approaches Monads from an imperative programming language's (such as C/C++) perspective. It's an awkward way to look at them, so if you have not already, try reading Functors and Monads For People Who Have Read Too Many "Tutorials" first. It's a much clearer explanation than any other I've seen and uses the more natural viewpoint of function composition. Or try Typeclassopedia, which covers Monads and several related abstractions.
If you're still a bit confused, or just want another way to look at things, read on.
What is a semicolon?
What is the difference between,
int main( int argc, char** argv ) { double temperature; printf( "Enter temperature in Fahrenheit: " ); scanf( "%lf", &temperature ); printf( "Converted to Celsius: %.1lf\n", (temperature-32)*5/9 ); return 0; }
And this one with the ; replaced with + operators?
int main( int argc, char** argv ) { double temperature; printf( "Enter temperature in Fahrenheit: " ) + scanf( "%lf", &temperature ) + printf( "Converted to Celsius: %.1lf\n", (temperature-32)*5/9 ); return 0; }
Nothing? Just a warning that the result of a computation was unused? Or maybe the program seems to operate unpredictably displaying a nonsense result before asking you for input? Or is executed in the right order but the result is wrong?
It all depends on what compiler and options you chose.
The difference is sequence points. The C language definition says that when you have a ; b, a must happen before b. However for a + b, the compiler may evaluate them in any order (and if a variable, like temperature above, is both read from and written to anything can happen).
Now if we changed that code to use the , operator:
int main( int argc, char** argv ) { double temperature; printf( "Enter temperature in Fahrenheit: " ) , scanf( "%lf", &temperature ) , printf( "Converted to Celsius: %.1lf\n", (temperature-32)*5/9 ); return 0; }
Everything will work again for every compiler and every option. The , operator is defined to mean execute the left-hand side, then execute the right-hand side. So while ; has some extra meaning syntax-wise, we can think of it as carrying the same semantics as the , operator.
The IO Monad
Writing the same code in Haskell,
main = do putStrLn "Enter temperature in Fahrenheit: " temperature <- readLn putStrLn $ "Converted to Celsius: " ++ showFFloat (Just 1) ((temperature - 32)*(5/9)) []
Removing all the fancy automatic formatting provided by the do notation,
main = do { putStrLn "Enter temperature in Fahrenheit: "; temperature <- readLn; putStrLn $ "Converted to Celsius: " ++ showFFloat (Just 1) ((temperature - 32)*(5/9)) [] }
And we're left with something that looks one-to-one with the C code.
Taking that a step further and remove the do notation completely,
main = putStrLn "Enter temperature in Fahrenheit: " >>= \_ -> ( readLn >>= \temperature -> ( putStrLn $ "Converted to Celsius: " ++ showFFloat (Just 1) ((temperature - 32)*(5/9)) [] ) )
To get the underlying >>= operator from Monad, which is the ; in do notation, which (for IO) has the same meaning as a ; in C/C++.
If we use unsafePerformIO to escape the IO monad removing the "X then Y" meaning,
main = let puts str = unsafePerformIO $ putStrLn str >> return 0 temperature = unsafePerformIO $ readLn :: Float in return (puts "Enter temperature in Fahrenheit: ") + (puts "Converted to Celsius" ++ showFFloat (Just 1) ((temperature - 32)*(5/9)) [])
Then just like in C, the code behaves erratically as no order is enforced.
The big idea here is in C, the natural state of things is to execute in order. Sequence points are implicit in all sorts of constructs: between statements, between the arguments of the logical operators, before and after a function call, and so on. This is only relaxed in a few situations where it's more useful to omit that requirement (or more likely, historical accident of where it wasn't consistent).
Haskell functions in the opposite manner, defaulting to execution as needed. The IO monad annotates where sequential execution is required.
Operator Overloading and the Maybe Monad
Now that we're thinking of the semicolon as an operator, what can we do with that?
Consider a INI file parsing library. A file consists of one or more sections, each section consists of one or more variables, and a variable can be interpreted as various data types. Each of these can fail if there no section, no variable, or an incompatible value. To extract the user.name variable, we might write some code like,
char* config_get_username( config_t *config ) { config_section_t *section = config_get_section( config, "user" ); if( section == NULL ) return NULL; config_variable_t *variable = config_get_variable( section, "name" ); if( variable == NULL ) return NULL; return config_get_string( variable ); }
Which has a very distinct pattern of "Do X; If X returned NULL, return NULL". If we write that same code in Haskell though that pattern vanishes,
config_get_username config = do section <- config_get_section config "user" variable <- config_get_variable section "name" return config_get_string variable
The Maybe monad overloaded the ; operator,
instance Monad Maybe where Nothing >>= _ = Nothing (Just a) >>= b = b a
I.e., if a is Nothing (NULL), then return Nothing; otherwise, compute b. Every instance of ; turns into if( result == NULL ) return NULL;. Defining a function as config_get_section :: Config -> String -> Maybe ConfigSection identifies it as supporting this automatic boilerplate.
And the Rest
The List Monad
What other boilerplate code can get absorbed into the ;?
send_newsletter( vector<Client> clients, string message ) { for( auto client = clients.cbegin(); client < clients.cend(); client++ ) { for( auto contact = client->poc.cbegin(); contact < client->poc.cend(); contact++ ) { send_mail( contact->address, message ); } } }
With the help of the list Monad becomes,
send_newsletter clients message = do client <- clients contact <- poc client send_mail (address contact) message
All of the looping code ends up in the definition of instance Monad [] which says a; b means "For each value in a, execute b".
The State Monad
Those libraries where every function takes a context object?
signature_t* sign_message( key_t *private_key, uint8_t *message, size_t length ) { signature_ctx *ctx = signature_ctx_new(); set_private_key( ctx, private_key ); set_signature_mode( ctx, SIGNATURE_PSS ); set_signature_digest( ctx, DIGEST_SHA256 ); return sign( ctx, message, length ); }
All hidden inside the State Monad,
sign_message private_key message = fst $ flip runState signature_ctx_new $ do set_private_key private_key set_signature_mode SIGNATURE_PSS set_signature_digest DIGEST_SHA256 sign message
That's It
Where in C we write a ; b or res = a; b(res), Haskell writes it as a >> b or a >>= b. The operators >> and >>= come from the Monad instance of the type of a & b and define how those two statements are joined. Understanding any particular monad is asking when the meaning of >> is, i.e., what extra code is being inserted by the ; of a; b.
The IO Monad means evaluate sequentially, like we're in the imperative world. The Maybe Monad means evaluate with abort on error, like we're in an exception loop or try/catch. And so on.