YAML Tutorial

This tutorial introduces YAML (YAML Ain't Markup Language) and demonstrates the yamlrw OCaml library through interactive examples. We'll start with the basics and work up to advanced features like anchors, aliases, and streaming.

What is YAML?

YAML is a human-readable data serialization format. It's commonly used for configuration files, data exchange, and anywhere you need structured data that humans will read and edit.

YAML is designed to be more readable than JSON or XML:

YAML vs JSON

YAML is a superset of JSON - any valid JSON is also valid YAML. However, YAML offers additional features:

JSON:                          YAML:
{                              name: Alice
  "name": "Alice",             age: 30
  "age": 30,                   active: true
  "active": true
}

The YAML version is cleaner for humans to read and write.

Setup

First, let's set up our environment. The library is loaded with:

open Yamlrw;;

Basic Parsing

The simplest way to parse YAML is with Yamlrw.of_string:

let simple = of_string "hello";;

YAML automatically recognizes different data types:

of_string "42";; of_string "3.14";; of_string "true";; of_string "null";;

Note that integers are stored as floats in the JSON-compatible Yamlrw.value type, matching the behavior of JSON parsers.

Boolean Values

YAML recognizes many forms of boolean values:

of_string "yes";; of_string "no";; of_string "on";; of_string "off";;

Strings

Strings can be plain, single-quoted, or double-quoted:

of_string "plain text";; of_string "'single quoted'";; of_string {|"double quoted"|};;

Quoting is useful when your string looks like another type:

of_string "'123'";; of_string "'true'";;

Mappings (Objects)

YAML mappings associate keys with values. In the JSON-compatible representation, these become association lists:

of_string "name: Alice\nage: 30";;

Keys and values are separated by a colon and space. Each key-value pair goes on its own line.

Nested Mappings

Indentation creates nested structures:

let nested = of_string {| database: host: localhost port: 5432 credentials: user: admin pass: secret |};;

Accessing Values

Use the Yamlrw.Util module to navigate and extract values:

let db = Util.get "database" nested;; Util.get_string (Util.get "host" db);; Util.get_int (Util.get "port" db);;

For nested access, use Yamlrw.Util.get_path:

Util.get_path ["database"; "credentials"; "user"] nested;; Util.get_path_exn ["database"; "port"] nested;;

Sequences (Arrays)

YAML sequences are written as bulleted lists:

of_string {| - apple - banana - cherry |};;

Or using flow style (like JSON arrays):

of_string "[1, 2, 3]";;

Sequences of Mappings

A common pattern is a list of objects:

let users = of_string {| - name: Alice role: admin - name: Bob role: user |};;

Accessing Sequence Elements

Util.nth 0 users;; match Util.nth 0 users with | Some user -> Util.get_string (Util.get "name" user) | None -> "not found";;

Serialization

Convert OCaml values back to YAML strings with Yamlrw.to_string:

let data = `O [ ("name", `String "Bob"); ("active", `Bool true); ("score", `Float 95.5) ];; print_string (to_string data);;

Constructing Values

Use Yamlrw.Util constructors for cleaner code:

let config = Util.obj [ "server", Util.obj [ "host", Util.string "0.0.0.0"; "port", Util.int 8080 ]; "debug", Util.bool true; "tags", Util.strings ["api"; "v2"] ];; print_string (to_string config);;

Controlling Output Style

You can control the output format with style options:

print_string (to_string ~layout_style:`Flow config);;

Scalar styles control how strings are written:

print_string (to_string ~scalar_style:`Double_quoted (Util.string "hello"));; print_string (to_string ~scalar_style:`Single_quoted (Util.string "hello"));;

Full YAML Representation

The Yamlrw.value type is convenient but loses some YAML-specific information. For full fidelity, use the Yamlrw.yaml type:

let full = yaml_of_string ~resolve_aliases:false "hello";;

The Yamlrw.yaml type preserves:

Scalars with Metadata

let s = yaml_of_string ~resolve_aliases:false "'quoted string'";; match s with | `Scalar sc -> Scalar.value sc, Scalar.style sc | _ -> "", `Any;;

Anchors and Aliases

YAML supports node reuse through anchors (&name) and aliases (*name). This is powerful for avoiding repetition:

defaults: &defaults
  timeout: 30
  retries: 3

production:
  <<: *defaults
  host: prod.example.com

staging:
  <<: *defaults
  host: stage.example.com

Parsing with Aliases

By default, Yamlrw.of_string resolves aliases:

let yaml_with_alias = {| base: &base x: 1 y: 2 derived: <<: *base z: 3 |};; of_string yaml_with_alias;;

Preserving Aliases

To preserve the alias structure, use Yamlrw.yaml_of_string with ~resolve_aliases:false:

let y = yaml_of_string ~resolve_aliases:false {| item: &ref name: shared copy: *ref |};;

Multi-line Strings

YAML has special syntax for multi-line strings:

Literal Block Scalar

The | indicator preserves newlines exactly:

of_string {| description: | This is a multi-line string. |};;

Folded Block Scalar

The > indicator folds newlines into spaces:

of_string {| description: > This is a single line when folded. |};;

Multiple Documents

A YAML stream can contain multiple documents separated by ---:

let docs = documents_of_string {| --- name: first --- name: second ... |};; List.length docs;;

The --- marker starts a document, and ... optionally ends it.

Working with Documents

Each document has metadata and a root value:

List.map (fun d -> Document.root d) docs;;

Serializing Multiple Documents

let doc1 = Document.make (Some (of_json (Util.obj ["x", Util.int 1])));; let doc2 = Document.make (Some (of_json (Util.obj ["x", Util.int 2])));; print_string (documents_to_string [doc1; doc2]);;

Streaming API

For large files or fine-grained control, use the streaming API:

let parser = Stream.parser "key: value";;

Iterate over events:

Stream.iter (fun event _ _ -> Format.printf "%a@." Event.pp event ) parser;;

Building YAML with Events

You can also emit YAML by sending events:

let emitter = Stream.emitter ();; Stream.stream_start emitter `Utf8;; Stream.document_start emitter ();; Stream.mapping_start emitter ();; Stream.scalar emitter "greeting";; Stream.scalar emitter "Hello, World!";; Stream.mapping_end emitter;; Stream.document_end emitter ();; Stream.stream_end emitter;; print_string (Stream.contents emitter);;

Error Handling

Parse errors raise Yamlrw.Yamlrw_error:

try ignore (of_string "key: [unclosed"); "ok" with Yamlrw_error e -> Error.to_string e;;

Type Errors

The Yamlrw.Util module raises Yamlrw.Util.Type_error for type mismatches:

try ignore (Util.get_string (`Float 42.)); "ok" with Util.Type_error (expected, actual) -> Printf.sprintf "expected %s, got %s" expected (Value.type_name actual);;

Common Patterns

Configuration Files

A typical configuration file pattern:

let config_yaml = {| app: name: myapp version: 1.0.0 server: host: 0.0.0.0 port: 8080 ssl: true database: url: postgres://localhost/mydb pool_size: 10 |};; let config = of_string config_yaml;; let server = Util.get "server" config;; let host = Util.to_string ~default:"localhost" (Util.get "host" server);; let port = Util.to_int ~default:80 (Util.get "port" server);;

Working with Lists

Processing lists of items:

let items_yaml = {| items: - id: 1 name: Widget price: 9.99 - id: 2 name: Gadget price: 19.99 - id: 3 name: Gizmo price: 29.99 |};; let items = Util.get_list (Util.get "items" (of_string items_yaml));; let names = List.map (fun item -> Util.get_string (Util.get "name" item) ) items;; let total = List.fold_left (fun acc item -> acc +. Util.get_float (Util.get "price" item) ) 0. items;;

Transforming Data

Modifying YAML structures:

let original = of_string "name: Alice\nstatus: active";; let updated = Util.update "status" (Util.string "inactive") original;; let with_timestamp = Util.update "updated_at" (Util.string "2024-01-01") updated;; print_string (to_string with_timestamp);;

Summary

The yamlrw library provides:

  1. Simple parsing: Yamlrw.of_string for JSON-compatible values
  2. Full fidelity: Yamlrw.yaml_of_string preserves all YAML metadata
  3. Easy serialization: Yamlrw.to_string with style options
  4. Navigation: Yamlrw.Util module for accessing and modifying values
  5. Multi-document: Yamlrw.documents_of_string for YAML streams
  6. Streaming: Yamlrw.Stream module for event-based processing

Key types:

For more details, see the API reference.