Hoon Basics: Advanced Data Structures

Overview

Teaching: 30 min
Exercises: 15 min
Questions
  • How are structured data typically represented, populated, and accessed in Hoon?

  • Which common patterns in Hoon exist?

Objectives
  • Identify common Hoon molds: units, sets, maps, jars, and jugs.

  • Explore the standard library’s functionality: text parsing & processing, functional hacks, randomness, hashing, time, and so forth.

Data Structures

Urbit employs a series of data structures (molds) which are either unique to the Urbit operating environment (such as a unit) or analogous to convenient forms in other languages (like a set or a map).

Units

Every atom in Hoon is an unsigned integer, even if interpreted by an aura. Auras do not carry prescriptive power, however, so they cannot be used to fundamentally distinguish a NULL-style or NaN-style non-result. That is, if the result of a query is a ~, how does one distinguish a value of zero from a non-result (missing value)?

Units mitigate the situation by acting as a type union of ~ (for no result) and a cell [~ u=item] containing the returned item (with face u).

++  unit
  |$  [item]
  $@(~ [~ u=item])

Many lookup operations return a unit precisely so you can differentiate a failed search from an answer of zero.

Sets

Many compound data structures in Hoon are trees, maps, or sets. Trees are the least important for these for our purposes today, so we will focus on +$map and +$set.

In computer science, a set is like a mathematical set with single-membership of values. A set is not ordered so the results may not be ordered or stored the same as the input.

> (sy ~[5 4 3 2 1])
[n=5 l={} r={1 2 3 4}]
> (sy ~[8 8 8 7 7 6 5 5 4 3 2 1])
[n=6 l={8 5 7} r={1 2 3 4}]

For the most part, we don’t worry too much about the particular structure chosen for internal representation of the set as a tree because we will use the ++in door for interacting with elements.

A door is a kind of core more general than a gate. In particular, rather than a default arm $ the caller must specify which arm of the door to pull. This produces a gate which must then be slammed on the values. The ++in core instruments +$set instances, which makes for a somewhat intuitive English-language reading of operations.

To produce a set, you can either construct it manually from a list using ++sy, or you can insert items using the ++put arm of by.

For each of the following, we assume the following set has been defined in Dojo:

=my-set (sy ~['a' 'B' "sea" 4 .5])

There are also a number of set-theoretical operations such as difference (dif), intersection (int), union (uni), etc.

Maps

A +$map is a set of key–value pairs: (tree (pair key value)). (Other languages know this as a dictionary or associative array.) The ++by core provides map services.

To produce a map, you can either construct it manually from cells using ++my, or you can insert items using the ++put or ++gas arms of by. The latter is more robust and we use it here.

Map keys are commonly @tas words.

For each of the following, we assume the following set has been defined in Dojo:

=greek (~(gas by *(map @tas @t)) ~[[%alpha 'α'] [%beta 'β'] [%gamma 'γ'] [%delta 'δ'] [%epsilon 'ε'] [%zeta 'ζ'] [%eta 'η'] [%theta 'θ'] [%iota 'ι'] [%kappa 'κ'] [%lambda 'λ'] [%mu 'μ'] [%nu 'ν'] [%xi 'ξ'] [%omicron 'ο'] [%pi 'π'] [%rho 'ρ'] [%sigma 'σ'] [%tau 'τ'] [%upsilon 'υ'] [%phi 'φ'] [%chi 'χ'] [%psi 'ψ'] [%omega 'ω']])

Many aspects of Hoon, in particular the parser, have their own characteristic data structures, but this should serve to get you started reading and retrieving data from structured sources.

Jars and Jugs (Optional)

A couple of compound structures are also used frequently that have their own terminology:

  • jar is a map of lists.

      ++  jar  |$  [key value]  (map key (list value))
    
  • jug is a map of sets.

      ++  jug  |$  [key value]  (map key (set value))
    

Standard Library

The Hoon standard library is split between sys/hoon.hoon and sys/zuse.hoon. Since Hoon follows the Lisp dictum that all code is data, the standard library functions can be quite helpful in operating on all sorts of inputs. We will focus on a few particular aspects: text parsing & processing, randomness, hashing, and time.

Text Parsing and Processing

Most of the text parsing library is built to process Hoon itself, and a variety of rule parsers and analyzers have been built up to this end. However, we are interested in the much simpler case of loading a structured text string and converting it to some sort of internal Hoon representation.

Urbit Types from @t

Frequently, one receives plaintext data from some source (like an HTTP PUT request) and needs to convert it to a Hoon-compatible atom format. Alternatively, one needs to convert from a raw Hoon atom into a text representation. ++slaw and ++scot provide this functionality:

> (slaw %p '~dopzod')
[~ 4.608]
> (scot %p 1.000)
~.~wanlyn

JSON Structured Data

JSON is a frequently-employed data representation standard that organizes data into series of lists and key-value pairs (or maps).

{
  "name": "John",
  "age": 30,
  "car": null
}

Hoon has built-in support for parsing and exporting JSON-styled data. THIS IS WHERE THINGS GET WEIRD. Frequently when working in Hoon, it becomes necessary for you the developer to think in terms of structures rather than representations. JSON is frequently thought of as a generic text representation of data, and while it is acknowledged that there are some different but compatible representations possible (such as by manipulating whitespace), in general JSON is the text. Not so for Hoon. It’s a very Lisp-y move to consider the data as more fundamental than the instantiation as one particular type of text, and thus there being multiple valid representations of the same data, some quite different from each other.

The foregoing example becomes in Hoon:

[~
  [%o p={
      [p='car'
       q=~
      ]
      [p='name'
       q=[%s p='John']
      ]
      [p='age'
       q=[%n p=~.30]
      ]
    }
  ]
]

Note the type tags on the information components, and that the map is unordered.

Hoon parses JSON in two passes: first, the JSON is converted in the raw to a tagged data structure using ++de-json:html. After this, schema-specific parsers are used to deconstruct the tagged data into usable components with ++dejs:format. This gets a little complicated, and we’ll revisit it again tomorrow when we need it.

The first part is relatively straightforward. Given a cord (@t) of JSON data, parse it to a tagged json data structure:

> =a '{"name":"John", "age":30, "car":null}'
> a
'{"name":"John", "age":30, "car":null}'
> (de-json:html a)
[~ [%o p={[p='car' q=~] [p='name' q=[%s p='John']] [p='age' q=[%n p=~.30]]}]]

JSON is a structured data type, however, so one cannot generically process it into particular values. One must know and account for the expected structure.

We have to build a parser to extract expected values (and ignore others).

> =parser %-  ot:dejs-soft:format
  :~  [%name so:dejs-soft:format]
      [%age no:dejs-soft:format]
      [%car (mu so):dejs-soft:format]  :: allows for unit result incl. null
  ==
> (parser u:+:(de-json:html a))

The dejs-soft library fails gracefully (as opposed to dejs). It can parse, among others, the following types. (Wrap with ++mu if the quantity is optional.)

JSON parsing code often ends up rather involved.

Randomness

We encountered random numbers previously when dealing with the six-sided dice problem.

The entropy eny is generated from /dev/random, itself seeded by factors such as keyboard typing latency.

For instance, to grab a particular entry at random from a list, you can convert eny to a valid key and then retrieve:

(snag (~(rad og eny) 5) (gulf 1 5))
(snag (~(rad og eny) 5) `(list @p)`~[~wanzod ~marzod ~binzod ~litzod ~samzod])

The og core provides a number of methods:

The main thing to keep in mind when using the RNG is that you must update the internal state. Since Hoon is a functional programming language without side effects, you have to pin the modified state as the old state’s name to continue to use it.

:-  %say
|=  [[* eny=@uv *] [n=@ud ~] ~]
  :-  %noun
  =/  values  `(list @ud)`~
  =/  count  0
  =/  rng  ~(. og eny)
  |-  ^-  (list @ud)
    ?:  =(count n)  values
    =^  r  rng  (rads:rng 100)
  $(count +(count), values (weld values ~[(add r 1)]))

Time

Hoon supports a native date-time aura, @da (absolute, relative to “the beginning of time”, ~292277024401-.1.1 = January 1, 292,277,024,401 B.C.), and @dr (relative). This is called the atomic date in the docs.

There is also a parsed time format tarp

> *tarp
[d=0 h=0 m=0 s=0 f=~]

and a parsed date format date

> *date
[[a=%.y y=0] m=0 t=[d=0 h=0 m=0 s=0 f=~]]

which are convenient for representations and interconversions.

Numerical Conversions

Since every value is at heart an unsigned decimal integer, there is a well-formed binary representation for each value. This determines the way that the system treats text characters and floating-point values, for instance, and if not careful one can quickly run into absurdity:

> (add 1 .-1)
3.212.836.865

If we specify the aura, we get a glimpse into what is happening:

> `@rs`(add 1 .-1)
.-1.0000001

That is, there is a binary representation of .-1 to which the value 1 has been added:

> `@ub`.-1
0b1011.1111.1000.0000.0000.0000.0000.0000
> `@ub`(add .-1 1)
0b1011.1111.1000.0000.0000.0000.0000.0001

Things can get much worse, too!

> (add .1 .1)
2.130.706.432
> `@rs`(add .1 .1)
.1.7014118e38

Auras have a characteristic core that enables one to consistently work with their values. For instance, for floating-point mathematics, one should use the rs core:

> (add:rs .1 .1)
.2

To convert between numbers, don’t do a simple aura cast, which won’t have the desired effect most of the time:

> `@rs`1.000
.1.401e-42
> `@rs`0
.0

Instead, employ the correct conversion routine:

> (sun:rs 5)
.5

Wrapping Up the First Day

At this point, you’ve been drinking from the firehose and it may have hurt a little.

via Gfycat

Our objective for today was to start you thinking in terms of the data structures of idiomatic Hoon. I want you to absorb as much of this as you can tonight and then tomorrow we will venture into the operating function, userspace, and software deployment, all of which you should be equipped to grapple with now.

If you are interested in exploring some single-purpose generators, I suggest these:

Key Points

  • Doors generalize the notion of a gate to a gate-factory.

  • Units, sets, maps, jars, and jugs provide a principled way to access and operate on data.