Documents vs UIs - holding back the web?

While updating this website I’ve found myself staring at a crossroads. Persist with the existing format of a backend application that serves up a mix of templated HTML and static assets, or switch to a UI framework (specifically, elm-ui). But what are the trade-offs? If I’ve grown so fond of elm-ui, why isn’t this switch a simple decision? It comes down to the difference between documents (AKA “content”) and User Interfaces, and I think the answer reveals one of the main weaknesses of the web as a platform.

"State" of play

The internet was built to display documents, but modern web applications instead typically require UIs. As the web has matured as a platform, we’ve seen client-side technologies move away from techniques that are well suited to document rendering. Instead, we now create applications that use Virtual DOM to render a UI from some in-application state. React is the most popular of these frameworks and I’ve already mentioned elm-ui, but this approach is very common in client-side development.

This is a huge leap forwards for our ability to create bespoke experiences on the web, but UIs are very rarely the whole application. Many web applications are primarily made up of “document elements” and even those that aren’t (e.g. games and true webapps) still contain large amounts of document content like help screens, information, and a home page introduction.

Embedding UI elements into a document is easy, we can look at the web’s history to see how that approach has developed over the decades. But the move towards building true UIs is at odds with the old document-first approach, and this comes down to how the document should be represented in the application. Since we ask our UI framework to use the application’s state to render the UI to the screen as a side-effect, the document content needs to be a part of the application state. But what does that state look like?

HTML?

Document representations are still mostly designed around the need to lay out content on a piece of paper. Google Docs and equivalents work this way, and produce opaque document formats with this goal in mind. For a pageless “machine-displayed document” we instead use markup languages like HTML to annotate a raw document with layout information. This already sounds like a poor fit for our UI-first world of client-side development, and unfortunately these document formats are also relatively opaque. HTML is notoriously loose and flexible! Parsing it into any kind of sensible application state is a genuinely difficult problem, exacerbated by HTML having become the language we use to express UIs on the web. Its expressive potential far exceeds what we need for document content within UIs, and necessarily so since all the UI frameworks mentioned above use HTML.

HTML is so complex to parse that we would essentially need to re-implement the browser in our application to be able to render the document state ourselves. This is impractical, but because using HTML as our document format is so attractive for other reasons, the practical choice is to defer to the browser for these elements. We either use an embedded web-view (e.g. an iFrame or a custom element), break the rules of our framework to insert our content directly into the DOM for these nodes, or re-architect our UI around the document and thereby return to the old approach of embedding UI elements into a document.

Content authoring tools that produce HTML are myriad and mature, and by leveraging our runtime environment directly (the browser), we’re able to render these document fragments relatively easily. But there are some large costs associated with this approach:

We have to use browser-native styling and layout to use the browser-native UI, which means writing CSS in a separate location.
This approach opens up very serious security holes. Rendering arbitrary HTML at runtime is an XSS waiting to happen. While I’m sure you and I would never allow untrusted HTML into the application, the approach pretty much guarantees it will happen to someone at some point.
There’s a big disconnect between the UI and the embedded document elements. Because the UI application completely ignores document content, we have no opportunity for inspecting or transforming the content. This also means it’s tricky to decide what to do in the fuzzy line between UI and document content. What if we want to embed an interactive graph into our document? This is a fractal problem, and we’re essentially back to square one here.

Markdown?

An alternative markup language is Markdown, which is a lot more structured and limited than HTML. Setting aside the lack of a consistent technical specification for Markdown, it only supports a small number of document elements and this simplicity makes it easier to read and write, easier to parse, and easier to render.

But Markdown supports embedded HTML. This is convenient when you need to add more complex UI elements to a document, but our goal here is the opposite - to add document elements to a UI. Luckily, because we can parse Markdown documents these document fragments are more transparent to our application than their HTML equivalent would be. It is possible for our UI to strip, ignore, or raise errors on HTML blocks if they are present in the parsed document representation. We can also provide support for specific custom HTML elements, which allows UI elements to be embedded back into our documents by parsing those elements directly into our UI’s application state without creating the fractal problem of rich embedded documents.

A Markdown solution

The elm-markdown package gives us a practical example of how to use Markdown in this way. At its core, the library defines an elm datastructure for representing Markdown documents. It provides a parser for turning Markdown fragments into their elm representations. Finally, we provide an instance of elm-markdown's renderer to convert this part of our application state into something that we can display in our UI. This renderer essentially duplicates CSS by providing a consistent way to render document elements based on their type in the document, instead of directly on an element-by-element basis like we do with UIs.

We can now accept Markdown content at runtime or as part of our source code and because these Markdown document fragments are parsed into a datastructure that represents the document’s structure, they can be inspected, manipulated and augmented as needed just like any other part of our application’s state. Finally, we’re able to use our renderer to put this content in front of our users via the same mechanism we use for the rest of our UI.

While this seems to meet our requirements, everything feels a bit duct-taped. This isn’t yet the standard way to do things, so many steps in our workflow lack maturity.

The elm-markdown package looks great, but it isn’t quite as polished as we’d like for something so important to our workflow.
With no in-language Markdown support, we’d be writing these as multiline strings or cobbling together tooling to import content from separate markdown files.
There are some great content authoring tools for Markdown documents, but they will expect embedded HTML to be permitted even though our application likely won’t support this. Conversely, they won’t understand our custom elements if we take that approach to augmenting the document.
elm-markdown parses embedded HTML with an xml parser, which is a great compromise for well-specified custom elements, but remains a source of complexity if we try to use this for raw HTML.
Although our styling is now UI-native it’s a bit off the side in our renderer, since we’ve had to essentially duplicate CSS. This feels like inherent complexity, so I don’t think this is the first place to focus on improvements.

Rich-text editors?

Another approach is to adopt the datastructures used by existing content editors like ProseMirror, instead of HTML or Markdown. These datastructures are already de-facto standards in their ecosystems and are well-optimised for the use case of editing document content. Could this be a better choice for a well-specified and broadly adopted document standard? If something like this gets some momentum then you’d expect communities to step up so that language ecosystems provide parsers, renderers and tools for document-manipulation.

Back to my quandary…

Switching to elm-ui will make it much easier for me to provide a great user experience on my website. I’ll be able to use space and design elements to arrange and identify different parts of the site while providing a rich interactive experience for reading my content. But a website revolves around its content, and the content is created, stored and served as documents. Getting this document content into a rich interactive UI remains more difficult than adding UI elements on top of the documents. Ultimately, this pressure pulls us back towards techniques and technologies that are no longer state of the art, to the cost of the web platform as a whole.