VGVL Introduction
Introduction to Visual Grammars for Visual Languages

© 2009 2012 by Fred Lakin   (back to book site: )

This work described in these essays has two goals: to both understand and support the phenomenon of human text graphic activity. And if the phrase "phenomenon of human text graphic activity" bothers you, just substitute "blackboard behavior." All we need in order to proceed with the discussion is a general category that includes the doing of visual stuff by people on blackboards (and whiteboards too, what the heck).

From the beginning it was clear that the two goals were related. Generating text and graphics over time      the activity in question      is by its very nature artifactual. Therefore, as long as an artifact must be constructed anyway to support the activity, why not also design the artifact to help reveal the structure of that activity?

And in turn, better understanding of the activity will then allow better artifacts to be designed for understanding and supporting it (and so on, and so on, bootstrap all the way down).

So, many years ago, I set out to design and built such a dual purpose tool, little noting but long remembering the amazing course upon which I had thus embarked.


I tried numerous times to write a paragraph called "Who would enjoy this book." But with each version, I was afraid that many of the very folks      the very visual, logical, and inquisitive folks      whom I hoped to entertain would read the paragraph, take it the wrong way, self define outside my enclosure, and thus not venture further into matters which really were just their meat (or bread and butter for you vegetarians). Eventually I gave up and instead drew and wrote a very graphic intro to the intro, see next section, in hopes of enticing one and all.

[Exception condition, you know who you are: (first (last *footnotes*)) ]


The Visual Punchline: lumpy oatmeal, visual grammars for visual languages, and Dave the visual agent

First, for the very very visually impatient, this section is simply a sequence of pictures with explanations. The images and text are intended to show you: that expressions in a visual language are like coagulated lumps in oatmeal, all the same basic stuff, only just more highly structured; that "trees" (thin spider webs that actually look like roots) are a diagramatic way to represent structure; and, that both humans and software can use that structure in dealing with images.

To start things off, here is a spread of metaphorical text graphic oatmeal with lumps:


And here is an actual text graphic spread, with dashed boxes around some of the more structured regions:


And then here is the notation I will use to represent details of structure. They are called "trees" even though they look more like roots. The fine lines show the grouping structure of text graphic objects without violating their visual integrity. The spider webs are simply an overlay to be turned on or off; they show the objects to be members of ordered, recursive lists (during manual manipulation of objects, the order is often ignored and the lists are just used as simple groupings).


So now we can employ tree notation      a much more precise diagrammatic tool than boxes      to show local structurings in the text graphic spread:


And finally, the visual grammar below is a graphic representation of rules which define the general patterns for some of those local structurings, namely the structurings described by the Sibtran visual language (right hand sides of rules denote spatial template to match against as well as prototypical tree ordering of objects):


When people perform text graphic activty, there are patterns in the text graphic stuff they leave behind. The text and graphics in these patterns are more locally organized than their surroundings      hence we humans see them as patterns. But they are not made of different material than their surroundings, they are simply more structured clumps of the same basic stuff; coagulations if you will. Hence we call this the "lumpy oatmeal model," and use visual grammars to describe the structure of the lumps. Pretty cool, huh?

But, for the visually impatient and ruthlessly practical, maybe not so much. A better understanding of visual languages, as evidenced by visual grammars to represent underlying structure      sounds like a very round about way to get somewhere vague. Where's the payoff? So, instead, how about this: visual grammars provide a way for a piece of software called a visual agent to recognize patterns in the text graphic flow and intervene to assist the human in her visual activity. This is a hard problem; powerful & sophisticated tools will come in handy.

So, what else was there? Oh yeah, the other practical consequence: build a performing instrument, namely the "artifact for generating text and graphics." Well, it turns out that trees      the simplest and yet most powerful data structure in the known galaxy      provide not only a system for understanding text graphics after the fact, but also an extremely useful schema for generating them in performance. An organizing framework if you will, which can be embedded in a visual instrument in the same way that the Western diatonic schema is embedded in and manifested by the piano keyboard.

Bottom line, trees are the answer. During the fact, providing a framework for improvisation; after the fact, providing a framework for analysis. And, for the grand finale combo shot, both in vivo and in cogitare, where Dave the visual agent in real time uses the spatial arrangement of objects generated by the performer, along with the structure specified by the trees in the grammar, to recognize patterns in the performer's activity and take immediate action.

The Plan

OK, so we have the phenonemon, text graphic activty, and we know the payoff, visual grammars to represent the underlying structure of the activity for use by a visual agent. It merely remains to connect the two in a useful way via a plan, which plan will unfold as the details of our graphic story ...

Having skipped ahead to the end, we must now return to the beginning of the project, where the first step was a definitional one. Or rather, the lack thereof. Another thing which had become clear to me was that the distinction between text and graphics was arbitrary and context dependent, varying widely from one visual communication system to another, and therefore must be deferred as long as possible.

Hence I decided early on to simply refer to all visual objects as "text graphic," leaving it a task for later as to which was which, and how and when to define the difference[1]. Some more examples of text graphic objects which the visual tool was designed to handle:



And, because musical improvisation was a guiding inspiration, I called this tool      the one to both support and reveal human text graphic manipulation      a "visual instrument." Here then is a diagram of the plan:



After constructing various paper based systems like the "Wall Scroll" ...

and the "Vacuum Board Vertical Desk", and recording their use with time lapse photography, and then studying the recordings, I decided to try building a visual instrument which employed computer graphics.

Although my original intent was simply to emulate my previous paper systems using a computer controlled display, I soon realized that computer science had some concepts which might be of great use to my project.

A key insight was the realization that John McCarthy's LISP could be generalized from "Computing with (textual) Symbolic Expressions" to "Computing with Text Graphic Forms". That is, visual objects could direct the processing of other visual objects. This system, called PAM for "PAttern Manipulating" (even as LISP stands for "LISt Processing") would be a simple and clear framework for describing text graphic objects and manipulations upon them. And graphics would be first class objects in PAM, able to be executed as instructions by a computer for performing manipulations on other graphic objects.

Because PAM was an extension of LISP, it would of course also be a general purpose programming language, enjoying all the power of LISP. That is, PAM is simply a programming language extremely well suited for building visual performing instruments[2] and analyzing their use.


The basic tree structures of LISP (and PAM) were very handy for supporting the dual aims of the project (you remember, building a visual performing instrument which was also a processing/measuring instrument for visual performance activity). Here is the underlying tree structure for a canonical text graphic object:


A visual performer in the heat of blackboard activity ("graphicist"), and an analyst after the fact ("visual linguist"), both have the same need: a simple underlying structure for the visual objects generated during performance. The performer uses this structure immediately during manipulation as grouping patterns which determine "what moves with what"; the analyst uses it afterwards in retrospection as "underlying syntactic structure based on the spatial arrangement of visual objects for purposes of interpretation".

So if we contemplate for a moment hyper skilled visual performer David Sibbet in action during a Group Graphics session:

Then we can use trees to represent a reasonable underlying grouping structure for the complex text graphic image that was generated by David during the session:

And then (once again) we can write/draw a visual grammar for a simplified version of the visual language used by David to organize the text and graphics on the paper display


The first three rules in this grammar "say" that an expression in the Sibtran language can be either a hollow arrow pattern, a bullet list pattern, or a straight arrow pattern. And then the last three rules go on to say that a hollow arrow pattern is a piece of text followed by a hollow arrow with a piece of text inside followed by any Sibtran expression (recursion, denoted by the extra dot, " "); a bullet list is a piece of text over three bullet text pairs; and a straight arrow pattern is a piece of text followed by a straight arrow followed by any Sibtran expression (recursion again).

Recusion is pretty cool. It allows very concise grammars to describe visual expressions of arbitrary complexity. Here is the underlying structure for a recursive Sibtran expression as recovered by the parser using the grammar and the spatial arrangement of the visual atoms:


The parser is a computer program. It can "use" the visual grammar because parsing is simply one kind of processing on visual objects, and we (well, me at least) have spent a lot of effort to build a system in which (drum roll): visual objects can direct the processing of other visual objects.


Visual Linguistics and the contents of this book

The various essays in this book all deal with "visual linguistics"[3], either indirectly through design of performing instruments for generating visual expressions, or directly by building tools for analyzing such expressions.

The basic tree structure for visual objects and computational use thereof is described in the chapter Computing with Text Graphic Forms [A], where the resulting framework is referred to as PAM for visual "PAttern Manipulating." Here is the result of evaluating a visual mapping function:



The visual performing instrument built using PAM is called vmacs; vmacs is described in the chapters A Structure from Manipulation for Text Graphic Objects and A Performing Medium for Working Group Graphics. Nuts and bolts vmacs functionality as an environment for graphic editing and programming      non scrolling, manual display management, discretionary eval      is described in Viz Literals [B].

A good use of the vmacs instrument      to facilitate the communication of distributed groups      is described in The Visual Telefacilitation Project at PGC.

The kind of visual linguistics which can be done on graphic communication activity performed with such an instrument      when the linguist has available visual grammars that are text graphic patterns which direct the processing of other text graphic patterns in the task of spatial parsing [4]      is described in Visual Grammars for Visual Languages [C].

In that essay, I say:

"The purpose of spatial parsing is to aid in the processing of visual languages. As an operational definition of visual language, we say: A visual language is a set of spatial arrangements of text graphic symbols with a semantic interpretation that is used in carrying out communicative actions in the world. Spatial parsing deals with the spatial arrangement of the text graphic symbols in a visual phrase from a visual language: Spatial parsing is the process of recovering the underlying syntactic structure of a visual communication object from its spatial arrangement."

And let me add:

"A visual grammar is a graphic representation of the rules for laying out pieces of text and graphics in a spatial arrangement which expresses the syntantic structure of a properly constructed visual phrase in a particular visual language."

When people perform text graphic activty, there are patterns in the text graphic stuff they leave behind. Blackboard activity is live and spontaneous, general and unstructured      "conversational graphics". Visual language expressions often arise in the midst of conversational graphics like the coagulation of lumps in oatmeal. Such a "lump," or, expression in a particular visual language, is highly structured according to the rules for laying out pieces of text and graphics to construct phrases in that language. Those lay out rules are the grammar for that language. If the rules in that grammar are represented visually      written and drawn      then we have a visual grammar for a visual language

In visual conversations, over time the amount of spatial organization in the text graphics tends to increase (or, "visual entropy" decreases). Loosely speaking, we can call this the "local coagulation of lumps in the text graphic oatmeal." This odd phrase points out the important fact that the basic text graphic stuff remains the same during the course of the activity, but that as it continues, a discernible infrastructure arises among the pieces. Where "over time" means both visual conversation time in minutes and cultural time in years. And where it is the job of the visual linguist to try and tease out that underlying structure, and then to represent it in a visual grammar.

Let's take another run at showing how visual grammars work. This time we will not come from the very complex end of the diagrammatic spectrum and try to grapple with the richness of a Sibbet panorama. Instead, we will use a very simple example, a "toy" domain as the computer guys like to say.

So, on the left is a complete grammar for a minimalistic family of bar charts, and on the right, one member of that very impoverished family.


Rules two and three allow for any number of bars in a chart, thanks to the tricky yet ever so useful functioning of recursion in rule two      a list of bars is simply one bar with a text label (rule three), or one bar with text label followed by (the rest of) a list of bars (rule two).

Additional articles in the book describe other graphical/computational tools for doing visual linguistics. Some of the tools can be used to study human text graphic activity over time. In Mapping Design Information, the focus is on design ideation over days and weeks as recorded in an electronic design notebook. And in Measuring Text Graphic Activity, the focus is on visual performance over seconds and minutes; here is an automatically generated graph of attention shifting during image construction:


Visual Grammars, the big payoff: presenting Dave the visual agent

Visual Agents are software entities which assist people in performing graphical tasks. One useful and interesting graphical task is to make a text and graphic record of a group meeting. In A Visual Agent for Performance Graphics[I], a visual agent named "Dave" [5] is described. Dave acts as a whiteboard assistant for group graphics, helping a person to graphically record the conversation and concepts of a working group on a large display. Here is Dave's response when the user draws two connecting lines.


Visual agents in vmacs have complete access to user actions and the state of the visual world in the midst of text graphic performance. Thanks to the visual grammar for Sibtran mentioned earlier, Dave is able to recognize certain patterns of text and graphics. And when the human does create such a pattern, Dave then generates a text graphic response and this object is displayed on the screen. The response is appropriate both to the general type of visual pattern which triggered it, and to specific elements in each individual triggering pattern. The human then looks at Dave's response, incorporates it (or not) into the text graphic stream of visual recording, and continues working. And so on and so on.

When two entities are trading text graphics back and forth in this manner, we call it a "text graphic dialog;" in this case the dialog is between a human and a visually adept machine.


[1] In fact, let's leave it as a PAM programming exercise for the reader (i.e. to write algorithms for distinguishing between text and graphics in different visual contexts, a task for which PAM is well suited by design).

A further example which confounds most ways of distinguishing text from graphics (and vice versa):



[2] Amongst other features, PAM like LISP has both an interpreter and a compiler which are in sync, so that the entire language is available interactively      for executing visual objects as well as textual commands. And of course PAM can easily write programs which write programs, a capability paramount in visual language processing (for VLs both designed and natural).


[3] Many lay claim to the term "visual linguistics", and I do not wish to dispute with them. All I know is that after being exposed to the concept of visual language from the graphic design point of view in the early 70's, and then computational linguistics in the mid 70's, I came up with the term for myself in that same decade, and first used it in print at the beginning of the next decade (A Structure from Manipulation for Text Graphic Objects, ACM SIGGRAPH 1980).


[4] The problem with procedurally directed parsing is that knowledge about the syntax of each language is embedded implicitly in procedures, making it hard to understand and modify. In Visual Grammars for Visual Languages, a spatial parser is described that utilizes context free grammars which are both visual and machine readable. The parser takes two inputs: a region of image space and a visual grammar. The parser employs the grammar in recovering the structure for the elements of the graphic communication object lying within the region. One advantage of a visual grammar is that it makes the syntactic features of the visual language it portrays graphically explicit and obvious (the visual linguist can literally "draw" the grammar using the indigenous text graphic symbols of the language being parsed). Grammars also increase modularity       by parameterizing one parser with different grammars, it is easy to change the behavior of the parser to handle new visual languages.


[5] I feel obligated to mention that "Dave" the visual agent has no relation to David Sibbet (whom I would never call Dave). My viz agent is in fact named after the affable and finally capable Kevin Kline character in the eponymous movie. And I didn't even know Dave Gray at the time when I built the agent.


[6] Preface to the 1978 manuscript: Structure and Manipulation in Text Graphic Images      a phenomenological approach

"Presented in this book is a way of thinking about graphic images and manipulating them. This way of thinking is called PAM, which stands for PAttern Manipulation.

"If you are not a computer scientist, then you can think of PAM as a formal notation for describing graphic objects and manipulations of them      just as algebra is a notation system for describing numbers and manipulations of them. Non computer people might use PAM as a precise way of talking about: text graphics images (in order to design them); visual languages; human text graphic activity (for anthropology, linguistics and psychology); and models of text graphic understanding.

"If you are a computer scientist, then you can consider PAM to be a machine independent description of how to think about graphic objects and manipulations of them. PAM is a simple and powerful system for describing algorithmic devices like: graphics editors (handPAM, the Electric Blackboard); evaluation functions for graphic forms (evaluate, the PAM interpreter); graphic programming environments, graphic text editors, circuit diagramming aids and architectual drafting systems.

"In acknowledgement, this book could never have been written without the inspiration of John McCarthy's LISP and the LISP editor concepts of Peter Deutsch and Warren Teitleman."


[7] To programmers of the McCarthyesque persuasion, I simply say that the heart of the project is to make graphics do LISP. This is completely different than making LISP do graphics (which usually involves using LISP to control output to a screen via side effects).

Instead, making graphics do LISP generalizes "Computing with Symbolic Expressions" to "Computing with Text Graphic Forms," where graphics on the screen are first class objects and can direct the processing of other graphics. The system is called PAM for "PAttern Manipulating" (even as LISP stands for "LISt Processing")

Many consequences follow naturally from the generalization and are described in this book. Lisp is good for processing textual languages, both programming and natural; hence PAM is good for defining and implementing visual programming languages, and also facilitates spatial parsing (via visual grammars) for natural graphic languages (like found on blackboards at day's end, or walls in the inner city).

Frankly, once the crucial leap (textual symbols to text graphic forms) has been made, and the spider web notation concocted to show structure, then most of the programmatic consequences are pretty obvious. The graphics structure editor, vmacs, is based on the Teitelman/Deutsch program structure editor. Spatial parsing of visual languages just uses a routine backtracking strategy for a context free grammar where the rules serve as spatial search templates. And this VennLISP version of labels is a minor hack on the vizeval interpeter defined in Computing with Text Graphic Forms.




[A] Computing with Text Graphic Forms, proceedings of the first ever LISP Conference at Stanford University, August 1980.

As for the rest, almost every chapter is also a paper of the same name. See those chapters for citations.

[B] The one exception is the Viz Literals chapter, which is as yet unpublished and instead has it's own web site (which I guess counts as published these days).

[C] Visual Grammars For Visual Languages, proceedings of AAAI 87, the conference of the AMERICAN ASSOCIATION for ARTIFICIAL INTELLIGENCE, Seattle, Washington, July 1987.


Computing with Text Graphic Forms
Viz Literals
Visual Grammars for Visual Languages
A Structure from Manipulation for Text Graphic Objects
A Performing Medium for Working Group Graphics
The Visual Telefacilitation Project at PGC
Visual Languages for Cooperation
On the Syntax of Diagrams
Executable Graphics
Computing for the Right Brain
Mapping Design Information
Measuring Text Graphic Activity
A Visual Agent for Performance Graphics
Author Bio

Fred Lakin, cabin on the ridge, Santa Cruz, CA, February 2010

© 2009 2012 by Fred Lakin   (back to book site: )