Navigation  without Java Scripts

Domain-Specific Languages

by Paul Morrow and Michael Alexander
Prolog Development Center, Atlanta, GA

1 Introduction

Modern day general-purpose programming languages (BASIC, C/C++, Java, Pascal, etc.) are high-powered tools for software development. Yet today's top programmers often choose special-purpose, Domain-Specific Languages (DSLs), such as HTML or SQL, for large portions of their applications. Their reasoning is simply that using a language designed specifically for a particular task appreciably increases programmer productivity and the reliability of the resulting applications.

But if DSLs offer significant productivity and reliability benefits, why don't programmers create new DSLs as a natural part of application development? One reason is that many developers believe that creating programming languages is a necessarily difficult and time-consuming task, and therefore only makes sense when considerable reuse is anticipated. Others simply aren't familiar with advanced language design and implementation techniques. But fortunately, because of their restricted and focused nature, DSLs are relatively easy to design and implement.

In this paper we look at why DSLs are important and consider some popular (and not-so-popular) examples of them. We then argue that creating DSLs should be part of basic software engineering, and describe a framework for implementing declarative DSLs.  So read on and discover how your development strategy can benefit from this exciting technology!
 

2 What are DSLs and why should we use them?

A DSL is a programming language designed to support applications in a narrow subject area (domain). As such, a DSL contains constructs that directly represent concepts in the application domain, thereby raising the level of abstraction. And when programmers work at high levels of abstraction, they are more productive, typically write less code, and their programs are generally more reliable, verifiable, and maintainable.

To appreciate this point, imagine having to write an assembly language program that calculates and displays the result of a mathematical expression such as

14*2.33/4

It would require many lines of code and, upon examination, it would not be immediately apparent that the program is correct. If instead you used a higher level language, like BASIC, the program would become simply

print 14*2.33/4

But now suppose that a requirement of the application domain is that, anytime the result of an expression is printed, it must also be accompanied by the expression itself, the time at which the expression was evaluated, and the name of the user running the program, each separated by a comma. In BASIC this would become

print 14*2.33/4; ","; "14*2.33/4,"; TIME$; ","; USER$

which is certainly an error-prone and verbose way to do it (and assumes that TIME$ and USER$ are variables which somehow contain the current time and user respectively). If instead a DSL was crafted for applications of this sort, the statement could once again be simply

print 14*2.33/4

but would have the required semantics. Of course this is a contrived example, but one which hopefully captures your imagination as to the potential power of DSL technology.

There are many commercially available DSLs in use today. Two of the more popular examples are HTML (document formatting) and SQL (relational databases).

Using HTML, the programmer is freed from low-level details normally associated with document layout such as font management and text justification --- the HTML language processor handles these details (and many others).
With SQL, productivity is improved in that tedium such as opening and closing files and maintaining/using indexes is handled automatically. It’s also worth noting that SQL programs are so small (relative to the same program written in a general purpose language) that it becomes feasible to transmit the program source code from a client machine to the server for remote execution. Doing so allows the program to execute on the machine(s) where the data resides, thereby considerably improving the performance of the program.

DSLs have proven themselves as powerful technology and clearly deserve a place in every programmer’s repertoire.
 

3 The Perfect DSL for Every Application

If you take the previous section’s discussion of abstraction to it’s natural conclusion, it becomes clear that the best DSL in which to develop a given application is one specifically designed for that application domain. But what do you do when this ideal DSL doesn’t already exist (which will typically be the case)? You build it. That’s right, you design and implement a language perfectly suited for the domain.

Before you dismiss this proposal as outrageous, consider the advantages of this approach to application development. In addition to the productivity and reliability benefits we’ve already talked about, your domain-experts (your customers) can examine application code and verify that it is correct, even if they’re not programmers. In fact, they can actually write and maintain significant portions of the application code themselves!

And your customers will save money and time over the long run because of the ease with which you or they can make changes to their applications. Remember that programs always have more than one version. Every application undergoes numerous changes throughout its lifetime, even before the first official version is released. If prior to developing an application, you first develop a DSL for the application’s domain, changes to the application will be much easier to make.

We have successfully used this technique a number of times in our consulting practice.

In one project from the pre-Windows era, the goal was to develop an application that produced correct and efficient overlay scripts for Plink86™ (a popular DOS overlaying linker of that era). Our solution was to implement a DSL which the programmer would use to describe various properties of the application to be overlaid (e.g. the files which comprise the application, the functions which contain hidden calls to other functions, and those functions which should always be resident in memory). The DSL language processor would analyze the application to be overlaid, produce an optimized overlay script, and then invoke the overlay linker, resulting in an overlaid executable that was safe and efficient.
In another project, the setting was a Unix/Oracle environment which hosted the information system software of a large Air Force R&D Laboratory. The goal was to devise a means of standardizing reports generated by the system (e.g. giving them all a common look and layout) and of improving programmer productivity. The solution was a report description DSL whose language processor managed the details of constructing each report, including all database access and formatting. Hundreds of reports were written in this DSL, most by junior programmers, where a single programmer could produce several reports per day.
We’re currently developing a personality testing system, a Windows 95/98 application. The program presents questions to the end-user, collects and tabulates the responses, and then presents the personality assessment. DSL technology is a particularly nice fit for this application, which we’ll later illustrate as we develop a DSL for a simplified example patterned after this system.

4 So what’s the problem? Why isn’t everyone doing it?

Some developers think it’s generally too costly and time-consuming to develop languages, and that it only makes sense when considerable re-use is anticipated. Others are simply unfamiliar with advanced language design and implementation techniques.

Design

Granted, good language design can be challenging. But the same is true of good application design. For some domains it’s difficult; for others, trivial. Although a thorough treatment of language design is beyond the scope of this paper, the key idea behind creating an effective DSL is to identify the variant concepts of the application domain, and those invariant concepts that require extensive discovery and experimentation to clarify. You then define language constructs which directly represent these concepts. Often, these constructs will take the form of parameterizations of some underlying mechanism, such that the resulting DSL has a declarative flavor (as opposed to procedural). In our experience, domain experts find declarative languages considerably easier to use than procedural.

In Section 6 we show an example of designing a simple DSL.

Implementation

Several strategies are effective at simplifying DSL implementation. One is to use a parser generator and a high-level, high-performance language especially suited for language implementation (such as Visual Prolog --- see the sidebar at the end of this paper). Another strategy is to develop a generic DSL framework and then use this framework for each new DSL. In the next section we present one such framework we’ve recently developed.
 

5 A Framework For Implementing Declarative DSLs

We have developed a DSL framework (DSLX) which simplifies the implementation of declarative DSLs. To accomplish this, there is the restriction that each DSL have the same base syntax, one strongly resembling core XML. By making this requirement, all DSLs can share a common infrastructure such that, for example, a parser need be implemented only once, and end-user training and documentation is minimized. Additionally, since DSLX syntax is essentially XML, each new DSL is simple to parse and simple to teach (a non-programmer could probably learn how to read it within a few minutes).

An application developed with the DSLX framework has four primary components:

A Driver which initiates processing and controls the application. The Driver will also provide answers to questions asked by the framework (e.g. In order to display a form to the user, the framework might need to know which window handle should be the parent of the form.).
A Document Manager which supports the creation and editing of, and access to, DSLX documents (DSLX programs are also referred to as "documents"). The Document Manager can also create new documents by parsing DSLX source code and can convert documents to textual form.
A Dispatcher which routes each DSLX program to the appropriate Document Processor, and
One or more Document Processors responsible for "executing" (processing) DSLX programs (documents).

Figure 1 is an illustration of the generic DSLX framework and shows communication between the various components.
 

pcai1_figure1.gif (34991 bytes)

Figure 1: DSLX Framework. The framework is comprised of a Driver, a Document Manager, a Dispatcher, and one or more Document Processors. A) Driver interacts with Document Manager to create a new document, optionally passing DSLX source-text to be parsed. Document Manager returns a DocId to Driver. B) Driver passes the DocId plus the address of a callback function to Dispatcher. The callback function will be used by the framework to obtain additional info and to post statuses. C) Dispatcher forwards the DocId and callback address to the appropriate Document Processor. D) Document Processor interacts with Document Manager to read and/or modify a document or to create a new document. E) Document Processor calls Driver’s callback as necessary. F) Document Processor contacts Dispatcher to obtain the services of other Document Processors.

To develop an application using the framework, you would generally do the following:

  1. Study your application domain and define a DSL within the boundaries of DSLX syntax.
  2. Create a Document Processor for the language.
  3. Register the DSL and Document Processor with the framework.
  4. Create the application’s Driver.
  5. Write a program in the new DSL.

In the following section, we illustrate the use of the DSLX framework as we implement a simple personality testing system.
 

6 Using the DSLX Framework: Creating a Personality Testing System

Suppose that your customer wants to develop an application that assesses the degree to which someone can tolerate stress. The application should ask the user a series of multiple-choice questions, then score the answers and present the assessment. So three obvious variant concepts in this domain are questions, answers, and assessments. And to tie answers to assessments, each answer could have a score associated with it, where each assessment would be keyed by a range of scores (low to high). This analysis leads to a simple DSL, where Figure 2 shows a possible document in this language.
 

pcai1_figure2.gif (29027 bytes)

Figure 2: Sample DSL Statements of Personality Testing System. Each question has two parts, the text of the question itself (possibly containing special formatting information, e.g. <emphasize>) and each possible answer. Each possible answer has answer text and a score. Each assessment has a low and high range and the text of the assessment (which also may contain special formatting instructions).

Now let’s briefly look at how the framework might be used to construct this system (please refer to Figure 1). The Driver would be responsible for the overall user-interface and flow of the application. So it might be coded to load a file containing DSL source-text and then pass this text to the Document Manager. The Document Manager would parse the source-text, create a document (an abstract syntax tree) representing the source-text, and then return the document id (DocId) of the newly created document to the Driver. (A of Figure 1)

The Driver could then pass this DocId and a callback function (which responds to messages from the framework) to the Dispatcher for processing. The Dispatcher, in turn, would pass the DocId and callback to the appropriate Document Processor; the one we create for processing documents written in our new DSL. (B and C of Figure 1)

Our language’s Document Processor would be responsible for three functions: 1) numbering, formatting, and presenting each question to the user, 2) collecting and scoring the answers, and 3) presenting the assessment to the user. To simplify the Document Processor’s construction, it could use an embedded HTML viewer for all I/O with the user.

So to present a question to the user, the Document Processor would first retrieve a question from the document by interacting with the Document Manager via its API (D of Figure 1). It would then write HTML code that represents the formatted question and pass this HTML to the HTML viewer (see Figure 3).
 

stress0.gif (38331 bytes)

Figure 3: Stress Test Document Processor presents questions to user. Compare the contents of the center window with the <question> statements in Figure 2. Note that by having the Document Processor make most formatting decisions, the DSL can be greatly simplified and the high level of abstraction preserved.

When the user has finished answering all of the questions, the Document Processor would score the test and then search the document (again via interacting with the Document Manager) for the assessment that matches the score. The text associated with the assessment would be retrieved, converted to HTML, and presented to the user via the HTML viewer (see Figure 4).
 

stress.gif (35224 bytes)

Figure 4: Stress Test Document Processor displays assessment to user. After totaling the user’s score (60 in this case), the Document Processor searched the possible assessments for the one in the proper range (refer to Figure 2). It then formatted and displayed the assessment.

 

7 Conclusion

All programmers use programming languages. The best programmers maximize their productivity as well as the reliability and maintainability of their applications by choosing the right language for each job. The best language will always be a DSL specifically designed for the programming task at hand. But unfortunately, the ideal DSL doesn’t always exist. When it doesn’t, we believe that developers should consider creating it, right then and there.

You know what your customers want. They want projects completed on time and within budget. But they also want reliability, maintainability, and extensibility. You can give them these things by taking a DSL approach to application development. It’s a different way of thinking about software engineering, but we believe it’s the best way. Give DSLs a try in your next project, and you may never build an application with purely conventional languages again!
 

Additional Reading

Bentley, J. (1986) Programming Pearls: Little Languages. Communications of the ACM, Vol. 29, 8, pp. 711--721.

Pelin, A. and Morrow, P. (1987) Automatic Program Generation from Specifications Using Prolog. In Proceedings of the Third Conference on Artificial Intelligence for Space Applications.

Spinellis, D. and Guruprasad, V. (1997) Lightweight languages as software engineering tools. In USENIX Conference on Domain-Specific Languages, pages 67-76.

Thibault, S., Marlet, R., and Consel, C. (1997) A Domain-Specific Language for Video Device Drivers: from Design to Implementation. IRISA technical report PI-1112.

 

Sidebar: Language Implementation in Visual Prolog 

If you’re trying to implement a language processor using tools that aren’t right for the job, you can expect a frustrating and time-consuming experience (anyone who’s hand-coded a substantial parser in C will testify to this). But by using tools suited to language implementation, you’ll likely find this a straightforward and even enjoyable activity.

For example, using Visual Prolog and its parser generator tool, implementing a language processor would require four basic steps:

  1. Create a Backus-Naur Form (BNF) representation of the language’s grammar.
  2. Convert the BNF into the syntax of the parser generator, which is simply an extended form of BNF (note that the parser generator input language is itself a DSL).
  3. Execute the parser generator to produce the actual parser.
  4. Add the surrounding application code, including that which will "execute" the term produced by the parser.

As a simplified example of this process, suppose you were developing a language that could accept statements like

print 12 + 7.3 * 5

where the semantics (meaning) of such statements would be to evaluate the mathematical expression and then display the result. The BNF for this simple language might be

<stmt> ::= print <expr>
<expr> ::= <expr> + <expr>
         | <expr> - <expr>
         | <expr> * <expr>
         | <expr> / <expr>
         | <number>

which would allow for any expression involving the addition, subtraction, multiplication, and division of numeric literals (note that <number> is not defined and is assumed to represent any numeric literal). Converting this BNF into parser generator input would result in

productions
   STMT = print EXPR      -> print(EXPR)

   EXPR = EXPR plus EXPR  -> add(EXPR,EXPR),
          EXPR minus EXPR -> subtract(EXPR,EXPR)
          --
          EXPR mult EXPR  -> multiply(EXPR,EXPR),
          EXPR div EXPR   -> divide(EXPR,EXPR)
          --
          number(REAL)    -> number(REAL)

Which specifies not only the legal statements of the language and how precedence should be handled (i.e. that multiplication and division should be performed before addition and subtraction), but also the Prolog terms that would be created when the various patterns are encountered in the source text. 

For example, given the above parser generator input, if the source text was

print 12 + 7.3 * 5

then the term

print(add(number(12.0),multiply(number(7.3),number(5.0))))

would be returned by the generated parser (for those of you who are unfamiliar with Prolog, the term is displayed here using the syntax of a Prolog literal). As another example, the term returned for the source text

print 13.7

would be simply

print(number(13.7))

Once the parser has finished, and your language processor now has a term that represents the source text, the language processor would execute the term. For our sample language, the following Prolog predicates could serve this purpose:

execute_stmt(print(Expr)) :-
   evaluate_expr(Expr, Result),
   write("The result of the expression is ", Result).
evaluate_expr(add(Expr1, Expr2), Result) :-
   evaluate_expr(Expr1, Result1),
   evaluate_expr(Expr2, Result2),
   Result = Result1 + Result2.
evaluate_expr(subtract(Expr1, Expr2), Result) :-
   evaluate_expr(Expr1, Result1),
   evaluate_expr(Expr2, Result2),
   Result = Result1 - Result2.
evaluate_expr(multiply(Expr1, Expr2), Result) :-
   evaluate_expr(Expr1, Result1),
   evaluate_expr(Expr2, Result2),
   Result = Result1 * Result2.
evaluate_expr(divide(Expr1, Expr2), Result) :-
   evaluate_expr(Expr1, Result1),
   evaluate_expr(Expr2, Result2),
   Result = Result1 / Result2.
evaluate_expr(number(Number), Number).

You can hopefully see from this example that the code which recursively walks this term (which represents the semantics of the parsed input language statement) could easily be undertaking the tasks of any particular domain, whether as an interpreter, as illustrated here, or as a pre-processor emitting code statements in some more generic language.

 

For more information on DSL technology, see www.pdcAtlanta.com/html/dsls.htm

Paul Morrow (paul@pdcAtlanta.com) and Michael Alexander (michael@pdcAtlanta.com) are software consultants and the directors of the Prolog Development Center / Atlanta.

A version of this paper appears in the Jan/Feb '99 issue of PCAI Magazine.

Copyright © 1998, Prolog Development Center / Atlanta