Building a Simple and Performant DSL in Scala

Scala provides different language constructs by which a wide variety of DSLs can be made. Implicits, operators as name of functions, macros, higher order functions etc. are examples of those constructs. However, building a DSL is not a trivial task especially when performance matters. In this blog post, I am going to talk about a simple DSL that I built in Scala to convert objects to a sequence of bytes. As a use case, the resulting byte array can be used to create a hash (e.g. MD5 or SHA) for a piece of data. The DSL’s aim is to be concise and efficient.
Here is a Java way to create a byte array out of a sequence of different values:

final ByteArrayOutputStream baos = new ByteArrayOutputStream();
final DataOutputStream dos = new DataOutputStream(baos);



final byte[] result = baos.toByteArray();

By running this code, result will contain 10 bytes, out of which 5 bytes to represent str, 4 bytes for i and one single byte for b. In standard UTF-8 encoding, str will encode to 3 bytes but writeUTF creates modified UTF-8 where it starts with preceding two bytes stating the length of the byte sequence. As it is Java, the code has lots of boilerplate but it is quite performant. Obviously, if we come up with a concise DSL to eliminate the boilerplates while preserving the performance, we will achieve our goal. So let’s define the desired DSL syntax first.

Usually APIs become easier to use if they provide meaningful method chaining. Moreover, Scala lets us use operators as the name of functions. This means that we can define DSLs which support operation chaining. I would like to chain ~ operator to create an array of byte representing all the values that formed the chain. Therefore, the above code can be concisely implemented as follows:

val result: Array[Byte] = "DSL" ~ 32 ~ true

To preserve performance, we can keep using DataOutputStream underneath. Thus, we need to create its instance together with an instance of ByteArrayOutputStream as its underlying output stream in the beginning of the flow. Obviously, the DataOutputStream instance should be closed and the bytes should be retrieved from the ByteArrayOutputStream instance at the end of the flow. For simplicity, I add unary operation ~| to our DSLs to show the end of the flow:

val result = "DSL" ~ 32 ~ true ~|

Now we have to find out how it is possible to build this DSL. First, let’s try to break our DSL into small pieces. Here are couple of observations:

  1. Since we want to be able to invoke ~ operator on any object, we have to extend all types to support ~.
  2. At the beginning of the chain, ~ is an operation defined on String and accepts instances of any type as its argument. As mentioned in the previous point, this operator is not just defined on String but on all types. In our example, the first ~ invocation can be desugared to "DSL".~(32).
  3. This operation produces an instance of a type which:
    • Supports ~ operation which accepts instances of any type as its argument. In our example, it accepts true.
    • Carries and mutates DataOutputStream and ByteArrayOutputStream created at the beginning of the flow.
    • Supports operation ~| which closes DataOutputStream instance and retrieves bytes from ByteArrayOutputStream instance.

As seen, I differentiated between the first ~ and the rest. The first one is defined on all types and produces an instance of a special type (let’s say Container). However, other ~ operators, are defined on Container and produce Container.

First, I start with defining Container. As mentioned before, Container is responsible for carrying and mutating DataOutputStream and ByteArrayOutputStream. Moreover, it should support ~ and ~|. The implementation of ~ will come later. The complete source code can be found here.

case class Container(dos: DataOutputStream, baos: ByteArrayOutputStream) {
    // TODO: def ~

    def ~| : Array[Byte] = {

object Container {
    def apply() = {
        val baos = new ByteArrayOutputStream()
        val dos = new DataOutputStream(baos)
        new Container(dos, baos)

Now, we need to provide a mechanism to abstract writing different values into DataOutputStream. Different DataOutputStream methods should be invoked to write different data types into the underlying output stream (e.g. writeInt, writeBoolean etc.). Scala type classes are a perfect way to abstract these differences while keeping extensibility. Take a look at the following code snippet:

trait ByteSequenceRepr[A] {
    def writer: (Container, A) => Unit

    def toByteSequence(arg: A): Container = {
        val baos = new ByteArrayOutputStream()
        val dos = new DataOutputStream(baos)
        toByteSequence(Container(dos, baos), arg)

    def toByteSequence(container: Container, arg: A): Container = {
        writer(container, arg)

ByteSequenceRepr is the actual type class which accepts a type parameter. It provides a writer higher order function which yields a function to write instances of type A into DataOutputStream embedded in the Container instance. Additionally, our type class has two other functions which actually mutate (or create and mutate) the Container instance using available writer. Later, I will explain how this type class is going to help us.

Now, let’s implement the missing ~ operator on Container case class:


def ~[A](arg: A)(implicit bsr: ByteSequenceRepr[A]) = bsr.toByteSequence(this, arg)

~ accepts an argument of type A and it needs an instance of ByteSequenceRepr of type A available in the scope. The logic is quite simple; using the available ByteSeqeunceRepr, the given instance of A is written into DataOutputStream carried by Container instance. Of course, you may rewrite this function using context bound and implicitly mechanism.
To make the life of DSL’s users even easier, we can provide default ByteSequenceRepr implicits for primitive data types. I created LowPriorityDefaultByteSequenceReprImplicits trait and put different ByteSequenceRepr there (the complete version is here):

trait LowPriorityDefaultByteSequenceReprImplicits {
    implicit val intToByteSequence: ByteSequenceRepr[Int] =
    new ByteSequenceRepr[Int] {
        val writer: (Container, Int) => Unit = _.dos.writeInt(_)

    implicit val stringToByteSequence: ByteSequenceRepr[String] =
    new ByteSequenceRepr[String] {
        val writer: (Container, String) => Unit = _.dos.writeChars(_)

    implicit val booleanToByteSequence: ByteSequenceRepr[Boolean] =
    new ByteSequenceRepr[Boolean] {
        val writer: (Container, Boolean) => Unit = _.dos.writeBoolean(_)

object ByteSequenceRepr extends LowPriorityDefaultByteSequenceReprImplicits

Implicits have been defined in ByteSequenceRepr companion object. In this way, the default implicits will have the lowest priority when they are being looked up. Thus, for example, if the user wants to provide a new implementation for string ByteSequenceRepr, she/he just needs to add a new implicit in the scope and that one will have the higher priority than the default one. Here, you can find a detailed explanation of implicits finding rules.

As mentioned before, in addition to Container which supports ~ and ~| operator, all data types should also support them. Implicit function and implicit class are two mechanisms in Scala to extend the existing APIs without introducing new inherited data types. It is known as “Pimp my library” pattern:

object Implicits {
    implicit class WithTilde[A](val left: A) extends AnyVal {
        def ~[B](right: B)(implicit seqA: ByteSequenceRepr[A], seqB: ByteSequenceRepr[B]): Container = {
            val container = seqA.toByteSequence(left)
            seqB.toByteSequence(container, right)

        def ~|(implicit seqA: ByteSequenceRepr[A]): Array[Byte] = seqA.toByteSequence(left).~|

Let’s have another look to our example:

val result = "DSL" ~ 32 ~ true ~|

The first invocation of ~ is on "DSL" and the argument of this call is 32. Therefore, first, the Container instance should be created having "DSL" written into its DataOutputStream field. Then it should be mutated by writing 32 in it. This happens by calling two different overloads of toByteSequence on available ByteSequenceRepr instances in WithTilde ~ implementation. ~ yields a Container instance, so all the subsequent ~ invocations (i.e. ~ true) will be done on that resulting container. WithTilde implicit class extends AnyVal so a new instance of WithTilde class is not created every time that the conversion needed. Moreover Container instance is mutated and passed through instead of being re-created every time so the heap size is not increased drastically.

Actually, we are done. By importing Implicits._, we can benefit from our concise DSL. We can extend it easily for any composite data type as well. You just need to provide a ByteSequenceRepr instance of that type in the scope:

case class Point(x: Int, y: Int)

implicit val pointToByteSequence: ByteSequenceRepr[Point] =
new ByteSequenceRepr[Point] {
    val writer: (Container, Point) => Unit = { (container, point) =>

val result = Point(12, 15) ~ "Hello" ~ 127 ~ 123L ~ true ~|

Our DSL has a very small memory usage overhead comparing to the pure java approach due to intermediate Container objects that being created. However, since for each chain, we instantiate Container just once, it is negligible.

We can go even further by using shapeless to define a general ByteSequenceRepr to convert any case class to a sequence of bytes. The basic idea is that shapeless provides HList data type to model heterogenous lists. Additionally, shapeless supports a mechanism to implicitly convert any case class to a HList. So, if you provide an implicit ByteSequenceRepr for HList, you can use it to implicitly convert any case class to an array of bytes. By having that, you do not need to provide an implicit ByteSequenceRepr for Point case class in the above example. Of course, it is not without cost and because of those intermediate implicit conversions, the memory consumption will be increased. The complete source code of using shapeless can be also found here. I am not going more into details of shapeless but you may get the basic idea by reading this.


Although the DSL is concise and performant, it is not fold friendly. If there is a list containing different objects and the goal is to create a single byte array out of all of them, we have to do it as follows:

val list = List("DSL", 32, true)

val result = list.foldLeft(Container())(_ ~ _) ~|

So it leaks some underlying structures (Container) which is not ideal.


Building a custom DSL should be done carefully especially from the memory management point of view. If you are using different internal DSLs in your server side application, under the load, the sum of the memory usage of them may negatively impact the server performance. Moreover, although we need to keep immutability, sometimes to improve performance, we may use a mutable data structure underneath but that should be encapsulated properly.