Transforming legacy C code into EM

CoreMark® has emerged as the premier industry benchmark for measuring CPU performance within embedded systems. Managed through EEMBC , virtually every MCU vendor has certified and published CoreMark scores for a broad portfolio of their processors. Running the benchmark code also serves as a "typical workload" used when characterizing active power consumption [ μW / Mhz ] of a particular MCU.

The workload introduced by CoreMark encompasses four algorithms reflecting the variety of software functions often implemented within embedded application programs:


list processing	find and remove elements, generalized sorting
matrix manipulation	add and multiply by a scalar, vector, or matrix
state machine	scan a string for a variety of numeric formats
cyclic redundancy check	checksum over a sequence of 16 / 32-bit values

Besides adding to the workload, CoreMark uses algorithm to validate the final results of running the benchmark program – comparing a checksum over the list elements used in algorithm against an expected value. CoreMark also checksums the matrix data produced by algorithm as well as the state machine transitions encountered by algorithm .

You'll find the CoreMark sources on GitHub, together with instructions for building / running the benchmark program. To ensure the integrity of the benchmark, you cannot modify any of its (portable) C source files – with the exception of core_portme.[ch], used to adapt CoreMark to a particular hardware platform.

Needless to say, your choice of C compiler along with specific options for controlling program optimization remain on the table. While primarily intended for comparing different MCUs, CoreMark also provides a known codebase useful for "apples-to-apples" comparisons between different compilers [GCC, IAR, Keil, LLVM] targeting the same MCU.

CoreMark – a "typical" C program in more ways than one

We sense that very few software practitioners have actually studied the CoreMark source files themselves. As long as "someone else" can actually port / build / run the benchmark on the MCU of interest, good enough !!

In our humble opinion, the CoreMark sources would not serve as the best textbook example of well-crafted C code: insufficent separation of concerns, excessive coupling among compilation units, plus other deficiencies.

Said another way, CoreMark typifies the design / implementation of much of the legacy embedded C code we've encountered for decades within industry and academia alike. But therein lies an opportunity to showcase EM.

CoreMark ⇒ EM•Mark

In reality, none of the official CoreMark sources (written in C) will survive their transformation into EM•Mark – a new codebase (re-)written entirely in EM. At the same time, applying the same CoreMark algorithms to the same input data must yield the same results in EM.

The input data used by EM•Mark (like CoreMark) ultimately derives from a handful of seed variables, statically-initialized with prescribed values. Declared volatile in EM as well as C, the integrity of the benchmark requires that the underlying compiler cannot know the initial values of these seed variables and potentially perform overly-aggressive code optimizations.

At the same time, the CoreMark sources do make use of C preprocessor #define directives to efficiently propogate constants and small (inline) functions during compilation. EM•Mark not only achieves the same effect automatically via whole-program optimization, but also leverages the full power of EM meta-programming to initialize internal data structures at build-time – resulting in a far-more compact program image at run-time.

If necessary, review the material on program configuration and compilation to fully appreciate the opportunities that EM affords for build-time optimization.

High-level design

The EM•Mark sources (found in the em.coremark package within the em.bench bundle) consist of ten EM modules and two EM interfaces, organized as follows:

The ActiveRunnerP and SleepyRunnerP programs on top of this hierarchy both execute the same core benchmark algorithms, albeit in two very different contexts:

ActiveRunnerP performs multiple benchmark iterations, much like the legacy CoreMark program

SleepyRunnerP performs a single benchmark iteration, awakening every second from deep-sleep

The CoreBench module (imported by both of these programs) coordinates both configuration as well as execution of the list processing, matrix manipulation, and state machine algorithms; we'll have more to say about its implementation in a little while.

To capture behavioral commonality between CoreBench and the algorithm modules it uses internally [ ListBench, MatrixBench, StateBench ], our EM•Mark design introduces the abstract em.coremark/BenchAlgI interface:

em.coremark/BenchAlgI.em
package em.coremark

import Utils

interface BenchAlgI

    config memSize: uint16

    function dump()
    function kind(): Utils.Kind
    function print()
    function run(arg: uarg_t = 0): Utils.sum_t
    function setup()

end

Of the handful of functions specified by this interface, two of these play a central role in the implementation of each benchmark algorithm:

BenchAlgI.setup, which initializes the algorithm's input data using volatile seed variables

BenchAlgI.run, which executes one pass of the benchmark algorithm and returns a CRC value

Taking a quick peek inside CoreBench, you'll notice how this module's implementation of the BenchI interface simply delegates to the other algorithm modules – which in turn implement the same interface:

em.coremark/CoreBench.em [exc]
def em$construct()
    Utils.bindSeedH(1, 0x0)
    Utils.bindSeedH(2, 0x0)
    Utils.bindSeedH(3, 0x66)
end

def dump()
    ListBench.dump()
    MatrixBench.dump()
    StateBench.dump()
end

def kind()
    return Utils.Kind.FINAL
end

def print()
    ListBench.print()
    MatrixBench.print()
    StateBench.print()
end

def run(arg)
    auto crc = ListBench.run(1)
    Utils.setCrc(Utils.Kind.FINAL, Crc.add16(<int16>crc, Utils.getCrc(Utils.Kind.FINAL)))
    crc = ListBench.run(-1)
    Utils.setCrc(Utils.Kind.FINAL, Crc.add16(<int16>crc, Utils.getCrc(Utils.Kind.FINAL)))
    Utils.bindCrc(Utils.Kind.LIST, Utils.getCrc(Utils.Kind.FINAL))
    return Utils.getCrc(Utils.Kind.FINAL)
end

def setup()
    ListBench.setup()
    MatrixBench.setup()
    StateBench.setup()
end

CoreBench also uses public get / set functions provided by the Utils module to fetch / store designated CRC and seed values.

more code ahead – free free to scroll down to the Summary

Each of the benchmark algorithms will call the Crc.add16 or Crc.addU32 functions to fold a new data value into a particular checksum. Looking at the implementation of the Crc module, both of these function definitions ultimately call Crc.update – a private function that effectively mimics the crcu8 routine found in the legacy CoreMark source code:

core_util.c
ee_u16
crcu8(ee_u8 data, ee_u16 crc)
{
    ee_u8 i = 0, x16 = 0, carry = 0;

    for (i = 0; i < 8; i++)
    {
        x16 = (ee_u8)((data & 1) ^ ((ee_u8)crc & 1));
        data >>= 1;

        if (x16 == 1)
        {
            crc ^= 0x4002;
            carry = 1;
        }
        else
            carry = 0;
        crc >>= 1;
        if (carry)
            crc |= 0x8000;
        else
            crc &= 0x7fff;
    }
    return crc;
}

Finally, CoreBench defines a pair of config params [ TOTAL_DATA_SIZE, NUM_ALGS ] used to bind the BenchAlgI.memSize parameter associated with the other algorithms; refer to CoreBench.em$configure defined here for further details. Initialized to values tracking the legacy CoreMark code, CoreBench assigns ⌊2000/3⌋ ≡ 666 bytes per algorithm.(1)

We'll have more to say about CoreBench.em$configure after we explore the three benchmark algorithms in more detail.

Matrix manipulation

Pivoting to the simplest of the three benchmark algorithms administered by CoreBench, the MatrixBench module implements each (public) function specified by the BenchAlgI interface; and most of the MatrixBench private functions defined inside the module [ addVal, mulVec, clip, etc ] correspond to legacy C functions / macros found in core_matrix.c .

Internally, MatrixBench operates upon three matrices [ matA, matB, matC ] dimensioned at build-time by the module's em$construct function – which uses the BenchI.memSize parameter (bound previously in CoreBench.em$configure) when calculating a value for dimN:

em.coremark/MatrixBench.em [exc]
module MatrixBench: BenchAlgI

private:

    type matdat_t: int16
    type matres_t: int32

    config dimN: uint8

    var matA: matdat_t[]
    var matB: matdat_t[]
    var matC: matres_t[]

em.coremark/MatrixBench.em [exc]
def em$construct()
    auto i = 0
    auto j = 0
    while j < memSize
        i += 1
        j = i * i * 2 * 4
    end
    dimN = i - 1
    matA.length = matB.length = matC.length = dimN * dimN
end

The MatrixBench.setup function initializes "input" matrices [ matA, matB ] at run-time, using values derived from two of the volatile seed variables prescribed by legacy CoreMark:

em.coremark/MatrixBench.em [exc]
def setup()
    auto s32 = <uint32>Utils.getSeed(1) | (<uint32>Utils.getSeed(2) << 16)
    auto sd = <matdat_t>s32
    sd = 1 if sd == 0
    auto order = <matdat_t>1
    for auto i = 0; i < dimN; i++
        for auto j = 0; j < dimN; j++
            sd = <int16>((order * sd) % 65536)
            auto val = <matdat_t>(sd + order)
            val = clip(val, false)
            matB[i * dimN + j] = val
            val += order
            val = clip(val, true)
            matA[i * dimN + j] = val
            order += 1
        end
    end
end

MatrixBench.run finally executes the benchmark algorithm itself – calling a sequence of private matrix manipulation functions and then returning a checksum that captures intermediate results of these operations:

em.coremark/MatrixBench.em [exc]
def run(arg)
    auto crc = <Crc.sum_t>0
    auto val = <matdat_t>arg
    auto clipval = enlarge(val)
    #
    addVal(val)
    mulVal(val)
    crc = Crc.add16(sumDat(clipval), crc)
    #
    mulVec()
    crc = Crc.add16(sumDat(clipval), crc)
    #
    mulMat()
    crc = Crc.add16(sumDat(clipval), crc)
    #
    mulMatBix()
    crc = Crc.add16(sumDat(clipval), crc)
    #
    addVal(-val)
    return Crc.add16(<int16>crc, Utils.getCrc(Utils.Kind.FINAL))
end

Once again, the [EM] implementations of private functions like addVal and mulMat track their [C] counterparts found in the CoreMark core_matrix.c source file.

State machine

The StateBench module – which also conforms to the BenchAlgI interface – scans an internal array [ memBuf ] for text matching a variety of numeric formats. Similar to what we've seen in MatrixBench, the build-time em$construct function sizes memBuf as well as initializes some private config parameters used as run-time constants:

em.coremark/StateBench.em [exc]
    config intPat: string[4] = [
        "5012", "1234", "-874", "+122"
    ]
    config fltPat: string[4] = [
        "35.54400", ".1234500", "-110.700", "+0.64400"
    ]
    config sciPat: string[4] = [
        "5.500e+3", "-.123e-2", "-87e+832", "+0.6e-12"
    ]
    config errPat: string[4] = [
        "T0.3e-1F", "-T.T++Tq", "1T3.4e4z", "34.0e-T^"
    ]

    config intPatLen: uint16
    config fltPatLen: uint16
    config sciPatLen: uint16
    config errPatLen: uint16

    var memBuf: char[]

em.coremark/StateBench.em [exc]
def em$construct()
    memBuf.length = memSize
    intPatLen = intPat[0].length
    fltPatLen = fltPat[0].length
    sciPatLen = sciPat[0].length
    errPatLen = errPat[0].length
end

The StateBench.setup function uses the xxxPat and xxxPatLen config parameters in combination with a local seed variable to initializing the memBuf characters at run-time:

em.coremark/StateBench.em [exc]
def setup()
    auto seed = Utils.getSeed(1)
    auto p = &memBuf[0]
    auto total = 0
    auto pat = ""
    auto plen = 0
    while (total + plen + 1) < (memSize - 1)
        if plen
            for auto i = 0; i < plen; i++
                *p++ = pat[i]
            end
            *p++ = ','
            total += plen + 1
        end
        switch ++seed & 0x7
        case 0
        case 1
        case 2
            pat  = intPat[(seed >> 3) & 0x3]
            plen = intPatLen
            break
        case 3
        case 4
            pat  = fltPat[(seed >> 3) & 0x3]
            plen = fltPatLen
            break
        case 5
        case 6
            pat  = sciPat[(seed >> 3) & 0x3]
            plen = sciPatLen
            break
        case 7
            pat  = errPat[(seed >> 3) & 0x3]
            plen = errPatLen
            break
        end
    end
end

Details aside, StateBench.run calls a private scan function which in turn drives the algorithm's state machine; run also calls a private scramble function to "corrupt" memBuf contents ahead of the next scanning cycle:

em.coremark/StateBench.em [exc]
def run(arg)
    arg = 0x22 if arg < 0x22
    var finalCnt: uint32[NUM_STATES]
    var transCnt: uint32[NUM_STATES]
    for auto i = 0; i < NUM_STATES; i++
        finalCnt[i] = transCnt[i] = 0
    end
    scan(finalCnt, transCnt)
    scramble(Utils.getSeed(1), arg)
    scan(finalCnt, transCnt)
    scramble(Utils.getSeed(2), arg)
    auto crc = Utils.getCrc(Utils.Kind.FINAL)
    for auto i = 0; i < NUM_STATES; i++
        crc = Crc.addU32(finalCnt[i], crc)
        crc = Crc.addU32(transCnt[i], crc)
    end
    return crc
end

def scan(finalCnt, transCnt)
    for auto str = &memBuf[0]; *str;
        auto state = nextState(&str, transCnt)
        finalCnt[ord(state)] += 1
    end
end

def scramble(seed, step)
    for auto str = &memBuf[0]; str < &memBuf[memSize]; str += <uint16>step
        *str ^= <uint8>seed if *str != ','
    end
end

The crc returned by StateBench.run effectively summarizes the number of transitory and finals states encountered when scanning.

even more code ahead – free free to scroll down to the Summary

List processing

Unlike its peer benchmark algorithms, the ListBench module introduces some new design elements into the EM•Mark hierarchy depicted earlier:

the ComparatorI abstraction, used by ListBench to generalize its internal implementation of list sorting through a function-valued parameter that compares element values

the ValComparator module, an implementation of ComparatorI which invokes the other benchmark algorithms (through a proxy) in a data-dependent fashion

The ComparatorI interface names just a single function [ compare ] ; the ListBench module in turn specifies the signature of this function through a public type [ Comparator ] : (1)

a design-pattern similar to a Java @FunctionalInterface annotation or a C# delegate object

em.coremark/ComparatorI.em
package em.coremark

import ListBench

interface ComparatorI

    function compare: ListBench.Comparator

end

em.coremark/ListBench.em [exc]
module ListBench: BenchAlgI

    type Data: struct
        val: int16
        idx: int16
    end

    type Comparator: function(a: Data&, b: Data&): int32

    config idxCompare: Comparator
    config valCompare: Comparator

CoreBench.em$configure (which we'll examine shortly) performs build-time binding of conformant Comparator functions to the pair of ListBench config parameters declared above. But first, let's look at some private declarations within the ListBench module:

em.coremark/ListBench.em [exc]
private:

    type Elem: struct
        next: Elem&
        data: Data&
    end

    function find(list: Elem&, data: Data&): Elem&
    function pr(list: Elem&, name: string)
    function remove(item: Elem&): Elem&
    function reverse(list: Elem&): Elem&
    function sort(list: Elem&, cmp: Comparator): Elem&
    function unremove(removed: Elem&, modified: Elem&)

    config maxElems: uint16

    var curHead: Elem&

end

The Elem struct supports the conventional representation of a singly-linked list, with the ListBench private functions manipulating references to objects of this type. The maxElems parameter effectively sizes the pool of Elem objects, while the curHead variable references a particular Elem object that presently anchors the list.

Similar to the other BenchAlgI modules we've seen, ListBench cannot fully initialize its internal data structures until setup fetches a volatile seed at run-time. Nevertheless, we still can perform a sizeable amount of build-time initialization within em$construct:

em.coremark/ListBench.em [exc]
def em$construct()
    auto itemSize = 16 + sizeof<Data>
    maxElems = Math.round(memSize / itemSize) - 3
    curHead = new<Elem>
    curHead.data = new<Data>
    auto p = curHead
    for auto i = 0; i < maxElems - 1; i++
        auto q = p.next = new<Elem>
        q.data = new<Data>
        p = q
    end
    p.data = new<Data>
    p.next = null
end

Like all EM config params, maxElems behaves like a var at build-time but like a const at run-time; and the value assigned by em$construct will itself depend on other build-time parameters and variables [ itemSize, memSize ]. In theory, initialization of maxElem could have occurred at run-time – and with EM code that looks virtually identical to what we see here.

But by executing this EM code at build-time , we'll enjoy higher-levels of performance at run-time .

Taking this facet of EM one step further,(1)em$construct "wires up" a singly-linked chain of newly allocated / initialized Elem objects anchored by the curHead variable – a programming idiom you've learned in Data Structures 101 . Notice how each Elem.data field similarly references a newly-allocated (but uninitialized ) Data object.

that the EM language serves as its own meta-language

Turning now to ListBench.setup, the pseudo-random values assigned to each element's e.data.val and e.data.idx fields originate with one of the volatile seed variables prescribed by CoreMark. Before returning, the private sort function (which we'll visit shortly) re-orders the list elements by comparing their e.data.idx fields:

em.coremark/ListBench.em [exc]
def setup()
    auto seed = Utils.getSeed(1)
    auto ki = 1
    auto kd = maxElems - 3
    auto e = curHead
    e.data.idx = 0
    e.data.val = 0x8080
    for e = e.next; e.next; e = e.next
        auto pat = <uint16>(seed ^ kd) & 0xf
        auto dat = (pat << 3) | (kd & 0x7)
        e.data.val = <int16>((dat << 8) | dat)
        kd -= 1
        if ki < (maxElems / 5)
            e.data.idx = ki++
        else
            pat = <uint16>(seed ^ ki++)
            e.data.idx = <int16>(0x3fff & (((ki & 0x7) << 8) | pat))
        end
    end
    e.data.idx = 0x7fff
    e.data.val = 0xffff
    %%[c+]
    %%[c]

Finally, the following implementation of ListBench.run calls many private functions [ find, remove, reverse, … ] to continually rearrange the list elements; ListBench.run also uses another volatile seed as well as calls sort with two different Comparator functions:

em.coremark/ListBench.em [exc]
 def run(arg)
    auto list = curHead
    auto finderIdx = <int16>arg
    auto findCnt = Utils.getSeed(3)
    auto found = <uint16>0
    auto missed = <uint16>0
    auto retval = <Crc.sum_t>0
    var data: Data
    data.idx = finderIdx
    for auto i = 0; i < findCnt; i++
        data.val = <int16>(i & 0xff)
        auto elem = find(list, data)
        list = reverse(list)
        if elem == null
            missed += 1
            retval += <uint16>(list.next.data.val >> 8) & 0x1
        else
            found += 1
            if <uint16>elem.data.val & 0x1
                retval += (<uint16>(elem.data.val >> 9)) & 0x1
            end
            if elem.next != null
                auto tmp = elem.next
                elem.next = tmp.next
                tmp.next = list.next
                list.next = tmp
            end
        end
        data.idx += 1 if data.idx >= 0
    end
    retval += found * 4 - missed
    list = sort(list, valCompare) if finderIdx > 0
    auto remover = remove(list.next)
    auto finder = find(list, &data)
    finder = list.next if !finder
    while finder
        retval = Crc.add16(list.data.val, retval)
        finder = finder.next
    end
    unremove(remover, list.next)
    list = sort(list, idxCompare)
    for auto e = list.next; e; e = e.next
        retval = Crc.add16(list.data.val, retval)
    end
    return retval
end

Refer to ListBench for the definitions of the internal functions called by ListBench.run .

Generalized sorting

As already illustrated, the ListBench.sort accepts a cmp argument of type Comparator – invoked when merging Data objects from a pair of sorted sub-lists: (1)

The implementation seen here (including the inline comments) mimics the core_list_mergesort function found in the legacy core_list_join.c source file.

em.coremark/ListBench.em [exc]
    %%[c-]
end

def sort(list, cmp)
    auto insize = <int32>1
    var q: Elem&
    var e: Elem&
    for ;;
        auto p = list
        auto tail = list = null
        auto nmerges = <int32>0  # count number of merges we do in this pass
        while p
            nmerges++  # there exists a merge to be done
            # step `insize' places along from p
            q = p
            auto psize = 0
            for auto i = 0; i < insize; i++
                psize++
                q = q.next
                break if !q
            end
            # if q hasn't fallen off end, we have two lists to merge
            auto qsize = insize
            # now we have two lists; merge them
            while psize > 0 || (qsize > 0 && q)
                # decide whether next element of merge comes from p or q
                if psize == 0
                    # p is empty; e must come from q
                    e = q
                    q = q.next
                    qsize--
                elif qsize == 0 || !q
                    # q is empty; e must come from p.
                    e = p
                    p = p.next
                    psize--
                elif cmp(p.data, q.data) <= 0
                    # First element of p is lower (or same); e must come from p.
                    e = p
                    p = p.next
                    psize--
                else
                    # First element of q is lower; e must come from q.
                    e = q
                    q = q.next
                    qsize--
                end
                # add the next element to the merged list
                if tail
                    tail.next = e
                else
                    list = e
                end
                tail = e
            end
            # now p has stepped `insize' places along, and q has too
            p = q
        end
        tail.next = null
        # If we have done only one merge, we're finished
        break if nmerges <= 1  # allow for nmerges==0, the empty list case
        # Otherwise repeat, merging lists twice the size
        insize *= 2

Looking first at the IdxComparator module, you couldn't imagine a simpler implementation of its ComparatorI.compare function – which returns the signed difference of the idx fields after scrambling the val fields:

em.coremark/IdxComparator.em [exc]
module IdxComparator: ComparatorI

end

def compare(a, b)
    a.val = <int16>((<uint16>a.val & 0xff00) | (0x00ff & <uint16>(a.val >> 8)))
    b.val = <int16>((<uint16>b.val & 0xff00) | (0x00ff & <uint16>(b.val >> 8)))
    return a.idx - b.idx
end

Turning now to the ValComparator module, you couldn't imagine a more convoluted implementation of ComparatorI.compare – which returns the signed difference of values computed by the private calc function: (1)

the twin of calc_func found in the legacy core_list_join.c source file

em.coremark/ValComparator.em [exc]
module ValComparator: ComparatorI

    proxy Bench0: BenchAlgI
    proxy Bench1: BenchAlgI

private:

    function calc(pval: int16*): int16

end

def calc(pval)
    auto val = <uint16>*pval
    auto optype = <uint8>(val >> 7) & 1
    return <int16>(val & 0x007f) if optype
    auto flag = val & 0x7
    auto vtype = (val >> 3) & 0xf
    vtype |= vtype << 4
    var ret: uint16
    switch flag
    case 0
        ret = Bench0.run(<uarg_t>vtype)
        Utils.bindCrc(Bench0.kind(), ret)
        break
    case 1
        ret = Bench1.run(<uarg_t>vtype)
        Utils.bindCrc(Bench1.kind(), ret)
        break
    default
        ret = val
        break
    end
    auto newcrc = Crc.add16(<int16>ret, Utils.getCrc(Utils.Kind.FINAL))
    Utils.setCrc(Utils.Kind.FINAL, Crc.add16(<int16>ret, Utils.getCrc(Utils.Kind.FINAL)))
    ret &= 0x007f
    *pval = <int16>((val & 0xff00) | 0x0080 | ret)   ## cache the result
    return <int16>ret
end

def compare(a, b)
    auto val1 = calc(&a.val)
    auto val2 = calc(&b.val)
    return val1 - val2
end

Besides scrambling the contents of a val field reference passed as its argument, calc actually runs other benchmark algorithms via a pair of BenchAlgI proxies [ Bench0, Bench1 ] .

Benchmark configuration

Having visited most of the individual modules found in the EM•Mark design hierarchy, let's return to CoreBench and review its build-time configuration functions:

em.coremark/CoreBench.em [exc]
module CoreBench: BenchAlgI

    config TOTAL_DATA_SIZE: uint16 = 2000
    config NUM_ALGS: uint8 = 3

end

def em$configure()
    memSize = Math.floor(TOTAL_DATA_SIZE / NUM_ALGS)
    ListBench.idxCompare ?= IdxComparator.compare
    ListBench.valCompare ?= ValComparator.compare
    ListBench.memSize ?= memSize
    MatrixBench.memSize ?= memSize
    StateBench.memSize ?= memSize
    ValComparator.Bench0 ?= StateBench
    ValComparator.Bench1 ?= MatrixBench
end

def em$construct()
    Utils.bindSeedH(1, 0x0)
    Utils.bindSeedH(2, 0x0)
    Utils.bindSeedH(3, 0x66)
end

In addition to calculating and assigning the memSize config parameter for each of the benchmarks, CoreBench.em$configure binds a pair of Comparator functions to ListBench as well as binds the StateBench and MatrixBench modules to the ValComparator proxies.

CoreBench.em$construct completes build-time configuration by binding a prescribed set of values to the volatile seed variables accessed at run-time by the individual benchmarks.

Summary and next steps

Whether you've arrived here by studying (or skipping !!) all of that EM code, let's summarize some key takeaways from the exercise of transforming CoreMark into EM•Mark :

The CoreMark source code – written in C with "plenty of room for improvement" – typifies much of the legacy software targeting resource-constrained MCUs.

The high-level design of EM•Mark (depicted here) showcases many aspects of the EM langage – separation of concerns, client-supplier decoupling, build-time configuration, etc.

The ActiveRunnerP and SleepyRunnerP programs can run on any MCU for which an em$distro package exists – making EM•Mark ideal for benchmarking MCU performance.

Besides embodying a higher-level of programming, EM•Mark also outperforms legacy CoreMark.

To prove our claim about programming in EM, let's move on to the EM•Mark results and allow the numbers to speak for themselves.