MEMOIR

This is the developer manual for the MEMOIR compiler infrastructure. As the code repository is actively under development and subject to change so to is this documentation.

This book is organized as follows:

  • MEMOIR : An overview of the MEMOIR intermediate representation.
  • Compiler : An overview of the compiler implementation.
  • API : An overview of the language APIs used to write MEMOIR programs.

Doxygen

The Doxygen is available for detailed documentation of the MEMOIR compiler.

Publications

In addition to this documentation, more information can be found in the CGO'24 paper. If you build upon or use MEMOIR we kindly ask that you cite that paper:

@inproceedings(MEMOIR:McMichen:2024,
    title={Representing Data Collections in an SSA Form},
    author={McMichen, Tommy and Greiner, Nathan and Zhong, Peter and Sossai, Federico and Patel, Atmn and Campanoni, Simone},
    booktitle={International Symposium on Code Generation and Optimization, 2024. CGO 2024.},
    year={2024},
}

Installation

Prerequisites

MEMOIR relies on LLVM 9.0.0 If you do not already have it installed and on your PATH do the following:

curl -sL https://releases.llvm.org/9.0.0/clang+llvm-9.0.0-x86_64-pc-linux-gnu.tar.xz | tar xJ
mv clang+llvm-9.0.0-x86_64-pc-linux-gnu llvm-9.0.0
export PATH=$(realpath ./llvm-9.0.0):$PATH

If you want LLVM 9.0.0 to always be on your PATH, add it to your ~/.bashrc.

MEMOIR

git clone git@github.com:arcana-lab/memoir.git
cd memoir
make

Following installation, MEMOIR will be installed under memoir/install. Run source enable in the memoir/ directory to add the MEMOIR toolchain to your PATH.

Writing a Program

In this section, we will be going over a few example C++ programs using the cmemoir collections API.

Example: Sequences

Let's start with a basic statistics program.

To get started, we need to include the cmemoir headers.

#include <cmemoir/cmemoir.h>

using namespace memoir;

Now, let's make our main function, where we will read the arguments into a MEMOIR sequence.

To start off with, we need to allocate a sequence. Sequences can be allocated with the memoir_allocate_sequence function. The first argument to the function is the type of elements within the sequence and the second is the initial length of the sequence, for example:

#include <cmemoir/cmemoir.h>

using namespace memoir;

int main(int argc, char *argv[]) {
    collection_ref input = memoir_allocate_sequence(memoir_i32_t, argc - 1);
}

Now let's populate the sequence! To write to the sequence, we use the function memoir_index_write(T, v, s, i). T is the shorthand element type, v is the value to write to s[i]. With this, we can update our main function:

#include <cmemoir/cmemoir.h>
#include <stdlib.h>

using namespace memoir;

int main(int argc, char *argv[]) {
    collection_ref input = memoir_allocate_sequence(memoir_i32_t, argc - 1);
    
    for (int i = 0; i < argc - 1; ++i) {
        memoir_index_write(i32, atoi(argv[i+1]), input, i);
    }
}

Let's go ahead and use the sequence for something now! To do that, we can create a new function to compute the sum of the sequence:

int sum(collection_ref input) {
    int32_t result = 0;
    for (int i = 0; i < memoir_size(input); ++i) {
        result += memoir_index_read(i32, input, i);
    }
    return result;
}

In sum we use two new MEMOIR functions: memoir_index_read and memoir_size. memoir_index_read(T, s, i) reads s[i], which has element type T. memoir_size(s) returns the size of the sequence as a size_t.

Now we can call our sum function in main:

#include <stdio.h>
...
int main(int argc, char *argv[]) {
    collection_ref input = memoir_allocate_sequence(memoir_i32_t, argc - 1);
    
    for (int i = 0; i < argc - 1; ++i) {
        memoir_index_write(i32, atoi(argv[i+1]), input, i);
    }
    
    int result = sum(input);
    
    printf("sum = %d\n", result);
    
    return 0;
}

Compiling your program

Compiling C/C++ programs using MEMOIR collections is straightforward. The memoir-clang and memoir-clang++ command-line tools provide similar functionality to clang/clang++. NOTE: These should not be considered drop-in replacements.

To compile the program you just wrote, simply run:

memoir-clang++ my_program.cpp -o my_program

The result of this command is my_program which can be executed, for example, if you run:

./my_program 123 123 321

The following will be printed on your command line:

sum = 567

Congratulations! You have now written and ran your first program in MEMOIR!

Example: Associative Array

Let's extend our statistics program to provide a histogram!

First off, we will need to create a new associative array, which can be done with memoir_allocate_assoc_array(K, V), where K is the key rtype and V is the value type:

collection_ref hist(collection_ref input) {
    collection_ref histogram = memoir_allocate_assoc_array(memoir_i32_t, memoir_u32_t);
}

Following this, we need to populate it with values from the input sequence.

collection_ref hist(collection_ref input) {
    collection_ref histogram = memoir_allocate_assoc_array(memoir_i32_t, memoir_u32_t);
    
    for (int i = 0; i < memoir_size(input); ++i) {
        int32_t value = memoir_index_read(i32, input, i);
        if (memoir_assoc_has(histogram, value)) {
            uint32_t current = memoir_assoc_read(u32, histogram, value);
            memoir_assoc_write(u32, current + 1, histogram, value);
        } else {
            memoir_assoc_insert(histogram, value);
            memoir_assoc_write(u32, 1, histogram, value);
        }
    }
    
    return histogram;
}

Note that we must explicitly insert keys into the associative array before writing to it, it is undefined behavior to write a value to a key which doesn't exist in the associative array.

Now, let's use the new hist function and see how to iterate over it:

int main(...) {
    collection_ref input = ...;
    
    { ... }
    
    collection_ref histogram = hist(input);
    
    collection_ref keys = memoir_assoc_keys(histogram);
    for (int i = 0; i < memoir_size(keys); ++i) {
        int32_t key = memoir_index_read(i32, keys, i);
        uint32_t count = memoir_assoc_read(u32, histogram, key);
        printf("%d -> %ld\n", key, count);
    }
    
    return 0;
}

To iterate over the associative array, we need to collect its keys with the memoir_keys function. memoir_assoc_keys(a) returns a sequence of the keys present in a. Note that their is no guarantee on the order of keys in the resultant.

If we recompile our program:

memoir-clang++ my_program.cpp -o my_program

And run it:

./my_program 123 123 321

You should get the following result:

sum = 567
123 -> 2
321 -> 1

Writing a Pass

In this section, we will be going over a few example MEMOIR passes, including analysis and transformation.

MEMOIR is built atop the LLVM compiler infrastructure, we will assume some knowledge of writing LLVM passes. If you need more information about LLVM you can start with their writing an LLVM pass guide.

Additionally, MEMOIR makes use of NOELLE abstractions, for more information about NOELLE, see their repository.

Example: Counting MEMOIR Instructions

Let's start off with the simplest example, count MEMOIR instructions in a program.

For this example, you will need to include the following header:

#include <memoir/ir/Instructions.hpp>

This header includes the definition of MEMOIR instructions and methods to analyze them within an LLVM pass.

Let's implement our runOnModule method:

bool runOnModule(llvm::Module &M) override {
    // We will keep track of MEMOIR instructions with this:
    uint32_t memoir_inst_count = 0;

    // Analyze the program.
    for (llvm::Function &F : M) {
        for (llvm::BasicBlock &BB : F) {
            for (llvm::Instruction &I : BB) {
                // Try to convert the LLVM Instruction to a MEMOIR Instruction.
                // NOTE: as is equivalent to LLVM's dyn_cast
                llvm::memoir::MemOIRInst *MI = dyn_cast_into<llvm::memoir::MemOIRInst>(&I);
                
                // If @MI is null, then @I is NOT a MEMOIR instruction.
                if (MI == nullptr) {
                    continue;
                }
                
                // Otherwise, @I is a MEMOIR instruction! Let's count it:
                ++memoir_inst_count;
            }
        }
    }
    
    // Print the number of MEMOIR instructions that we found.
    llvm::errs() << memoir_inst_count << "\n";
    
    // We did not modify the program, so we return false.
    return false;
}

The main difference between this pass and your run-of-the-mill LLVM pass is the use of llvm::memoir::MemOIRInst via the as function. dyn_cast_into is identical to LLVM's dyn_cast, but is necessary when converting from an LLVM instruction to a MEMOIR instruction and vice versa. For more information about MEMOIR instructions, see instructions.

Congratulations! You wrote your first MEMOIR pass!

Example: Counting Types of MEMOIR Instructions

So far our pass isn't much to write home about, but it's a great start! Let's kick it up a notch by updating the pass to count the type of MEMOIR instructions in our program.

For this, we can use a visitor. Add the following includes to your program:

#include <memoir/ir/InstVisitor.hpp>
#include <map>

The MEMOIR InstVisitor gives us similar functionality to LLVM's InstVisitor, in fact, anything you can do in an LLVM InstVisitor can be done in MEMOIR's! We can start by definining out visitor class:

class MyVisitor : public llvm::memoir::InstVisitor<MyVisitor, void> {
    // In order for the wrapper to work, we need to declare our parent classes as friends.
    friend class llvm::memoir::InstVisitor<MyVisitor, void>;
    friend class llvm::InstVisitor<MyVisitor, void>;
}

Great! Now that we have our boilerplate, let's fill it in:

class MyVisitor : ... {
    ...
    
public:

    // We will store the results of our analysis here:
    std::map<std::string, uint32_t> instruction_counts;

    // We _always_ need to implement visitInstruction!
    void visitInstruction(llvm::Instruction &I) {
        // Do nothing.
        return;
    }
    
    // Count all access instructions (read, write, get) together:
    void visitAccessInst(llvm::memoir::AccessInst &I) {
        this->instruction_counts["access"]++;
        return;
    }
    
    // Let's do the same for allocation instructions:
    void visitAllocInst(llvm::memoir::AllocInst &I) {
        this->instruction_counts["alloc"]++;
        return;
    }
    
    // Put everything else into an "other" bucket.
    // NOTE: since visitAllocInst and visitAccessInst are 
    //       implemented, visitMemOIRInst will _never_ be
    //       passed an AllocInst nor an AccessInst.
    void visitMemOIRInst(llvm::memoir::MemOIRInst &I) {
        this->instruction_counts["other"]++;
        return;
    }

Now we can use the visitor in our runOnModule pass:

bool runOnModule(llvm::Module &M) override {
    // Initialize our visitor:
    MyVisitor visitor;

    // Analyze the program.
    for (llvm::Function &F : M) {
        for (llvm::BasicBlock &BB : F) {
            for (llvm::Instruction &I : BB) {
                visitor.visit(I);
            }
        }
    }
    
    // Print the results of our visitor:
    for (const auto &[type, count] : visitor.instruction_counts) {
        llvm::errs() << type << " -> " << count << "\n";
    }
    
    // We did not modify the program, so we return false.
    return false;
}

Now this is starting to look like a compiler pass! As an exercise, try adding a couple more counters like InsertInst and RemoveInst!

MEMOIR

A compiler intermediate representation for data collections

This chapter is currently under construction, see the CGO'24 paper for information about MEMOIR.

Compiler

In this chapter we will go over the structure of the MEMOIR compiler infrastructure.

LLVM

MEMOIR is built using LLVM compiler infrastructure. For information about the LLVM IR, see the LLVM Language Reference Manual. For information about how to use the LLVM compiler API, see the LLVM Doxygen.

NOELLE

The MEMOIR compiler infrastructure makes use of NOELLE abstractions and utilities. For information about NOELLE, see the NOELLE Paper. For guides on how to use NOELLE, Simone Campanoni has put together YouTube videos explaining different NOELLE features, they can be found in his Advanced Topics in Compilers materials. Lastly, for detailed information about the NOELLE source code, you can reference the GitHub repository.

Instructions

MEMOIR instructions can be operated on within the compiler with the MemOIRInst class. In this section, we will detail what you are provided with in MemOIRInst and how to obtain them from within the LLVM compiler infrastructure.

Bitcode Representation

All MEMOIR instructions are implemented as quasi-intrinsics, this allows us to extend the LLVM compiler without having to make changes to its internals. There are various benefits and drawbacks to this approach, but that will not be the focus of this section. When inspecting a LLVM+MEMOIR bitcode, you can easily spot MEMOIR instructions by their function prefix memoir__<INSTRUCTION>.

Dynamic Casting

To retrieve a MemOIRInst from an llvm::Instruction, use the into function:

llvm::Instruction *llvm_inst = ...;
MemOIRInst *memoir_inst = into<MemOIRInst>(llvm_inst);

This behaves similarly to LLVM's dyn_cast, attempting to cast the llvm_inst to MemOIRInst, returning NULL on failure. The primary difference between the two is that, while (void *)dyn_cast<T>(i) == (void *)i, (void *)into<T>(i) != (void *)i.

When you have a MemOIRInst the LLVM RTTI functions isa, dyn_cast, cast, etc. can be used to convert between subclasses of MemOIRInst. An overview of the subclasses are below. For more information, see memoir/ir/Instructions.hpp in your install directory or under the top-level compiler/ directory of the repository.

Additional Information

For more information, see the Doxygen.

API

cmemoir

The cmemoir library provides a simple interface for MEMOIR collections. See runtime/api/c/cmemoir.h for more detailed information.