MEMOIR
This is the developer manual for the MEMOIR compiler infrastructure. As the code repository is actively under development and subject to change so to is this documentation.
This book is organized as follows:
- MEMOIR : An overview of the MEMOIR intermediate representation.
- Compiler : An overview of the compiler implementation.
- API : An overview of the language APIs used to write MEMOIR programs.
Doxygen
The Doxygen is available for detailed documentation of the MEMOIR compiler.
Publications
In addition to this documentation, more information can be found in the CGO'24 paper. If you build upon or use MEMOIR we kindly ask that you cite that paper:
@inproceedings(MEMOIR:McMichen:2024,
title={Representing Data Collections in an SSA Form},
author={McMichen, Tommy and Greiner, Nathan and Zhong, Peter and Sossai, Federico and Patel, Atmn and Campanoni, Simone},
booktitle={International Symposium on Code Generation and Optimization, 2024. CGO 2024.},
year={2024},
}
Installation
Prerequisites
MEMOIR relies on LLVM 9.0.0
If you do not already have it installed and on your PATH
do the following:
curl -sL https://releases.llvm.org/9.0.0/clang+llvm-9.0.0-x86_64-pc-linux-gnu.tar.xz | tar xJ
mv clang+llvm-9.0.0-x86_64-pc-linux-gnu llvm-9.0.0
export PATH=$(realpath ./llvm-9.0.0):$PATH
If you want LLVM 9.0.0 to always be on your PATH
, add it to your ~/.bashrc
.
MEMOIR
git clone git@github.com:arcana-lab/memoir.git
cd memoir
make
Following installation, MEMOIR will be installed under memoir/install
.
Run source enable
in the memoir/
directory to add the MEMOIR toolchain to your PATH
.
Writing a Program
In this section, we will be going over a few example C++ programs using the cmemoir
collections API.
Example: Sequences
Let's start with a basic statistics program.
To get started, we need to include the cmemoir
headers.
#include <cmemoir/cmemoir.h>
using namespace memoir;
Now, let's make our main
function, where we will read the arguments into a MEMOIR sequence.
To start off with, we need to allocate a sequence.
Sequences can be allocated with the memoir_allocate_sequence
function.
The first argument to the function is the type of elements within the sequence and the second is the initial length of the sequence, for example:
#include <cmemoir/cmemoir.h>
using namespace memoir;
int main(int argc, char *argv[]) {
collection_ref input = memoir_allocate_sequence(memoir_i32_t, argc - 1);
}
Now let's populate the sequence!
To write to the sequence, we use the function memoir_index_write(T, v, s, i)
.
T
is the shorthand element type, v
is the value to write to s[i]
.
With this, we can update our main
function:
#include <cmemoir/cmemoir.h>
#include <stdlib.h>
using namespace memoir;
int main(int argc, char *argv[]) {
collection_ref input = memoir_allocate_sequence(memoir_i32_t, argc - 1);
for (int i = 0; i < argc - 1; ++i) {
memoir_index_write(i32, atoi(argv[i+1]), input, i);
}
}
Let's go ahead and use the sequence for something now! To do that, we can create a new function to compute the sum of the sequence:
int sum(collection_ref input) {
int32_t result = 0;
for (int i = 0; i < memoir_size(input); ++i) {
result += memoir_index_read(i32, input, i);
}
return result;
}
In sum
we use two new MEMOIR functions: memoir_index_read
and memoir_size
.
memoir_index_read(T, s, i)
reads s[i]
, which has element type T
.
memoir_size(s)
returns the size of the sequence as a size_t
.
Now we can call our sum
function in main
:
#include <stdio.h>
...
int main(int argc, char *argv[]) {
collection_ref input = memoir_allocate_sequence(memoir_i32_t, argc - 1);
for (int i = 0; i < argc - 1; ++i) {
memoir_index_write(i32, atoi(argv[i+1]), input, i);
}
int result = sum(input);
printf("sum = %d\n", result);
return 0;
}
Compiling your program
Compiling C/C++ programs using MEMOIR collections is straightforward.
The memoir-clang
and memoir-clang++
command-line tools provide similar functionality to clang
/clang++
.
NOTE: These should not be considered drop-in replacements.
To compile the program you just wrote, simply run:
memoir-clang++ my_program.cpp -o my_program
The result of this command is my_program
which can be executed, for example, if you run:
./my_program 123 123 321
The following will be printed on your command line:
sum = 567
Congratulations! You have now written and ran your first program in MEMOIR!
Example: Associative Array
Let's extend our statistics program to provide a histogram!
First off, we will need to create a new associative array, which can be done with memoir_allocate_assoc_array(K, V)
, where K
is the key rtype and V
is the value type:
collection_ref hist(collection_ref input) {
collection_ref histogram = memoir_allocate_assoc_array(memoir_i32_t, memoir_u32_t);
}
Following this, we need to populate it with values from the input sequence.
collection_ref hist(collection_ref input) {
collection_ref histogram = memoir_allocate_assoc_array(memoir_i32_t, memoir_u32_t);
for (int i = 0; i < memoir_size(input); ++i) {
int32_t value = memoir_index_read(i32, input, i);
if (memoir_assoc_has(histogram, value)) {
uint32_t current = memoir_assoc_read(u32, histogram, value);
memoir_assoc_write(u32, current + 1, histogram, value);
} else {
memoir_assoc_insert(histogram, value);
memoir_assoc_write(u32, 1, histogram, value);
}
}
return histogram;
}
Note that we must explicitly insert keys into the associative array before writing to it, it is undefined behavior to write a value to a key which doesn't exist in the associative array.
Now, let's use the new hist
function and see how to iterate over it:
int main(...) {
collection_ref input = ...;
{ ... }
collection_ref histogram = hist(input);
collection_ref keys = memoir_assoc_keys(histogram);
for (int i = 0; i < memoir_size(keys); ++i) {
int32_t key = memoir_index_read(i32, keys, i);
uint32_t count = memoir_assoc_read(u32, histogram, key);
printf("%d -> %ld\n", key, count);
}
return 0;
}
To iterate over the associative array, we need to collect its keys with the memoir_keys
function.
memoir_assoc_keys(a)
returns a sequence of the keys present in a
.
Note that their is no guarantee on the order of keys in the resultant.
If we recompile our program:
memoir-clang++ my_program.cpp -o my_program
And run it:
./my_program 123 123 321
You should get the following result:
sum = 567
123 -> 2
321 -> 1
Writing a Pass
In this section, we will be going over a few example MEMOIR passes, including analysis and transformation.
MEMOIR is built atop the LLVM compiler infrastructure, we will assume some knowledge of writing LLVM passes. If you need more information about LLVM you can start with their writing an LLVM pass guide.
Additionally, MEMOIR makes use of NOELLE abstractions, for more information about NOELLE, see their repository.
Example: Counting MEMOIR Instructions
Let's start off with the simplest example, count MEMOIR instructions in a program.
For this example, you will need to include the following header:
#include <memoir/ir/Instructions.hpp>
This header includes the definition of MEMOIR instructions and methods to analyze them within an LLVM pass.
Let's implement our runOnModule
method:
bool runOnModule(llvm::Module &M) override {
// We will keep track of MEMOIR instructions with this:
uint32_t memoir_inst_count = 0;
// Analyze the program.
for (llvm::Function &F : M) {
for (llvm::BasicBlock &BB : F) {
for (llvm::Instruction &I : BB) {
// Try to convert the LLVM Instruction to a MEMOIR Instruction.
// NOTE: as is equivalent to LLVM's dyn_cast
llvm::memoir::MemOIRInst *MI = dyn_cast_into<llvm::memoir::MemOIRInst>(&I);
// If @MI is null, then @I is NOT a MEMOIR instruction.
if (MI == nullptr) {
continue;
}
// Otherwise, @I is a MEMOIR instruction! Let's count it:
++memoir_inst_count;
}
}
}
// Print the number of MEMOIR instructions that we found.
llvm::errs() << memoir_inst_count << "\n";
// We did not modify the program, so we return false.
return false;
}
The main difference between this pass and your run-of-the-mill LLVM pass is the use of llvm::memoir::MemOIRInst
via the as
function.
dyn_cast_into
is identical to LLVM's dyn_cast
, but is necessary when converting from an LLVM instruction to a MEMOIR instruction and vice versa.
For more information about MEMOIR instructions, see instructions.
Congratulations! You wrote your first MEMOIR pass!
Example: Counting Types of MEMOIR Instructions
So far our pass isn't much to write home about, but it's a great start! Let's kick it up a notch by updating the pass to count the type of MEMOIR instructions in our program.
For this, we can use a visitor. Add the following includes to your program:
#include <memoir/ir/InstVisitor.hpp>
#include <map>
The MEMOIR InstVisitor
gives us similar functionality to LLVM's InstVisitor
, in fact, anything you can do in an LLVM InstVisitor
can be done in MEMOIR's!
We can start by definining out visitor class:
class MyVisitor : public llvm::memoir::InstVisitor<MyVisitor, void> {
// In order for the wrapper to work, we need to declare our parent classes as friends.
friend class llvm::memoir::InstVisitor<MyVisitor, void>;
friend class llvm::InstVisitor<MyVisitor, void>;
}
Great! Now that we have our boilerplate, let's fill it in:
class MyVisitor : ... {
...
public:
// We will store the results of our analysis here:
std::map<std::string, uint32_t> instruction_counts;
// We _always_ need to implement visitInstruction!
void visitInstruction(llvm::Instruction &I) {
// Do nothing.
return;
}
// Count all access instructions (read, write, get) together:
void visitAccessInst(llvm::memoir::AccessInst &I) {
this->instruction_counts["access"]++;
return;
}
// Let's do the same for allocation instructions:
void visitAllocInst(llvm::memoir::AllocInst &I) {
this->instruction_counts["alloc"]++;
return;
}
// Put everything else into an "other" bucket.
// NOTE: since visitAllocInst and visitAccessInst are
// implemented, visitMemOIRInst will _never_ be
// passed an AllocInst nor an AccessInst.
void visitMemOIRInst(llvm::memoir::MemOIRInst &I) {
this->instruction_counts["other"]++;
return;
}
Now we can use the visitor in our runOnModule
pass:
bool runOnModule(llvm::Module &M) override {
// Initialize our visitor:
MyVisitor visitor;
// Analyze the program.
for (llvm::Function &F : M) {
for (llvm::BasicBlock &BB : F) {
for (llvm::Instruction &I : BB) {
visitor.visit(I);
}
}
}
// Print the results of our visitor:
for (const auto &[type, count] : visitor.instruction_counts) {
llvm::errs() << type << " -> " << count << "\n";
}
// We did not modify the program, so we return false.
return false;
}
Now this is starting to look like a compiler pass!
As an exercise, try adding a couple more counters like InsertInst
and RemoveInst
!
MEMOIR
A compiler intermediate representation for data collections
This chapter is currently under construction, see the CGO'24 paper for information about MEMOIR.
Compiler
In this chapter we will go over the structure of the MEMOIR compiler infrastructure.
LLVM
MEMOIR is built using LLVM compiler infrastructure. For information about the LLVM IR, see the LLVM Language Reference Manual. For information about how to use the LLVM compiler API, see the LLVM Doxygen.
NOELLE
The MEMOIR compiler infrastructure makes use of NOELLE abstractions and utilities. For information about NOELLE, see the NOELLE Paper. For guides on how to use NOELLE, Simone Campanoni has put together YouTube videos explaining different NOELLE features, they can be found in his Advanced Topics in Compilers materials. Lastly, for detailed information about the NOELLE source code, you can reference the GitHub repository.
Instructions
MEMOIR instructions can be operated on within the compiler with the MemOIRInst
class.
In this section, we will detail what you are provided with in MemOIRInst
and how to obtain them from within the LLVM compiler infrastructure.
Bitcode Representation
All MEMOIR instructions are implemented as quasi-intrinsics, this allows us to extend the LLVM compiler without having to make changes to its internals.
There are various benefits and drawbacks to this approach, but that will not be the focus of this section.
When inspecting a LLVM+MEMOIR bitcode, you can easily spot MEMOIR instructions by their function prefix memoir__<INSTRUCTION>
.
Dynamic Casting
To retrieve a MemOIRInst
from an llvm::Instruction
, use the into
function:
llvm::Instruction *llvm_inst = ...;
MemOIRInst *memoir_inst = into<MemOIRInst>(llvm_inst);
This behaves similarly to LLVM's dyn_cast
, attempting to cast the llvm_inst
to MemOIRInst
, returning NULL
on failure.
The primary difference between the two is that, while (void *)dyn_cast<T>(i) == (void *)i
, (void *)into<T>(i) != (void *)i
.
When you have a MemOIRInst
the LLVM RTTI functions isa
, dyn_cast
, cast
, etc. can be used to convert between subclasses of MemOIRInst
.
An overview of the subclasses are below.
For more information, see memoir/ir/Instructions.hpp
in your install directory or under the top-level compiler/
directory of the repository.
Additional Information
For more information, see the Doxygen.
API
cmemoir
The cmemoir
library provides a simple interface for MEMOIR collections.
See runtime/api/c/cmemoir.h
for more detailed information.