Compiling to VampIR through Geb¶
Note
The compilation described is currently not supported by the newest version of Juvix and Geb. Changes are forthcoming.
Before, we discussed the standard normalization approach to compiling Juvix programs to VampIR. An alternative backend is provided by the Geb project currently implemented in Lisp providing a categorical view of the needed translations.
The post will be devoted to discussing the core ideas of Geb, the practical steps to take in order to use the corresponding pipeline, as well as the current limitations the backend faces.
We assume basic familiarity with categorical concepts and VampIR.
Geb Pipeline¶
Geb is a developing language providing a useful interface for specifying categorical constructs and helping define compilers in terms of functors. The approach is inspired by a classical observation that instead of dealing with 'good' compilation procedures one can instead specify functors between appropriate categories representing the languages one initially considers.
In the current state of development, Geb is a fairly simple category in which
one can do "basic arithmetic." It is a category freely spanned by
terminal/initial objects and finite (co)products with additional constructs for
objects representing n
-bit natural numbers. The pipeline itself is a string of
functors between categories:
which are all implemented in Lisp. The morphisms in the final category have a fairly direct interpretation as VampIR programs.
Here, Lambda stands for an extension of STLC with natural numbers types of different bit-width and some arithmetic, while Seq\(\mathbb N\) is a category of finite natural number sequences with morphism $$f: (x_1,\dots,x_n) \to (y_1,\dots,y_m) $$ a VampIR function which takes in \(n\) inputs with \(i\)-th input of bit-length \(x_i\) and spits out an \(m\)-tuple with \(i\)-th outputs of bit-length \(y_i\).
The Practice¶
In this section, we describe all the practical steps one needs to take in order to compile Juvix code to VampIR using the Geb binary.
From Juvix to Lambda¶
The first step to the compilation is getting Juvix code to compile to Lambda. The translation can be done inside of Juvix itself.
Let us create an example using one of the VampIR tests in the codebase. We
create a file NotZero.juvix
with body
--8<-- [start:NotZero]
module NotZero;
import Stdlib.Data.Bool open;
import Stdlib.Data.Nat open;
import Stdlib.Data.Nat.Ord open;
import Stdlib.Trait open;
-- We need to name the module exactly the same name as the file
intToBool (x : Nat) : Bool := not (x == 0);
-- We need to include a function named main
-- for the compilation to succeed
main : Nat -> Bool := intToBool;
--8<-- [end:NotZero]
and we want to compile it to code that is readable by the Lisp implementation of Geb. We run
juvix compile -t geb NotZero.juvix
which generates a NonZero.lisp
file with the following code:
(defpackage #:NotZero
(:shadowing-import-from :geb.lambda.spec #:pair #:right #:left)
(:shadowing-import-from :geb.spec #:case)
(:use #:common-lisp #:geb.lambda.trans #:geb.lambda.main #:geb.lambda.spec #:geb))
(in-package :NonZero)
(defparameter *entry*
(app (lamb (list (fun-type
int
(coprod
so1
so1)))
(lamb (list int)
(case-on
(app (index 1)
(list (index 0)))
(bit-choice 0)
(bit-choice 1))))
(list (lamb (list int)
(case-on
(lamb-eq
(index 0)
(bit-choice 0))
(right
so1
(unit))
(left
so1
(unit)))))))
The file generated will be used by the Geb binary where *entry*
is a
parameter for a term to be compiled. The Lambda code produced is similar to STLC
with certain variations. In the above code, we have:
- Unit type is
so1
- Unique element of the Unit type is
unit
- Coproduct type of
a
andb
iscoprod a b
left
andright
are appropriate sum-type injections- Function type from
a
tob
isfun-type a b
int
a stand-in for a type of 24-bit numberslamb
is lambda abstractionapp
is function applicationcase-on
is sum type inductionlamb-eq
predicate for checking number equalityindex n
is ann
-th DeBruijn indexbit-choice n
is a constructor for numbern
From Lambda to VampIR¶
After generating the file, one needs to go into the Geb repository, make a
binary, creating geb.image
file inside the build
folder and inside it run
./geb.image -i "NonZero.lisp" -e "NonZero:*entry*" -l -p -o "NonZero.pir"
This will generate a NonZero.pir
file in build
with the inserted Lambda code
compiled to VampIR through Geb.
One can remove -o "NonZero.pir"
in order for the terminal to print the
relevant code instead of putting it in a separate file.
In the build directory type
./geb.image -i "NonZero.lisp" -e "NonZero:*entry*" -l -p
Which returns
(def entry x1 x2 = {
(1 - (iszero x1))
};)
Note that the last variable in the function is unused. This will be the case for any compiled function and is due to a compilation side-effect when we go from Lambda to Geb.
For further instructions on the use of the binary, consult the Geb documentation.
Interactive Mode¶
If one is interested in using more features than the direct compilation to VampIR, one may use the Geb codebase interactively.
To do so, load the system as per the Geb
documentation, then go a package preferably using
all the relevant lambda code such as geb.lambda.trans
by typing
CL-USER> (in-package geb.lambda.trans)
Here one can compile the relevant code to several targets. to-circuit
takes a
lambda term in an empty context and a name.
If we want to compile the identity function on natural numbers (lamb (list int)
(index 0))
with name id
we type
TRANS> (to-circuit (lamb (list int) (index 0)) :id)
Moreover, we can compile this not to a circuit but to Geb in a nil context (which we need to specify explicitly), e.g. by typing
TRANS> (to-cat nil (lamb (list int) (index 0)))
or to Seq\(\mathbb N\) in the nil context by
TRANS> (to-seqn (lamb (list int) (index 0)))
For further utility function discription, please consult the relevant documentation.
The Theory¶
This section is devoted to describing the core ideas of how the compilation occurs step-by-step.
From Juvix to Lambda¶
The transformation of Juvix to Lambda is a pretty easy one compared to the normalization approach partially due to the limitations we will discuss later.
The Juvix code gets transformed by
- Transforming type names and rescuing needed type signatures, e.g.
Nat
becomes primitiveInt
- Unrolling recursion
- Lifting out let-statements
- Inlining type synonyms
- Presenting finite indictive datatypes using sum/product type structures
Afterwards, a direct translation to STLC follows.
From Lambda to Geb¶
The compilation of Lambda to Geb is actually a canonical one. Geb is secretly a
category equivalent to FinSet and in particular, is a Cartesian Closed
Category. By a classical result of Lambek-Scott, there is a unique compilation
of STLC to Geb preserving STLC structure. All other datatypes are compiled
primitively: e.g. plus
in Lambda gets compiled to the primitive nat-plus
.
From Geb to Seq\(\mathbb N\)¶
The transformation to the finite-sequence category representing VampIR
operations is a bit tricky, yet the point is that each object in Geb has an
intuitive notion of bit-width. The object of n
-bit numbers has n
-width,
while bool
, the coproduct of two terminal objects, has width 1
. Products
should just append the widths of its parts while coproducts should mark the left
bit entries with 0
while right ones with 1
to differentiate between the two.
Hence all the objects get encoded in this manner with morphisms following a
similar fashion.
From Seq\(\mathbb N\) to VampIR¶
The compilation to VampIR is a straightforward one as the morphisms in Seq\(\mathbb N\) have been chosen in such a way that the category can be directly seen as a model of VampIR. For more detailed info on the compilation, consult the recent report on the pipeline by the Juvix team.
Limitations¶
After describing the Geb backend, we also need to mention current limitations and challenges that the pipeline faces.
Limited Interpretation Power¶
Currently, not all Juvix programs can be interpreted in Geb. As we have mentioned, Geb is just a variation of FinSet, a fairly weak category. We currently do not have support for interpreting any polymorphic type or any infinite datatype.
Moreover, as there are no primitive function types in Geb, we also do not have a way to make internal function definitions via constraints in VampIR the way the normalizer compilation does.
However, these issues are to be addressed in a newer version of Geb, which will significantly enhance the expressive power of the language.
Heap Exhaustion for Functions of Natural Number¶
While Geb seems like a good candidate to do "categorical arithmetic" in and
hence to compile to arithmetic circuits, the fact that the primitive
constructions are very minimal serves as a reason for an exponential blow-up. An
object representing n
-bit natural number nat-wdth n
is analyzed as a 2^n
coproduct of terminal objects. So even if we have primitives for natural
numbers, for curry
and uncurry
operations the functions get analyzed
bit-by-bit, causing a blow-up.
If one compiles complicated functions, even working with 2-bit numbers can cause a heap exhaust.
However, there are forthcoming enhancements to the pipeline. Future optimizations of Lambda will include a "smart" version of beta-reduction allowing for the removal of unnecessary lambda and function application occurences. This will effectively eliminate all blow-up problems having to do with compiling relevant Juvix code.