Building Canrun: A statically typed logic programming library for Rust
Update 2 Jan, 2023: Many aspects of this library have been redesigned, especially relating to the domain stuff: Simplifying a toy logic programming library
Canrun is a new logic programming library for Rust with static types and constraints.
In my initial post I mentioned going through a few fundamental revisions before settling on the current approach. Here I'll try to do a quick recap for two reasons: 1) I think it's neat, and 2) with luck I'll snag the attention of someone with deeper experience and a willingness to share some tips.
My first successful attempt was actually based on an article by Tom Stuart titled "Hello, declarative world"I'm sorry to say that despite The Reasoned Schemer's status as the classic book on miniKanren, it never really clicked for me. Similarly, most of the other resources I've found on were either too deep or too toylike. This is not a critism or complaint! Logic programming is a niche, and I'm very grateful to those who have shared their work., which described a way to implement μKanren in RubyI actually used TypeScript for some of the very earliest prototyping due to its familiarity and relative malleability. So I learned about a logic programming approach that originated in Lisp from a Rubyist, started coding in TypeScript and finally built the thing in Rust. 'cause why not?. I recommend reading that if you find yourself confused about any of the "why" I gloss over here.
Starting Out
The core moving part is the State
. In its most basic form, it consists of a mapping between "logic variables" (we'll call them LVar
) and "logic values" (or Val<T>
). Note that a Val<T>
can contain an LVar
or a resolved T
.
Note: These code samples are heavily abridged. Many aspects including lifetime annotations have been elided for the sake of clarity.
pub struct State<T> {
values: HashMap<LVar, Val<T>>,
}
pub enum Val<T> {
Var(LVar),
Resolved(Rc<T>),
}
The public API essentially provides a way to "attempt" the addition of new bindings (through unification) and resolve existing bindings.
impl State<T> {
pub fn unify(&self, a: &Val<T>, b: &Val<T>) -> StateIter<T>;
pub fn resolve(&self, val: &LVar) -> Option<Val<T>>;
}
A key characteristic of this model is that .unify(...)
returns a new State
instead of mutating Self
.
The return type of .unify(...)
is an alias:
pub type StateIter<T> = Box<dyn Iterator<Item = State<T>>>;
A Goal
is a struct that simply returns an Iterator
of potential states (yes, I said "simply returns an Iterator
"... what's so funny?).
pub enum Goal<T> {
Unify {a: Val<T>, b: Val<T>},
Either { a: Box<Goal<T>>, b: Box<Goal<T>> },
// ... truncated
}
impl Goal<T> {
fn run<T: CanT>(&self, state: &State<T>) -> StateIter<T> {
match self {
Unify {a, b} => {
Box::new(once(state.unify(a, b)))
},
Either {a, b} => {
let a_iter = a.run(state);
let b_iter = b.run(state);
Box::new(a_iter.interleave(b_iter))
}
// ... truncated
}
}
}
These goals can be combined arbitrarily to create more complicated goals.
let goal = Either {
a: Unify { a: x, b: 1},
b: Both {
a: Unify { a: x, b: 2 },
a: Unify { a: y, b: 1 },
},
};
A goal can return as many new states as needed. It worked! Well, eventually. After a small war of attrition against the borrow checker, I finally managed get the streams flowing. Many clones were sacrificed.
With a few helper functions and some trait driven type coercion, the API wasn't too bad!
let state = State::new();
let goal: Goal<i32> = either(
unify(x, 1),
both(unify(x, 2 ), unify(y, 1 )),
);
goal.run(state) // <- returns an iterator of potential states
Smooth Sailing?
I pushed forward, implementing the other types of goals and learning more about ownership, how and when to use enums vs trait objects and more. But I couldn't shake the feeling that I was fighting more than just the borrow checker.
On top of excessive cloning, I was also struggling with a lack of type safety. My initial plan was for the T
type parameter to be a user defined enum.
enum MyType {
Number(i32),
Word(String),
}
// This compiles, but can never succeed!
let goal: Goal<MyType> = unify(
MyType::Number(42),
MyType::Word("42"),
);
Essentially, all relations in the "logic world" were dynamically typed.
I needed a new approach.
Rust is Different
An interesting thing about Rust is that it pulls so much from functional programming, but trying to use a "pure immutable" style as in a typical garbage collected language can get really uncomfortable. This sort of pain is usually a sign that something needs to change.
My new mutable State
adds a .fork(...)
operation and tweaks .unify(...)
:
impl State<T> {
pub fn unify(mut self, a: &Val<T>, b: &Val<T>) -> Option<Self>;
pub fn fork(mut self, fork: Rc<dyn Fork<T>>) -> Option<Self>;
}
pub trait Fork<T> {
fn run(&self, state: State<T>) -> StateIter<T>;
}
Note that both functions take a mut self
and return an Option<Self>
. The .unify(...)
function eagerly attempts to reconcile bindings with those already contained in the state. If unification fails, the entire state is now invalid. We can bail right away with None
and avoid processing any additional updates. Used with the ?
try operator, this can actually be quite smooth.
State::new()
.unify(x, 1)? // <- returns Some(state)
.unify(x, 2)? // <- returns None
The simplest Goal
that would use Fork
is Either
, where you'll get zero or more results states for both sides:
let goal = either(
unify(x, 1),
unify(x, 2),
);
By deferring evaluation of the Fork
objects, we do as much work to disprove as many goals as possible before we eventually split into an arbitrary number of result states.
The Fork
trait's .run(...)
is not invoked until we query for results. At that point, we recurse our way through the list of Fork
items as a queue, branching out at each iteration.
fn iter_forks(mut self) -> StateIter<'a, D> {
let fork = self.forks.pop_front();
match fork {
None => Box::new(once(self)),
Some(fork) => Box::new(fork.run(self).flat_map(State::iter_forks)),
}
}
Note that iter_forks(..)
does not clone the State
. The struct that implements Fork
may clone the State
at the last moment, but only if actually needed.
In a nutshell:
.unify(...)
eagerly converges, constraining theState
into zero or one possible outcomes..fork(...)
lazily diverges, splitting aState
into zero or more alternate possibilities.
Typing the Domain
Statically typed logic programming seems to be a small niche within the niche that is regular logic programming.The most substantial prior art for statically typed logic programming I found was Mercury and OCanren. As is typical, I did not spend nearly enough time trying to glean insight before I set out on my own.
After a walking down a few dead ends, I settled on an approached based on procedural macros.
domain! {
pub MyDomain {
i32,
String
}
}
The domain!
macro can create a struct with associated functions that enable it to store and retrieve collections of values compatible with a user defined domain by typeThis is not quite the same as something like AnyMap, which depends on types having a 'static
lifetime for reasons I don't claim to fully understand.. The generated struct looks (roughly) like this:
pub struct MyDomain {
t0: HashMap<LVar<i32>, Val<i32>>,
t1: HashMap<LVar<String>, Val<String>>,
}
With pseudo-privateMacro generated code is odd. Since it exists inside the the user's module, normal visibility tools do not really work for the macro author. My actual implementation of the domain!
macro has some additional indirection that has a secondary effect of providing a bit of privacy. Ultimately I just had to slap #[doc(hidden)
all over it and ask nicely to be left alone.] accessors that can be used to retreive the inner containers:
impl<'a> DomainType<'a, i32> for MyDomain {
fn values_as_ref(&self) -> &HashMap<LVar<i32>, Val<i32>> {
&self.t0
}
fn values_as_mut(&mut self) -> &mut HashMap<LVar<i32>, Val<i32>> {
&mut self.t0
}
}
The State
struct is now parameterized with a Domain
type, to which it delegates management of the individual Val<T>
containers (some of this is actually spread into other traits/impls, but this is essentially what happens):
impl <D: Domain> State<D> {
pub fn resolve_val<T>(&self, val: &Val<T>) -> &Val<T>
where
D: DomainType<T>,
{
match val {
Val::Var(var) => {
let resolved = self.domain.values_as_ref().get(var);
match resolved {
// We found another Var, try to resolve deeper
Some(found) => self.resolve_val(found),
// We didn't find a binding, return the Var
None => val,
}
}
// This isn't a Var, just return it
value => value,
}
}
}
Goals are also parameterized by Domain
:
// Compiles!
let goal = Goal<MyDomain> = unify(x, 1);
// Does not compile :)
let goal = Goal<MyDomain> = unify(x, vec![1, 2]);
While the macro approach feels a bit idiosyncratic, it does have a few really nice properties. Most notably: explicitly defining all of the types that should be valid in the logic domain greatly increases the helpfulness of compiler errors.
Takeaways
- In Rust (as elsewhere), pain is an indication that you may want to reconsider your approach.
- Studying prior art is important, but take its original context into account to avoid imitating badly. A different context might require different choices.
- David Tolnay's quote and syn libraries make my rather raw proc macro way nicer than it has any business being.
Onward!
As time allows, I'll dig into constraints, the UnifyIn
/Query
/ReifyIn
traits, collections and more!