Higher-kinded polymorphism — the abstraction over a type
constructor to be later supplied with arguments — is often needed,
for expressing generic operations over collections or embedding typed
DSLs, particularly in tagless-final style. Typically, the abstracted type
constructor is not arbitrary, but must implement a particular
interface (e.g., an abstract sequence) — so-called bounded
polymorphism.
OCaml does not support higher-kinded polymorphism directly: OCaml type
variables range over types rather than type constructors, and type
constructors may not appear in type expressions without being applied
to the right number of arguments. Nevertheless, higher-kinded
polymorphism is expressible in OCaml — in fact, in several, more or
less cumbersome ways. The less cumbersome ways are particularly less
known, and kept being rediscovered. This page summarizes the different
ways of expressing, and occasionally avoiding, higher-kinded
polymorphism. They are collected from academic papers and messages on
the caml-list spread over the years — and adjusted to fit the story
and differently explained.
- Introduction
- Why higher-kinded polymorphism is not supported directly in OCaml
- Higher-kinded functions as Functors
- Yallop and White’s Lightweight higher-kinded polymorphism
- Sidestepping higher-kinded polymorphism
- Algebras
- What’s in a higher-kinded type name
- Conclusions
Introduction
- “Polymorphism abstracts types, just as functions abstract
values. Higher-kinded polymorphism takes things a step further,
abstracting both types and types constructors, just as higher-order
functions abstract both first-order values and functions.” — write
Yallop and White (FLOPS 2014).This remarkably concise summary is worth expounding upon, to
demonstrate how (bounded) higher-kinded polymorphism tends to
arise. The example introduced here is used all throughout the page.Summing up numbers frequently occurs in practice; abstracting from
concrete numbers leads to a function — an operation that can be
uniformly performed on any collection (list) of numbers:let rec sumi : int list -> int = function [] -> 0 | h::t -> h + sumi t
We may further abstract over
0
and the operation
+
, which itself is a function (a parameterized value, so to
speak). The result is a higher-order function:let rec foldi (f: int->int->int) (z: int) : int list -> int = function [] -> z | h::t -> f h (foldi f z t)
Folding over a list, say, of floating-point numbers proceeds similarly, so we
may abstract yet again — this time not over values but over the type
int
, replacing it with a type variable:let rec fold (f: 'a->'a->'a) (z: 'a) : 'a list -> 'a = function [] -> z | h::t -> f h (fold f z t)
thus giving us the polymorphic function: the function that describes an
operation performed over lists of various types, uniformly.
The operationf
and the valuez
can be collected into a
parameterized recordtype 'a monoid = {op: 'a->'a->'a; unit: 'a}
The earlier
fold
then takes the formlet rec foldm (m: 'a monoid) : 'a list -> 'a = function [] -> m.unit | h::t -> m.op h (foldm m t)
When using
foldm
on a concrete list of the typet list
,
the type variable'a
gets
instantiated to the typet
of the elements of this list. The type is not
completely arbitrary, however: there must exist the value
t monoid
, to be passed tofoldm
as the argument. We say the type
t
must (at least) implement/support the'a monoid
interface; thet monoid
value is then the witness thatt
indeed does so. Hence the
polymorphism infoldm
is bounded.Exercise: if
'a monoid
really describes a monoid,
op x unit = x
holds. Write a more optimal version offoldm
(and
its subsequent variants) taking advantage of this identity.A file, a string, an array, a sequence — all can be folded over in the
same way. Any collection is foldable so long as it supports the
deconstruction operation, which tells if the collection is empty, or
gives its element and the rest of the sequence. One is tempted to
abstract again — this time not over a mere type likeint
orint list
, but over a type constructor such aslist
, and introducetype ('a,'F) seq = {decon: 'a 'F -> ('a * 'a 'F) option}
This is a hypothetical OCaml: the type variable ‘F (with the
upper-case name) is to be instantiated not with types but one-argument
type constructors: technically, one says it has the higher-kind
* -> *
rather than the ordinary kind*
of types and ordinary type variables
such as'a
. The recordseq
is, hence, higher-kind polymorphic. The
functionfoldm
then generalizes tolet rec folds (m: 'a monoid) (s: ('a,'F) seq) : 'a 'F -> 'a = fun c -> match s.decon c with None -> m.unit | Some (h,t) -> m.op h (folds m s t)
Again,
'F
is instantiated not with just any type constructor, but only that
for which we can find the value('a,'F) seq
; thusfolds
exhibits
bounded higher-kinded polymorphism.Alas, higher-kind type variables are not possible in OCaml. The
next section explains why. The following sections tell what we can do in
OCaml instead. There are several alternatives. In some, the end result
ends up looking almost exactly as the above imagined
higher-kind–polymorphic code.
Why higher-kinded polymorphism is not supported directly in OCaml
- Higher-kind type variables are not supported in OCaml. Yallop and
White’s FLOPS 2014 paper (Sec. 1.1) explains why: in a word, type
aliasing. For completeness, we recount their explanation here, with
modifications.Consider the following two modules:
module Tree = struct type 'a t = Leaf | Branch of 'a t * 'a * 'a t end module TreeA : dcont = struct type 'a t = ('a * 'a) Tree.t end
Here,
'a Tree.t
is a data type: a fresh type, distinct from
all other existing types. On the other hand,'a TreeA.t
is an alias:
as its declaration says, it is equal to an existing type, viz.('a * 'a) Tree.t
.Suppose OCaml had higher-kind
* -> *
type variables, such as'F
hypothesized in the previous section. Type checking is, in the end,
solving/checking type equalities, such as'a 'F = 'b 'G
. If
higher-kind type variables ranged only over data type constructors,
the solution is easy:'a = 'b
and'F = 'G
: a data type is fresh, hence
equal only to itself. This is the situation in Haskell. To ensure that
only data type constructors can be substituted for higher-kind type
variables, a Haskell compiler keeps track of type aliases, even across
module boundaries. Module system in Haskell is rather simple, so such
tracking is unproblematic.Module system of ML is, in contrast, sophisticated. It has functors,
signatures, etc., and extensively relies on type aliases, for example:module F(T: sig type 'a t val empty: 'a t end) = struct type 'a ft = 'a T.t end
If we preclude substitution of type aliases for higher-kind type
variables, we severely restrict expressiveness. For example,'a ft
above is a type alias; henceF(TRee).ft
cannot be substituted for a
higher-kind type variable, even though one may feelF(TRee).ft
is
the same asTree.t
, which is substitutable.On the other hand, if we allow type aliases to be substituted for
higher-kind type variables, the equivalence of'a 'F = 'b 'G
and'a = 'b, 'F = 'G
breaks down. Indeed, consider(int*int) 'F = int 'G
. This equation now has the solution:'F = Tree.t
and'G = TreeA.t
. Parameterized type aliases like'a TreeA.t
are type
functions, and type expressions likeint TreeA.t
are applications of
those functions, expanding to the right-hand-side of the alias
declaration with'a
substituted forint
. Thus, with type aliases,
the type equality problem becomes the higher-order unification problem,
which is not decidable. - References
- Jeremy Yallop and Leo White: Lightweight higher-kinded
polymorphism. FLOPS 2014.
Higher-kinded functions as Functors
- Although OCaml does not support higher-kind type variables, higher-kinded
polymorphism is not out of the question. There are other ways of
parameterizing by a type constructor: the module system (functor)
abstraction is the first to come to mind.
It is however rather verbose and cumbersome. Let us see.We now re-write the hypothetical higher-kind–polymorphic OCaml code
at the end of [Introduction] in the real OCaml — by raising the level, so to
speak, from term-level to module-level. The hypothetical recordtype ('a,'F) seq = {decon: 'a 'F -> ('a * 'a 'F) option}
becomes the module signature
module type seq_i = sig type 'a t (* sequence type *) val decon : 'a t -> ('a * 'a t) option end
which represents the higher-kind type variable
'F
, not supported in
OCaml, with an ordinary type constructort
(type
constant). Different implementations ofseq_i
(see, e.g.,ListS
below) instantiate'a t
in their own ways; hence
t
does in effect act like a variable. The hypothetical
higher-kind–polymorphic functionlet rec folds (m: 'a monoid) (s: ('a,'F) seq) : 'a 'F -> 'a = fun c -> match s.decon c with None -> m.unit | Some (h,t) -> m.op h (folds m s t)
becomes the functor, parameterized by the
seq_i
signature:module FoldS(S:seq_i) = struct let rec fold (m: 'a monoid) : 'a S.t -> 'a = fun c -> match S.decon c with None -> m.unit | Some (h,t) -> m.op h (fold m t) end
We got what we wanted: abstraction over a sequence. To use
it to define other higher-kinded polymorphic functions, such as
sums
to sum up a sequence, we also
need functors. Functors are infectious, one may say.module SumS(S:seq_i) = struct open S open FoldS(S) let sum : int t -> int = fold monoid_plus end
Finally, an example of instantiating and using
higher-kind–polymorphic functions: summing a list. First we need an
instance ofseq_i
for a list: the witness that a list is a sequence.module ListS = struct type 'a t = 'a list let decon = function [] -> None | h::t -> Some (h,t) end
which we pass to the
SumS
functor:let 6 = let module M = SumS(ListS) in M.sum [1;2;3]
The accompanying code shows another example: using the same
SumS
to
sum up an array, which also can be made a sequence.Thus in this approach, all higher-kind–polymorphic functions are
functors, which leads to verbosity, awkwardness and boilerplate. For
example, we cannot even write aSumS
application asSumS(ListS).sum [1;2;3]
; we have to use the verbose expression above. - References
- HKPoly_seq.ml [11K]
The complete code with tests and other examples
Yallop and White’s Lightweight higher-kinded polymorphism
- Perhaps surprisingly, higher-kinded polymorphism can always be reduced
to the ordinary polymorphism, as Yallop and White’s FLOPS 2014 paper
cleverly demonstrated. They explained their approach as
defunctionalization. Here we recap it and explain in a different way.Consider the type
'a list
again. It is a parameterized type:'a
is
the type of elements, andlist
is the name of the collection: `the
base name’, so to speak. The combination of the element type and the
base name can be expressed differently, for example, as
('a,list_name) app
, where('a,'b) app
is some fixed type, and
list_name
is the ordinary type that tells the base name. The fact
that the two representations are equivalent is witnessed by the
bijection:inj: 'a list -> ('a,list_name) app prj: ('a,list_name) app -> 'a list
Here is a way to implement it. First, we introduce the dedicated
`pairing’ data type. It is extensible, to let us define as many
pairings as needed.type ('a,'b) app = ..
For
'a list
, we have:type list_name type ('a,'b) app += List_name : 'a list -> ('a,list_name) app
In this case the bijection
'a list <-> ('a,list_name) app
is:let inj x = List_name x and let prj (List_name x) = x
and the two functions are indeed inverses of each other.
Exercise: Actually, that the above
inj
andprj
are
the inverses of each other is not as straightforward. It requires a
side-condition, which is satisfied in our case. State it.In this new representation of the polymorphic list as
('a,list_name) app
, the base namelist_name
is the ordinary (kind*
)
type. Abstraction over it is straightforward: replacing with a
type variable. The base-name-polymorphism is, hence, the ordinary
polymorphism. We can then write the desired sequence-polymorphic
folds
almost literally as the hypothetical code at the end of [Introduction]:type ('a,'n) seq = {decon: ('a,'n) app -> ('a * ('a,'n) app) option} let rec folds (m: 'a monoid) (s: ('a,'n) seq) : ('a,'n) app -> 'a = fun c -> match s.decon c with None -> m.unit | Some (h,t) -> m.op h (folds m s t)
Instead of
'a 'F
we write('a,'n) app
. That’s it. Usingfolds
in
other higher-kinded functions is straightforward, as if it were
a regular polymorphic function (which it actually is):let sums s c = folds monoid_plus s c (* val sums : (int, 'a) seq -> (int, 'a) app -> int =
*) Type annotations are not necessary: the type inference works. Here is a
usage example, summing a list:let list_seq : ('a,list_name) seq = {decon = fun (List_name l) -> match l with [] -> None | h::t -> Some (h,List_name t)} let 6 = sums list_seq (List_name [1;2;3])
There is still a bit of awkwardness remains: the user have to think up
the base name likelist_name
and the tag likeList_name
, and
ensure uniqueness. Yallop and White automate using the module system,
see the code accompanying this page, or Yallop and White’s paper (and
the Opam package `higher’).We shall return to Yallop and White’s approach later on this page,
with another perspective and implementation. - References
- Jeremy Yallop and Leo White: Lightweight higher-kinded
polymorphism. FLOPS 2014.HKPoly_seq.ml [11K]
The complete code with tests and other examples
Sidestepping higher-kinded polymorphism
- At times, higher-kinded polymorphism can be avoided altogether: upon
close inspection it may turn out that the problem at hand does not
actually require higher-kinded polymorphism. In fact, our running
example is such a problem.Let us examine the sequence interface,
parameterized both by the type of the sequence elements and the sequence
itself. The definition that first comes to mind, which cannot be written
as such in OCaml, is (from Introduction):type ('a,'F) seq = {decon: 'a 'F -> ('a * 'a 'F) option}
It has a peculiarity: the sole operation
decon
consumes and produces
sequences of the same type'a 'F
(i.e., the same sort of sequence
with the elements of the same type). That is,'F
always occurs as the type'a 'F
, where'a
isseq
‘s parameter:'a
and
'F
do not vary independently. Therefore, there is actually
no higher-kinded polymorphism here. The sequence interface can be written
simply astype ('a,'t) seq = {decon: 't -> ('a * 't) option}
with
folds
taking exactly the desired form:let rec folds (m: 'a monoid) (s: ('a,'t) seq) : 't -> 'a = fun c -> match s.decon c with None -> m.unit | Some (h,t) -> m.op h (folds m s t)
It is the ordinary polymorphic function. There is no problem in using
it to define other such sequence-polymorphic functions, e.g.:let sums s c = folds monoid_plus s c (* val folds : 'a monoid -> ('a, 't) seq -> 't -> 'a =
*) and applying it, say, to a list:
let list_seq : ('a,'a list) seq = {decon = function [] -> None | h::t -> Some (h,t)} let 6 = sums list_seq [1;2;3]
Exercise: Consider the interface of collections that may be
`mapped’, in the hypothetical OCaml with higher-kind type variables:type ('a,'b,'F) ftor = {map: ('a->'b) -> ('a 'F -> 'b 'F)}
Now
'F
is applied to different types. Can this interface be still
expressed using the ordinary polymorphism, or higher-kinded
polymorphism is really needed here?Looking very closely at the higher-kinded polymorphic interface
('a,'F) seq
and the ordinary polymorphic('a,'t) seq
, one may
notice that the latter is larger. The higher-kinded interface
describes only polymorphic sequences such as'a list
, whereas
('a,'t) seq
applies also to files, strings, buffers, etc. Such an
enlargement is welcome here: we can apply the samefolds
to
sequences whose structure is optimized for the type of their
elements. In Haskell terms,('a,'t) seq
corresponds to `data
families’, a later Haskell extension. Here is an
example, of applyingfolds
to a string, which is not a polymorphic
sequence:let string_seq : (char,int*string) seq = {decon = fun (i,s) -> if i >= String.length s || i < 0 then None else Some (s.[i], (i+1, s))} let 'c' = folds monoid_maxchar string_seq (0,"bca")
We can hence use
folds
with any collection, polymorphic or not, for
which there is an implementation of the('a,'t) seq
interface. We have
encountered the old, and very useful trick: enlarging the type but
restricting the set of its values by having to be able to define `witnesses'
such as('a,'t) seq
.Exercise: Yallop and White's approach can also deal with
non-polymorphic collections. Use it to implementstring_seq
. - References
- HKPoly_seq.ml [11K]
The complete code with tests and other examples
Algebras
- The running example, the
seq
interface, was about deconstruction of
sequences: technically, about co-algebras. Let us now turn to
construction: building of values using a fixed set of operations,
which can be considered an embedded DSL. The abstraction over a DSL
implementation gives rise to polymorphism. If the embedded
DSL is typed, the polymorphism becomes higher-kinded -- as commonly seen
in DSL embeddings in tagless-final style.Here we briefly recount how the higher-kinded polymorphism arises in DSL
embeddings, and how it can be hidden away. The key idea is initial
algebra, which is, by definition, the abstraction over any concrete
algebra of the same signature, i.e., the abstraction over DSL implementations.Our running example in this section is a simple programming language
with integers and booleans: a dialect of the language used in Chap. 3
of Pierce's `Types and Programming Languages' (TAPL). Here is the familiar
tagless-final embedding in OCaml. The grammar of the language is
represented as an OCaml signature:module type sym = sig type 'a repr val int : int -> int repr val add : int repr -> int repr -> int repr val iszero : int repr -> bool repr val if_ : bool repr -> 'a repr -> 'a repr -> 'a repr end
The language is typed; therefore, the type
'a repr
, which represents DSL
terms, is indexed by the term's type: anint
or abool
. The
signaturesym
also defines the type system of the DSL: almost like
in TAPL, but with the typing rules written in a
vertical-space--economic way.Here is a sample term of the DSL:
module SymEx1(I:sym) = struct open I let t1 = add (add (int 1) (int 2)) (int 3) (* intermediate binding *) let res = if_ (iszero t1) (int 0) (add t1 (int 1)) end
It is written as a functor parameterized by
sym
: a DSL
implementation is abstracted out. The term is polymorphic over
sym
and, hence, may be evaluated in any implementation of the DSL. Since
sym
contains a higher-kinded typerepr
, the polymorphism is
higher-kinded.The just presented (tagless-final) DSL embedding followed the
approach described in [Higher-kinded functions as Functors]. Let us move away from functors
to ordinary terms. Actually, we never quite escape functors, but we
hide them in terms, relying on first-class modules.
As we have seen, a DSL term of the typeint
such asSymEx1
is the functorfunctor (I:sym) -> sig val res : int I.repr end
To abstract over
int
, we wrap it into a modulemodule type symF = sig type a module Term(I:sym) : sig val res : a I.repr end end
which can then be turned into ordinary polymorphic type:
type 'a sym_term = (module (symF with type a = 'a))
which lets us represent the functor
SymEx1
as an ordinary OCaml value:let sym_ex1 : _ sym_term = (module struct type a = int module Term = SymEx1 end)
Here, the type annotation is needed. However, we let the type of the term
to be_
, as a schematic variable. OCaml infers it asint
.
If we have an implementation ofsym
, say, moduleR
, we can use it
to run the example (and obtain thesym_ex1
's value inR
's interpretation):let _ = let module N = (val sym_ex1) in let module M = N.Term(R) in M.res
The type
'a sym_term
can itself implement thesym
signature, in
a `tautological' sort of way:module SymSelf : (sym with type 'a repr = 'a sym_term) = struct type 'a repr = 'a sym_term let int : int -> int repr = fun n -> let module M(I:sym) = struct let res = I.int n end in (module struct type a = int module Term = M end) let add : int repr -> int repr -> int repr = fun (module E1) (module E2) -> let module M(I:sym) = struct module E1T = E1.Term(I) module E2T = E2.Term(I) let res = I.add (E1T.res) (E2T.res) end in (module struct type a = int module Term = M end) ... end
That was a mouthful. But writing
sym
DSL terms becomes much
easier, with no functors and no type annotations. The earliersym_ex1
can now be written aslet sym_ex1 = let open SymSelf in let t1 = add (add (int 1) (int 2)) (int 3) in (* intermediate binding *) if_ (iszero t1) (int 0) (add t1 (int 1))
It can be evaluated in
R
or other implementation as shown before.Technically,
SymSelf
is the initial algebra: an implementation of
the DSL that can be mapped to any other implementation, and in a
unique way. That means its terms likesym_ex1
can be evaluated in
anysym
DSL implementation: they are polymorphic over DSL implementation.On the down-side, we have
SymSelf
, which is the epitome of
boilerplate: utterly trivial and voluminous code that has to be
written. On the up side, writing DSL terms cannot be easier: no type
annotations, no functors, no implementation passing -- and no overt
polymorphism, higher-kind or even the ordinary kind. Still, the terms
can be evaluated in any implementation of the DSL.Exercise: Apply Yallop and White's method to this DSL
example. Hint: the first example in Yallop and White's paper, monad
representations, is an example of a DSL embedding in tagless-final
style. - References
- HKPoly_tf.ml [13K]
The complete code with tests and detailed development
The initial algebra construction using first-class functors,
in the case of one-sorted algebras (corresponding to untyped DSLs)Stephen Dolan: phantom type. Message on the caml-list
posted on Mon, 27 Apr 2015 12:51:11 +0100
What's in a higher-kinded type name
- We now look back at Yallop and White's approach of reducing
higher-kinded polymorphism to the ordinary polymorphism, from a
different perspective. It gives if not a new insight, at least new
implementations.A polymorphic type like
'a list
represents a family of types,
indexed by a type (of list elements, in this example). A higher-kinded
type abstraction such as'a 'F
with the hypothetical (in OCaml)
higher-kind type variable'F
is the abstraction over a family name,
so to speak, while still keeping track of the index. Here is another
way of accomplishing such an abstraction.Consider the existential type
exists a. a list
(realizable in OCaml
in several ways, although not in the shown notation. We will keep the
notation for clarity). The existential is now the ordinary, rank*
type and can be abstracted in a type variable, e.g.,'d
. The `family
name' is, hence, the family type with the hidden index. We have
lost track of the index, however. Therefore, we tack it back, ending
up with the type('a,'d) hk
. Thus(t,exists a. a list) hk
is meant
to be the same ast list
(for any typet
).There is a problem however:
('a,'d) hk
is a much bigger type. We
need the condition that in(t,exists a. a list) hk
, the indext
is
exactly the one that we hid in the existential quantification -- we
need dependent pairs, not supported in OCaml. Remember the old trick,
however: we may have a bigger type so long as we control the producers
of its values and ensure only the values satisfying the condition are
built. To be concrete, we must make certain that the only way to
produce('a,'d) hk
values is by using functions likeinj: 'a list -> ('a, exists a. a list) hk
that expose the same index they hide.
At some point the type checker will demand a
proof: when implementing the
inverse mapping('a, exists a. a list) hk -> 'a list
and extracting
the list out of the existential. There are several ways of going about
the proof.The simplest is to give our word -- that the condition always holds for
all('a,'d) hk
values actually produced, and we have a proof of that
on some piece of paper or in a.v
file. This leads to the exceptionally
simple implementation, which does nothing at all
(all of its operations are the identity).module HK : sig type ('a,'d) hk (* abstract *) module MakeHK : functor (S: sig type 'a t end) -> sig type anyt (* also abstract *) val inj : 'a S.t -> ('a,anyt) hk val prj : ('a,anyt) hk -> 'a S.t end end = struct type ('a,'d) hk = 'd module MakeHK(S:sig type 'a t end) = struct type anyt = Obj.t let inj : 'a S.t -> ('a,anyt) hk = Obj.repr let prj : ('a,anyt) hk -> 'a S.t = Obj.obj end end
The accompanying code shows a different, also quite simple
implementation without anyObj
magic.After enriching the
sym
signature of the DSL from the previous
section with fake higher-kinded types:module type sym_hk = sig include sym include module type of HK.MakeHK(struct type 'a t = 'a repr end) end
we can write the earlier
SymEx1
example as a function (a term)
rather than a functor:let sym_ex1 (type d) (module I:(sym_hk with type anyt=d)) : (_,d) HK.hk = let open I in let t1 = add (add (int 1) (int 2)) (int 3) |> inj in (* intermediate term *) let res = if_ (iszero t1) (int 0) (add t1 (int 1)) in inj res
It can be evaluated simply as
sym_ex1 (module RHK) |> RHK.prj
where
RHK
is a module implementingsym_hk
. Incidentally, ifSHK
is another module implementingsym_hk
and we attempt
sym_ex1 (module RHK) |> SHK.prj
, we discover that
(int,RHK.anyt) bk
and(int,SHK.anyt) bk
are actually different
types. AlthoughHK
does not do anything (at runtime), it does
maintain safety and soundness. - References
- HKPoly_tf.ml [13K]
The complete code with tests and detailed development
Conclusions
- We have surveyed various ways of abstracting over a type
constructor -- or, writing interface-parameterized terms
when the interface involves a polymorphic type. Even if the
language does not support type-constructor--polymorphism directly,
such interface parameterization can still be realized, as:- interface abstraction as a functor abstraction
- reducing higher-kinded polymorphism to ordinary polymorphism,
by establishing a bijection between type constructors and
ordinary types - hiding the polymorphism over DSL implementations
behind initial algebra
(if the interface is algebraic, as often happens in tagless-final
DSL embeddings) - and in some problems, higher-kinded polymorphism is not actually
needed, on close inspection