The Basic Principles of Rust Modules
This is apparently quite a contentious opinion, but after having worked with them for a while, I in general like Rust's modules. However, it's not contentious at all to say that they are complicated and hard to understand for someone looking at them for the first time. So, let's talk about Rust modules, building up some principles for understanding them as we go.
Important note: this blog post represents the state of Rust modules as of the time of writing. There are changes in store, and the module system as used in the Rust of 2019 is gearing up to be pretty different!1
Part One: Modules
Before I get to actual code snippets, let's talk about the first two basic principles:
- Every named item in Rust—-functions, types, values, modules—-can be made public by prefixing it with the
pub
keyword. - Every compilation unit—-a library or an executable, what Rust calls a crate—-contains a 'root' module. For a library, this root module is defined in
lib.rs
, and for an executable, it's defined atmain.rs
.
Let's say I make a new fresh Cargo project that's a library. I get a file at src/lib.rs
. Every item defined in that file lives in the root module, so let's write a simple function here:
// in src/lib.rs
pub fn hello() {
println!("Hello!");
}
Inside of this module, we can refer to this function using the unqualified name hello
, but we also can use a qualified name, which represents a path through our module hierarchy to get to the name. In this case, hello
is exposed through the root module, so the qualified name for hello
is ::hello
, which leads us to another principle:
- The root of the current compilation unit's module hierarchy is indicated in qualified paths with an initial
::
.
So let's make a new module inside of the root module. Submodules are always indicated with the mod
keyword, but (as we'll see) this keyword can be deployed in different ways depending on where we want to implement the module. For now, we'll define the contents of the module inside the same source file, in a curly-brace-delimited block headed by pub mod my_module_name
:
// in src/lib.rs
pub mod english {
// #1
pub fn hello() {
println!("Hello!");
}
}
// #2
pub fn hello() {
::english::hello();
}
At this point, the root module contains two public items: ::hello
, a function, and ::english
, a module. In turn, the module ::english
contains a public item: a function ::english::hello
. In this case, I'm using a fully qualified names for disambiguation: this is useful here because I have two different defined functions both named hello
. Without explicit qualification, the name hello
will refer to different things depending on the module scope: when referenced inside the english
module, hello
refers to the hello
function we defined the that I've indicated with the comment // #1
, and outside of the english
module, hello
refers to the hello
function I've indicated with the comment // #2
.
While there is never 'ambiguity' from the point of view of the Rust compiler, having the same name in different scopes can be confusing for us as readers and writers of code. We can always be explicit and use fully qualified names to alleviate any confusion. For example, we can (although I don't know why you would) write the following code:
// in src/lib.rs
pub mod english {
pub fn hello() {
println!("Hello!");
}
pub fn outer_hello() {
::hello();
}
}
pub fn hello() {
::english::hello();
}
In this example, we've added the new function ::english::outer_hello
, which invokes ::hello
using a fully qualified path, which in turn invokes ::english::hello
using a fully qualified path.
So we've discovered a new principle:
- Name lookup in expressions is relative to the module in which the expression appears unless the name is fully qualified.
There are also ways of traversing the module hierarchy with relative names. The above example can be rewritten using relative names like so:
// in src/lib.rs
pub mod english {
pub fn hello() {
println!("Hello!");
}
pub fn outer_hello() {
super::hello();
}
}
pub fn hello() {
english::hello();
}
The expression english::hello()
inside of the function ::hello
is in the root module, and therefore the name is looked up relative to the root module. The declaration of ::english::outer_hello
, however, is relative to the module ::english
, so therefore if we want a relative reference to ::hello
, we need to first move up a module level with the super
keyword in order to access the item we want in the parent module. There's also a self
keyword, which allows us to look up names relative to the current module. That keyword is redundant here—-we're already looking up names relative to the current module by default!—-but we'll find a use for it later on.
- You can use
super
andself
in qualified names to traverse the module hierarchy in a relative way.
(Note also that super
and self
only work at the beginning of a path: that means we can't write a perverse relative path like ::english::super::english::hello
—-not that you'd want to write that anyway!)
There are also two other ways of creating a module, but all of them use the mod
keyword in some way. Right now, we've defined a module using a curly-brace-delimited block in a single file, but we can also create a module by placing the definitions of items in that module into a separate file named the same name as the module. In these examples, because our module is called english
, we can put declarations into the file src/english.rs
, so our two source files will look like
// in src/lib.rs
pub mod english;
pub fn hello() {
self::english::hello();
}
and
// in src/english.rs
pub fn hello() {
println!("Hello!");
}
pub fn call_outer_hello() {
super::hello();
}
Notice that we've kept the pub mod english
declaration in the root module, but we've removed the body of that declaration. At the same time, the contents of src/english.rs
are (modulo indentation and the comment) identical to the contents of the english
module before, and similarly none of the rest of the code had to change at all. This reorganization is purely for our benefit: the code does not care whether we use an inline module or an external module.
There's a third way we could have organized our code, as well: instead of creating a file named after our module, we could have created a directory named after our module, and included a file called mod.rs
inside that directory. Again, we could have equivalently written these modules:
// in src/lib.rs
pub mod english;
pub fn hello() {
self::english::hello();
}
and
// in src/english/mod.rs
pub fn hello() {
println!("Hello!");
}
pub fn call_outer_hello() {
super::hello();
}
Notice that even less had to change this time: the contents of the files are identical to before, and only the on-disk organization has changed! Again, from Rust's point of view, the tree of modules and items we've created here is identical. This final on-disk organization method is more convenient if we want to create other nested submodules that are contained in their own files: if we wanted to have a module ::english::plurals
, then we could define the module ::english
in the file src/english/mod.rs
, and then define the module ::english::plurals
in src/english/plurals.rs
2. If we used either of the other two on-disk organization methods, then we would either have to write nested module blocks like
pub mod english {
pub mod plurals {
// ...
}
}
Or we would have to have a pub mod plurals { ... }
block inside of the source file src/english.rs
.
However, like I said, all of these choies are mostly important to us as programmers structuring a project on-disk and maintaining it as a repository, but from Rust's point of view, they define an identical namespace. In summary:
- Modules can be defined in three ways—-using lexical
pub mod my_module { ... }
blocks, using asrc/my_module.rs
file, or using asrc/my_module/mod.rs
file, but the choice of approach is immaterial to the namespace structure being constructed.
This can be a powerful way of growing a project: a module might start life as one or two types and functions inside a source file, and as they grow, they can get moved first into another file, and then into a directory that itself contains several more files, and this can happen transparently, without any user of the code (inside or outside my crate) having to be aware of it.
Now we can move on to
Part Two: Extern Crates
The extern crate
declaration does two things at once: it specifies a dependency on an external named crate, and it includes that crate's root module as a named module in the current module's namespace.
Let's build some randomness into our example code. We will use the rand
crate to get some random numbers, and use them to choose from a list of greetings.
// in src/lib.rs
// importing an external crate in the root
extern crate rand;
// our statically-chosen list of greetings
static GREETINGS: &[&'static str] = &[
"Hello!", "Heya!", "Greetings!",
];
pub fn hello() {
// choose a random valid index
let index = ::rand::random::<usize>() % GREETINGS.len();
// and print the chosen greeting
println!("{}", GREETINGS[index]);
}
Notice that the library functions in the rand
crate are now available as though they were defined in a module called rand
in the root of our crate. By default, the name of the imported module is the same as the name of the crate—-but we can change that! If we want a different module structure, we can use the as
keyword to rename the imported module to a different name—-in this case, let's call it my_rand
:
// in src/lib.rs
// importing an external crate in the root, with renaming
extern crate rand as my_rand;
static GREETINGS: &[&'static str] = &[
"Hello!", "Heya!", "Greetings!",
];
pub fn hello() {
let index = ::my_rand::random::<usize>() % GREETINGS.len();
println!("{}", GREETINGS[index]);
}
Now, we've separated out the specification of the external name of the crate we want (the rand
crate) from the name we're giving its root module in our internal module hierarchy (::my_rand
).
Unlike many programming languages, we can declare extern crate
dependencies anywhere in the module hierarchy we want! So, for example, let's import rand
underneath a new module. Notice that we need to include a pub
qualifier to make sure the module it creates is accessible outside of the deps
module. (Remember that all names brought into scope must be made explicitly public with the pub
keyword!)
// in src/lib.rs
// importing an external crate in a submodule
pub mod deps {
pub extern crate rand as my_rand;
}
static GREETINGS: &[&'static str] = &[
"Hello!", "Heya!", "Greetings!",
];
pub fn hello() {
let index = ::deps::my_rand::random::<usize>() % ::GREETINGS.len();
println!("{}", ::GREETINGS[index]);
}
This is a pretty stark departure from how most languages think of external dependencies: usually, they just get placed in some kind of root namespace, and you might be able to import them under a different name (like with Python's import module as name
syntax). In Rust, you have full control over your own module hierarchy, and so you can insert the root module of an external crate whever you'd like.
- The
extern crate
declaration specifies a dependency on a named external crate and imports that crate's root module as a named module in the current module scope. - The name of a module created by
extern crate
defaults to the name of the crate, but can be explicitly specified with theas
keyword.
As an aside: why would you want to do this? Well, one reason is that you might have a lot of external dependencies, and it would be nice to manage them. Imagine that you're writing a video game, and therefore depend on a lot of libraries for things like graphics and physics and so forth. You could therefore imagine wanting to both organize and rename your dependencies in order to manage them, something along the lines of:
pub mod graphics {
pub extern crate opengl as gl;
pub extern crate sdl;
}
pub mod img {
pub extern crate png;
pub extern crate libjpeg as jpeg;
}
Now you can organize your dependencies in a way that's convenient and structured!
Part Three: Uses
The last thing we haven't talked about is the use
declaration, which is given a qualified path and creates a new local name for the item named by that path.
- The
use
keyword pulls in names from another scope into the current one.
We often use this with functions or types defined by external crates, so we can pull them into scope with more convenient local names, but it works with any named item, including modules.[\^2]
It's not uncommon to see declarations like use std::fs;
, which allows the module ::std::fs
to be accessed locally as fs
without the std::
prefix. This brings a module, not a function or type, into the local scope.
// in src/lib.rs
extern crate rand as my_rand;
// ...but pulling the random function into scope here
use my_rand::random;
static GREETINGS: &[&'static str] = &[
"Hello!", "Heya!", "Greetings!",
];
pub fn hello() {
// we can now use random unqualified
let index = random::<usize>() % GREETINGS.len();
println!("{}", GREETINGS[index]);
}
Confusingly, use
declarations work differently from expressions, because names in use
declarations are always relative to the root. Consider the example below: because the extern crate rand as my_rand
was declared in the root module, the fully qualified name of the random
function is ::my_rand::random
, but when I use
that name inside the english
module, I give it a relative path as though I'm looking up the symbol from the root.
// in src/lib.rs
extern crate rand as my_rand;
pub mod english {
use my_rand::random;
static GREETINGS: &[&'static str] = &[
"Hello!", "Heya!", "Greetings!",
];
pub fn hello() {
let index = random::<usize>() % GREETINGS.len();
println!("{}", GREETINGS[index]);
}
}
pub fn hello() {
english::hello();
}
If I do want to use
a name that's relative to the current module, I can use the self
keyword in the path, which starts me instead from the module containing the use
declaration. In the below example, we've moved the extern crate
line inside of the module, so now the my_rand
module lives in ::english::my_rand
, and we then use
an explicit relative path to pull in the function we want.
// in src/lib.rs
pub mod english {
extern crate rand as my_rand;
use self::my_rand::random;
static GREETINGS: &[&'static str] = &[
"Hello!",
"Heya!",
"Greetings!",
];
pub fn hello() {
let index = random::<usize>() % GREETINGS.len();
println!("{}", GREETINGS[index]);
}
}
pub fn hello() {
english::hello();
}
So our last principle is:
- Name lookup in
use
declarations is relative to the root module unless the name is explicitly made relative with theself
keyword.
I also can use the pub
keyword on use
s to pull names into a module while simultaneously exposing those names outside of that module. Sometimes I like to do this to carefully namespace a subset of external names I plan on using, allowing me to group several libraries together, or pick-and-choose items from separate but related compilation units into a single module for local use.
pub mod io {
pub use std::fs::File;
pub use std::io::*;
}
All The Principles In One Place
To summarize, I'm going to pull all the principles from the above text into a single list:
- Every named item in Rust—-functions, types, values, modules—-can be made public by prefixing it with the
pub
keyword. - Every compilation unit—-a library or an executable, what Rust calls a crate—-contains a 'root' module. For a library, this root module is defined in
lib.rs
, and for an executable, it's defined atmain.rs
. - The root of the module hierarchy is indicated in qualified paths with an initial
::
. - Name lookup in expressions is relative to the module in which the expression appears unless the name is fully qualified.
- You can use
super
andself
in qualified names to traverse the module hierarchy in a relative way. - Modules can be defined in three ways—-using lexical
pub mod my_module { ... }
blocks, using asrc/my_module.rs
file, or using asrc/my_module/mod.rs
file, but the choice of approach is immaterial to the namespace structure being constructed. - The
extern crate
declaration specifies a dependency on a named external crate and imports that crate's root module as a named module in the current module scope. - The name of a module created by
extern crate
defaults to the name of the crate, but can be explicitly specified with theas
keyword. - The
use
keyword pulls in names from another scope into the current one. - Name lookup in
use
declarations is relative to the root module unless the name is explicitly made relative with theself
keyword.
This is a lot to take in at once, and it can be confusingly different from other languages, but the biggest difference is that your module tree is always explicit. The drawback is that it's verbose and sometimes offers flexibility that can be unnecessary or confusing, but the advantage is that what's happening it's always explicit in the code, and also that you can express exactly the hierarchy of names and namespaces that you want.
- Is it a pun to say 'gearing up' when talking about Rust?
- Or, in
src/english/plurals/mod.rs
, of course!