This is part of the series Advent of Code 2022.
Day 7 - Advent of Code 2022

The challenge of today, day 7, deals with an issue of the device the elves gave you earlier. In particular a system update fails due to no space left on the device. Time to have a loot at the file system for something that can be deleted.

$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k

The input consists of the commands cd, ls that start with the $ prompt symbol. The cd command is to change a directory while the ls will list files & directories. Files also show their size. The complete output describes a file tree with plain files & directories, which themselves can contain files & directories.

The expected filesystem from the given input looks like:

- / (dir)
  - a (dir)
    - e (dir)
      - i (file, size=584)
    - f (file, size=29116)
    - g (file, size=2557)
    - h.lst (file, size=62596)
  - b.txt (file, size=14848514)
  - c.dat (file, size=8504156)
  - d (dir)
    - j (file, size=4060174)
    - d.log (file, size=8033020)
    - d.ext (file, size=5626152)
    - k (file, size=7214296)

On top there is the root directory / with directories a & d. Directory a itself has directory e in it. They contain files of different sizes.

Part 1

The task is to find all directories (including their subtrees) with a total size of at most 100.000. The calculate the sum of their total sizes. For examplet he outermost directory has a total of 48.381.165 because it contains all other directories & files, while directory a has a size of 94853.

This seems like a good opportunity to have another go with the peg parser crate to read in all the lines to convert them into usable types. This crate was also used on day 5. There are four different types of input lines, cd command, ls command, a file with name and size & a dir with a name. To parse these lines first let's add an enum that can represent any of these types, quite lazily named Line.

#[derive(Debug, PartialEq)]
enum Line {
    Cd(String),
    Ls,
    Dir(String),
    File(u64, String),
}

Then the parser will read in the input line by line using the follow parser grammar.

peg::parser! {
    grammar line_parser() for str {
        rule dots() -> String
            = ".." { String::from("..") }

        rule root() -> String
            = "/" { String::from("/") }

        rule label() -> String
            = l:$(['a'..='z']+) { l.to_string() }

        rule cd() -> Line
            = "$ cd " l:(root() / dots() / label()) { Line::Cd(l.into()) }

        rule ls() -> Line
            = "$ ls" { Line::Ls }

        rule filename() -> String
            = l:$(['a'..='z']+['.']?['a'..='z']*) { l.to_string() }

        rule file() -> Line
            = n:$(['0'..='9']+) " " l:filename() { Line::File(n.parse::<u64>().unwrap(), l.into()) }

        rule dir() -> Line
            = "dir " l:label() { Line::Dir(l.to_string()) }

        pub(crate) rule line() -> Line
            = cd() / ls() / file() / dir()
    }
}

That's quite a lot, most of it is mainly to support multiple different representations of the cd command as it accepts the root /, .. to move to the parent directory or a name (e.g. a) that we could change into.

Still it's a bit much to parse & to avoid any issues later let's add a test to check a fair number of possible lines.

#[cfg(test)]
mod tests {
    use crate::*;

    #[test]
    fn check_line_parser() {
        assert_eq!(Ok(Line::Cd("/".into())), line_parser::line("$ cd /"));
        assert_eq!(Ok(Line::Ls), line_parser::line("$ ls"));
        assert_eq!(Ok(Line::Dir("a".into())), line_parser::line("dir a"));
        assert_eq!(
            Ok(Line::File(14_848_514, "b.txt".into())),
            line_parser::line("14848514 b.txt")
        );
        assert_eq!(Ok(Line::File(584, "i".into())), line_parser::line("584 i"));
        assert_eq!(Ok(Line::Cd("..".into())), line_parser::line("$ cd .."));
    }
}

Now all lines from the input file should be able to get converted into one of the Line variants. The next step is to somehow parse these lines and traverse the file hierarchy appropriately. There are basically two options here, one is to calculate the directory sizes by traversing the input once, where the input can be considered as a kind of a stream. The other option is to convert the input into data structures that represent the filesystem first and then use a traversal algorithm to find the information.

Personally I prefer the second option because the new types will reflect the underlying concepts, e.g. a file or directory entry. One aspect that is quite difficult to figure out sometimes is how to name these concepts. In this case it's probably easier, as most people who have used a computer before will have some understanding of what a file or directory is. Files and directories are organized in a file system, an entry can be either. For example in Rust there is a type to reference an entry inside a directory named DirEntry. By calling the DirEntry::metadata method it can be determined if the entry references a file or a directory.

We will use the commands cd & ls to build the file hierarchy. To get started we first lay out a few basic types that represents this hierarchy later.

#[derive(Debug)]
struct FileEntry {
    name: String,
    size: u64,
}

impl FileEntry {
    pub fn new(name: &str, size: u64) -> Self {
        Self {
            name: name.to_string(),
            size,
        }
    }

    pub fn size(&self) -> u64 {
        self.size
    }
}

#[derive(Debug)]
struct DirEntry {
    name: String,
    entries: BTreeMap<String, Entry>,
}

impl DirEntry {
    pub fn new(name: &str) -> Self {
        Self {
            name: name.to_string(),
            entries: BTreeMap::new(),
        }
    }

    pub fn size(&self) -> u64 {
        self.entries.iter().map(|(_, entry)| entry.size()).sum()
    }
}

#[derive(Debug)]
enum Entry {
    File(FileEntry),
    Directory(DirEntry),
}

impl Entry {
    pub fn root() -> Self {
        Entry::Directory(DirEntry::new("/"))
    }
}

So far this is pretty basic. A FileEntry holds the filename and size. A DirEntry holds a name and list of entries to hold either files or directories. The Entry type is the common "interface" to represent either. The entries field in DirEntry maps names to its entries. These structs will get new methods to add entries or to fetch information from them later.

Let's continue with the parse function to read in all input lines.

fn parse(input: &str) -> Entry {
    let lines = input
        .lines()
        .map(str::trim)
        .filter(|l| !l.is_empty())
        .filter_map(|line| line_parser::line(line).ok())
        .collect::<Vec<_>>();

    // We will kind of cheat here, the first line in the input is the root directory
    let mut root = Entry::root();
    // TODO build the hierarchy

    root
}

This reads in all input and converts them into Line variants. Now we need a way to transform these into a file system. We will use Entry::root as a starting point (& can ignore the first line from the input then). The new function is named build_filesystem.

fn build_filesystem<'a>(
    parent: &mut Entry,
    lines: &mut impl Iterator<Item = &'a Line>,
) -> anyhow::Result<()> {
    loop {
        match lines.next() {
            Some(Line::Cd(dir)) => match dir.as_ref() {
                ".." => return Ok(()),
                dir => {
                    let entry = parent
                        .get_dir(dir)
                        .ok_or_else(|| anyhow!("Directory '{}' not found in parent", dir))?;
                    build_filesystem(entry, lines)?;
                }
            },
            Some(Line::Ls) => (),
            Some(Line::Dir(dir)) => {
                parent.add_dir(dir)?;
            }
            Some(Line::File(size, name)) => {
                parent.add_file(name, *size)?;
            }
            None => break,
        };
    }
    Ok(())
}

Some of the methods have not been implemented yet, they will be added shortly. We also need to add the anyhow crate to the Cargo.toml in the #[dependencies] block.

The function takes in its first parameter a parent Entry as mutable reference in order to be able to modify it, e.g. to add a file or directory. The second parameter lines is also a mutable reference to an iterator over &'a Line. This parameter acts as a cursor into the list of lines. The main idea for both parameters is that it allows us to call the build_filesystem function recursively and passing them in further. While traversing the file system by following the Line statements, we manage to build the hierarchy & are able to modify the right Entry. The match block inside the loop checks each new line and executes the associated logic. For example a line with a directory name, e.g. dir a, will add a directory with its name to the current Entry. Even though this type can be either a File or a Directory the calls to add_dir & add_file will only work the Directory variant.

There are possibly better solutions to handle this traversal. Methods on Entry will forward to the appropriate inner variant or will fail if the wrong variant is found, for example when trying to add a new directory to the File variant. Some basic error handling is in place by using anyhow::Result as the return type. In cases something is not expected an error will be raised. The anyhow crate is ideal for these situations, it provides a flexible Error type that allows for idiomatic error handling in Rust.

When the next Line changes the directory, either it will leave the current iteration and return to the parent Entry (..) or it changes directory, then the directory by the name is passed to a recursive call to build_filesystem along with the lines iterator. In case all lines are handled the loop will return.

The missing methods add_dir & add_file will be added to Entry, they are forwarded to the DirEntry value of the Directory variant. Both DirEntry and Entry now are given with these methods.

impl DirEntry {
    pub fn new(name: &str) -> Self {
        Self {
            name: name.to_string(),
            entries: BTreeMap::new(),
        }
    }

    pub fn add_dir(&mut self, name: &str) {
        self.entries.insert(name.to_string(), Entry::dir(name));
    }

    pub fn add_file(&mut self, name: &str, size: u64) {
        self.entries
            .insert(name.to_string(), Entry::file(name, size));
    }

    pub fn size(&self) -> u64 {
        self.entries.iter().map(|(_, entry)| entry.size()).sum()
    }
}

impl Entry {
    pub fn root() -> Self {
        Entry::Directory(DirEntry::new("/"))
    }

    pub fn add_dir(&mut self, name: &str) -> anyhow::Result<()> {
        match self {
            Entry::Directory(dir) => dir.add_dir(name),
            Entry::File(_) => return Err(anyhow!("Not a Dir")),
        };
        Ok(())
    }

    pub fn add_file(&mut self, name: &str, size: u64) -> anyhow::Result<()> {
        match self {
            Entry::Directory(dir) => dir.add_file(name, size),
            Entry::File(_) => return Err(anyhow!("{:?} not a Dir", self)),
        }
        Ok(())
    }

    pub fn dir(name: &str) -> Self {
        Self::Directory(DirEntry::new(name))
    }

    pub fn file(name: &str, size: u64) -> Self {
        Self::File(FileEntry::new(name, size))
    }

    pub fn size(&self) -> u64 {
        match self {
            Entry::Directory(dir) => dir.size(),
            Entry::File(file) => file.size(),
        }
    }
}

Both Entry::add_dir & Entry::add_file forward the call to DirEntry of raise an Err using the anyhow! macro. This macro is quite flexible & allows string formatitng like format!. The DirEntry contains a BTreeMap of the entries in the file system, it is the basic structure that makes up the file hierarchy. It maps a name (either a directory or file) to an Entry. Using enums like this is quite common, otherwise it would require a lot more effort to manage & store separate types. An enum can act as a common interface. For example the method Entry::size calls the size method on both variants, either on FileEntry or DirEntry.

That is even more code so it's even more urgent to have a test to check a few properties given by the sample input. We will also complete the parse method while we are at it.

fn parse(input: &str) -> Entry {
    let lines = input
        .lines()
        .map(str::trim)
        .filter(|l| !l.is_empty())
        .filter_map(|line| line_parser::line(line).ok())
        .collect::<Vec<_>>();

    let mut root = Entry::root();
    // we skipt the `$ cd /` line, it's the root of the file system
    build_filesystem(&mut root, &mut lines.iter().skip(1)).expect("Failed to build file hierarchy");
    root
}

#[cfg(test)]
mod tests {
    const INPUT: &str = r#"
        $ cd /
        $ ls
        dir a
        14848514 b.txt
        8504156 c.dat
        dir d
        $ cd a
        $ ls
        dir e
        29116 f
        2557 g
        62596 h.lst
        $ cd e
        $ ls
        584 i
        $ cd ..
        $ cd ..
        $ cd d
        $ ls
        4060174 j
        8033020 d.log
        5626152 d.ext
        7214296 k
    "#;

    #[test]
    fn check_tree_sizes() {
        let mut entry = parse(INPUT);
        assert_eq!(94853, entry.get_dir("a").unwrap().size());
        assert_eq!(48_381_165, entry.size());
    }
}

The test check_tree_sizes builds the file system hierarchy first and then checks a couple of expected sizes. This sets up the file system hierarchy, but we are still not done yet. The task is to find all directories that have a total size smaller than 100.000. It is best to use a traversal algorithm that returns all sizes of the directories.

The function needs to traverse the file system by directories, therefore we will take a similar approach as we did with the parameters of the build_filesystem function. A new Vec is allocated that will hold directory sizes. For this we will expand Entry by adding a traversal function named sizes.

impl Entry {
    pub fn sizes(&self, sizes: &mut Vec<u64>) {
        match self {
            Entry::Directory(dir) => {
                sizes.push(dir.size());
                for (_, entry) in dir.entries.iter() {
                    entry.sizes(sizes);
                }
            }
            Entry::File(_) => (),
        }
    }
}

This new method takes the sizes parameter and for each found directory will add a size to it. If the current entry is a DirEntry the size call will itself recursively calculate the total size of its subtree. All directory sizes are returned, but only a few will match the criteria. This will be done in the part1 function now.

fn part1(root: &Entry) -> u64 {
    let mut sizes = Vec::new();
    root.sizes(&mut sizes);
    sizes.iter().filter(|&&size| size < 100_0000).sum()
}

#[cfg(test)]
mod tests {
    #[test]
    fn check_part1() {
        let entry = parse(INPUT);
        assert_eq!(95437, part1(&entry));
    }
}

fn main() {
    let entry = parse(include_str!("input.txt"));
    println!("Part 1: {}", part1(&entry));
}

Thankfully this is quite terse code for once. First all sizes are determined given the root Entry then an iterator is applied on them to find all that are smaller than 100.000. Finally their values are summed up and returned.

Running the program outputs the first solution.

Part 2

The second part is slightly more complicated (of course). This time you need to choose a single directory for deletion. The total disk space is 70.000.000 in size, to run the update an unused space of 30.000.000 is required (the unit itself does not seem to matter). The task is to find the directory that will free up just enough space to fit the requirement of running the update. From all possible candidates the one that deletes the least is preferred.

In the example above the root directory contains a size of 48.381.165, which means the current unused space is of size 21.618.835. To fit the update requirement a directory has to be found that deletes at least 8.381.165 in size.

As in part one we get all directory sizes first, but this time we need to change the criteria for picking a single directory. First we need the total size of our file system built from the input file, the root directory. Then we need to determine the required space a directory needs to at least free up. Lastly all those candidates are determined and the one with the smallest size is our candidate.

fn part2(root: &Entry) -> u64 {
    let mut sizes = Vec::new();
    root.sizes(&mut sizes);

    let available_space = 70_000_000 - root.size();
    let required_space = 30_000_000 - available_space;

    sizes
        .into_iter()
        .filter(|&size| size > required_space)
        .min()
        .expect("No directory found")
}

fn main() {
    let entry = parse(include_str!("input.txt"));
    println!("Part 1: {}", part1(&entry));
    println!("Part 2: {}", part2(&entry));
}

#[cfg(test)]
mod tests {
    #[test]
    fn check_part2() {
        let entry = parse(INPUT);
        assert_eq!(24.933.642, part2(&entry));
    }
}

The algorithm first calculates the required space a directory would free up in space. Then all directories that would free up this amount of space are filtered and finally the min value is returned. This value is the lowest size that would sufficiently free up enough space to allow the update to run.

A test checks the expected value for the sample input. Running the program again outputs the second solution.