This is part of the series Advent of Code 2022.
Day 6 - Advent of Code 2022

The Advent Of Code challenge of day 6 is a shorter one. One of the elves gives you a handheld device with a lot features, among them a communication system. Unfortunately it's malfunctioning but the elves have confidence in you being able to fix it. The signal is a series of seemingly-random characters that it receives one at a time.

A single input line consisting of lower case letters represents a signal

bvwbjplbgvbhsrlpgdmjqwftvncz

To fix the communicator a subroutine is required that detects a start of packet in the given input datastream. A start of packet is indicated by a sequence of four letters that are all different.

Given the input sequence the subroutine needs to check each subsequence of four contiguous letters.

bvwbjplbgvbhsrlpgdmjqwftvncz

bvwb
 vwbj
  wbjp
   bjpl
    jplb
     plbg
      [..]

The subroutine needs to identify the first position where the four most recently received characters are all different (or unique), which in the sample above happens in subsequence vwbj. The task is to determine the first position of the start of packet marker. The given input file only contains a single line.

Part 1

We set up the project as before, add a scaffold to the main.rs. Today no additional crates will be required.

fn part1(line: &str) -> usize {
    todo!()
}

fn main() {
    let lines = include_str!("input.txt");
    println!("Part 1: {}", part1(&lines));
}

#[cfg(test)]
mod tests {
    use crate::*;

    #[test]
    fn check_part1() {
        assert_eq!(7, part1("mjqjpqmgbljsphdztnvjfqwrcgsmlb"));
        assert_eq!(5, part1("bvwbjplbgvbhsrlpgdmjqwftvncz"));
        assert_eq!(6, part1("nppdvjthqldpwncqszvftbrmjlhg"));
        assert_eq!(10, part1("nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg"));
        assert_eq!(11, part1("zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw"));
    }
}

This time there is nothing to parse, the content of the input file is loaded via include_str! from the input.txt file. The function part1 is prepared to receive the input string (&str) and returns the start of packet position (usize). The test module has a test to check all the given sample sequences.

Let's see what needs to be done now. The given string consists of lower case letters only. There are different options to iterate in groups of four along the sequence of letters. For example one option is to pick itertools::tuple_windows on the input to iterate over tuples of four elements. For this to work it makes sense to convert the given input string into a Iterator over char using chars. We could write a for loop to access all four elements using itertools::tuple_windows like in the following example:

fn part1(line: &str) -> usize {
    for (index, (c1, c2, c3, c4)) in input.chars().tuple_windows().enumerate() {
        // check all chars
    }
}

Rust's type inference detects that the call to tuple_windows will result in a tuple of four char entries, because the itertools crate supports a set of tuples with of different lengths. The enumerate call is to provide an index, the position into the list of chars.

A better option is to use Rust's slice primitive, a dynamically sized view into a contiguous sequence. It can be applied to our use case in the way where the given input string is transformed into a Vec<char>. Vec organizes its data contiguously in memory. Vec provides an interface to get a slice over its data.

For example the following code

let array = [1, 2, 3, 4, 5];
let slice = &[..2];

The first line creates a new fixed sized array named array. This can be easily borrowed as a slice. The second line creates a slice (named slice) that has a view of the first two elements [1, 2]. A slice is kind of a reference into data & does not have ownership. A slice can at most provide view on the whole array.

Fortunately for us Vec provides a method windows that fits our need. It returns an iterator over contiguous windows of a given length, where the windows partially overlap. This method is available because Vec implements the Deref trait that converts it to a slice. The method windows itself is implemented on the slice type. A good example on how this works can be found in the Rustonomicon book in chapter 9.

We have a way to iterate over windows of 4 letters, how can we determine that all letters are unique? Again there are multiple options to do this, for example a double nested for loop could iterate over all pairs of letters and compare them and if these are equal continue to the next window, otherwise return the index at which this occurred.

Another option is to use the HashSet type. One important characteristic is that is not necessarily emphasized in the documentation is that all items in a HashSet are unique. Inserting the same item again will not add a new item to the HashSet, in fact the insert returns a bool to indicate if the item was inserted or not. We will use this property and convert the sequence of four letters into a HashSet. When they all are unique the size of the HashSet matches the size of the window. A smaller size means the list contains duplicates.

One version using iterators can look like.

fn marker_pos(datastream: &str, distinct: usize) -> Option<usize> {
    let chars = datastream.chars().collect::<Vec<_>>();
    chars
        .windows(distinct)
        .enumerate()
        .map(|(index, seq)| (index, seq.iter().collect::<HashSet<_>>()))
        .find(|(_, seq)| seq.len() == distinct)
        .map(|(index, _)| index + distinct)
}

fn part1(line: &str) -> usize {
    marker_pos(line, 4).expect("Failed to find start of packet")
}

fn main() {
    let lines = include_str!("input.txt");
    println!("Part 1: {}", part1(&lines));
}

The marker_pos function takes the input, converts into a Vec<char> on the first line. Then a call to windows will generate the iterator we pass over, each iteration yields a slice of length four (distinct). The enumerate call here also yields the index to find the position the start of packet marker is found. map will insert all entries from the given slice (&[char]) into a HashSet. The find checks the condition of having all unique characters and returns true on the first occurrence. Lastly the final map will take the index and calculates the final position.

This seems still a bit verbose & there is something to eliminate from the iterator. As we are interested in the index position of the start of packet the position method is sufficient. It will return the index at which the given position evalutes to true, the case where the HashSet has exactly the size of distinct letters. Then there is no need to call enumerate to also pass the index in the iterator chain. The updated version looks then:

fn marker_pos(datastream: &str, distinct: usize) -> Option<usize> {
    let chars = datastream.chars().collect::<Vec<_>>();
    chars
        .windows(distinct)
        .map(|seq| seq.iter().collect::<HashSet<_>>())
        .position(|seq| seq.len() == distinct)
        .map(|pos| pos + distinct)
}

The tests should now pass & running the program should provide the first answer.

Part 2

Thankfully the second part is using the same algorithm but instead of checking a sequence of four characters for uniqueness, sequences of 14 characters need to be checked. Luckily for us by chance the function already has a parameter named distinct to specify the length of the window.

Therefore implementing the second version in part2 should be easy now.

fn part2(line: &str) -> usize {
    marker_pos(line, 14).expect("Failed to find start of packet")
}

fn main() {
    let lines = include_str!("input.txt");
    println!("Part 1: {}", part1(&lines));
    println!("Part 2: {}", part2(&lines));
}

#[cfg(test)]
mod tests {
    //

    #[test]
    fn check_part2() {
        assert_eq!(19, part2("mjqjpqmgbljsphdztnvjfqwrcgsmlb"));
        assert_eq!(23, part2("bvwbjplbgvbhsrlpgdmjqwftvncz"));
        assert_eq!(23, part2("nppdvjthqldpwncqszvftbrmjlhg"));
        assert_eq!(29, part2("nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg"));
        assert_eq!(26, part2("zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw"));
    }
}

The test checks the updated behavior for part 2, the part2 function works the same as part1 but passes a different size of characters to the marker_pos function. Tests should pass & running the program should output the second solution.