The Advent Of Code challenge of day 6 is a shorter one. One of the elves gives you a handheld device with a lot features, among them a communication system. Unfortunately it's malfunctioning but the elves have confidence in you being able to fix it. The signal is a series of seemingly-random characters that it receives one at a time.
A single input line consisting of lower case letters represents a signal
bvwbjplbgvbhsrlpgdmjqwftvncz
To fix the communicator a subroutine is required that detects a start of packet in the given input datastream. A start of packet is indicated by a sequence of four letters that are all different.
Given the input sequence the subroutine needs to check each subsequence of four contiguous letters.
bvwbjplbgvbhsrlpgdmjqwftvncz
bvwb
vwbj
wbjp
bjpl
jplb
plbg
[..]
The subroutine needs to identify the first position where the four most recently received characters are all different (or unique),
which in the sample above happens in subsequence vwbj
.
The task is to determine the first position of the start of packet marker. The given input file only contains a single line.
Part 1
We set up the project as before, add a scaffold to the main.rs
. Today no additional crates will be required.
fn part1(line: &str) -> usize {
todo!()
}
fn main() {
let lines = include_str!("input.txt");
println!("Part 1: {}", part1(&lines));
}
#[cfg(test)]
mod tests {
use crate::*;
#[test]
fn check_part1() {
assert_eq!(7, part1("mjqjpqmgbljsphdztnvjfqwrcgsmlb"));
assert_eq!(5, part1("bvwbjplbgvbhsrlpgdmjqwftvncz"));
assert_eq!(6, part1("nppdvjthqldpwncqszvftbrmjlhg"));
assert_eq!(10, part1("nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg"));
assert_eq!(11, part1("zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw"));
}
}
This time there is nothing to parse, the content of the input file is loaded via include_str!
from the input.txt
file.
The function part1
is prepared to receive the input string (&str
) and returns the start of packet position (usize
).
The test module has a test to check all the given sample sequences.
Let's see what needs to be done now. The given string consists of lower case letters only. There are different options to
iterate in groups of four along the sequence of letters. For example one option is to pick itertools::tuple_windows
on the input to iterate over tuples of four elements. For this
to work it makes sense to convert the given input string into a Iterator over char
using chars
.
We could write a for
loop to access all four elements using itertools::tuple_windows
like in the following example:
fn part1(line: &str) -> usize {
for (index, (c1, c2, c3, c4)) in input.chars().tuple_windows().enumerate() {
// check all chars
}
}
Rust's type inference detects that the call to tuple_windows
will result in a tuple of four char
entries, because the itertools
crate supports a set of tuples with of different lengths. The enumerate
call is to provide an index, the position into the list of char
s.
A better option is to use Rust's slice
primitive, a dynamically sized view into a contiguous sequence. It can be applied to our use case in the way where the given input string is transformed into a Vec<char>
. Vec
organizes its data contiguously in memory. Vec
provides an interface to get a slice
over its data.
For example the following code
let array = [1, 2, 3, 4, 5];
let slice = &[..2];
The first line creates a new fixed sized array named array
. This can be easily borrowed as a slice
. The second line creates a slice (named slice
) that has a view of the first two elements [1, 2]
. A slice is kind of a reference into data & does not have ownership. A slice can at most provide view on the whole array.
Fortunately for us Vec
provides a method windows
that fits our need. It returns an iterator over contiguous windows of a given length, where the windows partially overlap. This method is available because Vec
implements the Deref
trait that converts it to a slice
. The method windows
itself is implemented on the slice
type. A good example on how this works can be found in the Rustonomicon book in chapter 9.
We have a way to iterate over windows of 4 letters, how can we determine that all letters are unique? Again there are multiple options to do this, for example a double nested for
loop could iterate over all pairs of letters and compare them and if these are equal continue to the next window, otherwise return the index at which this occurred.
Another option is to use the HashSet
type. One important characteristic is that is not necessarily emphasized in the documentation is that all items in a HashSet
are unique. Inserting the same item again will not add a new item to the HashSet
, in fact the insert
returns a bool
to indicate if the item was inserted or not.
We will use this property and convert the sequence of four letters into a HashSet
. When they all are unique the size of the HashSet
matches the size of the window. A smaller size means the list contains duplicates.
One version using iterators can look like.
fn marker_pos(datastream: &str, distinct: usize) -> Option<usize> {
let chars = datastream.chars().collect::<Vec<_>>();
chars
.windows(distinct)
.enumerate()
.map(|(index, seq)| (index, seq.iter().collect::<HashSet<_>>()))
.find(|(_, seq)| seq.len() == distinct)
.map(|(index, _)| index + distinct)
}
fn part1(line: &str) -> usize {
marker_pos(line, 4).expect("Failed to find start of packet")
}
fn main() {
let lines = include_str!("input.txt");
println!("Part 1: {}", part1(&lines));
}
The marker_pos
function takes the input, converts into a Vec<char>
on the first line.
Then a call to windows
will generate the iterator we pass over, each iteration yields a slice of length four (distinct
). The enumerate
call here also yields the index to find the position the start of packet marker is found.
map
will insert all entries from the given slice (&[char]
) into a HashSet
. The find
checks the condition of having all unique characters and returns true
on the first occurrence. Lastly the final map
will take the index and calculates the final position.
This seems still a bit verbose & there is something to eliminate from the iterator.
As we are interested in the index position of the start of packet the position
method is sufficient. It will return the index at which the given position evalutes to true
, the case where the HashSet
has exactly the size of distinct letters. Then there is no need to call enumerate
to also pass the index in the iterator chain. The updated version looks then:
fn marker_pos(datastream: &str, distinct: usize) -> Option<usize> {
let chars = datastream.chars().collect::<Vec<_>>();
chars
.windows(distinct)
.map(|seq| seq.iter().collect::<HashSet<_>>())
.position(|seq| seq.len() == distinct)
.map(|pos| pos + distinct)
}
The tests should now pass & running the program should provide the first answer.
Part 2
Thankfully the second part is using the same algorithm but instead of checking a sequence of four characters for uniqueness, sequences of 14 characters need to be checked. Luckily for us by chance the function already has a parameter named distinct
to specify the length of the window.
Therefore implementing the second version in part2
should be easy now.
fn part2(line: &str) -> usize {
marker_pos(line, 14).expect("Failed to find start of packet")
}
fn main() {
let lines = include_str!("input.txt");
println!("Part 1: {}", part1(&lines));
println!("Part 2: {}", part2(&lines));
}
#[cfg(test)]
mod tests {
//
#[test]
fn check_part2() {
assert_eq!(19, part2("mjqjpqmgbljsphdztnvjfqwrcgsmlb"));
assert_eq!(23, part2("bvwbjplbgvbhsrlpgdmjqwftvncz"));
assert_eq!(23, part2("nppdvjthqldpwncqszvftbrmjlhg"));
assert_eq!(29, part2("nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg"));
assert_eq!(26, part2("zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw"));
}
}
The test checks the updated behavior for part 2, the part2
function works the same as part1
but passes a different size of characters to the marker_pos
function. Tests should pass & running the program should output the second solution.