Advent of Code 2020 (Day 3&4)

04 Dec 2020

Reading time ~9 minutes

Day 3

Anddd we’re off! This time, we’re travelling down the slope at a certain angle, but need to account for the trees in our path. We’re given a map showing open squares (.) and trees (#), as such:

    ..##.......
    #...#...#..
    .#....#..#.
    ..#.#...#.#
    .#...##..#.
    ..#.##.....
    .#.#.#....#
    .#........#
    #.##...#...
    #...##....#
    .#..#...#.#

Of course, it’s a magical island, with infinite width - having the same pattern repeating multiple times.

    ..##.........##.........##.........##.........##.........##.......  --->
    #...#...#..#...#...#..#...#...#..#...#...#..#...#...#..#...#...#..
    .#....#..#..#....#..#..#....#..#..#....#..#..#....#..#..#....#..#.
    ..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#
    .#...##..#..#...##..#..#...##..#..#...##..#..#...##..#..#...##..#.
    ..#.##.......#.##.......#.##.......#.##.......#.##.......#.##.....  --->
    .#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#
    .#........#.#........#.#........#.#........#.#........#.#........#
    #.##...#...#.##...#...#.##...#...#.##...#...#.##...#...#.##...#...
    #...##....##...##....##...##....##...##....##...##....##...##....#
    .#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#  --->

Our task is to slide down the slope and count the number of trees we encounter along the way! But first, we import the data from our file:

# import data
with open("input.txt", 'r') as file:
    arr = file.read().splitlines()

arr[0:3] # print first 3 lines

['...#.....#.......##......#.....',
 '...#..................#........',
 '....##....#.......#............']

Part 1

The toboggan can only follow a few specific slopes; count all the trees you would encounter for the slope **right 3, down 1**.

Righty, seems easy enough! We just need to keep track of our position and move 3 times to the right each line - if we hit the end of the file, we can “wrap around” this magic island by using a modulus (%) to make the remainder our “wrapped around” position.

position = 0
count = 0

for line in arr:
    if line[position] == "#":
        count += 1
    
    # wrap around if we hit the end of the line
    position = (position + 3)%len(line)
    
print(count) 

We simply count the number of trees we encounter and get our answer!

Ans: 220

Part 2

Time to check the rest of the slopes - you need to minimize the probability of a sudden arboreal stop, after all.

Tl;dr Count the number of trees per given slope, and find the product of them.

Right 1, down 1.
Right 3, down 1. (This is the slope that we just checked.)
Right 5, down 1.
Right 7, down 1.
Right 1, down 2.

Alrighty! Most of the slopes are trivial (i.e. adding 1/5/7 to our position instead of 3), but the last slope is a little more interesting. With a slope of 1 right and 2 down, we need to skip alternating lines. That’s not bad though - we can just add a line of code to select the nth alternating lines per n down, as such:

def checkSlope(right, down):
    # selects every line if down 1, selects alternating nth line for down = n
    arr_selected=arr[::down]
    position = 0
    count = 0

    for line in arr_selected:
        if line[position] == "#":
            count += 1
        # wrap around if we hit the end of the line
        position = (position + right)%len(line)
    return count

With that done, we just need to run our function per given slope, to get our answer!

slopes=[[3,1],[1,1],[5,1],[7,1],[1,2]]

prod = 1
for slope in slopes:
    treeNum = checkSlope(slope[0],slope[1])
    print("Right: ", slope[0],"Down: ", slope[1], "Trees: ", treeNum)
    prod *=treeNum   

print("Ans:", prod)

Right:  3 Down:  1 Trees:  220
Right:  1 Down:  1 Trees:  70
Right:  5 Down:  1 Trees:  63
Right:  7 Down:  1 Trees:  76
Right:  1 Down:  2 Trees:  29
Ans: 2138320800

Day 4

Passport Processing

Ooooo this one is interesting. We arrive at the airport and (as per usual) we’re needed for some help with the long lines at the passport scanners! (Okay, but ngl I’m getting throwback to those 3h long lines at TSA and… shudders). The automatic passport scanners are slow because they’re having trouble detecting which passports have all required fields. The expected fields are as follows:

byr (Birth Year)
iyr (Issue Year)
eyr (Expiration Year)
hgt (Height)
hcl (Hair Color)
ecl (Eye Color)
pid (Passport ID)
cid (Country ID)

Passport data is validated in batch files (your puzzle input). Each passport is represented as a sequence of key:value pairs separated by spaces or newlines. Passports are separated by blank lines. For example, in our input file, we have our first 3 entries:

eyr:2033
hgt:177cm pid:173cm
ecl:utc byr:2029 hcl:#efcc98 iyr:2023

pid:337605855 cid:249 byr:1952 hgt:155cm
ecl:grn iyr:2017 eyr:2026 hcl:#866857

cid:242 iyr:2011 pid:953198122 eyr:2029 ecl:blu hcl:#888785

Alrighty, let’s import these data! This was a little bit more convoluted than our import from our previous days - I used a couple of split operations to create a dictionary of key:value pairs per passport, and threw that into our best friend, a dataframe.

import pandas as pd 
import re

arr = []
# import data
with open("input.txt", 'r') as file:
    curDict = dict()
    for line in file:
        if line == "\n" or line == "":
            # new entry! add data to array and create empty dictionary
            arr.append(curDict)
            curDict = dict()
        else:
            # parse input into tuples and add into dictionary
            tuples=line.rstrip().split(' ')
            for i in tuples:
                i = i.split(':')
                curDict[i[0]]=i[1]

# Creates DataFrame. 
df = pd.DataFrame(arr) 
df.tail(3)

	eyr	hgt	pid	ecl	byr	hcl	iyr	cid
284	2039	72cm	#d80d30	#cc4aff	2025	5307c9	2025	224
285	2027	184	58151347	NaN	2015	98fb9d	2029	NaN
286	2021	183cm	164cm	xry	2019	#18171d	2013	187

Part 1

Our first task is to check for the validity of passports. Specifically, those that have all required fields. We are to treat cid as optional, but all other fields are necessary. In our batch file, how many passports are valid?

Passports are valid if they contain all fields except the country ID. Count all valid passports.

Thanks to the format of our data management from before, this is easy. We simply check for any rows that have null fields (ignoring the cid field). And thus, we have:

df['valid_p1']=~(df.drop(['cid'], axis=1)).isnull().any(axis=1)
df['valid_p1'].value_counts()

True     219
False     68
Name: valid_p1, dtype: int64

Ans: 219

Part 2

The line is moving more quickly now, but you overhear airport security talking about how passports with invalid data are getting through. Better add some data validation, quick!

Tl;dr caught doing lazy validation, time to fix our mistakes! We now have stricter rules for each field.

byr (Birth Year) - four digits; at least 1920 and at most 2002.
iyr (Issue Year) - four digits; at least 2010 and at most 2020.
eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
hgt (Height) - a number followed by either cm or in:
- If cm, the number must be at least 150 and at most 193.
- If in, the number must be at least 59 and at most 76.
hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
pid (Passport ID) - a nine-digit number, including leading zeroes.
cid (Country ID) - ignored, missing or not.

That’s more like it! For each field, we now have a specific format we want to match - sounds like a task perfect for RegEx.

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

Here are some examples of how to use RegEx from the w3schools page:

Character	Description	Example
[]	A set of characters	“[a-m]”
\	Signals a special sequence (can also be used to escape special characters)	“\d”
.	Any character (except newline character)	“he..o”
^	Starts with	“^hello”
$	Ends with	“world$”
*	Zero or more occurrences	“aix*”
+	One or more occurrences	“aix+”
{}	Exactly the specified number of occurrences	“al{2}”
\|	Either or	“falls\|stays”

Alrighty, let’s get right into it! We can start with the easier fields, like byr. We need byr to be a string that comprises of 4 digits and is between 1920 and 2002.

# byr (Birth Year) - four digits; at least 1920 and at most 2002.
byr_valid = re.findall("(19[2-9][0-9]|200[0-2])", byr)

Let me break down the above pattern: the first half (before |) asks RegEx to match the format 19(any digit between 2 and 9)(any digit between 0 and 9), i.e. any number between 1920 and 1999. The second half does the same thing, but for 2000-2002!

Even the more complicated looking fields like hgt aren’t too bad - we just need to add more “either” fields using |!

# byr (Birth Year) - four digits; at least 1920 and at most 2002.
hgt_valid = re.findall("(1[5-8][0-9]cm|19[0-3]cm|59in|6[0-9]in|7[0-6]in)", byr)

Here, we’re checking if the height falls between 150-189cm, or 190-193cm, or 59in, or 60-69in, or 70-76in. Not too bad once we get used to it!

The hardest field to check is the pid field - here, we want to make sure the field is a nine-digit number, including leading zeroes.

# pid (Passport ID) - a nine-digit number, including leading zeroes.
pid_valid = re.findall("^\d{9}$", pid))

The ^ operator signifies that the field has to start with the following expression, \d is a special sequence that returns only digits (the equivalent of [0-9]), {9} specifies 9 (digits), and the $ “end with” metacharacter matches the EOL character \n. This metacharacter is especially important to make sure the pattern ends with those 9 digits and nothing more - otherwise, we’ll end up accepting 10 digit PIDs!

Whew! With that all done, we can finish up our little passport validity function and output our answer.

def isPassportValid(eyr,hgt,pid,ecl,byr,hcl,iyr):
    validity = False

    if not(pd.isna(eyr) or pd.isna(hgt) or pd.isna(pid) or pd.isna(ecl) or pd.isna(byr) or pd.isna(hcl) or pd.isna(iyr)):
        passport_valid = []
        # byr (Birth Year) - four digits; at least 1920 and at most 2002.
        passport_valid.append(len(re.findall("(19[2-9][0-9]|200[0-2])", byr)) == 1)

        # iyr (Issue Year) - four digits; at least 2010 and at most 2020.
        passport_valid.append(len(re.findall("(201[0-9]|2020)", iyr)) == 1)

        # eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
        passport_valid.append(len(re.findall("(202[0-9]|2030)", eyr)) == 1)

        # hgt (Height) - a number followed by either cm or in:
            # If cm, the number must be at least 150 and at most 193.
            # If in, the number must be at least 59 and at most 76.
        passport_valid.append(len(re.findall("(1[5-8][0-9]cm|19[0-3]cm|59in|6[0-9]in|7[0-6]in)", hgt)) == 1)

        # hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
        passport_valid.append(len(re.findall("^#[a-f0-9]{6}", hcl)) == 1)

        # ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
        passport_valid.append(len(re.findall("(amb|blu|brn|gry|grn|hzl|oth)", ecl)) == 1)

        # pid (Passport ID) - a nine-digit number, including leading zeroes.
        passport_valid.append(len(re.findall("^\d{9}$", pid)) == 1)

        validity = not(False in passport_valid)

    return validity

df['valid_p2']=df.apply(lambda x: isPassportValid(x['eyr'], x['hgt'], x['pid'], x['ecl'], x['byr'], x['hcl'], x['iyr']), axis=1)
df['valid_p2'].value_counts()

False    160
True     127
Name: valid_p2, dtype: int64

Since I was new to RegEx, I had trouble debugging this one, and using this RegEx debugging tool (highly recommended!!!) helped me to figure out exactly what was wrong. It’s a really useful tool that I’ll be sure to use in the future - very very cool!

Thoughts after day 3 & 4: mhmmm text parsing, not too hot. but also, a newfound love for regex!