Containers are data types intended to hold a collection of data.
List
Tuple
Set
Dictionary
ordered, mutable, allows duplicates
not ordered, mutable, no duplicates, sorted
ordered, immutable, allows duplicates
(Can’t modify. Read only)
count
index
not ordered, mutable, no duplicate keys
Packing and unpacking values
# I am interested only in what is at the head and tail of the list l = [1, 2, 3, 4, 5, 6, 7, 20] head, *body, tail = l print(head, tail)
1 20
Formatted String
Formatted String Literals
Introduced in Python 3.6, f-strings offer several benefits over the older .format() string method.
name = 'Antonio'
# Using the old .format() method: print('His name is {}.'.format(name))
# Using f-strings: print(f'His name is {name}.')
Pass !r to get the string representation:
print(f'His name is {name!r}')
His name is 'Antonio'
Be careful not to let quotation marks in the replacement fields conflict with the quoting used in the outer string:
d = {'first':123,'second':456} # wrong print(f'Address: {d['first']} Main Street') # right print(f"Address: {d['first']} Main Street")
Minimum Widths, Alignment and Padding
To set the alignment, use the character < for left-align, ^ for center, > for right.
To set padding, precede the alignment character with the padding character (- and . are common choices).
Text Files
# I am creating a file using ipython # This function is specific to jupyter notebooks # Alternatively, quickly create a file using a text editor. %%writefile test.txt Hello, this is first line This is second line
%%writefile -a test.txt
PDF Files
using PyPDF2
pip install PyPDF2
Reading PDFs
import PyPDF2
f = open('Magna-carta-translation.pdf','rb') pdf_reader = PyPDF2.PdfFileReader(f) page_one = pdf_reader.getPage(0)
What if we wanted to do two tasks, find phone numbers, but also be able to quickly extract their area code (the first three digits). We can use groups for any general task that involves grouping together regular expressions (so that we can later break them down).
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})') results = re.search(phone_pattern,text) # The entire result results.group()
'614-292-5800'
# Can then also call by group position. # remember groups were separated by parentheses () # Something to note is that group ordering starts at 1. Passing in 0 returns everything results.group(1)
'614'
Additional Regax Syntax
Or operator |
Use the pipe operator to have an or statement. For example:
re.search(r"man|woman","This man was here.")
The Wildcard Character
Use a “wildcard” as a placement that will match any character placed there. You can use a simple period . for this. For example:
re.findall(r".at","The cat in the hat sat here.") # ['cat', 'hat', 'sat'] re.findall(r"...at","The bat went splat") # ['e bat', 'splat']
\S:Non-whitespace
+:Occurs one or more times
# One or more non-whitespace that ends with 'at' re.findall(r'\S+at',"The bat went splat") # ['bat', 'splat']
Starts With and Ends With
We can use the ^ to signal starts with, and the $ to signal ends with.(for the entire string)
Exclusion
To exclude characters, we can use the ^ symbol in conjunction with a set of brackets []. Anything inside the brackets is excluded. For example:
phrase = "there are 3 numbers 34 inside 5 this sentence." re.findall(r'[^\d]+',phrase) # ['there are ', ' numbers ', ' inside ', ' this sentence.']
We can use this to remove punctuation from a sentence.
test_phrase = 'This is a string! But it has punctuation. How can we remove it?' clean = ' '.join(re.findall('[^!.? ]+',test_phrase)) # 'This is a string But it has punctuation How can we remove it'
Brackets for Grouping
As we showed above we can use brackets to group together options, for example if we wanted to find hyphenated words:
text = 'Only find the hypen-words in this sentence. But you do not know how long-ish they are' re.findall(r'[\w]+-[\w]+',text) # ['hypen-words', 'long-ish']
Parentheses for Multiple Options
If we have multiple options for matching, we can use parentheses to list out these options. For Example:
# Find words that start with cat and end with one of these options: 'fish','nap', or 'claw' text = 'Hello, would you like some catfish?' texttwo = "Hello, would you like to take a catnap?" textthree = "Hello, have you seen this caterpillar?" re.search(r'cat(fish|nap|claw)',text)