Dictionaries#

Programming for Geoscientists Data Science and Machine Learning for Geoscientists

A dictionary is a Python data structure that can store data as key-value pairs. The syntax is:

    dict1 = {key1: value1, key2: value2, key3: value3, ...}

Keys can be strings or numbers and values can be anything: strings, numbers, lists, arrays, etc. Keys must be unique - if you set it twice, the second value replaces the first.

rocks_dict = {"basalt": 1, "granite": 2,
              "marl": 3, "gneiss": 4, 
              "shale": 5}
print(rocks_dict)
{'basalt': 1, 'granite': 2, 'marl': 3, 'gneiss': 4, 'shale': 5}

We can access and modify values based on their key:

# Access value with key 'basalt'
print(rocks_dict["basalt"])

# Create a new key 'sandstone' with value 6
rocks_dict["sandstone"] = 6
print(rocks_dict)

# Add another key/valye pair to the dictionary
rocks_dict.update({"schist": 7})
print(rocks_dict)

# Remove new entry
del rocks_dict["sandstone"]
print(rocks_dict)

# Remove entry
rocks_dict.pop("schist")
print(rocks_dict)
1
{'basalt': 1, 'granite': 2, 'marl': 3, 'gneiss': 4, 'shale': 5, 'sandstone': 6}
{'basalt': 1, 'granite': 2, 'marl': 3, 'gneiss': 4, 'shale': 5, 'sandstone': 6, 'schist': 7}
{'basalt': 1, 'granite': 2, 'marl': 3, 'gneiss': 4, 'shale': 5, 'schist': 7}
{'basalt': 1, 'granite': 2, 'marl': 3, 'gneiss': 4, 'shale': 5}

We can also search and iterate over keys:

# Search if key 'granite' exists
if "granite" in rocks_dict:
    print(rocks_dict["granite"])
    
# Iterate over keys in rocks_dict
for key in rocks_dict:
    print(key, rocks_dict[key])
2
basalt 1
granite 2
marl 3
gneiss 4
shale 5

Exercises#


  • Countries per continent Question very similair to the one in File Handling exercises. Change the following code, so that the result is a dictionary:

from pandas import read_csv

df = read_csv('Data\\CountryContinent.csv')

continents = df['Continent_Name'].unique()  # list of continent names from the file
res = [[continent, 0] for continent in continents]  # initial list, not counted yet

for index, row in df.iterrows():
    if row["Three_Letter_Country_Code"] != "nan":

        for j in range(len(res)):
            if row["Continent_Name"] == res[j][0]:  # find correct continent
                res[j][1] += 1  # increase country count by 1

print(res)
[['Asia', 58], ['Europe', 57], ['Antarctica', 5], ['Africa', 58], ['Oceania', 27], ['North America', 43], ['South America', 14]]