Summary statistics
Contents
Summary statistics#
Programming for Geoscientists Data Science and Machine Learning for Geoscientists
Pandas have built-in functions that can calculate simple statistics:
df.describe()
For numeric data, this will return count, median, standard deviation, minimum, maximum, 25, 50 and 75 percentiles. For strings/timestamps, this will return count, unique, top and frequency.
Let’s load New Zealand earthquake data:
import pandas as pd
nz_eqs = pd.read_csv("../../geosciences/data/nz_largest_eq_since_1970.csv")
nz_eqs.head(4)
year | month | day | utc_time | mag | lat | lon | depth_km | region | iris_id | timestamp | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2009 | 7 | 15 | 09:22:31 | 7.8 | -45.8339 | 166.6363 | 20.9 | OFF W. COAST OF S. ISLAND, N.Z. | 2871698 | 1247649751 |
1 | 2016 | 11 | 13 | 11:02:59 | 7.8 | -42.7245 | 173.0647 | 22.0 | SOUTH ISLAND, NEW ZEALAND | 5197722 | 1479034979 |
2 | 2003 | 8 | 21 | 12:12:47 | 7.2 | -45.0875 | 167.0892 | 6.8 | SOUTH ISLAND, NEW ZEALAND | 1628007 | 1061467967 |
3 | 2001 | 8 | 21 | 06:52:06 | 7.1 | -36.8010 | -179.7230 | 33.5 | EAST OF NORTH ISLAND, N.Z. | 1169374 | 998376726 |
nz_eqs.describe()
year | month | day | mag | lat | lon | depth_km | iris_id | timestamp | |
---|---|---|---|---|---|---|---|---|---|
count | 25000.000000 | 25000.000000 | 25000.000000 | 25000.000000 | 25000.000000 | 25000.000000 | 25000.000000 | 2.500000e+04 | 2.500000e+04 |
mean | 1993.862160 | 6.408760 | 15.384160 | 4.270952 | -38.939428 | 130.757907 | 94.014232 | 2.285625e+06 | 7.684759e+08 |
std | 12.733297 | 3.512482 | 8.814035 | 0.356037 | 3.278140 | 117.371409 | 94.284137 | 2.562292e+06 | 4.021305e+08 |
min | 1970.000000 | 1.000000 | 1.000000 | 3.900000 | -47.952400 | -179.999000 | 0.000000 | 1.034600e+04 | 2.629760e+05 |
25% | 1984.000000 | 3.000000 | 8.000000 | 4.000000 | -40.537000 | 169.977150 | 12.000000 | 4.046555e+05 | 4.698424e+08 |
50% | 1995.000000 | 7.000000 | 15.000000 | 4.200000 | -38.063050 | 175.867700 | 42.000000 | 1.608522e+06 | 7.920289e+08 |
75% | 2003.000000 | 9.000000 | 23.000000 | 4.400000 | -36.864300 | 177.507000 | 170.100000 | 3.059155e+06 | 1.061565e+09 |
max | 2020.000000 | 12.000000 | 31.000000 | 7.800000 | -33.608600 | 180.000000 | 665.100000 | 1.124420e+07 | 1.590893e+09 |
Pandas also has separate built-in functions that can calculate these statistics, e.g. to get mean magnitude we can simply call:
nz_eqs["mag"].mean()
4.27095199999966
For full tutorial see Descriptive Statistics chapter in Pandas documentation.
References#
The notebook was compiled based on: