Summary statistics#

Programming for Geoscientists Data Science and Machine Learning for Geoscientists

Pandas have built-in functions that can calculate simple statistics:

df.describe()

For numeric data, this will return count, median, standard deviation, minimum, maximum, 25, 50 and 75 percentiles. For strings/timestamps, this will return count, unique, top and frequency.

Let’s load New Zealand earthquake data:

import pandas as pd
nz_eqs = pd.read_csv("../../geosciences/data/nz_largest_eq_since_1970.csv")
nz_eqs.head(4)

	year	month	day	utc_time	mag	lat	lon	depth_km	region	iris_id	timestamp
0	2009	7	15	09:22:31	7.8	-45.8339	166.6363	20.9	OFF W. COAST OF S. ISLAND, N.Z.	2871698	1247649751
1	2016	11	13	11:02:59	7.8	-42.7245	173.0647	22.0	SOUTH ISLAND, NEW ZEALAND	5197722	1479034979
2	2003	8	21	12:12:47	7.2	-45.0875	167.0892	6.8	SOUTH ISLAND, NEW ZEALAND	1628007	1061467967
3	2001	8	21	06:52:06	7.1	-36.8010	-179.7230	33.5	EAST OF NORTH ISLAND, N.Z.	1169374	998376726

nz_eqs.describe()

	year	month	day	mag	lat	lon	depth_km	iris_id	timestamp
count	25000.000000	25000.000000	25000.000000	25000.000000	25000.000000	25000.000000	25000.000000	2.500000e+04	2.500000e+04
mean	1993.862160	6.408760	15.384160	4.270952	-38.939428	130.757907	94.014232	2.285625e+06	7.684759e+08
std	12.733297	3.512482	8.814035	0.356037	3.278140	117.371409	94.284137	2.562292e+06	4.021305e+08
min	1970.000000	1.000000	1.000000	3.900000	-47.952400	-179.999000	0.000000	1.034600e+04	2.629760e+05
25%	1984.000000	3.000000	8.000000	4.000000	-40.537000	169.977150	12.000000	4.046555e+05	4.698424e+08
50%	1995.000000	7.000000	15.000000	4.200000	-38.063050	175.867700	42.000000	1.608522e+06	7.920289e+08
75%	2003.000000	9.000000	23.000000	4.400000	-36.864300	177.507000	170.100000	3.059155e+06	1.061565e+09
max	2020.000000	12.000000	31.000000	7.800000	-33.608600	180.000000	665.100000	1.124420e+07	1.590893e+09

Pandas also has separate built-in functions that can calculate these statistics, e.g. to get mean magnitude we can simply call:

nz_eqs["mag"].mean()

4.27095199999966

For full tutorial see Descriptive Statistics chapter in Pandas documentation.

References#

The notebook was compiled based on:

ESE Jupyter Material

Summary statistics

Contents

Summary statistics#

References#