# pyskim: Quick summary statistics for dataframes When starting out with a new data analysis project, familiarizing yourself with your data set is, of course, crucial (besides obtaining domain specific knowledge, etc). This typically involves a set of somewhat repetitive steps such as value counts, histograms, scatterplots and so on. [pyskim](https://github.com/kpj/pyskim) helps you achieve that goal as quickly and comfortably as possible. Simply locate your CSV (or whichever delimiter is your favorite one) file and call `pyskim` from the commandline: ```bash $ pyskim iris.csv ── Data Summary ──────────────────────────────────────────────────────────────────────────────────── type value ----------------- ------- Number of rows 150 Number of columns 5 ────────────────────────────────────────────────── Column type frequency: Count ------- ------- float64 4 string 1 ── Variable type: number ─────────────────────────────────────────────────────────────────────────── name na_count mean sd p0 p25 p50 p75 p100 hist -- ------------ ---------- ------ ----- ---- ----- ----- ----- ------ ---------- 0 sepal_length 0 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▂▆▃▇▄▇▅▁▁▁ 1 sepal_width 0 3.06 0.436 2 2.8 3 3.3 4.4 ▁▁▄▅▇▆▂▂▁▁ 2 petal_length 0 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▃▁▁▂▅▆▄▃▁ 3 petal_width 0 1.2 0.762 0.1 0.3 1.3 1.8 2.5 ▇▂▁▂▂▆▁▄▂▃ ── Variable type: string ─────────────────────────────────────────────────────────────────────────── name na_count n_unique top_counts -- --------------- ---------- ---------- ----------------------------------------- 0 species 0 3 versicolor: 50, setosa: 50, virginica: 50 ``` It will tell you the most relevant dataframe properties at a glance, and additionally provide statistics for each column. Which statistics are computed depends on the column's datatype and is customizable to adapt to your own custom datatypes.