{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", " \n", "___" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Groupby\n", "\n", "The groupby method allows you to group rows of data together and call aggregate functions" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "# Create dataframe\n", "data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],\n", " 'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],\n", " 'Sales':[200,120,340,124,243,350]}" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df = pd.DataFrame(data)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CompanyPersonSales
0GOOGSam200
1GOOGCharlie120
2MSFTAmy340
3MSFTVanessa124
4FBCarl243
5FBSarah350
\n", "
" ], "text/plain": [ " Company Person Sales\n", "0 GOOG Sam 200\n", "1 GOOG Charlie 120\n", "2 MSFT Amy 340\n", "3 MSFT Vanessa 124\n", "4 FB Carl 243\n", "5 FB Sarah 350" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Now you can use the .groupby() method to group rows together based off of a column name. For instance let's group based off of Company. This will create a DataFrameGroupBy object:**" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Company')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can save this object as a new variable:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": true }, "outputs": [], "source": [ "by_comp = df.groupby(\"Company\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then call aggregate methods off the object:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sales
Company
FB296.5
GOOG160.0
MSFT232.0
\n", "
" ], "text/plain": [ " Sales\n", "Company \n", "FB 296.5\n", "GOOG 160.0\n", "MSFT 232.0" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_comp.mean()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sales
Company
FB296.5
GOOG160.0
MSFT232.0
\n", "
" ], "text/plain": [ " Sales\n", "Company \n", "FB 296.5\n", "GOOG 160.0\n", "MSFT 232.0" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Company').mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More examples of aggregate methods:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sales
Company
FB75.660426
GOOG56.568542
MSFT152.735065
\n", "
" ], "text/plain": [ " Sales\n", "Company \n", "FB 75.660426\n", "GOOG 56.568542\n", "MSFT 152.735065" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_comp.std()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PersonSales
Company
FBCarl243
GOOGCharlie120
MSFTAmy124
\n", "
" ], "text/plain": [ " Person Sales\n", "Company \n", "FB Carl 243\n", "GOOG Charlie 120\n", "MSFT Amy 124" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_comp.min()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PersonSales
Company
FBSarah350
GOOGSam200
MSFTVanessa340
\n", "
" ], "text/plain": [ " Person Sales\n", "Company \n", "FB Sarah 350\n", "GOOG Sam 200\n", "MSFT Vanessa 340" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_comp.max()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PersonSales
Company
FB22
GOOG22
MSFT22
\n", "
" ], "text/plain": [ " Person Sales\n", "Company \n", "FB 2 2\n", "GOOG 2 2\n", "MSFT 2 2" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_comp.count()" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sales
Company
FBcount2.000000
mean296.500000
std75.660426
min243.000000
25%269.750000
50%296.500000
75%323.250000
max350.000000
GOOGcount2.000000
mean160.000000
std56.568542
min120.000000
25%140.000000
50%160.000000
75%180.000000
max200.000000
MSFTcount2.000000
mean232.000000
std152.735065
min124.000000
25%178.000000
50%232.000000
75%286.000000
max340.000000
\n", "
" ], "text/plain": [ " Sales\n", "Company \n", "FB count 2.000000\n", " mean 296.500000\n", " std 75.660426\n", " min 243.000000\n", " 25% 269.750000\n", " 50% 296.500000\n", " 75% 323.250000\n", " max 350.000000\n", "GOOG count 2.000000\n", " mean 160.000000\n", " std 56.568542\n", " min 120.000000\n", " 25% 140.000000\n", " 50% 160.000000\n", " 75% 180.000000\n", " max 200.000000\n", "MSFT count 2.000000\n", " mean 232.000000\n", " std 152.735065\n", " min 124.000000\n", " 25% 178.000000\n", " 50% 232.000000\n", " 75% 286.000000\n", " max 340.000000" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_comp.describe()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CompanyFBGOOGMSFT
countmeanstdmin25%50%75%maxcountmean...75%maxcountmeanstdmin25%50%75%max
Sales2.0296.575.660426243.0269.75296.5323.25350.02.0160.0...180.0200.02.0232.0152.735065124.0178.0232.0286.0340.0
\n", "

1 rows × 24 columns

\n", "
" ], "text/plain": [ "Company FB GOOG \\\n", " count mean std min 25% 50% 75% max count \n", "Sales 2.0 296.5 75.660426 243.0 269.75 296.5 323.25 350.0 2.0 \n", "\n", "Company ... MSFT \\\n", " mean ... 75% max count mean std min 25% \n", "Sales 160.0 ... 180.0 200.0 2.0 232.0 152.735065 124.0 178.0 \n", "\n", "Company \n", " 50% 75% max \n", "Sales 232.0 286.0 340.0 \n", "\n", "[1 rows x 24 columns]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_comp.describe().transpose()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
Sales2.0160.056.568542120.0140.0160.0180.0200.0
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "Sales 2.0 160.0 56.568542 120.0 140.0 160.0 180.0 200.0" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_comp.describe().transpose()['GOOG']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Great Job!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }