{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Groupby\n", "\n", "La méthode de regroupement par groupe vous permet de regrouper des lignes de données et d'appeler des fonctions d'agrégation." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "# Création dataframe\n", "data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],\n", " 'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],\n", " 'Sales':[200,120,340,124,243,350]}" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame(data)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Company Person Sales\n", "0 GOOG Sam 200\n", "1 GOOG Charlie 120\n", "2 MSFT Amy 340\n", "3 MSFT Vanessa 124\n", "4 FB Carl 243\n", "5 FB Sarah 350" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
CompanyPersonSales
0GOOGSam200
1GOOGCharlie120
2MSFTAmy340
3MSFTVanessa124
4FBCarl243
5FBSarah350
\n
" }, "metadata": {}, "execution_count": 3 } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Maintenant vous pouvez utiliser la méthode .groupby() pour regrouper les lignes en fonction d'un nom de colonne. Par exemple, groupons les personnes par société. Ceci créera un objet DataFrameGroupBy :**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 4 } ], "source": [ "df.groupby('Company')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vous pouvez sauvegarder cet objet comme nouvelle variable :" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "by_comp = df.groupby(\"Company\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Et ensuite, appelez les méthodes d'agrégation sur l'objet :" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Sales\n", "Company \n", "FB 296.5\n", "GOOG 160.0\n", "MSFT 232.0" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Sales
Company
FB296.5
GOOG160.0
MSFT232.0
\n
" }, "metadata": {}, "execution_count": 6 } ], "source": [ "by_comp.mean()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Sales\n", "Company \n", "FB 296.5\n", "GOOG 160.0\n", "MSFT 232.0" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Sales
Company
FB296.5
GOOG160.0
MSFT232.0
\n
" }, "metadata": {}, "execution_count": 7 } ], "source": [ "df.groupby('Company').mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Autres exemples de méthodes d'agrégation :" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Sales\n", "Company \n", "FB 75.660426\n", "GOOG 56.568542\n", "MSFT 152.735065" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Sales
Company
FB75.660426
GOOG56.568542
MSFT152.735065
\n
" }, "metadata": {}, "execution_count": 8 } ], "source": [ "by_comp.std()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Person Sales\n", "Company \n", "FB Carl 243\n", "GOOG Charlie 120\n", "MSFT Amy 124" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
PersonSales
Company
FBCarl243
GOOGCharlie120
MSFTAmy124
\n
" }, "metadata": {}, "execution_count": 9 } ], "source": [ "by_comp.min()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Person Sales\n", "Company \n", "FB Sarah 350\n", "GOOG Sam 200\n", "MSFT Vanessa 340" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
PersonSales
Company
FBSarah350
GOOGSam200
MSFTVanessa340
\n
" }, "metadata": {}, "execution_count": 10 } ], "source": [ "by_comp.max()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Person Sales\n", "Company \n", "FB 2 2\n", "GOOG 2 2\n", "MSFT 2 2" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
PersonSales
Company
FB22
GOOG22
MSFT22
\n
" }, "metadata": {}, "execution_count": 11 } ], "source": [ "by_comp.count()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Sales \n", " count mean std min 25% 50% 75% max\n", "Company \n", "FB 2.0 296.5 75.660426 243.0 269.75 296.5 323.25 350.0\n", "GOOG 2.0 160.0 56.568542 120.0 140.00 160.0 180.00 200.0\n", "MSFT 2.0 232.0 152.735065 124.0 178.00 232.0 286.00 340.0" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Sales
countmeanstdmin25%50%75%max
Company
FB2.0296.575.660426243.0269.75296.5323.25350.0
GOOG2.0160.056.568542120.0140.00160.0180.00200.0
MSFT2.0232.0152.735065124.0178.00232.0286.00340.0
\n
" }, "metadata": {}, "execution_count": 12 } ], "source": [ "by_comp.describe()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Company FB GOOG MSFT\n", "Sales count 2.000000 2.000000 2.000000\n", " mean 296.500000 160.000000 232.000000\n", " std 75.660426 56.568542 152.735065\n", " min 243.000000 120.000000 124.000000\n", " 25% 269.750000 140.000000 178.000000\n", " 50% 296.500000 160.000000 232.000000\n", " 75% 323.250000 180.000000 286.000000\n", " max 350.000000 200.000000 340.000000" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
CompanyFBGOOGMSFT
Salescount2.0000002.0000002.000000
mean296.500000160.000000232.000000
std75.66042656.568542152.735065
min243.000000120.000000124.000000
25%269.750000140.000000178.000000
50%296.500000160.000000232.000000
75%323.250000180.000000286.000000
max350.000000200.000000340.000000
\n
" }, "metadata": {}, "execution_count": 13 } ], "source": [ "by_comp.describe().transpose()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Sales count 2.000000\n", " mean 160.000000\n", " std 56.568542\n", " min 120.000000\n", " 25% 140.000000\n", " 50% 160.000000\n", " 75% 180.000000\n", " max 200.000000\n", "Name: GOOG, dtype: float64" ] }, "metadata": {}, "execution_count": 14 } ], "source": [ "by_comp.describe().transpose()['GOOG']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Bon travail!" ] } ], "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3.7.9 64-bit ('pyfinance': conda)", "metadata": { "interpreter": { "hash": "e89404a230d8800c54ad520c7b67d1bd9bb833a07b37dd3e521a178a3dc34904" } } }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9-final" } }, "nbformat": 4, "nbformat_minor": 1 }