{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# DataFrames\n", "\n", "Les DataFrames sont le point central de pandas et sont directement inspirés par le langage de programmation R. Nous pouvons considérer un DataFrame comme un ensemble d'objets Series assemblés et qui partagent le même index. Utilisons pandas pour explorer ce sujet !" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from numpy.random import randn\n", "np.random.seed(101)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "A 2.706850 0.628133 0.907969 0.503826\n", "B 0.651118 -0.319318 -0.848077 0.605965\n", "C -2.018168 0.740122 0.528813 -0.589001\n", "D 0.188695 -0.758872 -0.933237 0.955057\n", "E 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 4 } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sélection et Indexation\n", "\n", "Apprenons les différentes méthodes pour récupérer des données à partir d'une DataFrame" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "A 2.706850\n", "B 0.651118\n", "C -2.018168\n", "D 0.188695\n", "E 0.190794\n", "Name: W, dtype: float64" ] }, "metadata": {}, "execution_count": 5 } ], "source": [ "df['W']" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W Z\n", "A 2.706850 0.503826\n", "B 0.651118 0.605965\n", "C -2.018168 -0.589001\n", "D 0.188695 0.955057\n", "E 0.190794 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WZ
A2.7068500.503826
B0.6511180.605965
C-2.018168-0.589001
D0.1886950.955057
E0.1907940.683509
\n
" }, "metadata": {}, "execution_count": 6 } ], "source": [ "# Passer une liste de noms de colonnes\n", "df[['W','Z']]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "A 2.706850\n", "B 0.651118\n", "C -2.018168\n", "D 0.188695\n", "E 0.190794\n", "Name: W, dtype: float64" ] }, "metadata": {}, "execution_count": 7 } ], "source": [ "# Syntaxe SQL (Non Recommandée!)\n", "df.W" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les colonnes d'un DataFrame sont juste des séries" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "pandas.core.series.Series" ] }, "metadata": {}, "execution_count": 8 } ], "source": [ "type(df['W'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Création d'une nouvelle colonne:**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "df['new'] = df['W'] + df['Y']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z new\n", "A 2.706850 0.628133 0.907969 0.503826 3.614819\n", "B 0.651118 -0.319318 -0.848077 0.605965 -0.196959\n", "C -2.018168 0.740122 0.528813 -0.589001 -1.489355\n", "D 0.188695 -0.758872 -0.933237 0.955057 -0.744542\n", "E 0.190794 1.978757 2.605967 0.683509 2.796762" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZnew
A2.7068500.6281330.9079690.5038263.614819
B0.651118-0.319318-0.8480770.605965-0.196959
C-2.0181680.7401220.528813-0.589001-1.489355
D0.188695-0.758872-0.9332370.955057-0.744542
E0.1907941.9787572.6059670.6835092.796762
\n
" }, "metadata": {}, "execution_count": 10 } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Supression d'une colonne**" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "A 2.706850 0.628133 0.907969 0.503826\n", "B 0.651118 -0.319318 -0.848077 0.605965\n", "C -2.018168 0.740122 0.528813 -0.589001\n", "D 0.188695 -0.758872 -0.933237 0.955057\n", "E 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 11 } ], "source": [ "df.drop('new',axis=1)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z new\n", "A 2.706850 0.628133 0.907969 0.503826 3.614819\n", "B 0.651118 -0.319318 -0.848077 0.605965 -0.196959\n", "C -2.018168 0.740122 0.528813 -0.589001 -1.489355\n", "D 0.188695 -0.758872 -0.933237 0.955057 -0.744542\n", "E 0.190794 1.978757 2.605967 0.683509 2.796762" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZnew
A2.7068500.6281330.9079690.5038263.614819
B0.651118-0.319318-0.8480770.605965-0.196959
C-2.0181680.7401220.528813-0.589001-1.489355
D0.188695-0.758872-0.9332370.955057-0.744542
E0.1907941.9787572.6059670.6835092.796762
\n
" }, "metadata": {}, "execution_count": 12 } ], "source": [ "# Pas de remplacement sauf si spécifié!\n", "df" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "df.drop('new',axis=1,inplace=True)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "A 2.706850 0.628133 0.907969 0.503826\n", "B 0.651118 -0.319318 -0.848077 0.605965\n", "C -2.018168 0.740122 0.528813 -0.589001\n", "D 0.188695 -0.758872 -0.933237 0.955057\n", "E 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 14 } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On peut aussi supprimer une ligne de cette façon:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "A 2.706850 0.628133 0.907969 0.503826\n", "B 0.651118 -0.319318 -0.848077 0.605965\n", "C -2.018168 0.740122 0.528813 -0.589001\n", "D 0.188695 -0.758872 -0.933237 0.955057" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
\n
" }, "metadata": {}, "execution_count": 15 } ], "source": [ "df.drop('E',axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Sélection de lignes**" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "W 2.706850\n", "X 0.628133\n", "Y 0.907969\n", "Z 0.503826\n", "Name: A, dtype: float64" ] }, "metadata": {}, "execution_count": 16 } ], "source": [ "df.loc['A']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ou sélectionner en fonction de la position au lieu de l'étiquette " ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "W -2.018168\n", "X 0.740122\n", "Y 0.528813\n", "Z -0.589001\n", "Name: C, dtype: float64" ] }, "metadata": {}, "execution_count": 17 } ], "source": [ "df.iloc[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Sélection d'un sous-ensemble de lignes et de colonnes**" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "-0.8480769834036315" ] }, "metadata": {}, "execution_count": 18 } ], "source": [ "df.loc['B','Y']" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W Y\n", "A 2.706850 0.907969\n", "B 0.651118 -0.848077" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WY
A2.7068500.907969
B0.651118-0.848077
\n
" }, "metadata": {}, "execution_count": 19 } ], "source": [ "df.loc[['A','B'],['W','Y']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sélection conditionnelle\n", "\n", "Une caractéristique importante de pandas est la sélection conditionnelle à l'aide des crochets, très similaire à celle de numpy :" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "A 2.706850 0.628133 0.907969 0.503826\n", "B 0.651118 -0.319318 -0.848077 0.605965\n", "C -2.018168 0.740122 0.528813 -0.589001\n", "D 0.188695 -0.758872 -0.933237 0.955057\n", "E 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 20 } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "A True True True True\n", "B True False False True\n", "C False True True False\n", "D True False False True\n", "E True True True True" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
ATrueTrueTrueTrue
BTrueFalseFalseTrue
CFalseTrueTrueFalse
DTrueFalseFalseTrue
ETrueTrueTrueTrue
\n
" }, "metadata": {}, "execution_count": 21 } ], "source": [ "df>0" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "A 2.706850 0.628133 0.907969 0.503826\n", "B 0.651118 NaN NaN 0.605965\n", "C NaN 0.740122 0.528813 NaN\n", "D 0.188695 NaN NaN 0.955057\n", "E 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118NaNNaN0.605965
CNaN0.7401220.528813NaN
D0.188695NaNNaN0.955057
E0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 22 } ], "source": [ "df[df>0]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "A 2.706850 0.628133 0.907969 0.503826\n", "B 0.651118 -0.319318 -0.848077 0.605965\n", "D 0.188695 -0.758872 -0.933237 0.955057\n", "E 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 23 } ], "source": [ "df[df['W']>0]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "A 0.907969\n", "B -0.848077\n", "D -0.933237\n", "E 2.605967\n", "Name: Y, dtype: float64" ] }, "metadata": {}, "execution_count": 24 } ], "source": [ "df[df['W']>0]['Y']" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Y X\n", "A 0.907969 0.628133\n", "B -0.848077 -0.319318\n", "D -0.933237 -0.758872\n", "E 2.605967 1.978757" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
YX
A0.9079690.628133
B-0.848077-0.319318
D-0.933237-0.758872
E2.6059671.978757
\n
" }, "metadata": {}, "execution_count": 25 } ], "source": [ "df[df['W']>0][['Y','X']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour 2 conditions, vous pouvez utiliser | et & avec des parenthèses:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "E 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
E0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 26 } ], "source": [ "df[(df['W']>0) & (df['Y'] > 1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plus de détails sur l'Index\n", "\n", "Discutons d'autres caractéristiques de l'indexation, y compris la réinitialisation de l'index ou la définition d'une autre fonction. Nous parlerons aussi de la hiérarchie des indices !" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "A 2.706850 0.628133 0.907969 0.503826\n", "B 0.651118 -0.319318 -0.848077 0.605965\n", "C -2.018168 0.740122 0.528813 -0.589001\n", "D 0.188695 -0.758872 -0.933237 0.955057\n", "E 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 27 } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " index W X Y Z\n", "0 A 2.706850 0.628133 0.907969 0.503826\n", "1 B 0.651118 -0.319318 -0.848077 0.605965\n", "2 C -2.018168 0.740122 0.528813 -0.589001\n", "3 D 0.188695 -0.758872 -0.933237 0.955057\n", "4 E 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
indexWXYZ
0A2.7068500.6281330.9079690.503826
1B0.651118-0.319318-0.8480770.605965
2C-2.0181680.7401220.528813-0.589001
3D0.188695-0.758872-0.9332370.955057
4E0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 28 } ], "source": [ "# Réinitialisation de l'indice par défaut 0,1...n\n", "df.reset_index()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "newind = 'CA NY WY OR CO'.split()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "df['States'] = newind" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z States\n", "A 2.706850 0.628133 0.907969 0.503826 CA\n", "B 0.651118 -0.319318 -0.848077 0.605965 NY\n", "C -2.018168 0.740122 0.528813 -0.589001 WY\n", "D 0.188695 -0.758872 -0.933237 0.955057 OR\n", "E 0.190794 1.978757 2.605967 0.683509 CO" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZStates
A2.7068500.6281330.9079690.503826CA
B0.651118-0.319318-0.8480770.605965NY
C-2.0181680.7401220.528813-0.589001WY
D0.188695-0.758872-0.9332370.955057OR
E0.1907941.9787572.6059670.683509CO
\n
" }, "metadata": {}, "execution_count": 31 } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "States \n", "CA 2.706850 0.628133 0.907969 0.503826\n", "NY 0.651118 -0.319318 -0.848077 0.605965\n", "WY -2.018168 0.740122 0.528813 -0.589001\n", "OR 0.188695 -0.758872 -0.933237 0.955057\n", "CO 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
States
CA2.7068500.6281330.9079690.503826
NY0.651118-0.319318-0.8480770.605965
WY-2.0181680.7401220.528813-0.589001
OR0.188695-0.758872-0.9332370.955057
CO0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 32 } ], "source": [ "df.set_index('States')" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z States\n", "A 2.706850 0.628133 0.907969 0.503826 CA\n", "B 0.651118 -0.319318 -0.848077 0.605965 NY\n", "C -2.018168 0.740122 0.528813 -0.589001 WY\n", "D 0.188695 -0.758872 -0.933237 0.955057 OR\n", "E 0.190794 1.978757 2.605967 0.683509 CO" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZStates
A2.7068500.6281330.9079690.503826CA
B0.651118-0.319318-0.8480770.605965NY
C-2.0181680.7401220.528813-0.589001WY
D0.188695-0.758872-0.9332370.955057OR
E0.1907941.9787572.6059670.683509CO
\n
" }, "metadata": {}, "execution_count": 33 } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "df.set_index('States',inplace=True)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " W X Y Z\n", "States \n", "CA 2.706850 0.628133 0.907969 0.503826\n", "NY 0.651118 -0.319318 -0.848077 0.605965\n", "WY -2.018168 0.740122 0.528813 -0.589001\n", "OR 0.188695 -0.758872 -0.933237 0.955057\n", "CO 0.190794 1.978757 2.605967 0.683509" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WXYZ
States
CA2.7068500.6281330.9079690.503826
NY0.651118-0.319318-0.8480770.605965
WY-2.0181680.7401220.528813-0.589001
OR0.188695-0.758872-0.9332370.955057
CO0.1907941.9787572.6059670.683509
\n
" }, "metadata": {}, "execution_count": 35 } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multi-index et hiérarchie des indices\n", "\n", "Voyons comment travailler avec un Multi-Index, nous allons d'abord créer un exemple rapide de ce à quoi ressemblerait un DataFrame Multi-Indexé :" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# Niveaux d'Index\n", "outside = ['G1','G1','G1','G2','G2','G2']\n", "inside = [1,2,3,1,2,3]\n", "hier_index = list(zip(outside,inside))\n", "hier_index = pd.MultiIndex.from_tuples(hier_index)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "MultiIndex([('G1', 1),\n", " ('G1', 2),\n", " ('G1', 3),\n", " ('G2', 1),\n", " ('G2', 2),\n", " ('G2', 3)],\n", " )" ] }, "metadata": {}, "execution_count": 37 } ], "source": [ "hier_index" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " A B\n", "G1 1 0.302665 1.693723\n", " 2 -1.706086 -1.159119\n", " 3 -0.134841 0.390528\n", "G2 1 0.166905 0.184502\n", " 2 0.807706 0.072960\n", " 3 0.638787 0.329646" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
AB
G110.3026651.693723
2-1.706086-1.159119
3-0.1348410.390528
G210.1669050.184502
20.8077060.072960
30.6387870.329646
\n
" }, "metadata": {}, "execution_count": 38 } ], "source": [ "df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Maintenant, montrons comment indexer ceci ! Pour la hiérarchie d'index nous utilisons df.loc[], si c'était sur l'axe des colonnes, vous n'utiliseriez que la notation normale entre crochets df[]. L'appel d'un niveau de l'index retourne un sous-dataframe :" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " A B\n", "1 0.302665 1.693723\n", "2 -1.706086 -1.159119\n", "3 -0.134841 0.390528" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
AB
10.3026651.693723
2-1.706086-1.159119
3-0.1348410.390528
\n
" }, "metadata": {}, "execution_count": 39 } ], "source": [ "df.loc['G1']" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "A 0.302665\n", "B 1.693723\n", "Name: 1, dtype: float64" ] }, "metadata": {}, "execution_count": 40 } ], "source": [ "df.loc['G1'].loc[1]" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "FrozenList([None, None])" ] }, "metadata": {}, "execution_count": 41 } ], "source": [ "df.index.names" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "df.index.names = ['Group','Num']" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " A B\n", "Group Num \n", "G1 1 0.302665 1.693723\n", " 2 -1.706086 -1.159119\n", " 3 -0.134841 0.390528\n", "G2 1 0.166905 0.184502\n", " 2 0.807706 0.072960\n", " 3 0.638787 0.329646" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
AB
GroupNum
G110.3026651.693723
2-1.706086-1.159119
3-0.1348410.390528
G210.1669050.184502
20.8077060.072960
30.6387870.329646
\n
" }, "metadata": {}, "execution_count": 43 } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " A B\n", "Num \n", "1 0.302665 1.693723\n", "2 -1.706086 -1.159119\n", "3 -0.134841 0.390528" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
AB
Num
10.3026651.693723
2-1.706086-1.159119
3-0.1348410.390528
\n
" }, "metadata": {}, "execution_count": 44 } ], "source": [ "df.xs('G1')" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "A 0.302665\n", "B 1.693723\n", "Name: (G1, 1), dtype: float64" ] }, "metadata": {}, "execution_count": 45 } ], "source": [ "df.xs(['G1',1])" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " A B\n", "Group \n", "G1 0.302665 1.693723\n", "G2 0.166905 0.184502" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
AB
Group
G10.3026651.693723
G20.1669050.184502
\n
" }, "metadata": {}, "execution_count": 46 } ], "source": [ "df.xs(1,level='Num')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Bon travail!" ] } ], "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3.7.9 64-bit ('pyfinance': conda)", "metadata": { "interpreter": { "hash": "e89404a230d8800c54ad520c7b67d1bd9bb833a07b37dd3e521a178a3dc34904" } } }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9-final" } }, "nbformat": 4, "nbformat_minor": 1 }