1139 lines
51 KiB
Plaintext
1139 lines
51 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"source": [
|
||
|
"# DataFrames\n",
|
||
|
"\n",
|
||
|
"Les DataFrames sont le point central de pandas et sont directement inspirés par le langage de programmation R. Nous pouvons considérer un DataFrame comme un ensemble d'objets Series assemblés et qui partagent le même index. Utilisons pandas pour explorer ce sujet !"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import pandas as pd\n",
|
||
|
"import numpy as np"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from numpy.random import randn\n",
|
||
|
"np.random.seed(101)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 4
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Sélection et Indexation\n",
|
||
|
"\n",
|
||
|
"Apprenons les différentes méthodes pour récupérer des données à partir d'une DataFrame"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"A 2.706850\n",
|
||
|
"B 0.651118\n",
|
||
|
"C -2.018168\n",
|
||
|
"D 0.188695\n",
|
||
|
"E 0.190794\n",
|
||
|
"Name: W, dtype: float64"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 5
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df['W']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 6,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W Z\n",
|
||
|
"A 2.706850 0.503826\n",
|
||
|
"B 0.651118 0.605965\n",
|
||
|
"C -2.018168 -0.589001\n",
|
||
|
"D 0.188695 0.955057\n",
|
||
|
"E 0.190794 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 6
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Passer une liste de noms de colonnes\n",
|
||
|
"df[['W','Z']]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 7,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"A 2.706850\n",
|
||
|
"B 0.651118\n",
|
||
|
"C -2.018168\n",
|
||
|
"D 0.188695\n",
|
||
|
"E 0.190794\n",
|
||
|
"Name: W, dtype: float64"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 7
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Syntaxe SQL (Non Recommandée!)\n",
|
||
|
"df.W"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Les colonnes d'un DataFrame sont juste des séries"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"pandas.core.series.Series"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 8
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"type(df['W'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**Création d'une nouvelle colonne:**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 9,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df['new'] = df['W'] + df['Y']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 10,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z new\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826 3.614819\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965 -0.196959\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001 -1.489355\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057 -0.744542\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509 2.796762"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n <th>new</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n <td>3.614819</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n <td>-0.196959</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n <td>-1.489355</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n <td>-0.744542</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n <td>2.796762</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 10
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**Supression d'une colonne**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 11,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 11
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.drop('new',axis=1)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 12,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z new\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826 3.614819\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965 -0.196959\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001 -1.489355\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057 -0.744542\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509 2.796762"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n <th>new</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n <td>3.614819</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n <td>-0.196959</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n <td>-1.489355</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n <td>-0.744542</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n <td>2.796762</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 12
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Pas de remplacement sauf si spécifié!\n",
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 13,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df.drop('new',axis=1,inplace=True)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 14,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 14
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"On peut aussi supprimer une ligne de cette façon:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 15,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 15
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.drop('E',axis=0)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**Sélection de lignes**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 16,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"W 2.706850\n",
|
||
|
"X 0.628133\n",
|
||
|
"Y 0.907969\n",
|
||
|
"Z 0.503826\n",
|
||
|
"Name: A, dtype: float64"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 16
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.loc['A']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Ou sélectionner en fonction de la position au lieu de l'étiquette "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 17,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"W -2.018168\n",
|
||
|
"X 0.740122\n",
|
||
|
"Y 0.528813\n",
|
||
|
"Z -0.589001\n",
|
||
|
"Name: C, dtype: float64"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 17
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.iloc[2]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**Sélection d'un sous-ensemble de lignes et de colonnes**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 18,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"-0.8480769834036315"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 18
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.loc['B','Y']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 19,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W Y\n",
|
||
|
"A 2.706850 0.907969\n",
|
||
|
"B 0.651118 -0.848077"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>Y</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.907969</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.848077</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 19
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.loc[['A','B'],['W','Y']]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Sélection conditionnelle\n",
|
||
|
"\n",
|
||
|
"Une caractéristique importante de pandas est la sélection conditionnelle à l'aide des crochets, très similaire à celle de numpy :"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 20,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 20
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 21,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"A True True True True\n",
|
||
|
"B True False False True\n",
|
||
|
"C False True True False\n",
|
||
|
"D True False False True\n",
|
||
|
"E True True True True"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>B</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>True</td>\n </tr>\n <tr>\n <th>C</th>\n <td>False</td>\n <td>True</td>\n <td>True</td>\n <td>False</td>\n </tr>\n <tr>\n <th>D</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>True</td>\n </tr>\n <tr>\n <th>E</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 21
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df>0"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 22,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"B 0.651118 NaN NaN 0.605965\n",
|
||
|
"C NaN 0.740122 0.528813 NaN\n",
|
||
|
"D 0.188695 NaN NaN 0.955057\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>C</th>\n <td>NaN</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 22
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df[df>0]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 23,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 23
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df[df['W']>0]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 24,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"A 0.907969\n",
|
||
|
"B -0.848077\n",
|
||
|
"D -0.933237\n",
|
||
|
"E 2.605967\n",
|
||
|
"Name: Y, dtype: float64"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 24
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df[df['W']>0]['Y']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 25,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" Y X\n",
|
||
|
"A 0.907969 0.628133\n",
|
||
|
"B -0.848077 -0.319318\n",
|
||
|
"D -0.933237 -0.758872\n",
|
||
|
"E 2.605967 1.978757"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Y</th>\n <th>X</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>0.907969</td>\n <td>0.628133</td>\n </tr>\n <tr>\n <th>B</th>\n <td>-0.848077</td>\n <td>-0.319318</td>\n </tr>\n <tr>\n <th>D</th>\n <td>-0.933237</td>\n <td>-0.758872</td>\n </tr>\n <tr>\n <th>E</th>\n <td>2.605967</td>\n <td>1.978757</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 25
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df[df['W']>0][['Y','X']]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Pour 2 conditions, vous pouvez utiliser | et & avec des parenthèses:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 26,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 26
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df[(df['W']>0) & (df['Y'] > 1)]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Plus de détails sur l'Index\n",
|
||
|
"\n",
|
||
|
"Discutons d'autres caractéristiques de l'indexation, y compris la réinitialisation de l'index ou la définition d'une autre fonction. Nous parlerons aussi de la hiérarchie des indices !"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 27,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 27
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 28,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" index W X Y Z\n",
|
||
|
"0 A 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"1 B 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"2 C -2.018168 0.740122 0.528813 -0.589001\n",
|
||
|
"3 D 0.188695 -0.758872 -0.933237 0.955057\n",
|
||
|
"4 E 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>index</th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>A</td>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>1</th>\n <td>B</td>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>2</th>\n <td>C</td>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>3</th>\n <td>D</td>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>4</th>\n <td>E</td>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 28
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Réinitialisation de l'indice par défaut 0,1...n\n",
|
||
|
"df.reset_index()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 29,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"newind = 'CA NY WY OR CO'.split()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 30,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df['States'] = newind"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 31,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z States\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826 CA\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965 NY\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001 WY\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057 OR\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509 CO"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n <th>States</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n <td>CA</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n <td>NY</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n <td>WY</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n <td>OR</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n <td>CO</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 31
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 32,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"States \n",
|
||
|
"CA 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"NY 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"WY -2.018168 0.740122 0.528813 -0.589001\n",
|
||
|
"OR 0.188695 -0.758872 -0.933237 0.955057\n",
|
||
|
"CO 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n <tr>\n <th>States</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>CA</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>NY</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>WY</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>OR</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>CO</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 32
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.set_index('States')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 33,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z States\n",
|
||
|
"A 2.706850 0.628133 0.907969 0.503826 CA\n",
|
||
|
"B 0.651118 -0.319318 -0.848077 0.605965 NY\n",
|
||
|
"C -2.018168 0.740122 0.528813 -0.589001 WY\n",
|
||
|
"D 0.188695 -0.758872 -0.933237 0.955057 OR\n",
|
||
|
"E 0.190794 1.978757 2.605967 0.683509 CO"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n <th>States</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n <td>CA</td>\n </tr>\n <tr>\n <th>B</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n <td>NY</td>\n </tr>\n <tr>\n <th>C</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n <td>WY</td>\n </tr>\n <tr>\n <th>D</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n <td>OR</td>\n </tr>\n <tr>\n <th>E</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n <td>CO</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 33
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 34,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df.set_index('States',inplace=True)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 35,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" W X Y Z\n",
|
||
|
"States \n",
|
||
|
"CA 2.706850 0.628133 0.907969 0.503826\n",
|
||
|
"NY 0.651118 -0.319318 -0.848077 0.605965\n",
|
||
|
"WY -2.018168 0.740122 0.528813 -0.589001\n",
|
||
|
"OR 0.188695 -0.758872 -0.933237 0.955057\n",
|
||
|
"CO 0.190794 1.978757 2.605967 0.683509"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>W</th>\n <th>X</th>\n <th>Y</th>\n <th>Z</th>\n </tr>\n <tr>\n <th>States</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>CA</th>\n <td>2.706850</td>\n <td>0.628133</td>\n <td>0.907969</td>\n <td>0.503826</td>\n </tr>\n <tr>\n <th>NY</th>\n <td>0.651118</td>\n <td>-0.319318</td>\n <td>-0.848077</td>\n <td>0.605965</td>\n </tr>\n <tr>\n <th>WY</th>\n <td>-2.018168</td>\n <td>0.740122</td>\n <td>0.528813</td>\n <td>-0.589001</td>\n </tr>\n <tr>\n <th>OR</th>\n <td>0.188695</td>\n <td>-0.758872</td>\n <td>-0.933237</td>\n <td>0.955057</td>\n </tr>\n <tr>\n <th>CO</th>\n <td>0.190794</td>\n <td>1.978757</td>\n <td>2.605967</td>\n <td>0.683509</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 35
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Multi-index et hiérarchie des indices\n",
|
||
|
"\n",
|
||
|
"Voyons comment travailler avec un Multi-Index, nous allons d'abord créer un exemple rapide de ce à quoi ressemblerait un DataFrame Multi-Indexé :"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 36,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Niveaux d'Index\n",
|
||
|
"outside = ['G1','G1','G1','G2','G2','G2']\n",
|
||
|
"inside = [1,2,3,1,2,3]\n",
|
||
|
"hier_index = list(zip(outside,inside))\n",
|
||
|
"hier_index = pd.MultiIndex.from_tuples(hier_index)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 37,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"MultiIndex([('G1', 1),\n",
|
||
|
" ('G1', 2),\n",
|
||
|
" ('G1', 3),\n",
|
||
|
" ('G2', 1),\n",
|
||
|
" ('G2', 2),\n",
|
||
|
" ('G2', 3)],\n",
|
||
|
" )"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 37
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"hier_index"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 38,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B\n",
|
||
|
"G1 1 0.302665 1.693723\n",
|
||
|
" 2 -1.706086 -1.159119\n",
|
||
|
" 3 -0.134841 0.390528\n",
|
||
|
"G2 1 0.166905 0.184502\n",
|
||
|
" 2 0.807706 0.072960\n",
|
||
|
" 3 0.638787 0.329646"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>A</th>\n <th>B</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th rowspan=\"3\" valign=\"top\">G1</th>\n <th>1</th>\n <td>0.302665</td>\n <td>1.693723</td>\n </tr>\n <tr>\n <th>2</th>\n <td>-1.706086</td>\n <td>-1.159119</td>\n </tr>\n <tr>\n <th>3</th>\n <td>-0.134841</td>\n <td>0.390528</td>\n </tr>\n <tr>\n <th rowspan=\"3\" valign=\"top\">G2</th>\n <th>1</th>\n <td>0.166905</td>\n <td>0.184502</td>\n </tr>\n <tr>\n <th>2</th>\n <td>0.807706</td>\n <td>0.072960</td>\n </tr>\n <tr>\n <th>3</th>\n <td>0.638787</td>\n <td>0.329646</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 38
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])\n",
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Maintenant, montrons comment indexer ceci ! Pour la hiérarchie d'index nous utilisons df.loc[], si c'était sur l'axe des colonnes, vous n'utiliseriez que la notation normale entre crochets df[]. L'appel d'un niveau de l'index retourne un sous-dataframe :"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 39,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B\n",
|
||
|
"1 0.302665 1.693723\n",
|
||
|
"2 -1.706086 -1.159119\n",
|
||
|
"3 -0.134841 0.390528"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>1</th>\n <td>0.302665</td>\n <td>1.693723</td>\n </tr>\n <tr>\n <th>2</th>\n <td>-1.706086</td>\n <td>-1.159119</td>\n </tr>\n <tr>\n <th>3</th>\n <td>-0.134841</td>\n <td>0.390528</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 39
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.loc['G1']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 40,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"A 0.302665\n",
|
||
|
"B 1.693723\n",
|
||
|
"Name: 1, dtype: float64"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 40
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.loc['G1'].loc[1]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 41,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"FrozenList([None, None])"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 41
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.index.names"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 42,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df.index.names = ['Group','Num']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 43,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B\n",
|
||
|
"Group Num \n",
|
||
|
"G1 1 0.302665 1.693723\n",
|
||
|
" 2 -1.706086 -1.159119\n",
|
||
|
" 3 -0.134841 0.390528\n",
|
||
|
"G2 1 0.166905 0.184502\n",
|
||
|
" 2 0.807706 0.072960\n",
|
||
|
" 3 0.638787 0.329646"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>A</th>\n <th>B</th>\n </tr>\n <tr>\n <th>Group</th>\n <th>Num</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th rowspan=\"3\" valign=\"top\">G1</th>\n <th>1</th>\n <td>0.302665</td>\n <td>1.693723</td>\n </tr>\n <tr>\n <th>2</th>\n <td>-1.706086</td>\n <td>-1.159119</td>\n </tr>\n <tr>\n <th>3</th>\n <td>-0.134841</td>\n <td>0.390528</td>\n </tr>\n <tr>\n <th rowspan=\"3\" valign=\"top\">G2</th>\n <th>1</th>\n <td>0.166905</td>\n <td>0.184502</td>\n </tr>\n <tr>\n <th>2</th>\n <td>0.807706</td>\n <td>0.072960</td>\n </tr>\n <tr>\n <th>3</th>\n <td>0.638787</td>\n <td>0.329646</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 43
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 44,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B\n",
|
||
|
"Num \n",
|
||
|
"1 0.302665 1.693723\n",
|
||
|
"2 -1.706086 -1.159119\n",
|
||
|
"3 -0.134841 0.390528"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n </tr>\n <tr>\n <th>Num</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>1</th>\n <td>0.302665</td>\n <td>1.693723</td>\n </tr>\n <tr>\n <th>2</th>\n <td>-1.706086</td>\n <td>-1.159119</td>\n </tr>\n <tr>\n <th>3</th>\n <td>-0.134841</td>\n <td>0.390528</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 44
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.xs('G1')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 45,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"A 0.302665\n",
|
||
|
"B 1.693723\n",
|
||
|
"Name: (G1, 1), dtype: float64"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 45
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.xs(['G1',1])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 46,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B\n",
|
||
|
"Group \n",
|
||
|
"G1 0.302665 1.693723\n",
|
||
|
"G2 0.166905 0.184502"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n </tr>\n <tr>\n <th>Group</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>G1</th>\n <td>0.302665</td>\n <td>1.693723</td>\n </tr>\n <tr>\n <th>G2</th>\n <td>0.166905</td>\n <td>0.184502</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 46
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.xs(1,level='Num')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Bon travail!"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"name": "python3",
|
||
|
"display_name": "Python 3.7.9 64-bit ('pyfinance': conda)",
|
||
|
"metadata": {
|
||
|
"interpreter": {
|
||
|
"hash": "e89404a230d8800c54ad520c7b67d1bd9bb833a07b37dd3e521a178a3dc34904"
|
||
|
}
|
||
|
}
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.7.9-final"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 1
|
||
|
}
|