576 lines
31 KiB
Plaintext
576 lines
31 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"source": [
|
||
|
"# Fusionner, Joindre et Concaténer\n",
|
||
|
"\n",
|
||
|
"Il y a 3 façons principales de combiner des DataFrames ensemble : Fusion, assemblage et concaténation. Dans ce notebook, nous discuterons de ces 3 méthodes avec des exemples.\n",
|
||
|
"\n",
|
||
|
"____"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Exemple DataFrames"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import pandas as pd"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\n",
|
||
|
" 'B': ['B0', 'B1', 'B2', 'B3'],\n",
|
||
|
" 'C': ['C0', 'C1', 'C2', 'C3'],\n",
|
||
|
" 'D': ['D0', 'D1', 'D2', 'D3']},\n",
|
||
|
" index=[0, 1, 2, 3])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],\n",
|
||
|
" 'B': ['B4', 'B5', 'B6', 'B7'],\n",
|
||
|
" 'C': ['C4', 'C5', 'C6', 'C7'],\n",
|
||
|
" 'D': ['D4', 'D5', 'D6', 'D7']},\n",
|
||
|
" index=[4, 5, 6, 7]) "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],\n",
|
||
|
" 'B': ['B8', 'B9', 'B10', 'B11'],\n",
|
||
|
" 'C': ['C8', 'C9', 'C10', 'C11'],\n",
|
||
|
" 'D': ['D8', 'D9', 'D10', 'D11']},\n",
|
||
|
" index=[8, 9, 10, 11])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B C D\n",
|
||
|
"0 A0 B0 C0 D0\n",
|
||
|
"1 A1 B1 C1 D1\n",
|
||
|
"2 A2 B2 C2 D2\n",
|
||
|
"3 A3 B3 C3 D3"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>A1</td>\n <td>B1</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>A3</td>\n <td>B3</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 5
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df1"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 6,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B C D\n",
|
||
|
"4 A4 B4 C4 D4\n",
|
||
|
"5 A5 B5 C5 D5\n",
|
||
|
"6 A6 B6 C6 D6\n",
|
||
|
"7 A7 B7 C7 D7"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>4</th>\n <td>A4</td>\n <td>B4</td>\n <td>C4</td>\n <td>D4</td>\n </tr>\n <tr>\n <th>5</th>\n <td>A5</td>\n <td>B5</td>\n <td>C5</td>\n <td>D5</td>\n </tr>\n <tr>\n <th>6</th>\n <td>A6</td>\n <td>B6</td>\n <td>C6</td>\n <td>D6</td>\n </tr>\n <tr>\n <th>7</th>\n <td>A7</td>\n <td>B7</td>\n <td>C7</td>\n <td>D7</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 6
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df2"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 7,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B C D\n",
|
||
|
"8 A8 B8 C8 D8\n",
|
||
|
"9 A9 B9 C9 D9\n",
|
||
|
"10 A10 B10 C10 D10\n",
|
||
|
"11 A11 B11 C11 D11"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>8</th>\n <td>A8</td>\n <td>B8</td>\n <td>C8</td>\n <td>D8</td>\n </tr>\n <tr>\n <th>9</th>\n <td>A9</td>\n <td>B9</td>\n <td>C9</td>\n <td>D9</td>\n </tr>\n <tr>\n <th>10</th>\n <td>A10</td>\n <td>B10</td>\n <td>C10</td>\n <td>D10</td>\n </tr>\n <tr>\n <th>11</th>\n <td>A11</td>\n <td>B11</td>\n <td>C11</td>\n <td>D11</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 7
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df3"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Concaténation\n",
|
||
|
"\n",
|
||
|
"La concaténation colle fondamentalement les DataFrames ensemble. Gardez à l'esprit que les dimensions doivent correspondre à l'axe sur lequel vous concaténez. Vous pouvez utiliser **pd.concat** et passer dans une liste de DataFrames à concaténer ensemble :"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B C D\n",
|
||
|
"0 A0 B0 C0 D0\n",
|
||
|
"1 A1 B1 C1 D1\n",
|
||
|
"2 A2 B2 C2 D2\n",
|
||
|
"3 A3 B3 C3 D3\n",
|
||
|
"4 A4 B4 C4 D4\n",
|
||
|
"5 A5 B5 C5 D5\n",
|
||
|
"6 A6 B6 C6 D6\n",
|
||
|
"7 A7 B7 C7 D7\n",
|
||
|
"8 A8 B8 C8 D8\n",
|
||
|
"9 A9 B9 C9 D9\n",
|
||
|
"10 A10 B10 C10 D10\n",
|
||
|
"11 A11 B11 C11 D11"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>A1</td>\n <td>B1</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>A3</td>\n <td>B3</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n <tr>\n <th>4</th>\n <td>A4</td>\n <td>B4</td>\n <td>C4</td>\n <td>D4</td>\n </tr>\n <tr>\n <th>5</th>\n <td>A5</td>\n <td>B5</td>\n <td>C5</td>\n <td>D5</td>\n </tr>\n <tr>\n <th>6</th>\n <td>A6</td>\n <td>B6</td>\n <td>C6</td>\n <td>D6</td>\n </tr>\n <tr>\n <th>7</th>\n <td>A7</td>\n <td>B7</td>\n <td>C7</td>\n <td>D7</td>\n </tr>\n <tr>\n <th>8</th>\n <td>A8</td>\n <td>B8</td>\n <td>C8</td>\n <td>D8</td>\n </tr>\n <tr>\n <th>9</th>\n <td>A9</td>\n <td>B9</td>\n <td>C9</td>\n <td>D9</td>\n </tr>\n <tr>\n <th>10</th>\n <td>A10</td>\n <td>B10</td>\n <td>C10</td>\n <td>D10</td>\n </tr>\n <tr>\n <th>11</th>\n <td>A11</td>\n <td>B11</td>\n <td>C11</td>\n <td>D11</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 8
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pd.concat([df1,df2,df3])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 9,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B C D A B C D A B C D\n",
|
||
|
"0 A0 B0 C0 D0 NaN NaN NaN NaN NaN NaN NaN NaN\n",
|
||
|
"1 A1 B1 C1 D1 NaN NaN NaN NaN NaN NaN NaN NaN\n",
|
||
|
"2 A2 B2 C2 D2 NaN NaN NaN NaN NaN NaN NaN NaN\n",
|
||
|
"3 A3 B3 C3 D3 NaN NaN NaN NaN NaN NaN NaN NaN\n",
|
||
|
"4 NaN NaN NaN NaN A4 B4 C4 D4 NaN NaN NaN NaN\n",
|
||
|
"5 NaN NaN NaN NaN A5 B5 C5 D5 NaN NaN NaN NaN\n",
|
||
|
"6 NaN NaN NaN NaN A6 B6 C6 D6 NaN NaN NaN NaN\n",
|
||
|
"7 NaN NaN NaN NaN A7 B7 C7 D7 NaN NaN NaN NaN\n",
|
||
|
"8 NaN NaN NaN NaN NaN NaN NaN NaN A8 B8 C8 D8\n",
|
||
|
"9 NaN NaN NaN NaN NaN NaN NaN NaN A9 B9 C9 D9\n",
|
||
|
"10 NaN NaN NaN NaN NaN NaN NaN NaN A10 B10 C10 D10\n",
|
||
|
"11 NaN NaN NaN NaN NaN NaN NaN NaN A11 B11 C11 D11"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>1</th>\n <td>A1</td>\n <td>B1</td>\n <td>C1</td>\n <td>D1</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>3</th>\n <td>A3</td>\n <td>B3</td>\n <td>C3</td>\n <td>D3</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A4</td>\n <td>B4</td>\n <td>C4</td>\n <td>D4</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A5</td>\n <td>B5</td>\n <td>C5</td>\n <td>D5</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>6</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A6</td>\n <td>B6</td>\n <td>C6</td>\n <td>D6</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>7</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A7</td>\n <td>B7</td>\n <td>C7</td>\n <td>D7</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>8</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A8</td>\n <td>B8</td>\n <td>C8</td>\n <td>D8</td>\n </tr>\n <tr>\n <th>9</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A9</td>\n <td>B9</td>\n <td>C9</td>\n <td>D9</td>\n </tr>\n <tr>\n <th>10</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A10</td>\n <td>B10</td>\n <td>C10</td>\n <td>D10</td>\n </tr>\n <tr>\n <th>11</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A11</td>\n <td>B11</td>\n <td>C11</td>\n <td>D11</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 9
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pd.concat([df1,df2,df3],axis=1)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"_____\n",
|
||
|
"## Exemple DataFrames"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 10,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n",
|
||
|
" 'A': ['A0', 'A1', 'A2', 'A3'],\n",
|
||
|
" 'B': ['B0', 'B1', 'B2', 'B3']})\n",
|
||
|
" \n",
|
||
|
"right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n",
|
||
|
" 'C': ['C0', 'C1', 'C2', 'C3'],\n",
|
||
|
" 'D': ['D0', 'D1', 'D2', 'D3']}) "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 11,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" key A B\n",
|
||
|
"0 K0 A0 B0\n",
|
||
|
"1 K1 A1 B1\n",
|
||
|
"2 K2 A2 B2\n",
|
||
|
"3 K3 A3 B3"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key</th>\n <th>A</th>\n <th>B</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>A1</td>\n <td>B1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K2</td>\n <td>A2</td>\n <td>B2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K3</td>\n <td>A3</td>\n <td>B3</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 11
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"left"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 12,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" key C D\n",
|
||
|
"0 K0 C0 D0\n",
|
||
|
"1 K1 C1 D1\n",
|
||
|
"2 K2 C2 D2\n",
|
||
|
"3 K3 C3 D3"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K3</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 12
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"right"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"___"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Fusion\n",
|
||
|
"\n",
|
||
|
"La fonction **merge** vous permet de fusionner des DataFrames ensemble en utilisant une logique similaire à celle de la fusion de Tables SQL. Par exemple :"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 13,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" key A B C D\n",
|
||
|
"0 K0 A0 B0 C0 D0\n",
|
||
|
"1 K1 A1 B1 C1 D1\n",
|
||
|
"2 K2 A2 B2 C2 D2\n",
|
||
|
"3 K3 A3 B3 C3 D3"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>A1</td>\n <td>B1</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K2</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K3</td>\n <td>A3</td>\n <td>B3</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 13
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pd.merge(left,right,how='inner',on='key')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Exemple plus compliqué :"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 14,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],\n",
|
||
|
" 'key2': ['K0', 'K1', 'K0', 'K1'],\n",
|
||
|
" 'A': ['A0', 'A1', 'A2', 'A3'],\n",
|
||
|
" 'B': ['B0', 'B1', 'B2', 'B3']})\n",
|
||
|
" \n",
|
||
|
"right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],\n",
|
||
|
" 'key2': ['K0', 'K0', 'K0', 'K0'],\n",
|
||
|
" 'C': ['C0', 'C1', 'C2', 'C3'],\n",
|
||
|
" 'D': ['D0', 'D1', 'D2', 'D3']})"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 15,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" key1 key2 A B C D\n",
|
||
|
"0 K0 K0 A0 B0 C0 D0\n",
|
||
|
"1 K1 K0 A2 B2 C1 D1\n",
|
||
|
"2 K1 K0 A2 B2 C2 D2"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key1</th>\n <th>key2</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 15
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pd.merge(left, right, on=['key1', 'key2'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 16,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" key1 key2 A B C D\n",
|
||
|
"0 K0 K0 A0 B0 C0 D0\n",
|
||
|
"1 K0 K1 A1 B1 NaN NaN\n",
|
||
|
"2 K1 K0 A2 B2 C1 D1\n",
|
||
|
"3 K1 K0 A2 B2 C2 D2\n",
|
||
|
"4 K2 K1 A3 B3 NaN NaN\n",
|
||
|
"5 K2 K0 NaN NaN C3 D3"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key1</th>\n <th>key2</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K0</td>\n <td>K1</td>\n <td>A1</td>\n <td>B1</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>4</th>\n <td>K2</td>\n <td>K1</td>\n <td>A3</td>\n <td>B3</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>K2</td>\n <td>K0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 16
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pd.merge(left, right, how='outer', on=['key1', 'key2'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 17,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" key1 key2 A B C D\n",
|
||
|
"0 K0 K0 A0 B0 C0 D0\n",
|
||
|
"1 K1 K0 A2 B2 C1 D1\n",
|
||
|
"2 K1 K0 A2 B2 C2 D2\n",
|
||
|
"3 K2 K0 NaN NaN C3 D3"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key1</th>\n <th>key2</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K2</td>\n <td>K0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 17
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pd.merge(left, right, how='right', on=['key1', 'key2'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 18,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" key1 key2 A B C D\n",
|
||
|
"0 K0 K0 A0 B0 C0 D0\n",
|
||
|
"1 K0 K1 A1 B1 NaN NaN\n",
|
||
|
"2 K1 K0 A2 B2 C1 D1\n",
|
||
|
"3 K1 K0 A2 B2 C2 D2\n",
|
||
|
"4 K2 K1 A3 B3 NaN NaN"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key1</th>\n <th>key2</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K0</td>\n <td>K1</td>\n <td>A1</td>\n <td>B1</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>4</th>\n <td>K2</td>\n <td>K1</td>\n <td>A3</td>\n <td>B3</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 18
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"pd.merge(left, right, how='left', on=['key1', 'key2'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Assemblage\n",
|
||
|
"Join est une méthode pratique pour combiner les colonnes de deux DataFrames potentiellement indexées différemment résultant en un seul DataFrame."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 19,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],\n",
|
||
|
" 'B': ['B0', 'B1', 'B2']},\n",
|
||
|
" index=['K0', 'K1', 'K2']) \n",
|
||
|
"\n",
|
||
|
"right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],\n",
|
||
|
" 'D': ['D0', 'D2', 'D3']},\n",
|
||
|
" index=['K0', 'K2', 'K3'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 20,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B C D\n",
|
||
|
"K0 A0 B0 C0 D0\n",
|
||
|
"K1 A1 B1 NaN NaN\n",
|
||
|
"K2 A2 B2 C2 D2"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>K0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>K1</th>\n <td>A1</td>\n <td>B1</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>K2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 20
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"left.join(right)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 21,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"output_type": "execute_result",
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
" A B C D\n",
|
||
|
"K0 A0 B0 C0 D0\n",
|
||
|
"K1 A1 B1 NaN NaN\n",
|
||
|
"K2 A2 B2 C2 D2\n",
|
||
|
"K3 NaN NaN C3 D3"
|
||
|
],
|
||
|
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>K0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>K1</th>\n <td>A1</td>\n <td>B1</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>K2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>K3</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"execution_count": 21
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"left.join(right, how='outer')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Bon travail!"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"name": "python3",
|
||
|
"display_name": "Python 3.7.9 64-bit ('pyfinance': conda)",
|
||
|
"metadata": {
|
||
|
"interpreter": {
|
||
|
"hash": "e89404a230d8800c54ad520c7b67d1bd9bb833a07b37dd3e521a178a3dc34904"
|
||
|
}
|
||
|
}
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.7.9-final"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 1
|
||
|
}
|