python-pour-finance/03-Pandas/06-Merge-Join-et-Concat.ipynb

576 lines
31 KiB
Plaintext
Raw Normal View History

2023-08-21 15:12:19 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Fusionner, Joindre et Concaténer\n",
"\n",
"Il y a 3 façons principales de combiner des DataFrames ensemble : Fusion, assemblage et concaténation. Dans ce notebook, nous discuterons de ces 3 méthodes avec des exemples.\n",
"\n",
"____"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exemple DataFrames"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\n",
" 'B': ['B0', 'B1', 'B2', 'B3'],\n",
" 'C': ['C0', 'C1', 'C2', 'C3'],\n",
" 'D': ['D0', 'D1', 'D2', 'D3']},\n",
" index=[0, 1, 2, 3])"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],\n",
" 'B': ['B4', 'B5', 'B6', 'B7'],\n",
" 'C': ['C4', 'C5', 'C6', 'C7'],\n",
" 'D': ['D4', 'D5', 'D6', 'D7']},\n",
" index=[4, 5, 6, 7]) "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],\n",
" 'B': ['B8', 'B9', 'B10', 'B11'],\n",
" 'C': ['C8', 'C9', 'C10', 'C11'],\n",
" 'D': ['D8', 'D9', 'D10', 'D11']},\n",
" index=[8, 9, 10, 11])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" A B C D\n",
"0 A0 B0 C0 D0\n",
"1 A1 B1 C1 D1\n",
"2 A2 B2 C2 D2\n",
"3 A3 B3 C3 D3"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>A1</td>\n <td>B1</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>A3</td>\n <td>B3</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 5
}
],
"source": [
"df1"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" A B C D\n",
"4 A4 B4 C4 D4\n",
"5 A5 B5 C5 D5\n",
"6 A6 B6 C6 D6\n",
"7 A7 B7 C7 D7"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>4</th>\n <td>A4</td>\n <td>B4</td>\n <td>C4</td>\n <td>D4</td>\n </tr>\n <tr>\n <th>5</th>\n <td>A5</td>\n <td>B5</td>\n <td>C5</td>\n <td>D5</td>\n </tr>\n <tr>\n <th>6</th>\n <td>A6</td>\n <td>B6</td>\n <td>C6</td>\n <td>D6</td>\n </tr>\n <tr>\n <th>7</th>\n <td>A7</td>\n <td>B7</td>\n <td>C7</td>\n <td>D7</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 6
}
],
"source": [
"df2"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" A B C D\n",
"8 A8 B8 C8 D8\n",
"9 A9 B9 C9 D9\n",
"10 A10 B10 C10 D10\n",
"11 A11 B11 C11 D11"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>8</th>\n <td>A8</td>\n <td>B8</td>\n <td>C8</td>\n <td>D8</td>\n </tr>\n <tr>\n <th>9</th>\n <td>A9</td>\n <td>B9</td>\n <td>C9</td>\n <td>D9</td>\n </tr>\n <tr>\n <th>10</th>\n <td>A10</td>\n <td>B10</td>\n <td>C10</td>\n <td>D10</td>\n </tr>\n <tr>\n <th>11</th>\n <td>A11</td>\n <td>B11</td>\n <td>C11</td>\n <td>D11</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 7
}
],
"source": [
"df3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Concaténation\n",
"\n",
"La concaténation colle fondamentalement les DataFrames ensemble. Gardez à l'esprit que les dimensions doivent correspondre à l'axe sur lequel vous concaténez. Vous pouvez utiliser **pd.concat** et passer dans une liste de DataFrames à concaténer ensemble :"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" A B C D\n",
"0 A0 B0 C0 D0\n",
"1 A1 B1 C1 D1\n",
"2 A2 B2 C2 D2\n",
"3 A3 B3 C3 D3\n",
"4 A4 B4 C4 D4\n",
"5 A5 B5 C5 D5\n",
"6 A6 B6 C6 D6\n",
"7 A7 B7 C7 D7\n",
"8 A8 B8 C8 D8\n",
"9 A9 B9 C9 D9\n",
"10 A10 B10 C10 D10\n",
"11 A11 B11 C11 D11"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>A1</td>\n <td>B1</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>A3</td>\n <td>B3</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n <tr>\n <th>4</th>\n <td>A4</td>\n <td>B4</td>\n <td>C4</td>\n <td>D4</td>\n </tr>\n <tr>\n <th>5</th>\n <td>A5</td>\n <td>B5</td>\n <td>C5</td>\n <td>D5</td>\n </tr>\n <tr>\n <th>6</th>\n <td>A6</td>\n <td>B6</td>\n <td>C6</td>\n <td>D6</td>\n </tr>\n <tr>\n <th>7</th>\n <td>A7</td>\n <td>B7</td>\n <td>C7</td>\n <td>D7</td>\n </tr>\n <tr>\n <th>8</th>\n <td>A8</td>\n <td>B8</td>\n <td>C8</td>\n <td>D8</td>\n </tr>\n <tr>\n <th>9</th>\n <td>A9</td>\n <td>B9</td>\n <td>C9</td>\n <td>D9</td>\n </tr>\n <tr>\n <th>10</th>\n <td>A10</td>\n <td>B10</td>\n <td>C10</td>\n <td>D10</td>\n </tr>\n <tr>\n <th>11</th>\n <td>A11</td>\n <td>B11</td>\n <td>C11</td>\n <td>D11</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 8
}
],
"source": [
"pd.concat([df1,df2,df3])"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" A B C D A B C D A B C D\n",
"0 A0 B0 C0 D0 NaN NaN NaN NaN NaN NaN NaN NaN\n",
"1 A1 B1 C1 D1 NaN NaN NaN NaN NaN NaN NaN NaN\n",
"2 A2 B2 C2 D2 NaN NaN NaN NaN NaN NaN NaN NaN\n",
"3 A3 B3 C3 D3 NaN NaN NaN NaN NaN NaN NaN NaN\n",
"4 NaN NaN NaN NaN A4 B4 C4 D4 NaN NaN NaN NaN\n",
"5 NaN NaN NaN NaN A5 B5 C5 D5 NaN NaN NaN NaN\n",
"6 NaN NaN NaN NaN A6 B6 C6 D6 NaN NaN NaN NaN\n",
"7 NaN NaN NaN NaN A7 B7 C7 D7 NaN NaN NaN NaN\n",
"8 NaN NaN NaN NaN NaN NaN NaN NaN A8 B8 C8 D8\n",
"9 NaN NaN NaN NaN NaN NaN NaN NaN A9 B9 C9 D9\n",
"10 NaN NaN NaN NaN NaN NaN NaN NaN A10 B10 C10 D10\n",
"11 NaN NaN NaN NaN NaN NaN NaN NaN A11 B11 C11 D11"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>1</th>\n <td>A1</td>\n <td>B1</td>\n <td>C1</td>\n <td>D1</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>3</th>\n <td>A3</td>\n <td>B3</td>\n <td>C3</td>\n <td>D3</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A4</td>\n <td>B4</td>\n <td>C4</td>\n <td>D4</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A5</td>\n <td>B5</td>\n <td>C5</td>\n <td>D5</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>6</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A6</td>\n <td>B6</td>\n <td>C6</td>\n <td>D6</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>7</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A7</td>\n <td>B7</td>\n <td>C7</td>\n <td>D7</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>8</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A8</td>\n <td>B8</td>\n <td>C8</td>\n <td>D8</td>\n </tr>\n <tr>\n <th>9</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A9</td>\n <td>B9</td>\n <td>C9</td>\n <td>D9</td>\n </tr>\n <tr>\n <th>10</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A10</td>\n <td>B10</td>\n <td>C10</td>\n <td>D10</td>\n </tr>\n <tr>\n <th>11</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>A11</td>\n <td>B11</td>\n <td>C11</td>\n <td>D11</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 9
}
],
"source": [
"pd.concat([df1,df2,df3],axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_____\n",
"## Exemple DataFrames"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n",
" 'A': ['A0', 'A1', 'A2', 'A3'],\n",
" 'B': ['B0', 'B1', 'B2', 'B3']})\n",
" \n",
"right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n",
" 'C': ['C0', 'C1', 'C2', 'C3'],\n",
" 'D': ['D0', 'D1', 'D2', 'D3']}) "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" key A B\n",
"0 K0 A0 B0\n",
"1 K1 A1 B1\n",
"2 K2 A2 B2\n",
"3 K3 A3 B3"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key</th>\n <th>A</th>\n <th>B</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>A1</td>\n <td>B1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K2</td>\n <td>A2</td>\n <td>B2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K3</td>\n <td>A3</td>\n <td>B3</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 11
}
],
"source": [
"left"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" key C D\n",
"0 K0 C0 D0\n",
"1 K1 C1 D1\n",
"2 K2 C2 D2\n",
"3 K3 C3 D3"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K3</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 12
}
],
"source": [
"right"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fusion\n",
"\n",
"La fonction **merge** vous permet de fusionner des DataFrames ensemble en utilisant une logique similaire à celle de la fusion de Tables SQL. Par exemple :"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" key A B C D\n",
"0 K0 A0 B0 C0 D0\n",
"1 K1 A1 B1 C1 D1\n",
"2 K2 A2 B2 C2 D2\n",
"3 K3 A3 B3 C3 D3"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>A1</td>\n <td>B1</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K2</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K3</td>\n <td>A3</td>\n <td>B3</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 13
}
],
"source": [
"pd.merge(left,right,how='inner',on='key')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exemple plus compliqué :"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],\n",
" 'key2': ['K0', 'K1', 'K0', 'K1'],\n",
" 'A': ['A0', 'A1', 'A2', 'A3'],\n",
" 'B': ['B0', 'B1', 'B2', 'B3']})\n",
" \n",
"right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],\n",
" 'key2': ['K0', 'K0', 'K0', 'K0'],\n",
" 'C': ['C0', 'C1', 'C2', 'C3'],\n",
" 'D': ['D0', 'D1', 'D2', 'D3']})"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" key1 key2 A B C D\n",
"0 K0 K0 A0 B0 C0 D0\n",
"1 K1 K0 A2 B2 C1 D1\n",
"2 K1 K0 A2 B2 C2 D2"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key1</th>\n <th>key2</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 15
}
],
"source": [
"pd.merge(left, right, on=['key1', 'key2'])"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" key1 key2 A B C D\n",
"0 K0 K0 A0 B0 C0 D0\n",
"1 K0 K1 A1 B1 NaN NaN\n",
"2 K1 K0 A2 B2 C1 D1\n",
"3 K1 K0 A2 B2 C2 D2\n",
"4 K2 K1 A3 B3 NaN NaN\n",
"5 K2 K0 NaN NaN C3 D3"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key1</th>\n <th>key2</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K0</td>\n <td>K1</td>\n <td>A1</td>\n <td>B1</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>4</th>\n <td>K2</td>\n <td>K1</td>\n <td>A3</td>\n <td>B3</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>K2</td>\n <td>K0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 16
}
],
"source": [
"pd.merge(left, right, how='outer', on=['key1', 'key2'])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" key1 key2 A B C D\n",
"0 K0 K0 A0 B0 C0 D0\n",
"1 K1 K0 A2 B2 C1 D1\n",
"2 K1 K0 A2 B2 C2 D2\n",
"3 K2 K0 NaN NaN C3 D3"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key1</th>\n <th>key2</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K2</td>\n <td>K0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 17
}
],
"source": [
"pd.merge(left, right, how='right', on=['key1', 'key2'])"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" key1 key2 A B C D\n",
"0 K0 K0 A0 B0 C0 D0\n",
"1 K0 K1 A1 B1 NaN NaN\n",
"2 K1 K0 A2 B2 C1 D1\n",
"3 K1 K0 A2 B2 C2 D2\n",
"4 K2 K1 A3 B3 NaN NaN"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>key1</th>\n <th>key2</th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>K0</td>\n <td>K0</td>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K0</td>\n <td>K1</td>\n <td>A1</td>\n <td>B1</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C1</td>\n <td>D1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>K1</td>\n <td>K0</td>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>4</th>\n <td>K2</td>\n <td>K1</td>\n <td>A3</td>\n <td>B3</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 18
}
],
"source": [
"pd.merge(left, right, how='left', on=['key1', 'key2'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assemblage\n",
"Join est une méthode pratique pour combiner les colonnes de deux DataFrames potentiellement indexées différemment résultant en un seul DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],\n",
" 'B': ['B0', 'B1', 'B2']},\n",
" index=['K0', 'K1', 'K2']) \n",
"\n",
"right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],\n",
" 'D': ['D0', 'D2', 'D3']},\n",
" index=['K0', 'K2', 'K3'])"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" A B C D\n",
"K0 A0 B0 C0 D0\n",
"K1 A1 B1 NaN NaN\n",
"K2 A2 B2 C2 D2"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>K0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>K1</th>\n <td>A1</td>\n <td>B1</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>K2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 20
}
],
"source": [
"left.join(right)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" A B C D\n",
"K0 A0 B0 C0 D0\n",
"K1 A1 B1 NaN NaN\n",
"K2 A2 B2 C2 D2\n",
"K3 NaN NaN C3 D3"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>A</th>\n <th>B</th>\n <th>C</th>\n <th>D</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>K0</th>\n <td>A0</td>\n <td>B0</td>\n <td>C0</td>\n <td>D0</td>\n </tr>\n <tr>\n <th>K1</th>\n <td>A1</td>\n <td>B1</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>K2</th>\n <td>A2</td>\n <td>B2</td>\n <td>C2</td>\n <td>D2</td>\n </tr>\n <tr>\n <th>K3</th>\n <td>NaN</td>\n <td>NaN</td>\n <td>C3</td>\n <td>D3</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 21
}
],
"source": [
"left.join(right, how='outer')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bon travail!"
]
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3.7.9 64-bit ('pyfinance': conda)",
"metadata": {
"interpreter": {
"hash": "e89404a230d8800c54ad520c7b67d1bd9bb833a07b37dd3e521a178a3dc34904"
}
}
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9-final"
}
},
"nbformat": 4,
"nbformat_minor": 1
}