python-pour-finance/03-Pandas/08-Data-Input-et-Output.ipynb

261 lines
12 KiB
Plaintext
Raw Permalink Normal View History

2023-08-21 15:12:19 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Nous nous contenterons soit de lire les fichiers csv directement, soit d'utiliser pandas-datareader ou quandl. Considérez ce notebook comme un bref aperçu de ce qui est possible de faire avec pandas (nous ne travaillerons pas avec des fichiers SQL ou excel dans ce cours).**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Input et Output\n",
"\n",
"Ce notebook est le code de référence pour obtenir des entrées et des sorties, pandas peut lire une variété de types de fichiers en utilisant ses méthodes pd.read_. Jetons un coup d'oeil aux types de données les plus courants:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## CSV\n",
"\n",
"### CSV Input"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" a b c d\n",
"0 0 1 2 3\n",
"1 4 5 6 7\n",
"2 8 9 10 11\n",
"3 12 13 14 15"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>a</th>\n <th>b</th>\n <th>c</th>\n <th>d</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>1</td>\n <td>2</td>\n <td>3</td>\n </tr>\n <tr>\n <th>1</th>\n <td>4</td>\n <td>5</td>\n <td>6</td>\n <td>7</td>\n </tr>\n <tr>\n <th>2</th>\n <td>8</td>\n <td>9</td>\n <td>10</td>\n <td>11</td>\n </tr>\n <tr>\n <th>3</th>\n <td>12</td>\n <td>13</td>\n <td>14</td>\n <td>15</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 2
}
],
"source": [
"df = pd.read_csv('example.csv')\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CSV Output"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('example',index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Excel\n",
"Pandas peut lire et écrire des fichiers excel, gardez à l'esprit que ceci n'importe que des données. Pas de formules ni d'images, le fait d'avoir des images ou des macros peut provoquer le crash de cette méthode read_excel. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Excel Input"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Unnamed: 0 a b c d\n",
"0 0 0 1 2 3\n",
"1 1 4 5 6 7\n",
"2 2 8 9 10 11\n",
"3 3 12 13 14 15"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>a</th>\n <th>b</th>\n <th>c</th>\n <th>d</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>2</td>\n <td>3</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>4</td>\n <td>5</td>\n <td>6</td>\n <td>7</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2</td>\n <td>8</td>\n <td>9</td>\n <td>10</td>\n <td>11</td>\n </tr>\n <tr>\n <th>3</th>\n <td>3</td>\n <td>12</td>\n <td>13</td>\n <td>14</td>\n <td>15</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 4
}
],
"source": [
"pd.read_excel('Excel_Sample.xlsx',sheet_name='Sheet1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Excel Output"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## HTML\n",
"\n",
"Vous aurez besoin d'installer htmllib5, lxml et BeautifulSoup4. Dans votre terminal/command prompt tapez:\n",
"\n",
" conda install lxml\n",
" conda install html5lib\n",
" conda install BeautifulSoup4\n",
"\n",
"Puis redémarrez Jupyter Notebook.\n",
"(ou utilisez pip install si vous n'utilisez pas la distribution Anaconda)\n",
"\n",
"Pandas peut lire du HTML. Par exemple:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### HTML Input\n",
"\n",
"La fonction pandas read_html lit les balises table d'une page web et retourne une liste d'objets DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Bank Name City ST CERT \\\n",
"0 Almena State Bank Almena KS 15426 \n",
"1 First City Bank of Florida Fort Walton Beach FL 16748 \n",
"2 The First State Bank Barboursville WV 14361 \n",
"3 Ericson State Bank Ericson NE 18265 \n",
"4 City National Bank of New Jersey Newark NJ 21111 \n",
".. ... ... .. ... \n",
"558 Superior Bank, FSB Hinsdale IL 32646 \n",
"559 Malta National Bank Malta OH 6629 \n",
"560 First Alliance Bank & Trust Co. Manchester NH 34264 \n",
"561 National State Bank of Metropolis Metropolis IL 3815 \n",
"562 Bank of Honolulu Honolulu HI 21029 \n",
"\n",
" Acquiring Institution Closing Date \n",
"0 Equity Bank October 23, 2020 \n",
"1 United Fidelity Bank, fsb October 16, 2020 \n",
"2 MVB Bank, Inc. April 3, 2020 \n",
"3 Farmers and Merchants Bank February 14, 2020 \n",
"4 Industrial Bank November 1, 2019 \n",
".. ... ... \n",
"558 Superior Federal, FSB July 27, 2001 \n",
"559 North Valley Bank May 3, 2001 \n",
"560 Southern New Hampshire Bank & Trust February 2, 2001 \n",
"561 Banterra Bank of Marion December 14, 2000 \n",
"562 Bank of the Orient October 13, 2000 \n",
"\n",
"[563 rows x 6 columns]"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Bank Name</th>\n <th>City</th>\n <th>ST</th>\n <th>CERT</th>\n <th>Acquiring Institution</th>\n <th>Closing Date</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Almena State Bank</td>\n <td>Almena</td>\n <td>KS</td>\n <td>15426</td>\n <td>Equity Bank</td>\n <td>October 23, 2020</td>\n </tr>\n <tr>\n <th>1</th>\n <td>First City Bank of Florida</td>\n <td>Fort Walton Beach</td>\n <td>FL</td>\n <td>16748</td>\n <td>United Fidelity Bank, fsb</td>\n <td>October 16, 2020</td>\n </tr>\n <tr>\n <th>2</th>\n <td>The First State Bank</td>\n <td>Barboursville</td>\n <td>WV</td>\n <td>14361</td>\n <td>MVB Bank, Inc.</td>\n <td>April 3, 2020</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Ericson State Bank</td>\n <td>Ericson</td>\n <td>NE</td>\n <td>18265</td>\n <td>Farmers and Merchants Bank</td>\n <td>February 14, 2020</td>\n </tr>\n <tr>\n <th>4</th>\n <td>City National Bank of New Jersey</td>\n <td>Newark</td>\n <td>NJ</td>\n <td>21111</td>\n <td>Industrial Bank</td>\n <td>November 1, 2019</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>558</th>\n <td>Superior Bank, FSB</td>\n <td>Hinsdale</td>\n <td>IL</td>\n <td>32646</td>\n <td>Superior Federal, FSB</td>\n <td>July 27, 2001</td>\n </tr>\n <tr>\n <th>559</th>\n <td>Malta National Bank</td>\n <td>Malta</td>\n <td>OH</td>\n <td>6629</td>\n <td>North Valley Bank</td>\n <td>May 3, 2001</td>\n </tr>\n <tr>\n <th>560</th>\n <td>First Alliance Bank &amp; Trust Co.</td>\n <td>Manchester</td>\n <td>NH</td>\n <td>34264</td>\n <td>Southern New Hampshire Bank &amp; Trust</td>\n <td>February 2, 2001</td>\n </tr>\n <tr>\n <th>561</th>\n <td>National State Bank of Metropolis</td>\n <td>Metropolis</td>\n <td>IL</td>\n <td>3815</td>\n <td>Banterra Bank of Marion</td>\n <td>December 14, 2000</td>\n </tr>\n <tr>\n <th>562</th>\n <td>Bank of Honolulu</td>\n <td>Honolulu</td>\n <td>HI</td>\n <td>21029</td>\n <td>Bank of the Orient</td>\n <td>October 13, 2000</td>\n </tr>\n </tbody>\n</table>\n<p>563 rows × 6 columns</p>\n</div>"
},
"metadata": {},
"execution_count": 7
}
],
"source": [
"df[0]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"____"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bon travail!"
]
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3.7.9 64-bit ('pyfinance': conda)",
"metadata": {
"interpreter": {
"hash": "e89404a230d8800c54ad520c7b67d1bd9bb833a07b37dd3e521a178a3dc34904"
}
}
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9-final"
}
},
"nbformat": 4,
"nbformat_minor": 1
}