python-pour-finance/08-Analyse-Time-Series/1-Introduction-à-Statsmode...

637 lines
87 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction à Statsmodels\n",
"\n",
"Statsmodels est un module Python qui fournit des classes et des fonctions pour l'estimation de nombreux modèles statistiques différents, ainsi que pour la réalisation de tests statistiques et l'exploration de données statistiques. Une liste exhaustive de statistiques sur les résultats est disponible pour chaque estimateur. Les résultats sont testés par rapport aux progiciels statistiques existants pour s'assurer qu'ils sont corrects. Le paquet est publié sous la licence Open Source Modified BSD (3-clause). La documentation en ligne est hébergée sur statsmodels.org.\n",
"\n",
"La raison pour laquelle nous l'aborderons dans ce cours est qu'il pourrait vous être très utile plus tard lorsque vous discuterez des données de séries temporelles (typiques de l'analyse financière quantitative).\n",
"\n",
"Prenons un exemple très simple d'utilisation de statsmodels !"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import statsmodels.api as sm"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df = sm.datasets.macrodata.load_pandas().data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"::\n",
" Number of Observations - 203\n",
"\n",
" Number of Variables - 14\n",
"\n",
" Variable name definitions::\n",
"\n",
" year - 1959q1 - 2009q3\n",
" quarter - 1-4\n",
" realgdp - Real gross domestic product (Bil. of chained 2005 US$,\n",
" seasonally adjusted annual rate)\n",
" realcons - Real personal consumption expenditures (Bil. of chained\n",
" 2005 US$, seasonally adjusted annual rate)\n",
" realinv - Real gross private domestic investment (Bil. of chained\n",
" 2005 US$, seasonally adjusted annual rate)\n",
" realgovt - Real federal consumption expenditures & gross investment\n",
" (Bil. of chained 2005 US$, seasonally adjusted annual rate)\n",
" realdpi - Real private disposable income (Bil. of chained 2005\n",
" US$, seasonally adjusted annual rate)\n",
" cpi - End of the quarter consumer price index for all urban\n",
" consumers: all items (1982-84 = 100, seasonally adjusted).\n",
" m1 - End of the quarter M1 nominal money stock (Seasonally\n",
" adjusted)\n",
" tbilrate - Quarterly monthly average of the monthly 3-month\n",
" treasury bill: secondary market rate\n",
" unemp - Seasonally adjusted unemployment rate (%)\n",
" pop - End of the quarter total population: all ages incl. armed\n",
" forces over seas\n",
" infl - Inflation rate (ln(cpi_{t}/cpi_{t-1}) * 400)\n",
" realint - Real interest rate (tbilrate - infl)\n",
"\n"
]
}
],
"source": [
"print(sm.datasets.macrodata.NOTE)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>quarter</th>\n",
" <th>realgdp</th>\n",
" <th>realcons</th>\n",
" <th>realinv</th>\n",
" <th>realgovt</th>\n",
" <th>realdpi</th>\n",
" <th>cpi</th>\n",
" <th>m1</th>\n",
" <th>tbilrate</th>\n",
" <th>unemp</th>\n",
" <th>pop</th>\n",
" <th>infl</th>\n",
" <th>realint</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1959.0</td>\n",
" <td>1.0</td>\n",
" <td>2710.349</td>\n",
" <td>1707.4</td>\n",
" <td>286.898</td>\n",
" <td>470.045</td>\n",
" <td>1886.9</td>\n",
" <td>28.98</td>\n",
" <td>139.7</td>\n",
" <td>2.82</td>\n",
" <td>5.8</td>\n",
" <td>177.146</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1959.0</td>\n",
" <td>2.0</td>\n",
" <td>2778.801</td>\n",
" <td>1733.7</td>\n",
" <td>310.859</td>\n",
" <td>481.301</td>\n",
" <td>1919.7</td>\n",
" <td>29.15</td>\n",
" <td>141.7</td>\n",
" <td>3.08</td>\n",
" <td>5.1</td>\n",
" <td>177.830</td>\n",
" <td>2.34</td>\n",
" <td>0.74</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1959.0</td>\n",
" <td>3.0</td>\n",
" <td>2775.488</td>\n",
" <td>1751.8</td>\n",
" <td>289.226</td>\n",
" <td>491.260</td>\n",
" <td>1916.4</td>\n",
" <td>29.35</td>\n",
" <td>140.5</td>\n",
" <td>3.82</td>\n",
" <td>5.3</td>\n",
" <td>178.657</td>\n",
" <td>2.74</td>\n",
" <td>1.09</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1959.0</td>\n",
" <td>4.0</td>\n",
" <td>2785.204</td>\n",
" <td>1753.7</td>\n",
" <td>299.356</td>\n",
" <td>484.052</td>\n",
" <td>1931.3</td>\n",
" <td>29.37</td>\n",
" <td>140.0</td>\n",
" <td>4.33</td>\n",
" <td>5.6</td>\n",
" <td>179.386</td>\n",
" <td>0.27</td>\n",
" <td>4.06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1960.0</td>\n",
" <td>1.0</td>\n",
" <td>2847.699</td>\n",
" <td>1770.5</td>\n",
" <td>331.722</td>\n",
" <td>462.199</td>\n",
" <td>1955.5</td>\n",
" <td>29.54</td>\n",
" <td>139.6</td>\n",
" <td>3.50</td>\n",
" <td>5.2</td>\n",
" <td>180.007</td>\n",
" <td>2.31</td>\n",
" <td>1.19</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year quarter realgdp realcons realinv realgovt realdpi cpi \\\n",
"0 1959.0 1.0 2710.349 1707.4 286.898 470.045 1886.9 28.98 \n",
"1 1959.0 2.0 2778.801 1733.7 310.859 481.301 1919.7 29.15 \n",
"2 1959.0 3.0 2775.488 1751.8 289.226 491.260 1916.4 29.35 \n",
"3 1959.0 4.0 2785.204 1753.7 299.356 484.052 1931.3 29.37 \n",
"4 1960.0 1.0 2847.699 1770.5 331.722 462.199 1955.5 29.54 \n",
"\n",
" m1 tbilrate unemp pop infl realint \n",
"0 139.7 2.82 5.8 177.146 0.00 0.00 \n",
"1 141.7 3.08 5.1 177.830 2.34 0.74 \n",
"2 140.5 3.82 5.3 178.657 2.74 1.09 \n",
"3 140.0 4.33 5.6 179.386 0.27 4.06 \n",
"4 139.6 3.50 5.2 180.007 2.31 1.19 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', '2009Q3'))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"df.index = index"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>quarter</th>\n",
" <th>realgdp</th>\n",
" <th>realcons</th>\n",
" <th>realinv</th>\n",
" <th>realgovt</th>\n",
" <th>realdpi</th>\n",
" <th>cpi</th>\n",
" <th>m1</th>\n",
" <th>tbilrate</th>\n",
" <th>unemp</th>\n",
" <th>pop</th>\n",
" <th>infl</th>\n",
" <th>realint</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1959-03-31</th>\n",
" <td>1959.0</td>\n",
" <td>1.0</td>\n",
" <td>2710.349</td>\n",
" <td>1707.4</td>\n",
" <td>286.898</td>\n",
" <td>470.045</td>\n",
" <td>1886.9</td>\n",
" <td>28.98</td>\n",
" <td>139.7</td>\n",
" <td>2.82</td>\n",
" <td>5.8</td>\n",
" <td>177.146</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1959-06-30</th>\n",
" <td>1959.0</td>\n",
" <td>2.0</td>\n",
" <td>2778.801</td>\n",
" <td>1733.7</td>\n",
" <td>310.859</td>\n",
" <td>481.301</td>\n",
" <td>1919.7</td>\n",
" <td>29.15</td>\n",
" <td>141.7</td>\n",
" <td>3.08</td>\n",
" <td>5.1</td>\n",
" <td>177.830</td>\n",
" <td>2.34</td>\n",
" <td>0.74</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1959-09-30</th>\n",
" <td>1959.0</td>\n",
" <td>3.0</td>\n",
" <td>2775.488</td>\n",
" <td>1751.8</td>\n",
" <td>289.226</td>\n",
" <td>491.260</td>\n",
" <td>1916.4</td>\n",
" <td>29.35</td>\n",
" <td>140.5</td>\n",
" <td>3.82</td>\n",
" <td>5.3</td>\n",
" <td>178.657</td>\n",
" <td>2.74</td>\n",
" <td>1.09</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1959-12-31</th>\n",
" <td>1959.0</td>\n",
" <td>4.0</td>\n",
" <td>2785.204</td>\n",
" <td>1753.7</td>\n",
" <td>299.356</td>\n",
" <td>484.052</td>\n",
" <td>1931.3</td>\n",
" <td>29.37</td>\n",
" <td>140.0</td>\n",
" <td>4.33</td>\n",
" <td>5.6</td>\n",
" <td>179.386</td>\n",
" <td>0.27</td>\n",
" <td>4.06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1960-03-31</th>\n",
" <td>1960.0</td>\n",
" <td>1.0</td>\n",
" <td>2847.699</td>\n",
" <td>1770.5</td>\n",
" <td>331.722</td>\n",
" <td>462.199</td>\n",
" <td>1955.5</td>\n",
" <td>29.54</td>\n",
" <td>139.6</td>\n",
" <td>3.50</td>\n",
" <td>5.2</td>\n",
" <td>180.007</td>\n",
" <td>2.31</td>\n",
" <td>1.19</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year quarter realgdp realcons realinv realgovt realdpi \\\n",
"1959-03-31 1959.0 1.0 2710.349 1707.4 286.898 470.045 1886.9 \n",
"1959-06-30 1959.0 2.0 2778.801 1733.7 310.859 481.301 1919.7 \n",
"1959-09-30 1959.0 3.0 2775.488 1751.8 289.226 491.260 1916.4 \n",
"1959-12-31 1959.0 4.0 2785.204 1753.7 299.356 484.052 1931.3 \n",
"1960-03-31 1960.0 1.0 2847.699 1770.5 331.722 462.199 1955.5 \n",
"\n",
" cpi m1 tbilrate unemp pop infl realint \n",
"1959-03-31 28.98 139.7 2.82 5.8 177.146 0.00 0.00 \n",
"1959-06-30 29.15 141.7 3.08 5.1 177.830 2.34 0.74 \n",
"1959-09-30 29.35 140.5 3.82 5.3 178.657 2.74 1.09 \n",
"1959-12-31 29.37 140.0 4.33 5.6 179.386 0.27 4.06 \n",
"1960-03-31 29.54 139.6 3.50 5.2 180.007 2.31 1.19 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'REAL GDP')"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df['realgdp'].plot()\n",
"plt.ylabel(\"REAL GDP\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Utiliser Statsmodels pour obtenir la tendance\n",
"Le filtre Hodrick-Prescott sépare une série temporelle y_t en une composante de tendance τ_t et en une composante de cycle ζt\n",
"\n",
"$y_t = \\tau_t + \\zeta_t$\n",
"\n",
"Les composantes sont déterminées en minimisant la fonction de perte quadratique suivante:\n",
"\n",
"$\\min_{\\\\{ \\tau_{t}\\\\} }\\sum_{t}^{T}\\zeta_{t}^{2}+\\lambda\\sum_{t=1}^{T}\\left[\\left(\\tau_{t}-\\tau_{t-1}\\right)-\\left(\\tau_{t-1}-\\tau_{t-2}\\right)\\right]^{2}$"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Décomposition du tuple\n",
"gdp_cycle, gdp_trend = sm.tsa.filters.hpfilter(df.realgdp)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1959-03-31 39.511915\n",
"1959-06-30 80.088532\n",
"1959-09-30 48.875455\n",
"1959-12-31 30.591933\n",
"1960-03-31 64.882667\n",
" ... \n",
"2008-09-30 102.018455\n",
"2008-12-31 -107.269472\n",
"2009-03-31 -349.047706\n",
"2009-06-30 -397.557073\n",
"2009-09-30 -333.115243\n",
"Name: realgdp, Length: 203, dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdp_cycle"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(gdp_cycle)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"df[\"trend\"] = gdp_trend"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x1c1a790c90>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df[['trend','realgdp']].plot()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x1c1a8de890>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df[['trend','realgdp']][\"2000-03-31\":].plot(figsize=(12,8))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bon Travail!"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}