{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ré-échantillonnage du temps\n",
"\n",
"Apprenons à échantillonner des données de séries temporelles ! Cela sera utile plus tard dans le cours !"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Saisir les données\n",
"# Une alternative plus rapide\n",
"# df = pd.read_csv('time_data/walmart_stock.csv',index_col='Date')\n",
"df = pd.read_csv('time_data/walmart_stock.csv')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Date Open High Low Close Volume Adj Close\n",
"0 2012-01-03 59.970001 61.060001 59.869999 60.330002 12668800 52.619235\n",
"1 2012-01-04 60.209999 60.349998 59.470001 59.709999 9593300 52.078475\n",
"2 2012-01-05 59.349998 59.619999 58.369999 59.419998 12768200 51.825539\n",
"3 2012-01-06 59.419998 59.450001 58.869999 59.000000 8069400 51.459220\n",
"4 2012-01-09 59.029999 59.549999 58.919998 59.180000 6679300 51.616215"
],
"text/html": "
\n\n
\n \n \n | \n Date | \n Open | \n High | \n Low | \n Close | \n Volume | \n Adj Close | \n
\n \n \n \n 0 | \n 2012-01-03 | \n 59.970001 | \n 61.060001 | \n 59.869999 | \n 60.330002 | \n 12668800 | \n 52.619235 | \n
\n \n 1 | \n 2012-01-04 | \n 60.209999 | \n 60.349998 | \n 59.470001 | \n 59.709999 | \n 9593300 | \n 52.078475 | \n
\n \n 2 | \n 2012-01-05 | \n 59.349998 | \n 59.619999 | \n 58.369999 | \n 59.419998 | \n 12768200 | \n 51.825539 | \n
\n \n 3 | \n 2012-01-06 | \n 59.419998 | \n 59.450001 | \n 58.869999 | \n 59.000000 | \n 8069400 | \n 51.459220 | \n
\n \n 4 | \n 2012-01-09 | \n 59.029999 | \n 59.549999 | \n 58.919998 | \n 59.180000 | \n 6679300 | \n 51.616215 | \n
\n \n
\n
"
},
"metadata": {},
"execution_count": 4
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Créer un index de date à partir de la colonne Date"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"df['Date'] = df['Date'].apply(pd.to_datetime)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Date Open High Low Close Volume Adj Close\n",
"0 2012-01-03 59.970001 61.060001 59.869999 60.330002 12668800 52.619235\n",
"1 2012-01-04 60.209999 60.349998 59.470001 59.709999 9593300 52.078475\n",
"2 2012-01-05 59.349998 59.619999 58.369999 59.419998 12768200 51.825539\n",
"3 2012-01-06 59.419998 59.450001 58.869999 59.000000 8069400 51.459220\n",
"4 2012-01-09 59.029999 59.549999 58.919998 59.180000 6679300 51.616215"
],
"text/html": "\n\n
\n \n \n | \n Date | \n Open | \n High | \n Low | \n Close | \n Volume | \n Adj Close | \n
\n \n \n \n 0 | \n 2012-01-03 | \n 59.970001 | \n 61.060001 | \n 59.869999 | \n 60.330002 | \n 12668800 | \n 52.619235 | \n
\n \n 1 | \n 2012-01-04 | \n 60.209999 | \n 60.349998 | \n 59.470001 | \n 59.709999 | \n 9593300 | \n 52.078475 | \n
\n \n 2 | \n 2012-01-05 | \n 59.349998 | \n 59.619999 | \n 58.369999 | \n 59.419998 | \n 12768200 | \n 51.825539 | \n
\n \n 3 | \n 2012-01-06 | \n 59.419998 | \n 59.450001 | \n 58.869999 | \n 59.000000 | \n 8069400 | \n 51.459220 | \n
\n \n 4 | \n 2012-01-09 | \n 59.029999 | \n 59.549999 | \n 58.919998 | \n 59.180000 | \n 6679300 | \n 51.616215 | \n
\n \n
\n
"
},
"metadata": {},
"execution_count": 6
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"df.set_index('Date',inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Open High Low Close Volume Adj Close\n",
"Date \n",
"2012-01-03 59.970001 61.060001 59.869999 60.330002 12668800 52.619235\n",
"2012-01-04 60.209999 60.349998 59.470001 59.709999 9593300 52.078475\n",
"2012-01-05 59.349998 59.619999 58.369999 59.419998 12768200 51.825539\n",
"2012-01-06 59.419998 59.450001 58.869999 59.000000 8069400 51.459220\n",
"2012-01-09 59.029999 59.549999 58.919998 59.180000 6679300 51.616215"
],
"text/html": "\n\n
\n \n \n | \n Open | \n High | \n Low | \n Close | \n Volume | \n Adj Close | \n
\n \n Date | \n | \n | \n | \n | \n | \n | \n
\n \n \n \n 2012-01-03 | \n 59.970001 | \n 61.060001 | \n 59.869999 | \n 60.330002 | \n 12668800 | \n 52.619235 | \n
\n \n 2012-01-04 | \n 60.209999 | \n 60.349998 | \n 59.470001 | \n 59.709999 | \n 9593300 | \n 52.078475 | \n
\n \n 2012-01-05 | \n 59.349998 | \n 59.619999 | \n 58.369999 | \n 59.419998 | \n 12768200 | \n 51.825539 | \n
\n \n 2012-01-06 | \n 59.419998 | \n 59.450001 | \n 58.869999 | \n 59.000000 | \n 8069400 | \n 51.459220 | \n
\n \n 2012-01-09 | \n 59.029999 | \n 59.549999 | \n 58.919998 | \n 59.180000 | \n 6679300 | \n 51.616215 | \n
\n \n
\n
"
},
"metadata": {},
"execution_count": 8
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## resample()\n",
"\n",
"Une opération courante avec les données de séries temporelles est la restructuration basée sur l'indice de série temporelle. Voyons comment utiliser la méthode resample().\n",
"\n",
"#### Toutes les possibilités"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"Alias | \n",
"Description | \n",
"
\n",
"\n",
"\n",
"B | \n",
"business day frequency | \n",
"
\n",
"C | \n",
"custom business day frequency (experimental) | \n",
"
\n",
"D | \n",
"calendar day frequency | \n",
"
\n",
"W | \n",
"weekly frequency | \n",
"
\n",
"M | \n",
"month end frequency | \n",
"
\n",
"SM | \n",
"semi-month end frequency (15th and end of month) | \n",
"
\n",
"BM | \n",
"business month end frequency | \n",
"
\n",
"CBM | \n",
"custom business month end frequency | \n",
"
\n",
"MS | \n",
"month start frequency | \n",
"
\n",
"SMS | \n",
"semi-month start frequency (1st and 15th) | \n",
"
\n",
"BMS | \n",
"business month start frequency | \n",
"
\n",
"CBMS | \n",
"custom business month start frequency | \n",
"
\n",
"Q | \n",
"quarter end frequency | \n",
"
\n",
"BQ | \n",
"business quarter endfrequency | \n",
"
\n",
"QS | \n",
"quarter start frequency | \n",
"
\n",
"BQS | \n",
"business quarter start frequency | \n",
"
\n",
"A | \n",
"year end frequency | \n",
"
\n",
"BA | \n",
"business year end frequency | \n",
"
\n",
"AS | \n",
"year start frequency | \n",
"
\n",
"BAS | \n",
"business year start frequency | \n",
"
\n",
"BH | \n",
"business hour frequency | \n",
"
\n",
"H | \n",
"hourly frequency | \n",
"
\n",
"T, min | \n",
"minutely frequency | \n",
"
\n",
"S | \n",
"secondly frequency | \n",
"
\n",
"L, ms | \n",
"milliseconds | \n",
"
\n",
"U, us | \n",
"microseconds | \n",
"
\n",
"N | \n",
"nanoseconds | \n",
"
\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"DatetimeIndex(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06',\n",
" '2012-01-09', '2012-01-10', '2012-01-11', '2012-01-12',\n",
" '2012-01-13', '2012-01-17',\n",
" ...\n",
" '2016-12-16', '2016-12-19', '2016-12-20', '2016-12-21',\n",
" '2016-12-22', '2016-12-23', '2016-12-27', '2016-12-28',\n",
" '2016-12-29', '2016-12-30'],\n",
" dtype='datetime64[ns]', name='Date', length=1258, freq=None)"
]
},
"metadata": {},
"execution_count": 9
}
],
"source": [
"# Notre index\n",
"df.index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Vous devez appeler **resample** avec le paramètre **rule**, puis vous devez appeler une fonction d'agrégation. En effet, à cause du rééchantillonnage, nous avons besoin d'une règle mathématique pour joindre les lignes entre elles (moyenne, somme, décompte, etc...)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Open High Low Close Volume \\\n",
"Date \n",
"2012-12-31 67.158680 67.602120 66.786520 67.215120 9.239015e+06 \n",
"2013-12-31 75.264048 75.729405 74.843055 75.320516 6.951496e+06 \n",
"2014-12-31 77.274524 77.740040 76.864405 77.327381 6.515612e+06 \n",
"2015-12-31 72.569405 73.064167 72.034802 72.491111 9.040769e+06 \n",
"2016-12-31 69.481349 70.019643 69.023492 69.547063 9.371645e+06 \n",
"\n",
" Adj Close \n",
"Date \n",
"2012-12-31 59.389349 \n",
"2013-12-31 68.147179 \n",
"2014-12-31 71.709712 \n",
"2015-12-31 68.831426 \n",
"2016-12-31 68.054229 "
],
"text/html": "\n\n
\n \n \n | \n Open | \n High | \n Low | \n Close | \n Volume | \n Adj Close | \n
\n \n Date | \n | \n | \n | \n | \n | \n | \n
\n \n \n \n 2012-12-31 | \n 67.158680 | \n 67.602120 | \n 66.786520 | \n 67.215120 | \n 9.239015e+06 | \n 59.389349 | \n
\n \n 2013-12-31 | \n 75.264048 | \n 75.729405 | \n 74.843055 | \n 75.320516 | \n 6.951496e+06 | \n 68.147179 | \n
\n \n 2014-12-31 | \n 77.274524 | \n 77.740040 | \n 76.864405 | \n 77.327381 | \n 6.515612e+06 | \n 71.709712 | \n
\n \n 2015-12-31 | \n 72.569405 | \n 73.064167 | \n 72.034802 | \n 72.491111 | \n 9.040769e+06 | \n 68.831426 | \n
\n \n 2016-12-31 | \n 69.481349 | \n 70.019643 | \n 69.023492 | \n 69.547063 | \n 9.371645e+06 | \n 68.054229 | \n
\n \n
\n
"
},
"metadata": {},
"execution_count": 10
}
],
"source": [
"# Moyennes annuelles\n",
"df.resample(rule='A').mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ré-échantillonnage personnalisé\n",
"\n",
"Vous pouvez également créer techniquement votre propre fonction de rééchantillonnage personnalisé:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def first_day(entry):\n",
" \"\"\"\n",
" Retourne la première instance de la période, \n",
" quel que soit le taux d'échantillonnage.\n",
" \"\"\"\n",
" return entry[0]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Open High Low Close Volume Adj Close\n",
"Date \n",
"2012-12-31 59.970001 61.060001 59.869999 60.330002 12668800 52.619235\n",
"2013-12-31 68.930000 69.239998 68.449997 69.239998 10390800 61.879708\n",
"2014-12-31 78.720001 79.470001 78.500000 78.910004 6878000 72.254228\n",
"2015-12-31 86.269997 86.720001 85.550003 85.900002 4501800 80.624861\n",
"2016-12-31 60.500000 61.490002 60.360001 61.459999 11989200 59.289713"
],
"text/html": "\n\n
\n \n \n | \n Open | \n High | \n Low | \n Close | \n Volume | \n Adj Close | \n
\n \n Date | \n | \n | \n | \n | \n | \n | \n
\n \n \n \n 2012-12-31 | \n 59.970001 | \n 61.060001 | \n 59.869999 | \n 60.330002 | \n 12668800 | \n 52.619235 | \n
\n \n 2013-12-31 | \n 68.930000 | \n 69.239998 | \n 68.449997 | \n 69.239998 | \n 10390800 | \n 61.879708 | \n
\n \n 2014-12-31 | \n 78.720001 | \n 79.470001 | \n 78.500000 | \n 78.910004 | \n 6878000 | \n 72.254228 | \n
\n \n 2015-12-31 | \n 86.269997 | \n 86.720001 | \n 85.550003 | \n 85.900002 | \n 4501800 | \n 80.624861 | \n
\n \n 2016-12-31 | \n 60.500000 | \n 61.490002 | \n 60.360001 | \n 61.459999 | \n 11989200 | \n 59.289713 | \n
\n \n
\n
"
},
"metadata": {},
"execution_count": 12
}
],
"source": [
"df.resample(rule='A').apply(first_day)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Yearly Mean Close Price for Walmart')"
]
},
"metadata": {},
"execution_count": 13
},
{
"output_type": "display_data",
"data": {
"text/plain": "