python-pour-finance/02-NumPy/3-Numpy-Indexing-et-Selecti...

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NumPy Indexing et Selection\n",
    "\n",
    "Dans cette session, nous discuterons de la façon de sélectionner des éléments ou des groupes d'éléments à partir d'un tableau."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Création d'un tableau simple\n",
    "arr = np.arange(0,11)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# afficher\n",
    "arr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Indexation et sélection\n",
    "La façon la plus simple de choisir un ou plusieurs éléments d'un tableau ressemble beaucoup aux listes python :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Obtenir une valeur à un index\n",
    "arr[8]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 3, 4])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Obtenir des valeurs dans une plage d'entiers\n",
    "arr[1:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 2, 3, 4])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Obtenir des valeurs dans une plage d'entiers\n",
    "arr[0:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Broadcasting\n",
    "\n",
    "Les tableaux Numpy diffèrent d'une liste Python normale en raison de leur capacité à diffuser :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Réglage d'une valeur avec plage d'indice (Broadcasting)\n",
    "arr[0:5]=100\n",
    "\n",
    "# afficher\n",
    "arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Réinitialiser le tableau, nous verrons pourquoi \n",
    "# j'ai dû le réinitialiser plus bas.\n",
    "arr = np.arange(0,11)\n",
    "\n",
    "# afficher\n",
    "arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 2, 3, 4, 5])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Remarques importantes sur les tranches (slices)\n",
    "slice_of_arr = arr[0:6]\n",
    "\n",
    "# afficher tranche\n",
    "slice_of_arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([99, 99, 99, 99, 99, 99])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Changer de tranche\n",
    "slice_of_arr[:] = 99\n",
    "\n",
    "# Afficher de nouveau\n",
    "slice_of_arr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notez maintenant que les changements se produisent aussi dans notre tableau d'origine !"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Les données ne sont pas copiées, c'est une vue du tableau d'origine ! Cela permet d'éviter les problèmes de mémoire !"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Pour en obtenir une copie, il faut être explicite\n",
    "arr_copy = arr.copy()\n",
    "\n",
    "arr_copy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Indexation d'un tableau 2D (matrices)\n",
    "\n",
    "Le format général est le suivant **arr_2d[row][col]** ou **arr_2d[row,col]**. Je recommande habituellement d'utiliser la notation par virgule pour plus de clarté."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 5, 10, 15],\n",
       "       [20, 25, 30],\n",
       "       [35, 40, 45]])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))\n",
    "\n",
    "# afficher\n",
    "arr_2d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([20, 25, 30])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Indexer une ligne\n",
    "arr_2d[1]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Le format est arr_2d[row][col] ou arr_2d[row,col]\n",
    "\n",
    "# Obtenir la valeur d'un élément individuel\n",
    "arr_2d[1][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Obtenir la valeur d'un élément individuel\n",
    "arr_2d[1,0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[10, 15],\n",
       "       [25, 30]])"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# découpage en tranches d'un tableau 2D\n",
    "\n",
    "# Forme (2,2) du coin supérieur droit\n",
    "arr_2d[:2,1:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([35, 40, 45])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Forme de la rangée du bas\n",
    "arr_2d[2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([35, 40, 45])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Forme de la rangée du bas\n",
    "arr_2d[2,:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plus d'aide pour l'indexation\n",
    "L'indexation d'une matrice 2d peut être un peu confuse au début, surtout lorsque vous commencez à ajouter une taille de pas. Essayez la recherche d'images sur google \"NumPy indexing\" pour trouver des images utiles, comme celle-ci :\n",
    "\n",
    "<img src= 'https://s2.qwant.com/thumbr/0x380/e/f/c01bff98594466ebc0010d2898208f123a92d43849fb809db543c82f2038c9/Numpy1.jpg?u=https%3A%2F%2Fmedia.geeksforgeeks.org%2Fwp-content%2Fuploads%2FNumpy1.jpg&q=0&b=1&p=0&a=1' width=500/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sélection conditionnelle\n",
    "\n",
    "C'est un concept très fondamental qui se réglera directement par pandas plus tard, assurez-vous de bien comprendre cette partie !\n",
    "\n",
    "Passons brièvement en revue la façon d'utiliser les parenthèses pour la sélection basée sur des opérateurs de comparaison."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr = np.arange(1,11)\n",
    "arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr > 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "bool_arr = arr>4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bool_arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 5,  6,  7,  8,  9, 10])"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr[bool_arr]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 3,  4,  5,  6,  7,  8,  9, 10])"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr[arr>2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 3,  4,  5,  6,  7,  8,  9, 10])"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = 2\n",
    "arr[arr>x]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bon travail!\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}