python-pour-finance/02-NumPy/3-Numpy-Indexing-et-Selecti...

636 lines
12 KiB
Plaintext
Raw Permalink Normal View History

2023-08-21 15:12:19 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# NumPy Indexing et Selection\n",
"\n",
"Dans cette session, nous discuterons de la façon de sélectionner des éléments ou des groupes d'éléments à partir d'un tableau."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Création d'un tableau simple\n",
"arr = np.arange(0,11)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# afficher\n",
"arr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Indexation et sélection\n",
"La façon la plus simple de choisir un ou plusieurs éléments d'un tableau ressemble beaucoup aux listes python :"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Obtenir une valeur à un index\n",
"arr[8]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Obtenir des valeurs dans une plage d'entiers\n",
"arr[1:5]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Obtenir des valeurs dans une plage d'entiers\n",
"arr[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Broadcasting\n",
"\n",
"Les tableaux Numpy diffèrent d'une liste Python normale en raison de leur capacité à diffuser :"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([100, 100, 100, 100, 100, 5, 6, 7, 8, 9, 10])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Réglage d'une valeur avec plage d'indice (Broadcasting)\n",
"arr[0:5]=100\n",
"\n",
"# afficher\n",
"arr"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Réinitialiser le tableau, nous verrons pourquoi \n",
"# j'ai dû le réinitialiser plus bas.\n",
"arr = np.arange(0,11)\n",
"\n",
"# afficher\n",
"arr"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4, 5])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Remarques importantes sur les tranches (slices)\n",
"slice_of_arr = arr[0:6]\n",
"\n",
"# afficher tranche\n",
"slice_of_arr"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([99, 99, 99, 99, 99, 99])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Changer de tranche\n",
"slice_of_arr[:] = 99\n",
"\n",
"# Afficher de nouveau\n",
"slice_of_arr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notez maintenant que les changements se produisent aussi dans notre tableau d'origine !"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([99, 99, 99, 99, 99, 99, 6, 7, 8, 9, 10])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"arr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Les données ne sont pas copiées, c'est une vue du tableau d'origine ! Cela permet d'éviter les problèmes de mémoire !"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([99, 99, 99, 99, 99, 99, 6, 7, 8, 9, 10])"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Pour en obtenir une copie, il faut être explicite\n",
"arr_copy = arr.copy()\n",
"\n",
"arr_copy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Indexation d'un tableau 2D (matrices)\n",
"\n",
"Le format général est le suivant **arr_2d[row][col]** ou **arr_2d[row,col]**. Je recommande habituellement d'utiliser la notation par virgule pour plus de clarté."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 5, 10, 15],\n",
" [20, 25, 30],\n",
" [35, 40, 45]])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))\n",
"\n",
"# afficher\n",
"arr_2d"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([20, 25, 30])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Indexer une ligne\n",
"arr_2d[1]\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"20"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Le format est arr_2d[row][col] ou arr_2d[row,col]\n",
"\n",
"# Obtenir la valeur d'un élément individuel\n",
"arr_2d[1][0]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"20"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Obtenir la valeur d'un élément individuel\n",
"arr_2d[1,0]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[10, 15],\n",
" [25, 30]])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# découpage en tranches d'un tableau 2D\n",
"\n",
"# Forme (2,2) du coin supérieur droit\n",
"arr_2d[:2,1:]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([35, 40, 45])"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Forme de la rangée du bas\n",
"arr_2d[2]"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([35, 40, 45])"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Forme de la rangée du bas\n",
"arr_2d[2,:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plus d'aide pour l'indexation\n",
"L'indexation d'une matrice 2d peut être un peu confuse au début, surtout lorsque vous commencez à ajouter une taille de pas. Essayez la recherche d'images sur google \"NumPy indexing\" pour trouver des images utiles, comme celle-ci :\n",
"\n",
"<img src= 'https://s2.qwant.com/thumbr/0x380/e/f/c01bff98594466ebc0010d2898208f123a92d43849fb809db543c82f2038c9/Numpy1.jpg?u=https%3A%2F%2Fmedia.geeksforgeeks.org%2Fwp-content%2Fuploads%2FNumpy1.jpg&q=0&b=1&p=0&a=1' width=500/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sélection conditionnelle\n",
"\n",
"C'est un concept très fondamental qui se réglera directement par pandas plus tard, assurez-vous de bien comprendre cette partie !\n",
"\n",
"Passons brièvement en revue la façon d'utiliser les parenthèses pour la sélection basée sur des opérateurs de comparaison."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"arr = np.arange(1,11)\n",
"arr"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([False, False, False, False, True, True, True, True, True, True], dtype=bool)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"arr > 4"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"bool_arr = arr>4"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([False, False, False, False, True, True, True, True, True, True], dtype=bool)"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bool_arr"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 5, 6, 7, 8, 9, 10])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"arr[bool_arr]"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 3, 4, 5, 6, 7, 8, 9, 10])"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"arr[arr>2]"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 3, 4, 5, 6, 7, 8, 9, 10])"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = 2\n",
"arr[arr>x]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bon travail!\n"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}