465 lines
		
	
	
		
			8.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			465 lines
		
	
	
		
			8.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | |
|  "cells": [
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "___\n",
 | |
|     "\n",
 | |
|     "<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>\n",
 | |
|     "___\n",
 | |
|     "# Series"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.\n",
 | |
|     "\n",
 | |
|     "A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.\n",
 | |
|     "\n",
 | |
|     "Let's explore this concept through some examples:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 2,
 | |
|    "metadata": {
 | |
|     "collapsed": true
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "import numpy as np\n",
 | |
|     "import pandas as pd"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### Creating a Series\n",
 | |
|     "\n",
 | |
|     "You can convert a list,numpy array, or dictionary to a Series:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 3,
 | |
|    "metadata": {
 | |
|     "collapsed": true
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "labels = ['a','b','c']\n",
 | |
|     "my_list = [10,20,30]\n",
 | |
|     "arr = np.array([10,20,30])\n",
 | |
|     "d = {'a':10,'b':20,'c':30}"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "** Using Lists**"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 4,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "0    10\n",
 | |
|        "1    20\n",
 | |
|        "2    30\n",
 | |
|        "dtype: int64"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 4,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "pd.Series(data=my_list)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 5,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "a    10\n",
 | |
|        "b    20\n",
 | |
|        "c    30\n",
 | |
|        "dtype: int64"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 5,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "pd.Series(data=my_list,index=labels)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 6,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "a    10\n",
 | |
|        "b    20\n",
 | |
|        "c    30\n",
 | |
|        "dtype: int64"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 6,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "pd.Series(my_list,labels)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "** NumPy Arrays **"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 7,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "0    10\n",
 | |
|        "1    20\n",
 | |
|        "2    30\n",
 | |
|        "dtype: int64"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 7,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "pd.Series(arr)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 8,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "a    10\n",
 | |
|        "b    20\n",
 | |
|        "c    30\n",
 | |
|        "dtype: int64"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 8,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "pd.Series(arr,labels)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "** Dictionary**"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 9,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "a    10\n",
 | |
|        "b    20\n",
 | |
|        "c    30\n",
 | |
|        "dtype: int64"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 9,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "pd.Series(d)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### Data in a Series\n",
 | |
|     "\n",
 | |
|     "A pandas Series can hold a variety of object types:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 10,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "0    a\n",
 | |
|        "1    b\n",
 | |
|        "2    c\n",
 | |
|        "dtype: object"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 10,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "pd.Series(data=labels)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 11,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "0      <built-in function sum>\n",
 | |
|        "1    <built-in function print>\n",
 | |
|        "2      <built-in function len>\n",
 | |
|        "dtype: object"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 11,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "# Even functions (although unlikely that you will use this)\n",
 | |
|     "pd.Series([sum,print,len])"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Using an Index\n",
 | |
|     "\n",
 | |
|     "The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).\n",
 | |
|     "\n",
 | |
|     "Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 12,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   "
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 13,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "USA        1\n",
 | |
|        "Germany    2\n",
 | |
|        "USSR       3\n",
 | |
|        "Japan      4\n",
 | |
|        "dtype: int64"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 13,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "ser1"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 14,
 | |
|    "metadata": {
 | |
|     "collapsed": true
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   "
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 15,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "USA        1\n",
 | |
|        "Germany    2\n",
 | |
|        "Italy      5\n",
 | |
|        "Japan      4\n",
 | |
|        "dtype: int64"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 15,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "ser2"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 16,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "1"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 16,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "ser1['USA']"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "source": [
 | |
|     "Operations are then also done based off of index:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 17,
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "Germany    4.0\n",
 | |
|        "Italy      NaN\n",
 | |
|        "Japan      8.0\n",
 | |
|        "USA        2.0\n",
 | |
|        "USSR       NaN\n",
 | |
|        "dtype: float64"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 17,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "ser1 + ser2"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Let's stop here for now and move on to DataFrames, which will expand on the concept of Series!\n",
 | |
|     "# Great Job!"
 | |
|    ]
 | |
|   }
 | |
|  ],
 | |
|  "metadata": {
 | |
|   "kernelspec": {
 | |
|    "display_name": "Python 3",
 | |
|    "language": "python",
 | |
|    "name": "python3"
 | |
|   },
 | |
|   "language_info": {
 | |
|    "codemirror_mode": {
 | |
|     "name": "ipython",
 | |
|     "version": 3
 | |
|    },
 | |
|    "file_extension": ".py",
 | |
|    "mimetype": "text/x-python",
 | |
|    "name": "python",
 | |
|    "nbconvert_exporter": "python",
 | |
|    "pygments_lexer": "ipython3",
 | |
|    "version": "3.5.1"
 | |
|   }
 | |
|  },
 | |
|  "nbformat": 4,
 | |
|  "nbformat_minor": 0
 | |
| }
 |