📚 node [[pandas]]

📓 garden/KGBicheno/Artificial Intelligence/Introduction to AI/Week 3 - Introduction/Definitions/Pandas.md by @KGBicheno

pandas

Go back to the [[AI Glossary]]

A column-oriented data analysis API. Many machine learning frameworks, including TensorFlow, support pandas data structures as input. See the pandas documentation for details.

📓 garden/KGBicheno/Programming/Python/Week4/Pandas.md by @KGBicheno

Pandas

Go to the [[Python Week 4 Main Page]] or the [[Python - Main Page]] Also see the [[Programming Main Page]] or the [[Main AI Page]]

Also see [[Econometrics Main Page]]

A popular library for data analysis

Pandas comes with Conda, otherwise install wil Pip.

import pandas as pd <-- means you can use pd instead of pandas whenever you have to type it out. For example:

import pandas as pd
csv_path = 'file1.csv'
df1 = pd.read_csv(csv_path) # this is easier than always writing pandas.read_csv
xlxs_path = 'file1.xlxs'
df2 = pd.read_excel(file1.xlxs) # you can also read excel files
df2.head() # returns the first 5 rows of the dataframe

Dataframes

A pandas construct with colums and rows used for working with data.

You can cast a dictionary as a dataframe with the keys corresponding to the column labels and the values are lists corresponding to the rows.

A graphic representing a dictionary being cast as a dataframe

E.g.

songs = { "Album" : [ "Thriller", "Back in Black", "The Dark Side of the Moon", "The Bodyguard", "Bat Out of Hell"], "Released" : [1982, 1980, 1973, 1992, 1977], "Length" : ["00:42:19", "00:42:11", "00:42:49", "00:57:44", "00:46:33"]}

songs_frame = pd.DataFrame(songs)

|Index|Artist |Album |Released|Length |Genres |Millions sold|Claimed Sales (millions)|Released |Soundtrack|Rating| |-----|---|-|--|-|---|-||---|----|| |0 |Michael Jackson|Thriller |1982 |0:42:19|pop, rock, R&B |46 |65 |30-Nov-82|NaN |10 | |1 |AC/DC |Back in Black |1980 |0:42:11|hard rock |26.1 |50 |25-Jul-80|NaN |9.5 | |2 |Pink Floyd |The Dark Side of the Moon |1973 |0:42:49|progressive rock |24.2 |45 |01-Mar-73|NaN |9 | |3 |Whitney Houston|The Bodyguard |1992 |0:57:44|R&B, soul, pop |27.4 |44 |17-Nov-92|Y |8.5 | |4 |Meat Loaf |Bat Out of Hell |1977 |0:46:33|hard rock, progressive rock|20.6 |43 |21-Oct-77|NaN |8 | |5 |Eagles |Their Greatest Hits (1971-1975)|1976 |0:43:08|rock, soft rock, folk rock |32.2 |42 |17-Feb-76|NaN |7.5 | |6 |Bee Gees |Saturday Night Fever |1977 |1:15:54|disco |20.6 |40 |15-Nov-77|Y |7 | |7 |Fleetwood Mac |Rumours |1977 |0:40:01|soft rock |27.9 |40 |04-Feb-77|NaN |6.5 |

Making new dataframes from existing frames

A new frame can be made from an existing one. Say you wanted a new frame with only the album and release date from the previous datafame, you could say:

df_release = songs_frame[['Album', 'Released']]

And you'd get a new dataframe with only those two columns returned.

Album	Released
Thriller	1982
Back in Black	1980
The Dark Side of the Moon	1973
The Bodyguard	1992
Bat Out of Hell	1977
Their Greatest Hits (1971-1975)	1976
Saturday Night Fever	1977
Rumours	1977

Accessing values in a dataframe

There are two main ways to access data from a dataframe, loc, and iloc.

loc

loc relies on column labels and and row position to find the values you wish to extract. For example, if I wanted to know the release date of Bat Out of Hell, I could use df.loc[1, 'Released'] --> 21-Oct-77

You can use the normal slicing syntax to return multiple rows. If I wanted to know all 8 artists recorded on the table:

df.loc[0:7, 'Artist'] --> ['Michael Jackson', 'AC/DC', 'Pink Floyd', 'Whitney Houston', 'Meat Loaf', 'Eagles', 'Bee Gees', 'Fleetwood Mac'']

iloc

iloc is integer based and simply treats the dataframe as an x,y coordinate system starting at 0,0 with data retrieved by entering the coordinates of the cell or cells you wish to extract.

df.iloc[3,6] --> 27.4 (found in the Millions sold column) df.iloc[0.0] --> "Michael Jackson"

Note: Headings are not included in the coordinate indexes.

Again, you can use slicing notation to retrieve multiple values.

Creating a new dataframe with 'loc' slicing

You can slice a dataframe and assign the values to a new dataframe using loc as well.

E.g.

three_artists = df.loc[0:3, ['Artist', 'Album', 'Released']

This would create three_artists, a dataframe looking like this:

|Artist|Album |Released| ||---|--| |Michael Jackson|Thriller |1982 | |AC/DC |Back in Black |1980 | |Pink Floyd|The Dark Side of the Moon|1973 |

But what if I wanted all the data in those three columns? Simply leave the integers out of the position argument and leave the colon bare.

three_artists = df.loc[ : , ['Artist', 'Album', 'Released']]

Giving us:

|Artist|Album |Released| ||---|--| |Michael Jackson|Thriller |1982 | |AC/DC |Back in Black |1980 | |Pink Floyd|The Dark Side of the Moon|1973 | |Whitney Houston|The Bodyguard |1992 | |Meat Loaf|Bat Out of Hell|1977 | |Eagles|Their Greatest Hits (1971-1975)|1976 | |Bee Gees|Saturday Night Fever|1977 | |Fleetwood Mac|Rumours |1977 |

You can also slice multiple columns.

df1 = df.loc[0:7, 'Artist':'Length']

|Artist|Album|Released |Length| ||-----|-|| |Michael Jackson|Thriller|1982 |0:42:19| |AC/DC |Back in Black|1980 |0:42:11| |Pink Floyd|The Dark Side of the Moon|1973 |0:42:49| |Whitney Houston|The Bodyguard|1992 |0:57:44| |Meat Loaf|Bat Out of Hell|1977 |0:46:33| |Eagles|Their Greatest Hits (1971-1975)|1976 |0:43:08| |Bee Gees|Saturday Night Fever|1977 |1:15:54| |Fleetwood Mac|Rumours|1977 |0:40:01|

Keep in mind you can simply create a new dataframe by ommitting the first argument entirely, using:

df2 = df1[['a', 'b', 'c']]

Creating a new dataframe with iloc slicing

We can do the same as the above by using the coordinate-only system of the iloc method.

two_artists = df.iloc[0:2, 0:3]

This new dataframe now looks like this:

|Artist|Album |Released| ||---|--| |Michael Jackson|Thriller |1982 | |AC/DC |Back in Black |1980 |

There is so much more to Pandas

Pandas is an overwhelmingly powerful tool that forms the basis of most data science done in Python - and this page has barely begun to cover it.

#ToDo Go through the Pandas User Guide and study each page as if it were a part of this course. It will become hugely useful when dealing with the bonkers amounts of data and metadata coming through M.E.T.A, and E.M.I.L.Y.

Here's the User Guide

📖 stoas

public document at doc.anagora.org/pandas
video call at meet.jit.si/pandas

⥱ context

← back
hpi
main ai page
programming main page
programming misc
python
python main page
python week 4 main page
typing
week4 pandas data lab

↑ pushing here
(none)

↓ pulling this
(none)

→ forward
a b c
album released
ai glossary
econometrics main page
main ai page
programming main page
python main page
python week 4 main page

⥅ related node [[filtering and saving data with pandas]]

⥅ related node [[week4 pandas data lab]]

🔎 full text search for 'pandas'

📚 Node	📕 Resource	🧍 User
1614141998	garden/bmann/_logs/1614141998.md	@bmann
100 reasons	garden/evan/100 reasons.md	@evan
programming-misc	garden/karlicoss/programming/programming-misc.md	@karlicoss
python	garden/karlicoss/programming/python/python.md	@karlicoss
typing	garden/karlicoss/programming/python/typing.md	@karlicoss
dashboard	garden/karlicoss/projects/dashboard.md	@karlicoss
hpi	garden/karlicoss/projects/hpi.md	@karlicoss
timeline	garden/karlicoss/projects/timeline.md	@karlicoss
qs-misc	garden/karlicoss/quantifiedself/qs-misc.md	@karlicoss
think	garden/karlicoss/think.md	@karlicoss
AI Glossary	garden/KGBicheno/Artificial Intelligence/Introduction to AI/Week 3 - Introduction/Definitions/AI Glossary.md	@KGBicheno
Dataframe	garden/KGBicheno/Artificial Intelligence/Introduction to AI/Week 3 - Introduction/Definitions/Dataframe.md	@KGBicheno
Numpy	garden/KGBicheno/Artificial Intelligence/Introduction to AI/Week 3 - Introduction/Definitions/Numpy.md	@KGBicheno
Pandas	garden/KGBicheno/Artificial Intelligence/Introduction to AI/Week 3 - Introduction/Definitions/Pandas.md	@KGBicheno
Main AI Page	garden/KGBicheno/Artificial Intelligence/Main AI Page.md	@KGBicheno
Programming Main Page	garden/KGBicheno/Programming/Programming Main Page.md	@KGBicheno
Python - Main Page	garden/KGBicheno/Programming/Python/Python - Main Page.md	@KGBicheno
Filtering and Saving Data with Pandas	garden/KGBicheno/Programming/Python/Week4/Filtering and Saving Data with Pandas.md	@KGBicheno
Pandas	garden/KGBicheno/Programming/Python/Week4/Pandas.md	@KGBicheno
Python IBM API Assignment Lab	garden/KGBicheno/Programming/Python/Week4/Python IBM API Assignment Lab.md	@KGBicheno
Python Week 4 Main Page	garden/KGBicheno/Programming/Python/Week4/Python Week 4 Main Page.md	@KGBicheno
Simple APIs in Python	garden/KGBicheno/Programming/Python/Week4/Simple APIs in Python.md	@KGBicheno
Week4 Pandas Data Lab	garden/KGBicheno/Programming/Python/Week4/Week4 Pandas Data Lab.md	@KGBicheno