📚 node [[filtering and saving data with pandas]]

Saving Data with Pandas

You can save a dataframe into different formats.

Dataframe Methods

The 'unique()' method

Given the column 'Released' from the following dataframe:

Index Artist Album Released Length Genres Millions sold
0 Michael Jackson Thriller 1982 0:42:19 pop, rock, R&B 46
1 AC/DC Back in Black 1980 0:42:11 hard rock 26.1
2 Pink Floyd The Dark Side of the Moon 1973 0:42:49 progressive rock 24.2
3 Whitney Houston The Bodyguard 1992 0:57:44 R&B, soul, pop 27.4
4 Meat Loaf Bat Out of Hell 1977 0:46:33 hard rock, progressive rock 20.6
5 Eagles Their Greatest Hits (1971-1975) 1976 0:43:08 rock, soft rock, folk rock 32.2
6 Bee Gees Saturday Night Fever 1977 1:15:54 disco 20.6
7 Fleetwood Mac Rumours 1977 0:40:01 soft rock 27.9

Say we wanted to know the unique years in the column 'Released', we would use the following syntax:

year_list = df['Released'].unique

print(year_list) --> [1982,1980,1973,1992,1977,1976]

Inequalities on entire columns (Filtering part 1)

Say we want to only return rows for albums created in the eighties and after. This can be done in one line of code, but in two steps.

First --> df['Released']>=1980 returns a new column of booleans depending on the result of the inequality.

Index Released
0 True
1 True
2 False
3 True
4 False
5 False
6 False
7 False

Create a new dataframe on inequalities (Filtering part 2)

If we then want to create a new dataframe that only has the rows for which the inequality is ==True==, we use the following syntax:

df1 = df[df['Released']>=1980]

Now df1 is a dataframe that looks like this:

|Index|Artist|Album |Released|Length |Genres |Millions sold| |-----||-|--|-|--|-| |0 |Michael Jackson|Thriller |1982 |0:42:19|pop, rock, R&B|46 | |1 |AC/DC |Back in Black|1980 |0:42:11|hard rock |26.1 | |2 |Whitney Houston|The Bodyguard|1992 |0:57:44|R&B, soul, pop|27.4 |

Saving dataframes to CSV files

To save the new df1 dataframe to a csv file, we use the to_csv() method.

df1.to_csv('new_songs.csv')

This will create a new csv file in the working directory with our data stored in it.

There are many other file formats into which you can save your data using Pandas.

📖 stoas
⥱ context