Create a dataframe: three ways
Dataframes on-the-fly
At times it is necessary to create dataframes within a program. Learn three ways to create them.
You might be familiar with Python dictionaries and the Python Pandas library. To read this article make sure you are comfortable with the preliminary level concepts of the above mentioned topics.
Pandas is a powerful Python library for data analysis and manipulation. Dataframes are usually created by reading in datasets from disk. However, there might be cases where your program generates a lot of data which you want to capture for further manipulation or for writing onto disk.
Let us look at three ways we can create dataframes on-the-fly.
All dictionaries have two elements, the key and the value. When you want to create tables, you can do it in two ways, either by taking the keys as columns or taking the keys as rows.
Pandas dataframe from dict with keys as columns
import pandas as pd
#dictionary storing the data
data = {
"Subjects": ["Physics","Chemistry","Maths","Csc"],
"Marks": [87,67,90,76]
}
#dataframe from dict
df = pd.DataFrame.from_dict(data)
Output:
Subjects Marks
0 Physics 87
1 Chemistry 67
2 Maths 90
3 Csc 76
Pandas dataframe from dict with keys as rows
data = {
"Physics":[87],
"Chemistry":[67],
"Maths":[90],
"Csc":[76]
}
df2 = pd.DataFrame.from_dict(data,orient='index')
print(df2)
Output
0
Physics 87
Chemistry 67
Maths 90
Csc 76
Pandas dataframe from array of arrays
In this approach each array element becomes a row. Since the rows and columns do not have a name, we can manipulate the array using row and column numbers.
#Create an array of arrays
outer_arr = [np.random.randint(0,100,size=5) for i in range(6)]
#create a dataframe out of this array
df = pd.DataFrame(outer_arr)
print(df)
#Your dataframe will be different on account of the random numbers
Output:
0 1 2 3 4
0 9 98 24 75 8
1 63 7 25 69 95
2 49 59 89 12 10
3 84 99 86 67 61
4 73 45 53 19 95
5 5 87 84 74 35