Tutorial by Topics: dataframe



DataFrame is a data structure provided by pandas library,apart from Series & Panel. It is a 2-dimensional structure & can be compared to a table of rows and columns.

Each row can be identified by an integer index (0..N) or a label explicitly set when creating a DataFrame object. Each column can be of distinct type and is identified by a label.

This topic covers various ways to construct/create a DataFrame object. Ex. from Numpy arrays, from list of tuples, from dictionary.

A DataFrame is an abstraction of data organized in rows and typed columns. It is similar to the data found in relational SQL-based databases. Although it has been transformed into just a type alias for Dataset[Row] in Spark 2.0, it is still widely used and useful for complex processing pipelines making use of its schema flexibility and SQL-based operations.

Accessing rows in a dataframe using the DataFrame indexer objects .ix, .loc, .iloc and how it differentiates itself from using a boolean mask.

Aggregation is one of the most common uses for R. There are several ways to do so in R, which we will illustrate here.


Page 1 of 1