pandas Pandas IO tools (reading and saving data sets) Reading csv file into DataFrame


Example

Example for reading file data_file.csv such as:

File:

index,header1,header2,header3
1,str_data,12,1.4
3,str_data,22,42.33
4,str_data,2,3.44
2,str_data,43,43.34

7, str_data, 25, 23.32

Code:

pd.read_csv('data_file.csv')

Output:

   index    header1  header2  header3
0      1   str_data       12     1.40
1      3   str_data       22    42.33
2      4   str_data        2     3.44
3      2   str_data       43    43.34
4      7   str_data       25    23.32

Some useful arguments:

  • sep The default field delimiter is a comma ,. Use this option if you need a different delimiter, for instance pd.read_csv('data_file.csv', sep=';')

  • index_col With index_col = n (n an integer) you tell pandas to use column n to index the DataFrame. In the above example:

    pd.read_csv('data_file.csv',  index_col=0)
    

    Output:

              header1  header2  header3
    index
     1       str_data       12     1.40
     3       str_data       22    42.33
     4       str_data        2     3.44
     2       str_data       43    43.34
     7       str_data       25    23.32
    
  • skip_blank_lines By default blank lines are skipped. Use skip_blank_lines=False to include blank lines (they will be filled with NaN values)

    pd.read_csv('data_file.csv',  index_col=0,skip_blank_lines=False)
    

    Output:

             header1  header2  header3
    index
     1      str_data       12     1.40
     3      str_data       22    42.33
     4      str_data        2     3.44
     2      str_data       43    43.34
    NaN          NaN      NaN      NaN
     7      str_data       25    23.32
    
  • parse_dates Use this option to parse date data.

    File:

    date_begin;date_end;header3;header4;header5
    1/1/2017;1/10/2017;str_data;1001;123,45
    2/1/2017;2/10/2017;str_data;1001;67,89
    3/1/2017;3/10/2017;str_data;1001;0
    

    Code to parse columns 0 and 1 as dates:

    pd.read_csv('f.csv', sep=';', parse_dates=[0,1])
    

    Output:

      date_begin   date_end   header3  header4 header5
    0 2017-01-01 2017-01-10  str_data     1001  123,45
    1 2017-02-01 2017-02-10  str_data     1001   67,89
    2 2017-03-01 2017-03-10  str_data     1001       0
    

    By default, the date format is inferred. If you want to specify a date format you can use for instance

    dateparse = lambda x: pd.datetime.strptime(x, '%d/%m/%Y')
    pd.read_csv('f.csv', sep=';',parse_dates=[0,1],date_parser=dateparse)
    

    Output:

      date_begin   date_end   header3  header4 header5
    0 2017-01-01 2017-10-01  str_data     1001  123,45
    1 2017-01-02 2017-10-02  str_data     1001   67,89
    2 2017-01-03 2017-10-03  str_data     1001       0   
    

More information on the function's parameters can be found in the official documentation.