numpy File IO with numpy Reading CSV files


Example

Three main functions available (description from man pages):

fromfile - A highly efficient way of reading binary data with a known data-type, as well as parsing simply formatted text files. Data written using the tofile method can be read using this function.

genfromtxt - Load data from a text file, with missing values handled as specified. Each line past the first skip_header lines is split at the delimiter character, and characters following the comments character are discarded.

loadtxt - Load data from a text file. Each row in the text file must have the same number of values.

genfromtxt is a wrapper function for loadtxt. genfromtxt is the most straight-forward to use as it has many parameters for dealing with the input file.

Consistent number of columns, consistent data type (numerical or string):

Given an input file, myfile.csv with the contents:

#descriptive text line to skip
1.0, 2, 3
4, 5.5, 6

import numpy as np
np.genfromtxt('path/to/myfile.csv',delimiter=',',skiprows=1)

gives an array:

array([[ 1. ,  2. ,  3. ],
       [ 4. ,  5.5,  6. ]])

Consistent number of columns, mixed data type (across columns):

1   2.0000  buckle_my_shoe
3   4.0000  margery_door

import numpy as np
np.genfromtxt('filename', dtype= None)


array([(1, 2.0, 'buckle_my_shoe'), (3, 4.0, 'margery_door')], 
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '|S14')])

Note the use of dtype=None results in a recarray.

Inconsistent number of columns:

file: 1 2 3 4 5 6 7 8 9 10 11 22 13 14 15 16 17 18 19 20 21 22 23 24

Into single row array:

result=np.fromfile(path_to_file,dtype=float,sep="\t",count=-1)