Many people don't make use of file.path
when making path to a file. But if you are working across Windows, Mac and Linux machines it's usually good practice to use it for making paths instead of paste
.
FilePath <- file.path(AVariableWithFullProjectPath,"SomeSubfolder","SomeFileName.txt.gz")
Data <- as.matrix(read.table(FilePath, header=FALSE, sep ="\t"))
Generally this is sufficient for most people.
Sometimes it happens the matrix dimensions are so large that procedure of memory allocation must be taken into account while reading in the matrix, which means reading in the matrix line by line.
Take the previous example, In this case FilePath
contains a file of dimension 8970 8970
with 79% of the cells containing non-zero values.
system.time(expr=Data<-as.matrix(read.table(file=FilePath,header=FALSE,sep=" ") ))
system.time
says 267 seconds were taken to read the file.
user system elapsed
265.563 1.949 267.563
Similarly this file can be read line by line,
FilePath <- "SomeFile"
connection<- gzfile(FilePath,open="r")
TableList <- list()
Counter <- 1
system.time(expr= while ( length( Vector<-as.matrix(scan(file=connection, sep=" ", nlines=1, quiet=TRUE)) ) > 0 ) {
TableList[[Counter]]<-Vector
Counter<-Counter+1
})
user system elapsed
165.976 0.060 165.941
close(connection)
system.time(expr=(Data <- do.call(rbind,TableList)))
user system elapsed
0.477 0.088 0.565
There's also the futile.matrix
package which implements a read.matrix
method, the code itself will reveal itself to be the same thing as described in example 1.