TFRecord files is the native tensorflow binary format for storing data (tensors). To read the file you can use a code similar to the CSV example:
import tensorflow as tf
filename_queue = tf.train.string_input_producer(["file.tfrecord"], num_epochs=1)
reader = tf.TFRecordReader()
key, serialized_example = reader.read(filename_queue)
Then, you need to parse the examples from serialized_example
Queue. You can do it either using tf.parse_example
, which requires previous batching, but is faster or tf.parse_single_example
:
batch = tf.train.batch([serialized_example], batch_size=100)
parsed_batch = tf.parse_example(batch, features={
"feature_name_1": tf.FixedLenFeature(shape=[1], tf.int64),
"feature_name_2": tf.FixedLenFeature(shape=[1], tf.float32)
})
tf.train.batch
joins consecutive values of given tensors of shape [x, y, z]
to tensors of shape [batch_size, x, y, z]
.
features
dict maps names of the features to tensorflow's definitions of features. You use parse_single_example
in a similar way:
parsed_example = tf.parse_single_example(serialized_example, {
"feature_name_1": tf.FixedLenFeature(shape=[1], tf.int64),
"feature_name_2": tf.FixedLenFeature(shape=[1], tf.float32)
})
tf.parse_example
and tf.parse_single_example
return a dictionary mapping feature names to the tensor with the values.
To batch examples coming from parse_single_example
you should extract the tensors from the dict and use tf.train.batch
as before:
parsed_batch = dict(zip(parsed_example.keys(),
tf.train.batch(parsed_example.values(), batch_size=100)
You read the data as before, passing the list of all the tensors to evaluate to sess.run
:
with tf.Session() as sess:
sess.run(tf.initialize_local_variables())
tf.train.start_queue_runners()
try:
while True:
data_batch = sess.run(parsed_batch.values())
# process data
except tf.errors.OutOfRangeError:
pass