Consider the following code to copy one file to another:
import java.io.*;
public class FileCopy {
public static void main(String[] args) throws Exception {
try (InputStream is = new FileInputStream(args[0]);
OutputStream os = new FileOutputStream(args[1])) {
int octet;
while ((octet = is.read()) != -1) {
os.write(octet);
}
}
}
}
(We have deliberated omitted normal argument checking, error reporting and so on because they are not relevant to point of this example.)
If you compile the above code and use it to copy a huge file, you will notice that it is very slow. In fact, it will be at least a couple of orders of magnitude slower than the standard OS file copy utilities.
(Add actual performance measurements here!)
The primary reason that the example above is slow (in the large file case) is that it is performing one-byte reads and one-byte writes on unbuffered byte streams. The simple way to improve performance is to wrap the streams with buffered streams. For example:
import java.io.*;
public class FileCopy {
public static void main(String[] args) throws Exception {
try (InputStream is = new BufferedInputStream(
new FileInputStream(args[0]));
OutputStream os = new BufferedOutputStream(
new FileOutputStream(args[1]))) {
int octet;
while ((octet = is.read()) != -1) {
os.write(octet);
}
}
}
}
These small changes will improve data copy rate by at least a couple of orders of magnitude, depending on various platform-related factors. The buffered stream wrappers cause the data to be read and written in larger chunks. The instances both have buffers implemented as byte arrays.
With is
, data is read from the file into the buffer a few kilobytes at a time. When read()
is called, the implementation will typically return a byte from the buffer. It will only read from the underlying input stream if the buffer has been emptied.
The behavior for os
is analogous. Calls to os.write(int)
write single bytes into the buffer. Data is only written to the output stream when the buffer is full, or when os
is flushed or closed.
As you should be aware, Java I/O provides different APIs for reading and writing binary and text data.
InputStream
and OutputStream
are the base APIs for stream-based binary I/OReader
and Writer
are the base APIs for stream-based text I/O.For text I/O, BufferedReader
and BufferedWriter
are the equivalents for BufferedInputStream
and BufferedOutputStream
.
The real reason that buffered streams help performance is to do with the way that an application talks to the operating system:
Java method in a Java application, or native procedure calls in the JVM's native runtime libraries are fast. They typically take a couple of machine instructions and have minimal performance impact.
By contrast, JVM runtime calls to the operating system are not fast. They involve something known as a "syscall". The typical pattern for a syscall is as follows:
read
syscall, this may involve:
As you can imagine, performing a single syscall can thousands of machine instructions. Conservatively, at least two orders of magnitude longer than a regular method call. (Probably three or more.)
Given this, the reason that buffered streams make a big difference is that they drastically reduce the number of syscalls. Instead of doing a syscall for each read()
call, the buffered input stream reads a large amount of data into a buffer as required. Most read()
calls on the buffered stream do some simple bounds checking and return a byte
that was read previously. Similar reasoning applies in the output stream case, and also the character stream cases.
(Some people think that buffered I/O performance comes from the mismatch between the read request size and the size of a disk block, disk rotational latency and things like that. In fact, a modern OS uses a number of strategies to ensure that the application typically doesn't need to wait for the disk. This is not the real explanation.)
Not always. Buffered streams are definitely a win if your application is going to do lots of "small" reads or writes. However, if your application only needs to perform large reads or writes to / from a large byte[]
or char[]
, then buffered streams will give you no real benefits. Indeed there might even be a (tiny) performance penalty.
No it isn't. When you use Java's stream-based APIs to copy a file, you incur the cost of at least one extra memory-to-memory copy of the data. It is possible to avoid this if your use the NIO ByteBuffer
and Channel
APIs. (Add a link to a separate example here.)