Example
- In InnoDB, having a long PRIMARY KEY (either a single column with a
lengthy value, or several columns that form a long composite value)
wastes a lot of disk space. The primary key value for a row is duplicated in all the secondary index records that point to the same
row. Create an AUTO_INCREMENT column as the primary key if your
primary key is long.
- Use the VARCHAR data type instead of CHAR to store variable-length
strings or for columns with many NULL values. A CHAR(N) column
always takes N characters to store data, even if the string is
shorter or its value is NULL. Smaller tables fit better in the
buffer pool and reduce disk I/O.
When using COMPACT row format (the default InnoDB format) and
variable-length character sets, such as utf8 or sjis, CHAR(N) columns
occupy a variable amount of space, but still at least N bytes.
- For tables that are big, or contain lots of repetitive text or
numeric data, consider using COMPRESSED row format. Less disk I/O is
required to bring data into the buffer pool, or to perform full
table scans. Before making a permanent decision, measure the amount
of compression you can achieve by using COMPRESSED versus COMPACT
row format. Caveat: Benchmarks rarely show better than 2:1 compression
and there is a lot of overhead in the buffer_pool for COMPRESSED.
- Once your data reaches a stable size, or a growing table has
increased by tens or some hundreds of megabytes, consider using the
OPTIMIZE TABLE statement to reorganize the table and compact any
wasted space. The reorganized tables require less disk I/O to
perform full table scans. This is a straightforward technique that
can improve performance when other techniques such as improving
index usage or tuning application code are not practical.
Caveat: Regardless of table size, OPTIMIZE TABLE should only rarely be performed.
This is because it is costly, and rarely improves the table enough to be worth it.
InnoDB is reasonably good at keeping its B+Trees free of a lot of wasted space.
OPTIMIZE TABLE copies the data part of the table and rebuilds the
indexes. The benefits come from improved packing of data within
indexes, and reduced fragmentation within the tablespaces and on disk.
The benefits vary depending on the data in each table. You may find
that there are significant gains for some and not for others, or that
the gains decrease over time until you next optimize the table. This
operation can be slow if the table is large or if the indexes being
rebuilt do not fit into the buffer pool. The first run after adding a
lot of data to a table is often much slower than later runs.