Lua Avoiding gaps in tables used as arrays

Defining our terms

By array here we mean a Lua table used as a sequence. For example:

``````-- Create a table to store the types of pets we like.
local pets = {"dogs", "cats", "birds"}
``````

We're using this table as a sequence: a group of items keyed by integers. Many languages call this an array, and so will we. But strictly speaking, there's no such thing as an array in Lua. There are just tables, some of which are array-like, some of which are hash-like (or dictionary-like, if you prefer), and some of which are mixed.

An important point about our `pets` array is that is has no gaps. The first item, `pets[1]`, is the string "dogs", the second item, `pets[2]`, is the string "cats", and the last item, `pets[3]`, is "birds". Lua's standard library and most modules written for Lua assume 1 as the first index for sequences. A gapless array therefore has items from `1..n` without missing any numbers in the sequence. (In the limiting case, `n = 1`, and the array has only one item in it.)

Lua provides the built-in function `ipairs` to iterate over such tables.

``````-- Iterate over our pet types.
for idx, pet in ipairs(pets) do
print("Item at position " .. idx .. " is " .. pet .. ".")
end
``````

This would print "Item at position 1 is dogs.", "Item at position 2 is cats.", "Item at position 3 is birds."

But what happens if we do the following?

``````local pets = {"dogs", "cats", "birds"}
pets[12] = "goldfish"
for idx, pet in ipairs(pets) do
print("Item at position " .. idx .. " is " .. pet .. ".")
end
``````

An array such as this second example is a sparse array. There are gaps in the sequence. This array looks like this:

``````{"dogs", "cats", "birds", nil, nil, nil, nil, nil, nil, nil, nil, "goldfish"}
-- 1        2       3      4    5    6    7    8    9    10   11       12
``````

The nil values do not take up any aditional memory; internally lua only saves the values `[1] = "dogs"`, `[2] = "cats"`, `[3] = "birtds"` and `[12] = "goldfish"`

To answer the immediate question, `ipairs` will stop after birds; "goldfish" at `pets[12]` will never be reached unless we adjust our code. This is because `ipairs` iterates from `1..n-1` where `n` is the position of the first `nil` found. Lua defines `table[length-of-table + 1]` to be `nil`. So in a proper sequence, iteration stops when Lua tries to get, say, the fourth item in a three-item array.

When?

The two most common places for issues to arise with sparse arrays are (i) when trying to determine the length of the array and (ii) when trying to iterate over the array. In particular:

• When using the `#` length operator since the length operator stops counting at the first `nil` found.
• When using the `ipairs()` function since as mentioned above it stops iterating at the first `nil` found.
• When using the `table.unpack()` function since this method stops unpacking at the first `nil` found.
• When using other functions that (directly or indirectly) access any of the above.

To avoid this problem, it is important to write your code so that if you expect a table to be an array, you don't introduce gaps. Gaps can be introduced in several ways:

• If you add something to an array at the wrong position.
• If you insert a `nil` value into an array.
• If you remove values from an array.

You might think, "But I would never do any of those things." Well, not intentionally, but here's a concrete example of how things could go wrong. Imagine that you want to write a filter method for Lua like Ruby's `select` and Perl's `grep`. The method will accept a test function and an array. It iterates over the array, calling the test method on each item in turn. If the item passes, then that item gets added to a results array which is returned at the end of the method. The following is a buggy implementation:

``````local filter = function (fun, t)
local res = {}
for idx, item in ipairs(t) do
if fun(item) then
res[idx] = item
end
end

return res
end
``````

The problem is that when the function returns `false`, we skip a number in the sequence. Imagine calling `filter(isodd, {1,2,3,4,5,6,7,8,9,10})`: there will be gaps in the returned table every time there's an even number in the array passed to `filter`.

Here's a fixed implementation:

``````local filter = function (fun, t)
local res = {}
for _, item in ipairs(t) do
if fun(item) then
res[#res + 1] = item
end
end

return res
end
``````

Tips

1. Use standard functions: `table.insert(<table>, <value>)` always appends to the end of the array. `table[#table + 1] = value` is a short hand for this. `table.remove(<table>, <index>)` will move all following values back to fill the gap (which can also make it slow).
2. Check for `nil` values before inserting, avoiding things like `table.pack(function_call())`, which might sneak `nil` values into our table.
3. Check for `nil` values after inserting, and if necessary filling the gap by shifting all consecutive values.
4. If possible, use placeholder values. For example, change `nil` for `0` or some other placeholder value.
5. If leaving gaps is unavoidable, this should be propperly documented (commented).
6. Write a `__len()` metamethod and use the `#` operator.

Example for 6.:

``````tab = {"john", "sansa", "daenerys", [10] = "the imp"}
print(#tab) --> prints 3
setmetatable(tab, {__len = function() return 10 end})
-- __len needs to be a function, otherwise it could just be 10
print(#tab) --> prints 10
for i=1, #tab do print(i, tab[i]) end
--> prints:
-- 1 john
-- 2 sansa
-- 3 daenerys
-- 4 nil
-- ...
-- 10 the imp

for key, value in ipairs(tab) do print(key, value) end
--> this only prints '1 john \n 2 sansa \n 3 daenerys'
``````

Another alternative is to use the `pairs()` function and filter out the non-integer indices:

``````for key in pairs(tab) do
if type(key) == "number" then
print(key, tab[key]
end
end
-- note: this does not remove float indices
-- does not iterate in order
``````