Lua Avoiding gaps in tables used as arrays


Example

Defining our terms

By array here we mean a Lua table used as a sequence. For example:

-- Create a table to store the types of pets we like.
local pets = {"dogs", "cats", "birds"}

We're using this table as a sequence: a group of items keyed by integers. Many languages call this an array, and so will we. But strictly speaking, there's no such thing as an array in Lua. There are just tables, some of which are array-like, some of which are hash-like (or dictionary-like, if you prefer), and some of which are mixed.

An important point about our pets array is that is has no gaps. The first item, pets[1], is the string "dogs", the second item, pets[2], is the string "cats", and the last item, pets[3], is "birds". Lua's standard library and most modules written for Lua assume 1 as the first index for sequences. A gapless array therefore has items from 1..n without missing any numbers in the sequence. (In the limiting case, n = 1, and the array has only one item in it.)

Lua provides the built-in function ipairs to iterate over such tables.

-- Iterate over our pet types.
for idx, pet in ipairs(pets) do
  print("Item at position " .. idx .. " is " .. pet .. ".")
end

This would print "Item at position 1 is dogs.", "Item at position 2 is cats.", "Item at position 3 is birds."

But what happens if we do the following?

local pets = {"dogs", "cats", "birds"}
pets[12] = "goldfish"
for idx, pet in ipairs(pets) do
  print("Item at position " .. idx .. " is " .. pet .. ".")
end

An array such as this second example is a sparse array. There are gaps in the sequence. This array looks like this:

{"dogs", "cats", "birds", nil, nil, nil, nil, nil, nil, nil, nil, "goldfish"}
-- 1        2       3      4    5    6    7    8    9    10   11       12     

The nil values do not take up any aditional memory; internally lua only saves the values [1] = "dogs", [2] = "cats", [3] = "birtds" and [12] = "goldfish"

To answer the immediate question, ipairs will stop after birds; "goldfish" at pets[12] will never be reached unless we adjust our code. This is because ipairs iterates from 1..n-1 where n is the position of the first nil found. Lua defines table[length-of-table + 1] to be nil. So in a proper sequence, iteration stops when Lua tries to get, say, the fourth item in a three-item array.

When?

The two most common places for issues to arise with sparse arrays are (i) when trying to determine the length of the array and (ii) when trying to iterate over the array. In particular:

  • When using the # length operator since the length operator stops counting at the first nil found.
  • When using the ipairs() function since as mentioned above it stops iterating at the first nil found.
  • When using the table.unpack() function since this method stops unpacking at the first nil found.
  • When using other functions that (directly or indirectly) access any of the above.

To avoid this problem, it is important to write your code so that if you expect a table to be an array, you don't introduce gaps. Gaps can be introduced in several ways:

  • If you add something to an array at the wrong position.
  • If you insert a nil value into an array.
  • If you remove values from an array.

You might think, "But I would never do any of those things." Well, not intentionally, but here's a concrete example of how things could go wrong. Imagine that you want to write a filter method for Lua like Ruby's select and Perl's grep. The method will accept a test function and an array. It iterates over the array, calling the test method on each item in turn. If the item passes, then that item gets added to a results array which is returned at the end of the method. The following is a buggy implementation:

local filter = function (fun, t)
  local res = {}
  for idx, item in ipairs(t) do
    if fun(item) then
      res[idx] = item
    end
  end

  return res
end

The problem is that when the function returns false, we skip a number in the sequence. Imagine calling filter(isodd, {1,2,3,4,5,6,7,8,9,10}): there will be gaps in the returned table every time there's an even number in the array passed to filter.

Here's a fixed implementation:

local filter = function (fun, t)
  local res = {}
  for _, item in ipairs(t) do
    if fun(item) then
      res[#res + 1] = item
    end
  end

  return res
end

Tips

  1. Use standard functions: table.insert(<table>, <value>) always appends to the end of the array. table[#table + 1] = value is a short hand for this. table.remove(<table>, <index>) will move all following values back to fill the gap (which can also make it slow).
  2. Check for nil values before inserting, avoiding things like table.pack(function_call()), which might sneak nil values into our table.
  3. Check for nil values after inserting, and if necessary filling the gap by shifting all consecutive values.
  4. If possible, use placeholder values. For example, change nil for 0 or some other placeholder value.
  5. If leaving gaps is unavoidable, this should be propperly documented (commented).
  6. Write a __len() metamethod and use the # operator.

Example for 6.:

tab = {"john", "sansa", "daenerys", [10] = "the imp"}
print(#tab) --> prints 3
setmetatable(tab, {__len = function() return 10 end})
-- __len needs to be a function, otherwise it could just be 10
print(#tab) --> prints 10
for i=1, #tab do print(i, tab[i]) end
--> prints:
-- 1 john
-- 2 sansa
-- 3 daenerys
-- 4 nil
-- ...
-- 10 the imp

for key, value in ipairs(tab) do print(key, value) end
--> this only prints '1 john \n 2 sansa \n 3 daenerys'

Another alternative is to use the pairs() function and filter out the non-integer indices:

for key in pairs(tab) do
    if type(key) == "number" then
        print(key, tab[key]
    end
end
-- note: this does not remove float indices
-- does not iterate in order