C Language Strings Tokenisation: strtok(), strtok_r() and strtok_s()

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Extensions
> Step 2: And Like the video. BONUS: You can also share it!

Example

The function strtok breaks a string into a smaller strings, or tokens, using a set of delimiters.

#include <stdio.h>
#include <string.h>

int main(void)
{
    int toknum = 0;
    char src[] = "Hello,, world!";
    const char delimiters[] = ", !";
    char *token = strtok(src, delimiters);
    while (token != NULL)
    {
        printf("%d: [%s]\n", ++toknum, token);
        token = strtok(NULL, delimiters);
    }
    /* source is now "Hello\0, world\0\0" */
}

Output:

1: [Hello]
2: [world]

The string of delimiters may contain one or more delimiters and different delimiter strings may be used with each call to strtok.

Calls to strtok to continue tokenizing the same source string should not pass the source string again, but instead pass NULL as the first argument. If the same source string is passed then the first token will instead be re-tokenized. That is, given the same delimiters, strtok would simply return the first token again.

Note that as strtok does not allocate new memory for the tokens, it modifies the source string. That is, in the above example, the string src will be manipulated to produce the tokens that are referenced by the pointer returned by the calls to strtok. This means that the source string cannot be const (so it can't be a string literal). It also means that the identity of the delimiting byte is lost (i.e. in the example the "," and "!" are effectively deleted from the source string and you cannot tell which delimiter character matched).

Note also that multiple consecutive delimiters in the source string are treated as one; in the example, the second comma is ignored.

strtok is neither thread safe nor re-entrant because it uses a static buffer while parsing. This means that if a function calls strtok, no function that it calls while it is using strtok can also use strtok, and it cannot be called by any function that is itself using strtok.

An example that demonstrates the problems caused by the fact that strtokis not re-entrant is as follows:

char src[] = "1.2,3.5,4.2";
char *first = strtok(src, ","); 

do 
{
    char *part;
    /* Nested calls to strtok do not work as desired */
    printf("[%s]\n", first);
    part = strtok(first, ".");
    while (part != NULL)
    {
        printf(" [%s]\n", part);
        part = strtok(NULL, ".");
    }
} while ((first = strtok(NULL, ",")) != NULL);

Output:

[1.2]
 [1]
 [2]

The expected operation is that the outer do while loop should create three tokens consisting of each decimal number string ("1.2", "3.5", "4.2"), for each of which the strtok calls for the inner loop should split it into separate digit strings ("1", "2", "3", "5", "4", "2").

However, because strtok is not re-entrant, this does not occur. Instead the first strtok correctly creates the "1.2\0" token, and the inner loop correctly creates the tokens "1" and "2". But then the strtok in the outer loop is at the end of the string used by the inner loop, and returns NULL immediately. The second and third substrings of the src array are not analyzed at all.

C11

The standard C libraries do not contain a thread-safe or re-entrant version but some others do, such as POSIX' strtok_r. Note that on MSVC the strtok equivalent, strtok_s is thread-safe.

C11

C11 has an optional part, Annex K, that offers a thread-safe and re-entrant version named strtok_s. You can test for the feature with __STDC_LIB_EXT1__. This optional part is not widely supported.

The strtok_s function differs from the POSIX strtok_r function by guarding against storing outside of the string being tokenized, and by checking runtime constraints. On correctly written programs, though, the strtok_s and strtok_r behave the same.

Using strtok_s with the example now yields the correct response, like so:

/* you have to announce that you want to use Annex K */ 
#define __STDC_WANT_LIB_EXT1__ 1
#include <string.h>

#ifndef __STDC_LIB_EXT1__
# error "we need strtok_s from Annex K"
#endif

char src[] = "1.2,3.5,4.2";  
char *next = NULL;
char *first = strtok_s(src, ",", &next);

do 
{
    char *part;
    char *posn;

    printf("[%s]\n", first);
    part = strtok_s(first, ".", &posn);
    while (part != NULL)
    {
        printf(" [%s]\n", part);
        part = strtok_s(NULL, ".", &posn);
    }
} 
while ((first = strtok_s(NULL, ",", &next)) != NULL);

And the output will be:

[1.2]
 [1]
 [2]
[3.5]
 [3]
 [5]
[4.2]
 [4]
 [2]


Got any C Language Question?