The function strtok
breaks a string into a smaller strings, or tokens, using a set of delimiters.
#include <stdio.h>
#include <string.h>
int main(void)
{
int toknum = 0;
char src[] = "Hello,, world!";
const char delimiters[] = ", !";
char *token = strtok(src, delimiters);
while (token != NULL)
{
printf("%d: [%s]\n", ++toknum, token);
token = strtok(NULL, delimiters);
}
/* source is now "Hello\0, world\0\0" */
}
Output:
1: [Hello]
2: [world]
The string of delimiters may contain one or more delimiters and different delimiter strings may be used with each call to strtok
.
Calls to strtok
to continue tokenizing the same source string should not pass the source string again, but instead pass NULL
as the first argument. If the same source string is passed then the first token will instead be re-tokenized. That is, given the same delimiters, strtok
would simply return the first token again.
Note that as strtok
does not allocate new memory for the tokens, it modifies the source string. That is, in the above example, the string src
will be manipulated to produce the tokens that are referenced by the pointer returned by the calls to strtok
. This means that the source string cannot be const
(so it can't be a string literal). It also means that the identity of the delimiting byte is lost (i.e. in the example the "," and "!" are effectively deleted from the source string and you cannot tell which delimiter character matched).
Note also that multiple consecutive delimiters in the source string are treated as one; in the example, the second comma is ignored.
strtok
is neither thread safe nor re-entrant because it uses a static buffer while parsing. This means that if a function calls strtok
, no function that it calls while it is using strtok
can also use strtok
, and it cannot be called by any function that is itself using strtok
.
An example that demonstrates the problems caused by the fact that strtok
is not re-entrant is as follows:
char src[] = "1.2,3.5,4.2";
char *first = strtok(src, ",");
do
{
char *part;
/* Nested calls to strtok do not work as desired */
printf("[%s]\n", first);
part = strtok(first, ".");
while (part != NULL)
{
printf(" [%s]\n", part);
part = strtok(NULL, ".");
}
} while ((first = strtok(NULL, ",")) != NULL);
Output:
[1.2]
[1]
[2]
The expected operation is that the outer do while
loop should create three tokens consisting of each decimal number string ("1.2"
, "3.5"
, "4.2"
), for each of which the strtok
calls for the inner loop should split it into separate digit strings ("1"
, "2"
, "3"
, "5"
, "4"
, "2"
).
However, because strtok
is not re-entrant, this does not occur. Instead the first strtok
correctly creates the "1.2\0" token, and the inner loop correctly creates the tokens "1"
and "2"
. But then the strtok
in the outer loop is at the end of the string used by the inner loop, and returns NULL immediately. The second and third substrings of the src
array are not analyzed at all.
The standard C libraries do not contain a thread-safe or re-entrant version but some others do, such as POSIX' strtok_r
. Note that on MSVC the strtok
equivalent, strtok_s
is thread-safe.
C11 has an optional part, Annex K, that offers a thread-safe and re-entrant version named strtok_s
. You can test for the feature with __STDC_LIB_EXT1__
. This optional part is not widely supported.
The strtok_s
function differs from the POSIX strtok_r
function by guarding against storing outside of the string being tokenized, and by checking runtime constraints. On correctly written programs, though, the strtok_s
and strtok_r
behave the same.
Using strtok_s
with the example now yields the correct response, like so:
/* you have to announce that you want to use Annex K */
#define __STDC_WANT_LIB_EXT1__ 1
#include <string.h>
#ifndef __STDC_LIB_EXT1__
# error "we need strtok_s from Annex K"
#endif
char src[] = "1.2,3.5,4.2";
char *next = NULL;
char *first = strtok_s(src, ",", &next);
do
{
char *part;
char *posn;
printf("[%s]\n", first);
part = strtok_s(first, ".", &posn);
while (part != NULL)
{
printf(" [%s]\n", part);
part = strtok_s(NULL, ".", &posn);
}
}
while ((first = strtok_s(NULL, ",", &next)) != NULL);
And the output will be:
[1.2]
[1]
[2]
[3.5]
[3]
[5]
[4.2]
[4]
[2]