Concretization one level deeper: Implement your own strtok();

Hileamlak Yitayew
5 min readNov 1, 2020
Photo by Paweł Czerwiński on Unsplash

Programming is a very interesting field, and if you are getting into it without no prior experience, it might be a daunting task, but even after a couple of months or years, even after you have felt comfortable, you might still have many black box’s. You might know the C language, but not assembly or how it is compiled, you just assume that magical box that takes your c code and gives you a working binary as given for granted. And that, for most of the times, is the right thing to do because after all we are mortals and we can’t learn everything in one lifetime. Have you imagined how much knowledge is out there. That, however, shouldn’t stop us from digging one step deeper in to whatever field we are working on whenever we feel like we have the time and the curiosity to learn or when we are forced to. I prefer the former.

Without further do lets dig dip into the c’s strtok() function. This is part of the string.h library, and it is used for dividing a string based on a delimiter (a patter). Let see how one might use it using the string library, and then we will move one into making one ourselves.

#include <stdio.h>
#include <string.h>
int main(void)
{
char a[] = “hello; hello2; hello3;”;
char *tokenized = NULL;
tokenized = strtok(a, “;”);
while(tokenized != NULL)
{
printf(“Next token →%s\n”,tokenized);
tokenized = strtok(NULL, “;”);
}
return (1);
}

This c code if compiled and run would give the following output.

Next token →hello
Next token → hello2
Next token → hello3

What is interesting about this is that how it affects the value of our a variable, if you try to print “a” after the while loop you can see that the value of a has changed to “hello”. Another thing, if you run valgrind on your compiled program, you can see that it allocated no memory. The final interesting observation you can make is that even when you are sending the strtok function a NULL as a char pointer, it remembers the value of “a” and returns the second token.

Now having all that in mind, let's start constructing our _strtok function. Our strtok function will be a little harder than the normal strtok on the delimiter side, but if you learn to do this, you will be capable of doing the default one.

/**
* _strtok — tokenizes a string according to a certain delimiter
* it doesn't create a new string to hold the tokens but rather
* rather remembers what the previous string was when faced with a
* NULL pointer.
* for example, if you have a string str = “helo; now; bo”
* when _strtok is called for the first time (_strtok(str, “;”)), it
* will return
* hello and when it is called for the second time
* (_strtok(NULL, “;”);) it will return now

* @str: the string to be tokenized
* @delimeter: the delimiter to separate tokens
* Return: a character pointer to the current delimited token
*/
char *_strtok(char *str, const char *delimeter)
{
}

First thing first, always comment on your prototypes like this, so you will have a guide to where you will start and where will go.

Then, since we will have a variable that will remember the previous value of our string, we for sure need to declare a static variable. We will also need an interator that will loop though our string looking for the delimiter, so lets also have that.

static char *save;
int i = 0;

Now let's iterate through our string and look for the delimiter.

while (_strcmp(str + i, delimeter) != 1 && *(str + i) != ‘\0’)
i++;

So what the hell does that while loop check for? It checks if there is any delimiter at the ith position of the string before the string reaches its end. Why use another function _strcmp rather than just doing *(str + i) != *delimeter? Well, the delimiter could be more than one char in length. for example, “he” is a valid delimiter and that can't be checked by neither the == sign nor the strcmp function. So where is the _strcmp function? Here is is.

/**
*_strcmp — A special compare function that compares is sub
* is aubset of fstring. for example if fstring is “hello”
* “h”, “he”, … but not “hello” are all subsets and
* in such a case a 1 will be returned
*@fstring: fstring
*@sub: subset
*Return: 1 on success and -1 on faliur
*/
int _strcmp(char *fstring, const char *sub)
{
if (!fstring || !sub)
return (-1);
if (strlen(fstring) < strlen(sub))
return (-1);
while (*sub && *fstring)
{
if (*sub != *fstring)
return (-1);
sub++, fstring++;
}
return (1);
}

Now we got that out of the way lets continue.

/*
*if the while loop was stoped because the iterator reached at the *end we return the string itself
*/
if (*(str + i) == ‘\0’)
return (str);
/*other wise we will first set the static variable to be pointing at the end of the first token.*/
save = str + i + strlen(delimeter);
/*no save is initialized and it is poitning to the start of the next token*/
/*Then we put a null terminator instead of the first character of the delimeter inside the string*/
*(str + i) = '\0';
/*so know if some one want to print their initial string after this function call even though the string hasn't been reduced by actual size the rest of the characters beyond the first token will be invisible, pretty sweet right?*/

This concludes our first half of the job, but this isn't the only task. Our function must be able to return the second token when called with a NULL pointer as its string. so let work on that

char *_new = NULL;
/*first check if str is NULL*/
if (!str || !*str)
{
/*then make sure save isnt empty other wise it means that there are no more tokens*/
if (!save || !*save)
return (NULL);
/*we will now do similar comparision like the previous one but this time with save since save is thee static variable that is holding the rest of the token from the previous call*/ while (_strcmp(save + i, delimeter) != 1 && *(save + i) != ‘\0’)
i++;
if (*(save + i) == ‘\0’)
return (save);
_new = save;
*(save + i) = ‘\0’;
save = save + i + strlen(delimeter);
return (_new);}
Photo by Fachy Marín on Unsplash

Congratulations, you have been able to write a working strtok function with the help of the strlen function (which we could have just wrote our selves with just 10 lines of codes) and your curiosity. Congratulations! If you want the complete code in one file here is a github link.

This piece of code is part of a larger C Project that is intended to write a simple shell by using as small as possible standard libraries. If you are into more of this Concretization, stay tuned. Also don’t forget to comment, insult, support and say whatever you want!! looking forward to seeing you again.!!

--

--

Hileamlak Yitayew

CS@Harvard| Founder@Oban| Curiosity| Tech| Entrepreneurship