Neil's C++ Stuff



C++ Strings Tutorial

I only uploaded this unfinished version because I want to show that I have something, even if it is little. And what does size matter anyway! Its adequate! Anyway, about this document, only the first section is finished. But as you can see I have plans for the remaining ones.

Quick Descript & Reaquaintance

A string quite simply is a bunch of characters stuck together, one after the other, that usually form some kind of word or statement. That's why they call it a string because its a string of characters. And strings themselves are actually character arrays (char []) (don't look so surprised).

One difference being, however, is that strings require an extra subscript for a NULL character which is a character that basically symbolizes the END of the string. In our case this is zero (0). You can think of NULL in this situation like a cap on a bottle to keep the liquid from spilling out. Also the example I'll be using is the knot at the end of a string to keep the beads from falling off. In hopes to prevent you from being utterly confused I'm going to explain the true nature of character variables.

A character variable (or char as we have come to know them) is actually much like an integer in the fact that it can store and be used as a numeral value. The difference actually being that since a character is only 1 byte that it stores far less values. Truth be known, characters can only store 256 different values. Yes, 256 different values. In our scenario of just using a normal char the values it can store range from -128 to +127. That means the following is OK:

char c = -128;
c = 127;

But these next lines are NOT correct. When I said 256 values I did not mean that you can store the value 256 in a character because you can't, ever:

char c = 256;

What we're actually doing when we use the single quotes or apostrophes (') is having the compiler put in the number version of the ASCII character we specify. I can already hear the "huh wuh"'s so I'll explain further (please be seated). Say we do the following:

char c = 'A';

That is actually telling the compiler to look up the ASCII code for A and stick it into c. If we knew what the ASCII code was for A we could have put that instead, but it makes programs harder to understand when someone else is going through them. If you want to know what the ASCII code for something is you could use something like the following (which tells us the code for A):

char c = 'A';
cout << "The ASCII Code for " << c << " is " << (int)c;

In fact try this out. By putting the (int) in front of c we specifically say we want to print the number and not the ASCII character. When putting a numeral value into a character value the same rules apply as when you're putting a numeral value into an integer. Just remember to not put values larger than 127 or less than -128 into chars. Also do not put any kind of quotations around the numbers. The following are wrong:

char c = '107'; // WRONG
char c = "107"; // WRONG

Parenthesis are of course perfectly legal around anything so the following would be okay:

char c = (107);
char c = ((((107))));

Pretty strange looking, but that will work just fine. So by this point you should understand the use of character variables very well before moving on.

String Like Beads

A string in essence now is a bunch of characters sitting back to back against each other. Kind of like beads. We know that to represent the alphabetic character A that we put it in single quotes (apostrophes): 'A'. And by reading the above section you will know that this simply represents the ASCII value of A (which is 65).

The characters in your string are back to back one after the other. So putting an actual word in the string is done a character at a time from left to right, just like typing (unless your a bloody nut case escaped from the booby hatch). Say we represent characters with beads:

Smile Bead

And we want to make a series of beads (characters) to represent the word "bead". Pretty original, I should make movies. Anyway, when you type the word you type 'b', then you type 'e', and so on and so forth. This is exactly the way you put characters into a string, one at a time, left to right:

'b' 'e' 'a' 'd' '\0'

Now you may ask, what the heck is that last slash-'oh' on there for? First, its not an O as in oh. It's a slash-zero or NULL character. We can represent a NULL character as either '\0' or just plain 0. Note, one is surrounded by single-quotes and contains a backslash while the other is a single numeral. Recall that surrounding something in apostrophes (sounds religious) means to have the compiler look up the ASCII value for the character you put there. The value in this case is zero. Ahhhhh, yeah now you understand, right, RIGHT?! Just like 65 is the number for 'A', 0 is the number for '\0'. Why is there a backslash in front? This means its an escape character (like '\n' or '\t'). We couldn't very well use '0' because that ISN'T zero, that's an ASCII zero character. So, in the end we must use the backslash zero. *phew* I'll be using '\0' because its much more common.

Getting back to the underlying concept ... a NULL character is only slightly different than a NULL pointer. If you don't know how to use pointers then skip to the next paragraph, otherwise if you do. A NULL pointer is a pointer variable that points to nothing, void, nadda, zilch (I've watched Assassins too many times). In the past everyone assumed that a NULL pointer pointed to memory address 0x0 but these days I hear Nazi coders everywhere preaching that that is no longer necessarily the case. Either way, I try not to confuse a NULL character with a NULL pointer.

The NULL character tells us where the end of the string is. When you're hopping along, letter to letter, left to right you need to know where to stop. Otherwise you "fall off the edge" so to speak. Compilers don't say that, they scream, "array bounds overflow" if you're lucky. If you're not lucky and your program compiles but has one of these, pray to the computer gods.

See, otherwise the computer wouldn't know where to stop when its peaking visciously at your itty bitty string. It would keep reading letter by letter until it did come to a NULL character which may be much further down the line. In any case, its the lid on the bottle, the stopper on the test tube, the plug in the butt, the latch on the toolbox, the finish line, getting the picture? It closes off the string.

Why is a NULL character zero? This I can only explain if you know how conditions work. A condition is TRUE if the value is not zero. Therefor a condition is FALSE if the value is zero. This is why we end with a zero. Because every number other than zero is basically TRUE while a NULL character is FALSE.

When you print a string to the screen it sifts through each individual character; placing each one on the screen one at a time. When it reaches the NULL character (zero) it stops (before printing that character). This is an important concept that it has to dance through EACH piece separately. Remember that arrays are made up of several pieces that basically live under one roof. Arrays cannot be delt with as one solid piece at a time. Chant this back to me, "Arrays cannot be used directly as a whole, they must be processed one part at a time".

That is why we use for loops to skim through each subscript. Nothing is different with strings. They may be glunked together to form a word/sentence/paragraph/story, but in the nitty of the gritty they are still just pieces of the puzzle.

C++ Strings Declaration

From this section on you will learn the actual C++ syntax of string usage. I'll start with the declaring of a string which is extremely simple. Thusfar in the tutorial we learned that strings are actually character arrays (char name[size]). Therefore, logically speaking (captain) we would declare a string in the same way we would declare a character array. I know you're thinking, "no way man! NO WAY MAN! that's just too crazy daddio!". But its true, I swear.

What you need to know when you declare a string is what to name it and how long the string inside will be. The length, I suggest (as in command!) as being the maximum number of characters you will store at a time plus 1. Most compilers these days get really freaked out if you don't add one to the size. Why? Because we need that extra one for our NULL character. If you're at a loss for explanation then either you haven't read the above section or your drinking too much Pepsi.

So anyway, we want to declare a string called name which will later contain the name "Leo". With our beads this would have to look like:

'L' 'e' 'o' '\0'

Now kiddies, count the beads. Ooh ooh, I know I know! There's FOUR beads! Correct! So, when we declare our character array or string, we'll type it:

char name[4];

When all else fails, count the beads. Its just like when you're balancing your checkbook and for some reason you can't figure out what -103 dollars minus 15 is. Well, its not exactly like that but you either count your fingers (or pull out your handy-dandy pocket calculator).

You don't necessarily have to make the length of the string (size of the character array) the exact the length (plus one) of the characters you wish to store in it. But you must make it at least that. I call this the minimum or necessary length. In our "Leo" example the string must be declared as at LEAST 4, but it could have been more:

char name[4000];

That of course is ludicrous, but there may be times when you should declare it longer than the string will actually be. This allows you to, later on in your program, store a string longer than your original.

Let's examine this a bit further, just so we're clear. We declare a string to hold up to 10 characters:

10 character beads

But we only need three (3) of these to store then letters 'L', 'e', and 'o'; and then one more for the NULL character '\0'. But that's okay. This is how it would look after being filled with our letters:

'L' 'e' 'o' '\0' [10]

We still only use three letters and stop before using the fourth (thanks to our friend the NULL character). The rest are untouched, but are available. So if we needed to use them we could. The only thing you'd have to remember to do is move the NULL character back to the end of your string's contents. For example the following would still be a string of three letters to the compiler because of the first NULL character's interference:

'L' 'e' 'o' '\0' 'L' 'i' 'o' 'n' '\0' [10]

The compiler stops at the first NULL character it finds so putting multiple ones in is useless (at this point) and disrupting. Because of the way the above is, your program will never find the letters past the first '\0'. This would have been the correct way:

'L' 'e' 'o' 'L' 'i' 'o' 'n' '\0' [10]

The only way we can procede now is to learn how to actually put these characters into our string! On to the next section!

Filling C++ Strings

You already know how to put a single number into an array:

array[subscript] = value;

And we know that a string is just a character array. To put a single character value into one part we'd probably write something like:

string[subscript] = 'letter';

Need an example? As i suspected. Above we had made a string to hold the name "Leo". To put the 'L' in as the first letter, we'd stick that character in the first part of the array (subscript zero):

name[0] = 'L';

And we could go down the line and put in all the letters, including the NULL character, like this:

name[0] = 'L';
name[1] = 'e';
name[2] = 'o';
name[3] = '\0';

That is, of course, after we have declared the array. This is also a pain in the arse if you haven't noticed. What if we wanted to make a string for each of the fifty states and then put the letters in for each? Can you say, "Oh ******* *!"? We could initialize the array with a bunch of characters:

char name[4] = { 'L', 'e', 'o', '\0' };

But that is also a hassle. There actually is a better way (thank god). We first surround the entire statement (one character after another in order) with double quotation marks, like so:

"Leo\0"

And then second we make our string variable "equal" this upon initialization:

char name[4] = "Leo\0";

This is almost totally correct. But this still will give you a warning when you compile it. The compiler will tell you that you're trying to initialize the array with too many members or something. This is because when you perform the intialization of a string like this, it automagically puts in the NULL character at the end for you. So to make this work ever so well, it'd be:

char name[4] = "Leo";

So, you'd follow a format like this:

char string[length] = "content";

This definetly makes a lot of things faster and simpler. However, you can ONLY fill a string like this when you first initialize it. The following will not work:

char name[4];
name = "Leo"; // FAIL

Yes, you may be thinking, "But NEIL *nasal voice*, it COMPILES, what's WRONG about it?". *ack!* Let me tell you buddy boy, you will encounter problems later on if you try doing this. In the next paragraph I attempt explain why, but it deals with memory so you can skip it if you want. Just plug it into your brain that it does not work!

In declaring your string as char [4]; we set aside a place in memory with 4 bytes (one for each seperate character). When you put "Leo" the compiler sets aside a place in memory to store that which is also 4 characters. When we type simply name = "Leo" it actually makes name represent the same place in memory as "Leo". It does not copy the values from the constant string to the variable. This causes problems!

A list of characters inside the double quotation marks is called a constant string. Constant strings can be used almost exactly like a string variable (character array) because they represent the same thing:

"Leo" is equivalent to

What you can't do with constant strings is change their value. The word constant means "staying the same" or "unchanging". But of course it'd be almost pure folly to try and change a constant string anyway.

Remember that when you initialize a string variable using a constant string (words held between double quotes) you don't need to put in the NULL character because constant strings come with 'em built in. However, when you initialize one character at a time you must put the NULL character in.

At this point you only know how to initialize a string variable to a whole constant string. The only way to change the data in the string after initializing would be to access each subscript individually. Like in the following example:

/* -example 1--------------------------------------------------------------- */

void main()
{
  char name[4] = "Leo";

  name[0] = 'C';
  name[1] = 'a';
  name[2] = 't';
}

/* ------------------------------------------------------------------------- */

In the above we create a string variable (name - character array) that can hold three alphanumeric symbols and a NULL character. We initialize it with the value "Leo". Later on (one line down) we wanted to put the value "Cat" into our string. We could NOT have said:

name = "Cat";

So instead we had to reference each subscript individually one by one. Yes, I can almost feel your look of disgust right now. And the age old comment comes to mind, "There is a better way!". But anyway, notice that I didn't put in the NULL character when I was changing the value to "Cat". I didn't have to because I already know that the NULL character was in subscript 3 and that aspect didn't change cuz we're storing another three-letter word.

Here's another example (same thing basically) where we had to put in the NULL character because the string's value became longer:

/* -example 2--------------------------------------------------------------- */

void main()
{
  char name[5] = "Leo";

  name[0] = 'L';
  name[1] = 'i';
  name[2] = 'o';
  name[3] = 'n';
  name[4] = '\0';
}

/* ------------------------------------------------------------------------- */

Here we had to replace the NULL character because we overwrote it. In the beginning when we initialize name to "Leo" the NULL character is in subscript 3. But then later on we overwrite subscript 3 with 'n' which leaves the string wide open, cap-less, lid-less, etc. So we add another one on, just beyond the last letter. Also notice I made the character array 5 long. This was necessary because although I only need to store "Leo" at first, later on I needed to store "Lion".

C++ Strings Output/Input

Before going any further you will now learn how to output your strings to the screen via cout and fill them with user input a la cin. This process is actually really easy and I anticipate a short section here.

You already know how to cout a single character, correct? You simply cout followed by the character variable. Outputting strings works exactly the same way, just make sure you have the NULL character in there or you may see some really weird output.

/* -example 3--------------------------------------------outputting-strings- */

#include <iostream.h>

void main()
{
  char name[5] = "Leo";

  cout << "1st value = " << name << endl;

  name[0] = 'L';
  name[1] = 'i';
  name[2] = 'o';
  name[3] = 'n';
  name[4] = '\0';

  cout << "2nd value = " << name << endl;
}

/* ------------------------------------------------------------------------- */

Don't stop cheering, it really is that easy. Just cout with the name of the string. Wow, and you thought that hot pockets were easy to make!

Now on to the input. This gets a little bit more complicated. Now, don't get scared because its not altogether that tough. There are, however, a few quirks you should be aware of.

The basic syntax of getting user input via cin is the same as cout. You cin followed by the name of the string:

/* -example 4----------------------------------------------inputing-strings- */

#include <iostream.h>

void main()
{
  char name[21];

  cout << "Type in your first name (20 characters or less please)" << endl
       << "-->";

  cin  >>name;

  cout << endl << "So your name is " << name
       << "?  That's a pretty name, how 'bout we go out sometime?" << endl;
}

/* ------------------------------------------------------------------------- */

There are several points this program makes. First, make sure your user doesn't type in more characters than your array can handle (bad things happen). Second, it only asks for the user's first name. When you try to type in spaces with a standard cin statement it fargles things up because it thinks that every word (blocks of letters seperated by spaces) belongs in a seperate string.

You ask, "Is there a better way?" To which I reply, "Need you even ask!".

If we use cin's get() function we can prevent the user from sticking too many characters in and allow the input of spaces! We call this function with two parameters, the first being the name of your string, the second being the amount of characters you will allow the user to type in, plus one. We add one to the second parameter because the function thinks what you're passing it is the length of the array (including the NULL character).

So we could have compiled the same program above, but replaced ...

cin >> name;

... with ...

cin.get(name, 21);

Which breaks down like this:

cin.get call console input (keyboard) function: get
(
name, first parameter: put whatever the user types in into this variable
21 second parameter: only allow the user to type in a maximum of twenty characters
);

I encourage you to try modifying the above example program with your newfound knowledge.

What happens when the user types in more characters than allowed? A truncation occurs. Only the allowed number of characters are used, the rest are forgotten. To elaborate, if we called the function like this:

cin.get(name, 21);

And then we typed in the following:

0123456789abcdefghijklmnopqrstuvwxyz

This is what would be put into the string:

0123456789abcdefghij

In conclusion I seriously recommend that you use get() when getting user input and sticking it into a string variable! Otherwise you may get some less than desirable results! Also remember that when you're getting input and output to the whole array you do not use a subscript! You only use the name of the string. When you put in a subscript value the compiler will think you're trying to input/output that single part of the array.

String-Filling Functions

Because there is no way to deal with strings as a whole (you must deal with individual subscripts) there are some really nifty functions for their manipulation (filling, clearing, etc.). You've already used one of them: get(), but this is only for getting user input. There are other ones to be used when you include the string.h header file. And there is a great many functions, but I'm only going to cover the following:

strset()

This function merely sets the whole string (up until the NULL character) to a specific character. We can use this to blank out a string, to fill it with zeros, etc. Just remember that it stops filling when it reaches the NULL character. See, I told you its like a cap. Calling this function is relatively painless. The first argument (parameter) you pass is the string variable you want to fill and the second argument is the character you wish to use in filling the string.

For example, going back to our "Leo" example. If we had wanted to fill it (name) with blanks, we could do this:

strset(name, ' ');

And if that example didn't completely quench you, here's another!

/* -example 5------------------------------------------------------strset()- */

#include <iostream.h>
#include <string.h>

void main()
{
  char name[21] = "Leo LionHeart";

  cout << "value = [" << name << "]" << endl;

  strset ( name , ' ');

  cout << "value = [" << name << "]" << endl;

  strset ( name , '0');

  cout << "value = [" << name << "]" << endl;
}

/* ------------------------------------------------------------------------- */

Pretty self explanatory, after compiling and running the above program your output will look something like:

[Leo LionHeart]
[             ]
[0000000000000]

memset()

The previous function we learned of is very nifty and yet at the same time limited. I normally use memset() over strset() because you get to exercise much more control. At the same time be sure to use it responsibly and correctly or you can find yourself in serious troubles. strset() is 99.9% dummy-proof, but memset() is more powerful.

It takes three parameters. The first is the string variable you want to fill. The second is the character you want to use. And the last is the number of characters you want to fill. You have to explicitly tell it the number of characters to fill. You could have it fill just one or ten thousand. This is where the responsibility comes into play. Always make sure that if you overwrite the NULL character that you put it back.

Enough talk, examples EXAMPLES! See, I can go on forever trying to describe but until you actually see for yourself it just doesn't make complete sense:

/* -example 6------------------------------------------------------memset()- */

#include <iostream.h>
#include <string.h>

void main()
{
  char name[11] = "Leo";

  cout << "value = [" << name << "]" << endl;

  // set the first three characters to underscores ('_')
  memset ( name , '_', 3);

  cout << "value = [" << name << "]" << endl;

  // since we clear OVER the NULL character, we have to replace it
  // again at the end of the string
  memset ( name, ' ', 10);
  name[10] = '\0';

  cout << "value = [" << name << "]" << endl;
}

/* ------------------------------------------------------------------------- */

Now, I am going to waste another 10k of web space to show you how this would look with our good ol' beads:

char name[11] = "Leo";

'L' 'e' 'o' '\0' [11]

memset ( name , '_', 3 );

'_' '_' '_' '\0' [11]

memset ( name , ' ', 10 );

char[11]

name[10] = '\0';

...'\0' char[11]

strcpy()

At this point you should be learned in the area of filling a string with any particular character. Starting with this function you'll learn how to fill the string with the value of another! Wheeeeeee!

Its called copying, in this case, "string copying". Hence the shortened name strcpy. In just a tidbit of knowledge you may be wondering why all the old functions have shortened names like this. Back in the olden golden days, names could only be a maximum of six characters (wow, isn't that so interesting). But who cares, cuz you can make them really long these days. Anyway ...

This function takes two parameters. The first being the string you are going to "copy" the value into. The second being the value that you are going to be copying. Speaking in analogical terms, the first parameter is the toaster, and the second is the toast. What this function simply does is replace the value in the first string with the value of the second. It "fills" it with the value of the second parameter.

For example, if we had a string filled with blanks and we wanted to copy a name into it: we'd put the blank string as the first parameter, and the name as the second:

strcpy(string, "name");

Let's try this out by making a string and copying the name "Leo LionHeart" into the string:

/* -example 7------------------------------------------------------strcpy()- */

#include <iostream.h>
#include <string.h>

void main()
{
  char name[21];

  memset ( name, ' ', 20);
  name[20] = '\0';

  cout << "value = [" << name << "]" << endl;

  strcpy ( name, "Leo LionHeart" );

  cout << "value = [" << name << "]" << endl;
}

/* ------------------------------------------------------------------------- */

Your output after compiling and executing the above program will look like this:

[                    ]
[Leo LionHeart]

Notice how there are no blanks after the name "Leo LionHeart". Remember earlier when I mentioned that constant strings carry their own NULL character? When you copy a constant string into your variable it also copies the NULL character. In this scenario our array is filled with twenty blanks. Next we copy 13 characters ("Leo LionHeart") as well as one (1) more character for the NULL character. So the subscripts 14 to 19 are still blanks but they are not seen anymore because they are beyond the new NULL character which is at 13.

memcpy()

This function is like memset() is to strset(). Its more powerful than strcpy() and you have to exercise responsibility or you can wind up saying some foul things. It takes three parameters (like memset()!). The first being the string you're going to copy into, the second is the value you're copying, and the third is the number of characters you want to copy.

The difference here is that strcpy() will copy all of the characters (including the NULL character) from the source into the destination. memcpy() on the other hand will only copy the number of characters that you specify. Take our previous example, and replace the strcpy() with a memcpy(). Or look at this:

/* -example 8------------------------------------------------------memcpy()- */

#include <iostream.h>
#include <string.h>

void main()
{
  char name[21];

  memset ( name, ' ', 20);
  name[20] = '\0';

  cout << "value = [" << name << "]" << endl;

  memcpy ( name, "Leo LionHeart" , 3 );

  cout << "value = [" << name << "]" << endl;
}

/* ------------------------------------------------------------------------- */

The output from this will be:

[                    ]
[Leo                 ]

Why? Because in our memcpy() we only said to copy 3 characters. See it breaks down like so:

memcpy Call the function.
(
name This is the destination. The characters from the source string will be copied to this variable
,
"Leo LionHeart" This is the source. The characters we copy will come from this constant string.
,
3 The number of characters to copy. This tells the program to copy the first THREE characters from the beginning of the source to the beginning of the destination.
);

memcpy Example Breakdown

Still confused? Worry not, I now present the first of my program break downs, statement by statement! Okay not precisely statement by statement, but you get the idea.

#include <iostream.h>
#include <string.h>

Includes the header file named "iostream.h" into your program. For our purposes we need it for cout. The second one includes the header file named "string.h". We'll be needing this for our memset and memcpy routines.

void main()
{

Begin the main program function. Nuf said!

char name[21];

Create a string (character array) that can store 20 characters (subscripts 0 to 19) and a NULL character (subscript 20).

memset ( name , ' ' , 20 );

Fill the first twenty (20) subscripts (0 to 19) of the array we called name, with blanks (' ').

name[20] = '\0';

Cap off the string with a NULL character to prevent array bounds overflow and so we know where the array (string) ends.

cout << "value = [" << name << "]" << endl;

Output the contents of the string in a nice way so we know it looks all perty.

memcpy ( name, "Leo LionHeart" , 3 );

Copy the first three characters from "Leo LionHeart" to the first three subscripts of name.

cout << "value = [" << name << "]" << endl;

Again, output the contents of the string. Then end the program.

Now, if that doesn't detail memcpy() then I don't know what can.

Common Pitfalls

Many people have tried to access the subscript beyond the last usable one in hopes to use the entire array at a time. Let me tell you, this does not work and will cause complications. For example:

char name[20];
cout << "Enter your name: ";
cin >> name[20];

What the above will actually do is put the first character you type into subscript 20 of name which doesn't exist! So if your lucky nothing will happen, but sometimes you may hit something sensitive and blow up your program (metaphorically speaking of course).


Contact

Comments or questions? Email me at webmaster@neilstuff.com.