C String Question

Strings in C can be printed to stdout by using printf function with the specifier %s. If strings are arrays of characters, then the name of the string should be the address of the first element in the char array (similar to an array of integers)? When one uses

printf("The value of char h is %s", h)

after

char h]="Hello world!"

has been defined, printf does not print the address of the first element in the array h, but the string itself. I assume there are two distinct conventions: in the case of an array of integers, the name of the array is the address of the first element (a constant pointer to the array, because it cannot point to something else apart from that array), whereas in the case of an array of characters, the name of the array is not the address of the first element in the array? Or is it because printf with %s expects as argument a pointer to the string (the address of the array of characters, at which the string is located)?

The value of h, where h is an array, is always the address of element index 0, no matter what type the array is. %s takes the address and interprets it as the start of a sequence of characters to print until a null character is seen. %s never prints an address, if you want that there is %p.

So these three printfs generate the same output:

char h]="Hello world";
printf("%s", h);
printf("%s", &h[0]);
char *hh = "Hello world";
printf("%s", hh);

This will print ello world:

printf("%s", &h[1]);

Right. I see if in your example the value of pointer hh is incremented

hh++

, it will point to the next element in the char array, so

printf("%s", &h[1])

before incrementation will be the same as

printf("%s",&h[0])

or

printf("%s",h)

after the incrementation. It basically has to do with the interpretation that %s gives to the address which is passed to it.

I think you get the idea but &h[1] in sample 2 is obviously not the same value as &h[0] in sample 3. Not sure what you mean to write in sample 2.

Yes, it was a mistake. The address operator is used in relation with the first array, not with the pointer to array - no need to put it in front of the pointer, as it is already an address.

I noticed that char *hh cannot be used to change the individual characters in the array (with

*(hh+1)='E'

for example), as it gives segmentation fault.

That’s because in ANSI C, a string constant such as “Hello world” in the statement below

char *hh = "Hello world";

is an immutable array. The compiler puts such read-only objects in a protected memory segment such that modification causes a segmentation fault. However in this statement:

char h] = "Hello world";

the string constant is an initialiser of a mutable array so you can modify the elements.

Also have a look at the const qualifier in variable declarations.

Incidentally hh[1] has the same effect as *(hh+1) in your statement.

The behavior when trying to modify a string literal is undefined. gcc gives you a segmentation fault and another compiler could format you hard drive. gcc gives a segmentation fault because the string is stored in the .rodata (read only data) section of the ELF file. All the data in that section is protected at a very low level as a security measure.

Use ‘const char *hh = “Hello world”;’ instead of ‘char *hh = “Hello world”;’ to avoid surprises. But anyway ‘const char h]=“Hello world!”’ is better since you don’t use memory for a pointer.

I found this out by messing around with subscripts, which I first thought to be used only with “standard” arrays. :slight_smile:

so ‘const char *hh=“Hello world”;’ would be a pointer to a string literal (I cannot modify the value that it points to, namely the individual characters, but still can change the location stored in the pointer). Ok.

Consider these two declarations:

char *hh = "Hello world";
const char *hh = "Hello world";

In the first case you are assigning an object of type const char[12] to char *. The compiler might warn you that you are removing the constness but this is a common legacy use so may not, to avoid a flood of warnings. In the second case if you try to modify the string via hh, you will get a compiler error, which is better than a runtime error.

In both case you can reassign hh to point somewhere else. If you wanted an immutable pointer, you would write:

char *const hh = "Hello world";

and of course you can have both the pointer and the target immutable:

const char *const hh = "Hello world";

When the compiler interprets a string (let’s say “Hello world!”), it takes it as a pointer to an array of char, right? So in the new assignment

hh="Hi!"

, hh will get the address at which character ‘H’ is stored. If “Hi!” is of type const char[4] (could also be written as {‘H’,‘i’,’!’,’\0’}), then it can be passed to a function as a pointer because it is interpreted as an address to the first character. I hope I am not erring.

When the compiler interprets a string (let’s say “Hello world!”), it takes it as a pointer to an array of char, right?

Pointer to an array of const char to be precise.

If “Hi!” is of type const char[4] (could also be written as {‘H’,‘i’,’!’,’\0’}), then it can be passed to a function as a pointer because it is interpreted as an address to the first character.

Yes. However strictly speaking, the formal argument that it corresponds to should also be declared const char *, otherwise you may get a warning from the compiler, and you are liable to get a segmentation fault if you try to modify the R/O string in the function via the pointer.

I guess another type of array of char would be an array of pointers to char, that’s why “const char” is needed. An “array of char”, not an “array of const char”, might imply that the characters can be modified? Isn’t the string literal always an array of const char?

Const is orthogonal to data type and arrays. Put const aside for the time being if you are having difficulty understanding the equivalence between arrays and pointers in C.

There is no difference in C between a pointer to a single char and a pointer to the start of an array of char, i.e. this is legal:

char c;
char *cp;
cp = &c;
cp = "A string";

You could have an array of pointers to char, but would be a great waste of space if you were to use each pointer to point to only one char. And it would be inefficient to output. So the common use of an array of pointers to char is used for array of strings. E.g.:

char *messages] = {
    "Hello world",
    "I like C",
    "Bye for now",
};

printf("%s", messages[0]);

I’m just striving to get a clear picture. Anyway, I only said that a string literal is already a constant pointer ('cause it points to a certain array of char, which you said it was precisely an array of constant char, so the location of the array cannot be changed). Thank you.

A string literal has different meanings depending on where it appears in the program. It can appear in an initializer of an array for example, and it may not correspond to an address there.

When it appears as a rvalue, e.g. RHS of an assignment or argument of a function, it has type pointer to const char as the content of the string is immutable. Since it is a literal and not associated with a name, the address at which it is stored is constant and there is no syntax to change that. (However you cannot assume that multiple instances of string literals in the program with the same content are at distinct addresses. The compiler is entitled to fold them into a single address in memory.)

If you understand the equivalence between pointers and arrays, then you should be able to predict the value of c after this statement:

char c = "Hello world"[1];

A const pointer variable is something else. It is a pointer variable which is not allowed to be changed after initialization. The object it points to may or may not be const also. When it is, then you use the declaration syntax: const type *const name.

Sure. One last thing. If the string “Hello!” points to the first character ‘H’ (its value is the address of ‘H’), and it is a pointer to a constant char, is the address at which “Hello!” is located the same address which it points to? I know that a pointer usually is located at a different address from the address which it points to. But every time I want to print with %p the address of the string, it prints the address that it points to (the address of ‘H’). What is the address of the pointer “Hello!” (not the address where ‘H’ is located, i.e. the value of the pointer “Hello!”, but the address at which this address is stored)? Are they the same?

Things become clearer when you use the correct terminology. The literal “Hello!”, when it appears in a program as a rvalue, has the value of the address of the first char. There is no additional pointer variable. So %p will print the starting address of the string.

A special note on strings: HowStuffWorks “The Basics of C Programming”