Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
162 views
in Technique[技术] by (71.8m points)

c++ - Differentiate String Literal from Char Array

I want to write some function that takes a string literal - and only a string literal:

template <size_t N>
void foo(const char (&str)[N]);

Unfortunately, that is too expansive and will match any array of char - whether or not it's a true string literal. While it's impossible to tell the difference between these at compile-time - without having to resort to requiring the caller to wrap the literal/array - at run-time, the two arrays will be in entirely different places in memory:

foo("Hello"); // at 0x400f81

const char msg[] = {'1', '2', '3'};
foo(msg); // at 0x7fff3552767f

Is there a way to know where in memory the string data could live so that I could at least assert that the function takes a string literal only? (Using gcc 4.7.3, but really a solution for any compiler would be great).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You seem to assume that a necessary trait of a "true string literal" is that the compiler bakes it into the static storage of the executable.

This is not actually true. The C and C++ standards guarantee us that a string literal shall have static storage duration, so it must exist for the life of the program, but if a compiler can arrange this without placing the literal in static storage, it is free to do so, and some compilers sometimes do.

However, it's clear that the property you want to test, for a given string literal, is whether it is in fact in static storage. And since it need not be in static storage, as far as the language standards guarantee, there can't be any solution of your problem founded solely on portable C/C++.

Whether a given string literal is in fact in static storage is the question of whether the address of the string literal lies within one of the address ranges that get assigned to linkage sections that qualify as static storage, in the nomenclature of your particular toolchain, when your program is built by that toolchain.

So the solution I suggest is that you enable your program to know the address ranges of those of its own linkage sections that qualify as static storage, and then it can test whether a given string literal is in static storage by obvious code.

Here is an illustration of this solution for a toy C++ project, prog built with the GNU/Linux x86_64 toolchain (C++98 or better will do, and the approach is only slightly more fiddly for C). In this setting, we link in ELF format, and the linkage sections we will deem static storage are .bss (0-initialized static data), .rodata (read-only static static) and .data (read/write static data).

Here are our source files:

section_bounds.h

#ifndef SECTION_BOUNDS_H
#define SECTION_BOUNDS_H
// Export delimiting values for our `.bss`, `.rodata` and `.data` sections
extern unsigned long const section_bss_start;
extern unsigned long const section_bss_size;
extern unsigned long const section_bss_end;
extern unsigned long const section_rodata_start;
extern unsigned long const section_rodata_size;
extern unsigned long const section_rodata_end;
extern unsigned long const section_data_start;
extern unsigned long const section_data_size;
extern unsigned long const section_data_end;
#endif

section_bounds.cpp

// Assign either placeholder or pre-defined values to 
// the section delimiting globals.
#ifndef BSS_START
#define BSS_START 0x0
#endif
#ifndef BSS_SIZE
#define BSS_SIZE 0xffff
#endif
#ifndef RODATA_START
#define RODATA_START 0x0
#endif
#ifndef RODATA_SIZE
#define RODATA_SIZE 0xffff
#endif
#ifndef DATA_START
#define DATA_START 0x0
#endif
#ifndef DATA_SIZE
#define DATA_SIZE 0xffff
#endif
extern unsigned long const 
    section_bss_start = BSS_START;
extern unsigned long const section_bss_size = BSS_SIZE;
extern unsigned long const 
    section_bss_end = section_bss_start + section_bss_size;
extern unsigned long const 
    section_rodata_start = RODATA_START;
extern unsigned long const 
    section_rodata_size = RODATA_SIZE;
extern unsigned long const 
    section_rodata_end = section_rodata_start + section_rodata_size;
extern unsigned long const 
    section_data_start = DATA_START;
extern unsigned long const 
    section_data_size = DATA_SIZE;
extern unsigned long const 
    section_data_end = section_data_start + section_data_size;

cstr_storage_triage.h

#ifndef CSTR_STORAGE_TRIAGE_H
#define CSTR_STORAGE_TRIAGE_H

// Classify the storage type addressed by `s` and print it on `cout`
extern void cstr_storage_triage(const char *s);

#endif

cstr_storage_triage.cpp

#include "cstr_storage_triage.h"
#include "section_bounds.h"
#include <iostream>

using namespace std;

void cstr_storage_triage(const char *s)
{
    unsigned long addr = (unsigned long)s;
    cout << "When s = " << (void*)s << " -> "" << s << '"' << endl;
    if (addr >= section_bss_start && addr < section_bss_end) {
        cout << "then s is in static 0-initialized data
";
    } else if (addr >= section_rodata_start && addr < section_rodata_end) {
        cout << "then s is in static read-only data
";     
    } else if (addr >= section_data_start && addr < section_data_end){
        cout << "then s is in static read/write data
";
    } else {
        cout << "then s is on the stack/heap
";
    }       
}

main.cpp

// Demonstrate storage classification of various arrays of char 

#include "cstr_storage_triage.h"

static char in_bss[1];
static char const * in_rodata = "In static read-only data";
static char in_rwdata[] = "In static read/write data";  

int main()
{
    char on_stack[] = "On stack";
    cstr_storage_triage(in_bss);
    cstr_storage_triage(in_rodata);
    cstr_storage_triage(in_rwdata);
    cstr_storage_triage(on_stack);
    cstr_storage_triage("Where am I?");
    return 0;
}

Here is our makefile:

.PHONY: all clean

SRCS = main.cpp cstr_storage_triage.cpp section_bounds.cpp 
OBJS = $(SRCS:.cpp=.o)
TARG = prog
MAP_FILE = $(TARG).map

ifdef AGAIN
BSS_BOUNDS := $(shell grep -m 1 '^.bss ' $(MAP_FILE))
BSS_START := $(word 2,$(BSS_BOUNDS))
BSS_SIZE := $(word 3,$(BSS_BOUNDS))
RODATA_BOUNDS := $(shell grep -m 1 '^.rodata ' $(MAP_FILE))
RODATA_START := $(word 2,$(RODATA_BOUNDS))
RODATA_SIZE := $(word 3,$(RODATA_BOUNDS))
DATA_BOUNDS := $(shell grep -m 1 '^.data ' $(MAP_FILE))
DATA_START := $(word 2,$(DATA_BOUNDS))
DATA_SIZE := $(word 3,$(DATA_BOUNDS))
CPPFLAGS += 
    -DBSS_START=$(BSS_START) 
    -DBSS_SIZE=$(BSS_SIZE) 
    -DRODATA_START=$(RODATA_START) 
    -DRODATA_SIZE=$(RODATA_SIZE) 
    -DDATA_START=$(DATA_START) 
    -DDATA_SIZE=$(DATA_SIZE)
endif

all: $(TARG)

clean:
    rm -f $(OBJS) $(MAP_FILE) $(TARG)

ifndef AGAIN
$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp

$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1
else
$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)
endif

Here is what make looks like:

$ make
g++    -c -o main.o main.cpp
g++    -c -o cstr_storage_triage.o cstr_storage_triage.cpp
g++    -c -o section_bounds.o section_bounds.cpp
g++ -o prog  -Wl,-Map=prog.map main.o cstr_storage_triage.o section_bounds.o 
touch section_bounds.cpp
make AGAIN=1
make[1]: Entering directory `/home/imk/develop/SO/string_lit_only'
g++  -DBSS_START=0x00000000006020c0 -DBSS_SIZE=0x118 -DRODATA_START=0x0000000000400bf0
 -DRODATA_SIZE=0x120 -DDATA_START=0x0000000000602070 -DDATA_SIZE=0x3a
  -c -o section_bounds.o section_bounds.cpp
g++ -o prog  main.o cstr_storage_triage.o section_bounds.o

And lastly, what prog does:

$ ./prog
When s = 0x6021d1 -> ""
then s is in static 0-initialized data
When s = 0x400bf4 -> "In static read-only data"
then s is in static read-only data
When s = 0x602090 -> "In static read/write data"
then s is in static read/write data
When s = 0x7fffa1b053a0 -> "On stack"
then s is on the stack/heap
When s = 0x400c0d -> "Where am I?"
then s is in static read-only data

If it's obvious how this works, you need read no further.

The program will compile and link even before we know the addresses and sizes of its static storage sections. It would need too, wouldn't it!? In that case, the global section_* variables that ought to hold these values all get built with place-holder values.

When make is run, the recipes:

$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1

and

$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp

are operative, because AGAIN is undefined. They tell make that in order to build prog it must first build the linker map file of prog, as per the second recipe, and then re-timestamp section_bounds.cpp. After that, make is to call itself again, with AGAIN defined = 1.

Excecuting the makefile again, with AGAIN defined, make now finds that it must compute all the variables:

BSS_BOUNDS
BSS_START
BSS_SIZE
RODATA_BOUNDS
RODATA_START
RODATA_SIZE
DATA_BOUNDS
DATA_START
DATA_SIZE

For each static storage section S, it computes S_BOUNDS by grepping the linker map file for the line that reports the address and size of S. From that line, it assigns the 2nd word ( = the section address) to S_START, and the 3rd word ( = the size of the section) to S_SIZE. All the section delimiting values are then appended, via -D options to the CPPFLAGS that will automatically be passed to compilations.

Because AGAIN is defined, the operative recipe for $(TARG) is now the customary:

$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)

But we touched section_bounds.cpp in the parent make; so it has to be recompiled, and therefore prog has to be relinked. This time, when section_bounds.cpp is compiled, all the section-delimiting macros:

BSS_START
BSS_SIZE
RODATA_START
RODATA_SIZE
DATA_START
DATA_SIZE

will have pre-defined values and will not assume their place-holder values.

And those predefined values will be correct because the second linkage adds no symbols to the linkage and removes none, and does not alter the size or storage class of any symbol. It just assigns different values to symbols that were present in the first linkage. Consequently, the addresses and sizes of the static storage sections will be unaltered and are now known to your program.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...