c++ - Strings. TCHAR LPWCS LPCTSTR CString. Whats what here, simple quick

Question

Welcome To Ask or Share your Answers For Others

c++ - Strings. TCHAR LPWCS LPCTSTR CString. Whats what here, simple quick

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - Strings. TCHAR LPWCS LPCTSTR CString. Whats what here, simple quick

TCHAR szExeFileName[MAX_PATH]; 
GetModuleFileName(NULL, szExeFileName, MAX_PATH);

CString tmp;
lstrcpy(szExeFileName, tmp);
CString out;
out.Format("
Install32 at %s
", tmp);
TRACE(tmp);

Error (At the Format):

error C2664: 'void ATL::CStringT<BaseType,StringTraits>::Format(const wchar_t 
*,...)' : cannot convert parameter 1 from 'const char [15]' to 'const wchar_t

I'd just like to get the current path that this program was launched from and copy it into a CString so I can use it elsewhere. I am currently just try to get to see the path by TRACE'ing it out. But strings, chars, char arrays, I can't ever get all the strait. Could someone give me a pointer?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:56:08+0000

The accepted answer addresses the problem. But the question also asked for a better understanding of the differences among all the character types on Windows.

Encodings

A char on Windows (and virtually all other systems) is a single byte. A byte is typically interpreted as either an unsigned value [0..255] or a signed value [-128..127]. (Older C++ standards guarantees a signed range of only [-127..127], but most implementations give [-128..127]. I believe C++11 guarantees the larger range.)

ASCII is a character mapping for values in the range [0..127] to particular characters, so you can store an ASCII character in either a signed byte or an unsigned byte, and thus it will always fit in a char.

But ASCII doesn't have all the characters necessary for most languages, so the character sets were often extended by using the rest of the values available in a byte to represent the additional characters needed for certain languages (or families of languages). So, while [0..127] almost always mean the same thing, values like 150 can only be interpreted in the context of a particular encoding. For single-byte alphabets, these encodings are called code pages.

Code pages helped, but they didn't solve all the problems. You always had to know which code page a particular document used in order to interpret it correctly. Furthermore, you typically couldn't write a single document that used different languages.

Also, some languages have more than 256 characters, so there was no way to map one char to one character. This led to the development of multi-byte character encodings, where [0..127] is still ASCII, but some of the other values are "escapes" that mean you have to look at some number of following chars to figure out what character you really had. (It's best to think of multi-byte as variable-byte, as some characters require only one byte while other require two or more.) Multi-byte works, but it's a pain to code for.

Meanwhile, memory was becoming more plentiful, so a bunch of organizations got together and created Unicode, with the goal of making a universal mapping of values to characters (for appropriately vague definitions of "characters"). Initially, it was believed that all characters (or at least all the ones anyone would ever use) would fit into 16-bit values, which was nice because you wouldn't have to deal with multi-byte encodings--you'd just use two bytes per character instead of one. About this time, Microsoft decided to adopt Unicode as the internal representation for text in Windows.

WCHAR

So Windows has a type called WCHAR, a two-byte value that represents a "Unicode" "character". I'm using quotation marks here because Unicode evolved past the original two-byte encoding, so what Windows calls "Unicode" isn't really Unicode today--it's actually a particular encoding of Unicode called UTF-16. And a "character" is not as simple a concept in Unicode as it was in ASCII, because, in some languages, characters combine or otherwise influence adjacent characters in interesting ways.

Newer versions of Windows used these 16-bit WCHAR values for text internally, but there was a lot of code out there still written for single-byte code pages, and even some for multi-byte encodings. Those programs still used chars rather than WCHARs. And many of these programs had to work with people using older versions of Windows that still used chars internally as well as newer ones that use WCHAR. So a technique using C macros and typedefs was devised so that you could mostly write your code one way and--at compile time--choose to have it use either char or WCHAR.

TCHAR

To accomplish this flexibility, you use a TCHAR for a "text character". In some header file (often <tchar.h>), TCHAR would be typedef'ed to either char or WCHAR, depending on the compile time environment. Windows headers adopted conventions like this:

LPTSTR is a (long) pointer to a string of TCHARs.
LPWSTR is a (long) pointer to a string of WCHARs.
LPSTR is a (long) pointer to a string of chars.

(The L for "long" is a leftover from 16-bit days, when we had long, far, and near pointers. Those are all obsolete today, but the L prefix tends to remain.)

Most of the Windows API functions that take and return strings were actually replaced with two versions: the A version (for "ANSI" characters) and the W version (for wide characters). (Again, historical legacy shows in these. The code pages scheme was often called ANSI code pages, though I've never been clear if they were actually ruled by ANSI standards.)

So when you call a Windows API like this:

SetWindowText(hwnd, lptszTitle);

what you're really doing is invoking a preprocessor macro that expands to either SetWindowTextA or SetWindowTextW. It should be consistent with however TCHAR is defined. That is, if you want strings of chars, you'll get the A version, and if you want strings of WCHARs, you get the W version.

But it's a little more complicated because of string literals. If you write this:

SetWindowText(hwnd, "Hello World");  // works only in "ANSI" mode

then that will only compile if you're targeting the char version, because "Hello World" is a string of chars, so it's only compatible with the SetWindowTextA version. If you wanted the WCHAR version, you'd have to write:

SetWindowText(hwnd, L"Hello World");  // only works in "Unicode" mode

The L here means you want wide characters. (The L actually stands for long, but it's a different sense of long than the long pointers above.) When the compiler sees the L prefix on the string, it knows that string should be encoded as a series of wchar_ts rather than chars.

(Compilers targeting Windows use a two-byte value for wchar_t, which happens to be identical to what Windows defined a WCHAR. Compilers targeting other systems often use a four-byte value for wchar_t, which is what it really takes to hold a single Unicode code point.)

So if you want code that can compile either way, you need another macro to wrap the string literals. There are two to choose from: _T() and TEXT(). They work exactly the same way. The first comes from the compiler's library and the second from the OS's libraries. So you write your code like this:

SetWindowText(hwnd, TEXT("Hello World"));  // compiles in either mode

If you're targeting chars, the macro is a no-op that just returns the regular string literal. If you're targeting WCHARs, the macro prepends the L.

So how do you tell the compiler that you want to target WCHAR? You define UNICODE and _UNICODE. The former is for the Windows APIs and the latter is for the compiler libraries. Make sure you never define one without the other.

Categories

c++ - Strings. TCHAR LPWCS LPCTSTR CString. Whats what here, simple quick

c++ - Strings. TCHAR LPWCS LPCTSTR CString. Whats what here, simple quick

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags