<wchar.h>
Wide Character HandlingFunction | Description |
---|---|
btowc() |
Convert a single byte character to a wide character |
fgetwc() |
Get a wide character from a wide stream |
fgetws() |
Read a wide string from a wide stream |
fputwc() |
Write a wide character to a wide stream |
fputws() |
Write a wide string to a wide stream |
fwide() |
Get or set the orientation of the stream |
fwprintf() |
Formatted wide output to a wide stream |
fwscanf() |
Formatted wide input from a wide stream |
getwchar() |
Get a wide character from stdin |
getwc() |
Get a wide character from stdin |
mbrlen() |
Compute the number of bytes in a multibyte character restartably |
mbrtowc() |
Convert multibyte to wide characters restartably |
mbsinit() |
Test if an mbstate_t is in the initial conversion state |
mbsrtowcs() |
Convert a multibyte string to a wide character string restartably |
putwchar() |
Write a wide character to stdout |
putwc() |
Write a wide character to stdout |
swprintf() |
Formatted wide output to a wide string |
swscanf() |
Formatted wide input from a wide string |
ungetwc() |
Pushes a wide character back into the input stream |
vfwprintf() |
Variadic formatted wide output to a wide stream |
vfwscanf() |
Variadic formatted wide input from a wide stream |
vswprintf() |
Variadic formatted wide output to a wide string |
vswscanf() |
Variadic formatted wide input from a wide string |
vwprintf() |
Variadic formatted wide output |
vwscanf() |
Variadic formatted wide input |
wcscat() |
Concatenate wide strings dangerously |
wcschr() |
Find a wide character in a wide string |
wcscmp() |
Compare wide strings |
wcscoll() |
Compare two wide strings accounting for locale |
wcscpy() |
Copy a wide string dangerously |
wcscspn() |
Count characters not from a start at the front of a wide string |
wcsftime() |
Formatted date and time output |
wcslen() |
Returns the length of a wide string |
wcsncat() |
Concatenate wide strings more safely |
wcsncmp() |
Compare wide strings, length limited |
wcsncpy() |
Copy a wide string more safely |
wcspbrk() |
Search a wide string for one of a set of wide characters |
wcsrchr() |
Find a wide character in a wide string from the end |
wcsrtombs() |
Convert a wide character string to a multibyte string restartably |
wcsspn() |
Count characters from a set at the front of a wide string |
wcsstr() |
Find a wide string in another wide string |
wcstod() |
Convert a wide string to a double |
wcstof() |
Convert a wide string to a float |
wcstok() |
Tokenize a wide string |
wcstold() |
Convert a wide string to a long double |
wcstoll() |
Convert a wide string to a long long |
wcstol() |
Convert a wide string to a long |
wcstoull() |
Convert a wide string to an unsigned long long |
wcstoul() |
Convert a wide string to an unsigned long |
wcsxfrm() |
Transform a wide string for comparing based on locale |
wctob() |
Convert a wide character to a single byte character |
wctombr() |
Convert wide to multibyte characters restartably |
wmemcmp() |
Compare wide characters in memory |
wmemcpy() |
Copy wide character memory |
wmemmove() |
Copy wide character memory, potentially overlapping |
wprintf() |
Formatted wide output |
wscanf() |
Formatted wide input |
These are the wide character variants of the functions found in <stdio.h>
.
Remember that you can’t mix-and-match multibyte output functions (like printf()
) with wide character output functions (like wprintf()
). The output stream has an orientation to either multibyte or wide that gets set on the first I/O call to that stream. (Or it can be set with fwide()
.)
So choose one or the other and stick with it.
And you can specify wide character constants and string literals by prefixing L
to the front of it:
wchar_t *s = L"Hello, world!";
wchar_t c = L'B';
This header also introduces a type wint_t
that is used by the character I/O functions. It’s a type that can hold any single wide character, but also the macro WEOF
to indicate wide end-of-file.
Finally, a note on the “restartable” functions that are included here. When conversion is happening, some encodings require C to keep track of some state about the progress of the conversion so far.
For a lot of the functions, C uses an internal variable for the state that is shared between function calls. The problem is if you’re writing multithreaded code, this state might get trampled by other threads.
To avoid this, each thread needs to maintain its own state in a variable of the opaque type mbstate_t
. And the “restartable” functions allow you to pass in this state so that each thread can use their own.
wprintf()
, fwprintf()
, swprintf()
Formatted output with a wide string
#include <stdio.h> // For fwprintf()
#include <wchar.h>
int wprintf(const wchar_t * restrict format, ...);
int fwprintf(FILE * restrict stream, const wchar_t * restrict format, ...);
int swprintf(wchar_t * restrict s, size_t n,
const wchar_t * restrict format, ...);
These are the wide versions of printf()
, fprintf()](#man-printf), and [
sprintf()`.
See those pages for exact substantial usage.
These are the same except the format
string is a wide character string instead of a multibyte string.
And that swprintf()
is analogous to snprintf()
in that they both take the size of the destination array as an argument.
And one more thing: the precision specified for a %s
specifier corresponds to the number of wide characters printed, not the number of bytes. If you know of other difference, let me know.
Returns the number of wide characters outputted, or -1
if there’s an error.
#include <stdio.h>
#include <wchar.h>
int main(void)
{
char *mbs = "multibyte";
wchar_t *ws = L"wide";
wprintf(L"We're all wide for %s and %ls!\n", mbs, ws);
double pi = 3.14159265358979;
wprintf(L"pi = %f\n", pi);
}
Output:
We're all wide for multibyte and wide! pi = 3.141593
wscanf()
fwscanf()
swscanf()
Scan a wide stream or wide string for formatted input
#include <stdio.h> // for fwscanf()
#include <wchar.h>
int wscanf(const wchar_t * restrict format, ...);
int fwscanf(FILE * restrict stream, const wchar_t * restrict format, ...);
int swscanf(const wchar_t * restrict s, const wchar_t * restrict format, ...);
These are the wide variants of scanf()
, fscanf()
, and sscanf()
.
See the scanf()
page for all the details.
Returns the number of items successfully scanned, or EOF
on some kind of input failure.
#include <stdio.h>
#include <wchar.h>
int main(void)
{
int quantity;
wchar_t item[100];
wprintf(L"Enter \"quantity: item\"\n");
if (wscanf(L"%d:%99ls", &quantity, item) != 2)
wprintf(L"Malformed input!\n");
else
wprintf(L"You entered: %d %ls\n", quantity, item);
}
Output (input of 12: apples
):
Enter "quantity: item"
12: apples You entered: 12 apples
vwprintf()
vfwprintf()
vswprintf()
wprintf()
variants using variable argument lists (va_list
)
#include <stdio.h> // For vfwprintf()
#include <stdarg.h>
#include <wchar.h>
int vwprintf(const wchar_t * restrict format, va_list arg);
int vswprintf(wchar_t * restrict s, size_t n,
const wchar_t * restrict format, va_list arg);
int vfwprintf(FILE * restrict stream, const wchar_t * restrict format,
va_list arg);
These functions are the wide character variants of the vprintf()
, functions. You can refer to that reference page for more details.
Returns the number of wide characters stored, or a negative value on error.
In this example, we make our own version of wprintf()
called wlogger()
that timestamps output. Notice how the calls to wlogger()
have all the bells and whistles of wprintf()
.
#include <stdarg.h>
#include <wchar.h>
#include <time.h>
int wlogger(wchar_t *format, ...)
{
va_list va;
time_t now_secs = time(NULL);
struct tm *now = gmtime(&now_secs);
// Output timestamp in format "YYYY-MM-DD hh:mm:ss : "
wprintf(L"%04d-%02d-%02d %02d:%02d:%02d : ",
now->tm_year + 1900, now->tm_mon + 1, now->tm_mday,
now->tm_hour, now->tm_min, now->tm_sec);
va_start(va, format);
int result = vwprintf(format, va);
va_end(va);
wprintf(L"\n");
return result;
}
int main(void)
{
int x = 12;
float y = 3.2;
wlogger(L"Hello!");
wlogger(L"x = %d and y = %.2f", x, y);
}
Output:
2021-03-30 04:25:49 : Hello! 2021-03-30 04:25:49 : x = 12 and y = 3.20
vwscanf()
, vfwscanf()
, vswscanf()
wscanf()
variants using variable argument lists (va_list
)
#include <stdio.h> // For vfwscanf()
#include <stdarg.h>
#include <wchar.h>
int vwscanf(const wchar_t * restrict format, va_list arg);
int vfwscanf(FILE * restrict stream, const wchar_t * restrict format,
va_list arg);
int vswscanf(const wchar_t * restrict s, const wchar_t * restrict format,
va_list arg);
These are the wide counterparts to the vscanf()
collection of functions. See their reference page for details.
Returns the number of items successfully scanned, or EOF
on some kind of input failure.
I have to admit I was wracking my brain to think of when you’d ever want to use this. The best example I could find was one on Stack Overflow77 that error-checks the return value from scanf()
against the expected. A variant of that is shown below.
#include <stdarg.h>
#include <wchar.h>
#include <assert.h>
int error_check_wscanf(int expected_count, wchar_t *format, ...)
{
va_list va;
va_start(va, format);
int count = vwscanf(format, va);
va_end(va);
// This line will crash the program if the condition is false:
assert(count == expected_count);
return count;
}
int main(void)
{
int a, b;
float c;
error_check_wscanf(3, L"%d, %d/%f", &a, &b, &c);
error_check_wscanf(2, L"%d", &a);
}
getwc()
fgetwc()
getwchar()
Get a wide character from an input stream
#include <stdio.h> // For getwc() and fgetwc()
#include <wchar.h>
wint_t getwchar(void);
wint_t getwc(FILE *stream);
wint_t fgetwc(FILE *stream);
These are the wide variants of fgetc()
.
fgetwc()
and getwc()
are identical except that getwc()
might be implemented as a macro and is allowed to evaluate stream
multiple times.
getwchar()
is identical to getwc()
with stream
set to stdin
.
I don’t know why you’d ever use getwc()
instead of fgetwc()
, but if anyone knows, drop me a line.
Returns the next wide character in the input stream. Return WEOF
on end-of-file or error.
If an I/O error occurs, the error flag is also set on the stream.
If an invalid byte sequence is encountered, errno
is set to ILSEQ
.
Reads all the characters from a file, outputting only the letter ’b’s it finds in the file:
#include <stdio.h>
#include <wchar.h>
int main(void)
{
FILE *fp;
wint_t c;
fp = fopen("datafile.txt", "r"); // error check this!
// this while-statement assigns into c, and then checks against EOF:
while((c = fgetc(fp)) != WEOF)
if (c == L'b')
fputwc(c, stdout);
fclose(fp);
}
fgetws()
Read a wide string from a file
#include <stdio.h>
#include <wchar.h>
wchar_t *fgetws(wchar_t * restrict s, int n, FILE * restrict stream);
This is the wide version of fgets()
. See its reference page for details.
A wide NUL
character is used to terminate the string.
Returns s
on success, or a NULL
pointer on end-of-file or error.
The following example reads lines from a file and prepends them with numbers:
#include <stdio.h>
#include <wchar.h>
#define BUF_SIZE 1024
int main(void)
{
FILE *fp;
wchar_t buf[BUF_SIZE];
fp = fopen("textfile.txt", "r"); // error check this!
int line_count = 0;
while ((fgetws(buf, BUF_SIZE, fp)) != NULL)
wprintf(L"%04d: %ls", ++line_count, buf);
fclose(fp);
}
Example output for a file with these lines in them (without the prepended numbers):
0001: line 1
0002: line 2
0003: something 0004: line 4
putwchar()
putwc()
fputwc()
Write a single wide character to the console or to a file
#include <stdio.h> // For putwc() and fputwc()
#include <wchar.h>
wint_t putwchar(wchar_t c);
wint_t putwc(wchar_t c, FILE *stream);
wint_t fputwc(wchar_t c, FILE *stream);
These are the wide character equivalents to the ‘fputc()’ group of functions. You can find more information ‘in that reference section’.
fputwc()
and putwc()
are identical except that putwc()
might be implemented as a macro and is allowed to evaluate stream
multiple times.
putwchar()
is identical to putwc()
with stream
set to stdin
.
I don’t know why you’d ever use putwc()
instead of fputwc()
, but if anyone knows, drop me a line.
Returns the wide character written, or WEOF
on error.
If it’s an I/O error, the error flag will be set for the stream.
If it’s an encoding error, errno
will be set to EILSEQ
.
Read all characters from a file, outputting only the letter ’b’s it finds in the file:
#include <stdio.h>
#include <wchar.h>
int main(void)
{
FILE *fp;
wint_t c;
fp = fopen("datafile.txt", "r"); // error check this!
// this while-statement assigns into c, and then checks against EOF:
while((c = fgetc(fp)) != WEOF)
if (c == L'b')
fputwc(c, stdout);
fclose(fp);
}
fputws()
Write a wide string to a file
#include <stdio.h>
#include <wchar.h>
int fputws(const wchar_t * restrict s, FILE * restrict stream);
This is the wide version of fputs()
.
Pass in a wide string and an output stream, and it will so be written.
Returns a non-negative value on success, or EOF
on error.
fwide()
Get or set the orientation of the stream
#include <stdio.h>
#include <wchar.h>
int fwide(FILE *stream, int mode);
Streams can be either wide-oriented (meaning the wide functions are in use) or byte-oriented (that the regular multibyte functions are in use). Or, before an orientation is chosen, unoriented.
There are two ways to set the orientation of an unoriented stream:
Implicitly: just use a function like printf()
(byte oriented) or wprintf()
(wide oriented), and the orientation will be set.
Explicitly: use this function to set it.
You can set the orientation for the stream by passing different numbers to mode
:
mode |
Description |
---|---|
0 |
Do not alter the orientation |
-1 |
Set stream to byte-oriented |
1 |
Set stream to wide-oriented |
(I said -1
and 1
there, but really it could be any positive or negative number.)
Most people choose the wide or byte functions (printf()
or wprintf()
) and just start using them and never use fwide()
to set the orientation.
And once the orientation is set, you can’t change it. So you can’t use fwide()
for that, either.
So what can you use it for?
You can test to see what orientation a stream is in by passing 0
as the mode
and checking the return value.
Returns greater than zero if the stream is wide-oriented.
Returns less than zero if the stream is byte-oriented.
Returns zero if the stream is unoriented.
Example setting to byte-oriented:
#include <stdio.h>
#include <wchar.h>
int main(void)
{
printf("Hello world!\n"); // Implicitly set to byte
int mode = fwide(stdout, 0);
printf("Stream is %s-oriented\n", mode < 0? "byte": "wide");
}
Output:
Hello world! Stream is byte-oriented
Example setting to wide-oriented:
#include <stdio.h>
#include <wchar.h>
int main(void)
{
wprintf(L"Hello world!\n"); // Implicitly set to wide
int mode = fwide(stdout, 0);
wprintf(L"Stream is %ls-oriented\n", mode < 0? L"byte": L"wide");
}
Output:
Hello world! Stream is wide-oriented
ungetwc()
Pushes a wide character back into the input stream
#include <stdio.h>
#include <wchar.h>
wint_t ungetwc(wint_t c, FILE *stream);
This is the wide character variant of ungetc()
.
It performs the reverse operation of fgetwc()
, pushing a character back on the input stream.
The spec guarantees you can do this one time in a row. You can probably do it more times, but it’s up to the implementation. If you do too many calls without an intervening read, an error could be returned.
Setting the file position discards any characters pushed by ungetwc()
without being subsequently read.
The end-of-file flag is cleared after a successful call.
Returns the value of the pushed character on success, or WEOF
on failure.
This example reads a piece of punctuation, then everything after it up to the next piece of punctuation. It returns the leading punctuation, and stores the rest in a string.
#include <stdio.h>
#include <wctype.h>
#include <wchar.h>
wint_t read_punctstring(FILE *fp, wchar_t *s)
{
wint_t origpunct, c;
origpunct = fgetwc(fp);
if (origpunct == WEOF) // return EOF on end-of-file
return WEOF;
while (c = fgetwc(fp), !iswpunct(c) && c != WEOF)
*s++ = c; // save it in the string
*s = L'\0'; // nul-terminate the string
// if we read punctuation last, ungetc it so we can fgetc it next
// time:
if (iswpunct(c))
ungetwc(c, fp);
return origpunct;
}
int main(void)
{
wchar_t s[128];
wint_t c;
while ((c = read_punctstring(stdin, s)) != WEOF) {
wprintf(L"%lc: %ls\n", c, s);
}
}
Sample Input:
!foo#bar*baz
Sample output:
!: foo
#: bar *: baz
wcstod()
wcstof()
wcstold()
Convert a wide string to a floating point number
#include <wchar.h>
double wcstod(const wchar_t * restrict nptr, wchar_t ** restrict endptr);
float wcstof(const wchar_t * restrict nptr, wchar_t ** restrict endptr);
long double wcstold(const wchar_t * restrict nptr, wchar_t ** restrict endptr);
These are the wide counterparts to the strtod()
family of functions. See their reference pages for details.
Returns the string converted to a floating point value.
Returns 0
if there’s no valid number in the string.
On overflow, returns an apporpriately-signed HUGE_VAL
, HUGE_VALF
. or HUGE_VALL
depending on the return type, and errno
is set to ERANGE
.
On underflow, returns a number no greater than the smallest normalized positive number, appropriately signed. The implemention might set errno
to ERANGE
.
#include <wchar.h>
int main(void)
{
wchar_t *inp = L" 123.4567beej";
wchar_t *badchar;
double val = wcstod(inp, &badchar);
wprintf(L"Converted string to %f\n", val);
wprintf(L"Encountered bad characters: %ls\n", badchar);
val = wcstod(L"987.654321beej", NULL);
wprintf(L"Ignoring bad chars: %f\n", val);
val = wcstod(L"11.2233", &badchar);
if (*badchar == L'\0')
wprintf(L"No bad chars: %f\n", val);
else
wprintf(L"Found bad chars: %f, %ls\n", val, badchar);
}
Output:
Converted string to 123.456700
Encountered bad characters: beej
Ignoring bad chars: 987.654321 No bad chars: 11.223300
wcstol()
wcstoll()
wcstoul()
wcstoull()
Convert a wide string to an integer value
#include <wchar.h>
long int wcstol(const wchar_t * restrict nptr,
wchar_t ** restrict endptr, int base);
long long int wcstoll(const wchar_t * restrict nptr,
wchar_t ** restrict endptr, int base);
unsigned long int wcstoul(const wchar_t * restrict nptr,
wchar_t ** restrict endptr, int base);
unsigned long long int wcstoull(const wchar_t * restrict nptr,
wchar_t ** restrict endptr, int base);
These are the wide counterparts to the strtol()
family of functions, so see their reference pages for the details.
Returns the integer value of the string.
If nothing can be found, 0
is returned.
If the result is out of range, the value returned is one of LONG_MIN
, LONG_MAX
, LLONG_MIN
, LLONG_MAX
, ULONG_MAX
or ULLONG_MAX
, as appropriate. And errno
is set to ERANGE
.
#include <wchar.h>
int main(void)
{
// All output in decimal (base 10)
wprintf(L"%ld\n", wcstol(L"123", NULL, 0)); // 123
wprintf(L"%ld\n", wcstol(L"123", NULL, 10)); // 123
wprintf(L"%ld\n", wcstol(L"101010", NULL, 2)); // binary, 42
wprintf(L"%ld\n", wcstol(L"123", NULL, 8)); // octal, 83
wprintf(L"%ld\n", wcstol(L"123", NULL, 16)); // hex, 291
wprintf(L"%ld\n", wcstol(L"0123", NULL, 0)); // octal, 83
wprintf(L"%ld\n", wcstol(L"0x123", NULL, 0)); // hex, 291
wchar_t *badchar;
long int x = wcstol(L" 1234beej", &badchar, 0);
wprintf(L"Value is %ld\n", x); // Value is 1234
wprintf(L"Bad chars at \"%ls\"\n", badchar); // Bad chars at "beej"
}
Output:
123
123
42
83
291
83
291
Value is 1234 Bad chars at "beej"
wcstod()
, strtol()
, errno
, wcstoimax()
, wcstoumax()
wcscpy()
wcsncpy()
Copy a wide string
#include <wchar.h>
wchar_t *wcscpy(wchar_t * restrict s1, const wchar_t * restrict s2);
wchar_t *wcsncpy(wchar_t * restrict s1,
const wchar_t * restrict s2, size_t n);
These are the wide versions of strcpy()
and strncpy()
.
They’ll copy a string up to a wide NUL. Or, in the case of the safer wcsncpy()
, until then or until n
wide characters are copied.
If the string in s1
is shorter than n
, wcsncpy()
will pad s2
with wide NUL characters until the n
th wide character is reached.
Even though wcsncpy()
is safer because it will never overrun the end of s2
(assuming you set n
correctly), it’s still unsafe a NUL is not found in s1
in the first n
characters. In that case, s2
will not be NUL-terminated. Always make sure n
is greater than the string length of s1
!
Returns s1
.
#include <wchar.h>
int main(void)
{
wchar_t *s1 = L"Hello!";
wchar_t s2[10];
wcsncpy(s2, s1, 10);
wprintf(L"\"%ls\"\n", s2); // "Hello!"
}
wmemcpy()
, wmemmove()
strcpy()
, strncpy()
wmemcpy()
wmemmove()
Copy wide characters
#include <wchar.h>
wchar_t *wmemcpy(wchar_t * restrict s1,
const wchar_t * restrict s2, size_t n);
wchar_t *wmemmove(wchar_t *s1, const wchar_t *s2, size_t n);
These are the wide versions of memcpy()
and memmove()
.
They copy n
wide characters from s2
to s1
.
They’re the same except that wmemmove()
is guaranteed to work with overlapping memory regions, and wmemcpy()
is not.
Both functions return the pointer s1
.
#include <wchar.h>
int main(void)
{
wchar_t s[100] = L"Goats";
wchar_t t[100];
wmemcpy(t, s, 6); // Copy non-overlapping memory
wmemmove(s + 2, s, 6); // Copy overlapping memory
wprintf(L"s is \"%ls\"\n", s);
wprintf(L"t is \"%ls\"\n", t);
}
Output:
s is "GoGoats" t is "Goats"
wcscpy()
, wcsncpy()
, memcpy()
, memmove()
wcscat()
wcsncat()
Concatenate wide strings
#include <wchar.h>
wchar_t *wcscat(wchar_t * restrict s1, const wchar_t * restrict s2);
wchar_t *wcsncat(wchar_t * restrict s1,
const wchar_t * restrict s2, size_t n);
These are the wide variants of strcat()
and strncat()
.
They concatenate s2
onto the end of s1
.
They’re the same except wcsncat()
gives you the option to limit the number of wide characters appended.
Note that wcsncat()
always adds a NUL terminator to the end, even if n
characters were appended. So be sure to leave room for that.
Both functions return the pointer s1
.
#include <wchar.h>
int main(void)
{
wchar_t dest[30] = L"Hello";
wchar_t *src = L", World!";
wchar_t numbers[] = L"12345678";
wprintf(L"dest before strcat: \"%ls\"\n", dest); // "Hello"
wcscat(dest, src);
wprintf(L"dest after strcat: \"%ls\"\n", dest); // "Hello, world!"
wcsncat(dest, numbers, 3); // strcat first 3 chars of numbers
wprintf(L"dest after strncat: \"%ls\"\n", dest); // "Hello, world!123"
}
wcscmp()
, wcsncmp()
, wmemcmp()
Compare wide strings or memory
#include <wchar.h>
int wcscmp(const wchar_t *s1, const wchar_t *s2);
int wcsncmp(const wchar_t *s1, const wchar_t *s2, size_t n);
int wmemcmp(const wchar_t *s1, const wchar_t *s2, size_t n);
These are the wide variants of memcmp()
, strcmp()
, and strncmp()
.
wcscmp()
and wcsncmp()
both compare strings until a NUL character.
wcsncmp()
also has the additional restriction that it will only compare the first n
characters.
wmemcmp()
is like wcsncmp()
except it won’t stop at a NUL.
The comparison is done against the character value (which might (or might not) be its Unicode code point).
Returns zero if both regions are equal.
Returns a negative number if the region pointed to by s1
is less than s2
.
Returns a positive number if the region pointed to by s1
is greater than s2
.
#include <wchar.h>
int main(void)
{
wchar_t *s1 = L"Muffin";
wchar_t *s2 = L"Muffin Sandwich";
wchar_t *s3 = L"Muffin";
wprintf(L"%d\n", wcscmp(L"Biscuits", L"Kittens")); // <0 since 'B' < 'K'
wprintf(L"%d\n", wcscmp(L"Kittens", L"Biscuits")); // >0 since 'K' > 'B'
if (wcscmp(s1, s2) == 0)
wprintf(L"This won't get printed because the strings differ\n");
if (wcscmp(s1, s3) == 0)
wprintf(L"This will print because s1 and s3 are the same\n");
// this is a little weird...but if the strings are the same, it'll
// return zero, which can also be thought of as "false". Not-false
// is "true", so (!wcscmp()) will be true if the strings are the
// same. yes, it's odd, but you see this all the time in the wild
// so you might as well get used to it:
if (!wcscmp(s1, s3))
wprintf(L"The strings are the same!\n");
if (!wcsncmp(s1, s2, 6))
wprintf(L"The first 6 characters of s1 and s2 are the same\n");
}
Output:
-1
1
This will print because s1 and s3 are the same
The strings are the same! The first 6 characters of s1 and s2 are the same
wcscoll()
, memcmp()
, strcmp()
, strncmp()
wcscoll()
Compare two wide strings accounting for locale
#include <wchar.h>
int wcscoll(const wchar_t *s1, const wchar_t *s2);
This is the wide version of strcoll()
. See that reference page
for details.
This is slower than wcscmp()
, so only use it if you need the locale-specific compare.
Returns zero if both regions are equal in this locale.
Returns a negative number if the region pointed to by s1
is less than s2
in this locale.
Returns a positive number if the region pointed to by s1
is greater than s2
in this locale.
#include <wchar.h>
#include <locale.h>
int main(void)
{
setlocale(LC_ALL, "");
// If your source character set doesn't support "é" in a string
// you can replace it with `\u00e9`, the Unicode code point
// for "é".
wprintf(L"%d\n", wcscmp(L"é", L"f")); // Reports é > f, yuck.
wprintf(L"%d\n", wcscoll(L"é", L"f")); // Reports é < f, yay!
}
wcscmp()
, wcsxfrm()
, strcoll()
wcsxfrm()
Transform a wide string for comparing based on locale
#include <wchar.h>
size_t wcsxfrm(wchar_t * restrict s1,
const wchar_t * restrict s2, size_t n);
This is the wide variant of strxfrm()
. See that reference page for details.
Returns the length of the transformed wide string in wide characters.
If the return value is greater than n
, all bets are off for the result in s1
.
#include <wchar.h>
#include <locale.h>
#include <stdlib.h>
// Transform a string for comparison, returning a malloc'd
// result
wchar_t *get_xfrm_str(wchar_t *s)
{
int len = wcsxfrm(NULL, s, 0) + 1;
wchar_t *d = malloc(len * sizeof(wchar_t));
wcsxfrm(d, s, len);
return d;
}
// Does half the work of a regular wcscoll() because the second
// string arrives already transformed.
int half_wcscoll(wchar_t *s1, wchar_t *s2_transformed)
{
wchar_t *s1_transformed = get_xfrm_str(s1);
int result = wcscmp(s1_transformed, s2_transformed);
free(s1_transformed);
return result;
}
int main(void)
{
setlocale(LC_ALL, "");
// Pre-transform the string to compare against
wchar_t *s = get_xfrm_str(L"éfg");
// Repeatedly compare against "éfg"
wprintf(L"%d\n", half_wcscoll(L"fgh", s)); // "fgh" > "éfg"
wprintf(L"%d\n", half_wcscoll(L"àbc", s)); // "àbc" < "éfg"
wprintf(L"%d\n", half_wcscoll(L"ĥij", s)); // "ĥij" > "éfg"
free(s);
}
Output:
1
-1 1
wcscmp()
, wcscoll()
, strxfrm()
wcschr()
wcsrchr()
Find a wide character in a wide string
#include <wchar.h>
wchar_t *wcschr(const wchar_t *s, wchar_t c);
wchar_t *wcsrchr(const wchar_t *s, wchar_t c);
wchar_t *wmemchr(const wchar_t *s, wchar_t c, size_t n);
These are the wide equivalents to strchr()
, strrchr()
, and memchr()
.
They search for wide characters in a wide string from the front (wcschr()
), the end (wcsrchr()
) or for an arbitrary number of wide characters (wmemchr()
).
All three functions return a pointer to the wide character found, or NULL
if the character, sadly, isn’t found.
#include <wchar.h>
int main(void)
{
// "Hello, world!"
// ^ ^ ^
// A B C
wchar_t *str = L"Hello, world!";
wchar_t *p;
p = wcschr(str, ','); // p now points at position A
p = wcsrchr(str, 'o'); // p now points at position B
p = wmemchr(str, '!', 13); // p now points at position C
// repeatedly find all occurrences of the letter 'B'
str = L"A BIG BROWN BAT BIT BEEJ";
for(p = wcschr(str, 'B'); p != NULL; p = wcschr(p + 1, 'B')) {
wprintf(L"Found a 'B' here: %ls\n", p);
}
}
Output:
Found a 'B' here: BIG BROWN BAT BIT BEEJ
Found a 'B' here: BROWN BAT BIT BEEJ
Found a 'B' here: BAT BIT BEEJ
Found a 'B' here: BIT BEEJ Found a 'B' here: BEEJ
wcsspn()
wcscspn()
Return the length of a wide string consisting entirely of a set of wide characters, or of not a set of wide characters
#include <wchar.h>
size_t wcsspn(const wchar_t *s1, const wchar_t *s2);
size_t wcscspn(const wchar_t *s1, const wchar_t *s2);
The are the wide character counterparts to [strspn()
] (#man-strspn)and strcspn()
.
They compute the length of the string pointed to by s1
consisting entirely of the characters found in s2
. Or, in the case of wcscspn()
, the characters not found in s2
.
The length of the string pointed to by s1
consisting solely of the characters in s2
(in the case of wcsspn()
) or of the characters not in s2
(in th ecase of wcscspn()
).
#include <wchar.h>
int main(void)
{
wchar_t str1[] = L"a banana";
wchar_t str2[] = L"the bolivian navy on maneuvers in the south pacific";
int n;
// how many letters in str1 until we reach something that's not a vowel?
n = wcsspn(str1, L"aeiou");
wprintf(L"%d\n", n); // n == 1, just "a"
// how many letters in str1 until we reach something that's not a, b,
// or space?
n = wcsspn(str1, L"ab ");
wprintf(L"%d\n", n); // n == 4, "a ba"
// how many letters in str2 before we get a "y"?
n = wcscspn(str2, L"y");
wprintf(L"%d\n", n); // n = 16, "the bolivian nav"
}
wcspbrk()
Search a wide string for one of a set of wide characters
#include <wchar.h>
wchar_t *wcspbrk(const wchar_t *s1, const wchar_t *s2);
This is the wide character variant of strpbrk()
.
It finds the first occurrance of any of a set of wide characters in a wide string.
Returns a pointer to the first character in the string s1
that exists in the string s2
.
Or NULL
if none of the characters in s2
can be found in s1
.
#include <wchar.h>
int main(void)
{
// p points here after wcspbrk
// v
wchar_t *s1 = L"Hello, world!";
wchar_t *s2 = L"dow!"; // Match any of these chars
wchar_t *p = wcspbrk(s1, s2); // p points to the o
wprintf(L"%ls\n", p); // "o, world!"
}
wcschr()
, wmemchr()
, strpbrk()
wcsstr()
Find a wide string in another wide string
#include <wchar.h>
wchar_t *wcsstr(const wchar_t *s1, const wchar_t *s2);
This is the wide variant of strstr()
.
It locates a substring in a string.
Returns a pointer to the location in s1
that contains s2
.
Or NULL
if s2
cannot be found in s1
.
#include <wchar.h>
int main(void)
{
wchar_t *str = L"The quick brown fox jumped over the lazy dogs.";
wchar_t *p;
p = wcsstr(str, L"lazy");
wprintf(L"%ls\n", p == NULL? L"null": p); // "lazy dogs."
// p is NULL after this, since the string "wombat" isn't in str:
p = wcsstr(str, L"wombat");
wprintf(L"%ls\n", p == NULL? L"null": p); // "null"
}
wcschr()
, wcsrchr()
, wcsspn()
, wcscspn()
, strstr()
wcstok()
Tokenize a wide string
#include <wchar.h>
wchar_t *wcstok(wchar_t * restrict s1, const wchar_t * restrict s2,
wchar_t ** restrict ptr);
This is the wide version of strtok()
.
And, like that one, it modifies the string s1
. So make a copy of it first if you want to preserve the original.
One key difference is that wcstok()
can be threadsafe because you pass in the pointer ptr
to the current state of the transformation. This gets initializers for you when s1
is initially passed in as non-NULL
. (Subsequent calls with a NULL
s1
cause the state to update.)
#include <wchar.h>
int main(void)
{
// break up the string into a series of space or
// punctuation-separated words
wchar_t str[] = L"Where is my bacon, dude?";
wchar_t *token;
wchar_t *state;
// Note that the following if-do-while construct is very very
// very very very common to see when using strtok().
// grab the first token (making sure there is a first token!)
if ((token = wcstok(str, L".,?! ", &state)) != NULL) {
do {
wprintf(L"Word: \"%ls\"\n", token);
// now, the while continuation condition grabs the
// next token (by passing NULL as the first param)
// and continues if the token's not NULL:
} while ((token = wcstok(NULL, L".,?! ", &state)) != NULL);
}
}
Output:
Word: "Where"
Word: "is"
Word: "my"
Word: "bacon" Word: "dude"
wcslen()
Returns the length of a wide string
#include <wchar.h>
size_t wcslen(const wchar_t *s);
This is the wide counterpart to strlen()
.
Returns the number of wide characters before the wide NUL terminator.
#include <wchar.h>
int main(void)
{
wchar_t *s = L"Hello, world!"; // 13 characters
// prints "The string is 13 characters long.":
wprintf(L"The string is %zu characters long.\n", wcslen(s));
}
wcsftime()
Formatted date and time output
#include <time.h>
#include <wchar.h>
size_t wcsftime(wchar_t * restrict s, size_t maxsize,
const wchar_t * restrict format,
const struct tm * restrict timeptr);
This is the wide equivalent to strftime()
. See that reference page for details.
maxsize
here refers to the maximum number of wide characters that can be in the result string.
If successful, returns the number of wide characters written.
If not successful because the result couldn’t fit in the space alloted, 0
is returned and the contents of the string could be anything.
#include <wchar.h>
#include <time.h>
#define BUFSIZE 128
int main(void)
{
wchar_t s[BUFSIZE];
time_t now = time(NULL);
// %c: print date as per current locale
wcsftime(s, BUFSIZE, L"%c", localtime(&now));
wprintf(L"%ls\n", s); // Sun Feb 28 22:29:00 2021
// %A: full weekday name
// %B: full month name
// %d: day of the month
wcsftime(s, BUFSIZE, L"%A, %B %d", localtime(&now));
wprintf(L"%ls\n", s); // Sunday, February 28
// %I: hour (12 hour clock)
// %M: minute
// %S: second
// %p: AM or PM
wcsftime(s, BUFSIZE, L"It's %I:%M:%S %p", localtime(&now));
wprintf(L"%ls\n", s); // It's 10:29:00 PM
// %F: ISO 8601 yyyy-mm-dd
// %T: ISO 8601 hh:mm:ss
// %z: ISO 8601 time zone offset
wcsftime(s, BUFSIZE, L"ISO 8601: %FT%T%z", localtime(&now));
wprintf(L"%ls\n", s); // ISO 8601: 2021-02-28T22:29:00-0800
}
btowc()
wctob()
Convert a single byte character to a wide character
#include <wchar.h>
wint_t btowc(int c);
int wctob(wint_t c);
These functions convert between single byte characters and wide characters, and vice-versa.
Even though int
s are involved, don’t let this mislead you; they’re effectively converted to unsigned char
s internally.
The characters in the basic character set are guaranteed to be a single byte.
btowc()
returns the single-byte character as a wide character. Returns WEOF
if EOF
is passed in, or if the byte doesn’t correspond to a valid wide character.
wctob()
returns the wide character as a single-byte character. Returns EOF
if WEOF
is passed in, or if the wide character doesn’t correspond to a value single-byte character.
See mbtowc()
and wctomb()
for multibyte to wide character conversion.
#include <wchar.h>
int main(void)
{
wint_t wc = btowc('B'); // Convert single byte to wide char
wprintf(L"Wide character: %lc\n", wc);
unsigned char c = wctob(wc); // Convert back to single byte
wprintf(L"Single-byte character: %c\n", c);
}
Output:
Wide character: B Single-byte character: B
mbsinit()
Test if an mbstate_t
is in the initial conversion state
#include <wchar.h>
int mbsinit(const mbstate_t *ps);
For a given conversion state in a mbstate_t
variable, this function determines if it’s in the initial conversion state.
Returns non-zero if the value pointed to by ps
is in the initial conversion state, or if ps
is NULL
.
Returns 0
if the value pointed to by ps
is not in the initial conversion state.
For me, this example doesn’t do anything exciting, saying that the mbstate_t
variable is always in the initial state. Yay.
But if have a stateful encoding like 2022-JP, try messing around with this to see if you can get into an intermediate state.
This program has a bit of code at the top that reports if your locale’s encoding requires any state.
#include <locale.h> // For setlocale()
#include <string.h> // For memset()
#include <stdlib.h> // For mbtowc()
#include <wchar.h>
int main(void)
{
mbstate_t state;
wchar_t wc[128];
setlocale(LC_ALL, "");
int is_state_dependent = mbtowc(NULL, NULL, 0);
wprintf(L"Is encoding state dependent? %d\n", is_state_dependent);
memset(&state, 0, sizeof state); // Set to initial state
wprintf(L"In initial conversion state? %d\n", mbsinit(&state));
mbrtowc(wc, "B", 5, &state);
wprintf(L"In initial conversion state? %d\n", mbsinit(&state));
}
mbtowc()
, wctomb()
, mbrtowc()
, wcrtomb()
mbrlen()
Compute the number of bytes in a multibyte character, restartably
#include <wchar.h>
size_t mbrlen(const char * restrict s, size_t n, mbstate_t * restrict ps);
This is the restartable version of mblen()
.
It inspects at most n
bytes of the string s
to see how many bytes in this character.
The conversion state is stored in ps
.
This function doesn’t have the functionality of mblen()
that allowed you to query if this character encoding was stateful and to reset the internal state.
Returns the number of bytes required for this multibyte character.
Returns (size_t)(-1)
if the data in s
is not a valid multibyte character.
Returns (size_t)(-2)
if the data is s
is a valid but not complete multibyte character.
If your character set doesn’t support the Euro symbol “€”, substitute the Unicode escape sequence \u20ac
, below.
#include <locale.h> // For setlocale()
#include <string.h> // For memset()
#include <wchar.h>
int main(void)
{
mbstate_t state;
int len;
setlocale(LC_ALL, "");
memset(&state, 0, sizeof state); // Set to initial state
len = mbrlen("B", 5, &state);
wprintf(L"Length of 'B' is %d byte(s)\n", len);
len = mbrlen("€", 5, &state);
wprintf(L"Length of '€' is %d byte(s)\n", len);
}
Output:
Length of 'B' is 1 byte(s) Length of '€' is 3 byte(s)
mbrtowc()
Convert multibyte to wide characters restartably
#include <wchar.h>
size_t mbrtowc(wchar_t * restrict pwc, const char * restrict s,
size_t n, mbstate_t * restrict ps);
This is the restartable counterpart to mbtowc()
.
It converts individual characters from multibyte to wide, tracking the conversion state in the variable pointed to by ps
.
At most n
bytes are inspected for conversion to a wide character.
These two variants are identical and cause the state pointed to by ps
to be set to the initial conversion state:
(NULL, NULL, 0, &state);
mbrtowc(NULL, "", 1, &state); mbrtowc
Also, if you’re just interested in the length in bytes of the multibyte character, you can pass NULL
for pwc
and nothing will be stored for the wide character:
int len = mbrtowc(NULL, "€", 5, &state);
This function doesn’t have the functionality of mbtowc()
that allowed you to query if this character encoding was stateful and to reset the internal state.
On success, returns a positive number corresponding to the number of bytes in the multibyte character.
Returns 0
if the character encoded is a wide NUL character.
Returns (size_t)(-1)
if the data in s
is not a valid multibyte character.
Returns (size_t)(-2)
if the data is s
is a valid but not complete multibyte character.
If your character set doesn’t support the Euro symbol “€”, substitute the Unicode escape sequence \u20ac
, below.
#include <string.h> // For memset()
#include <stdlib.h> // For mbtowc()
#include <locale.h> // For setlocale()
#include <wchar.h>
int main(void)
{
mbstate_t state;
memset(&state, 0, sizeof state);
setlocale(LC_ALL, "");
wprintf(L"State dependency: %d\n", mbtowc(NULL, NULL, 0));
wchar_t wc;
int bytes;
bytes = mbrtowc(&wc, "€", 5, &state);
wprintf(L"L'%lc' takes %d bytes as multibyte char '€'\n", wc, bytes);
}
Output on my system:
State dependency: 0 L'€' takes 3 bytes as multibyte char '€'
wcrtomb()
Convert wide to multibyte characters restartably
#include <wchar.h>
size_t wcrtomb(char * restrict s, wchar_t wc, mbstate_t * restrict ps);
This is the restartable counterpart to wctomb()
.
It converts individual characters from wide to multibyte, tracking the conversion state in the variable pointed to by ps
.
The destination array s
should be at least MB_CUR_MAX
78 bytes in size—you won’t get anything bigger back from this function.
Note that the values in this result array won’t be NUL-terminated.
If you pass a wide NUL character in, the result will contain any bytes needed to restore the conversion state to its initial state followed by a NUL character, and the state pointed to by ps
will be reset to its initial state:
// Reset state
(mb, L'\0', &state) wcrtomb
If you don’t care about the results (i.e. you’re just interested in resetting the state or getting the return value), you can do this by passing NULL
for s
:
(NULL, L'\0', &state); // Reset state
wcrtomb
int byte_count = wctomb(NULL, "X", &state); // Count bytes in 'X'
This function doesn’t have the functionality of wctomb()
that allowed you to query if this character encoding was stateful and to reset the internal state.
On success, returns the number of bytes needed to encode this wide character in the current locale.
If the input is an invalid wide character, errno
will be set to EILSEQ
and the function returns (size_t)(-1)
. If this happens, all bets are off for the conversion state, so you might as well reset it.
If your character set doesn’t support the Euro symbol “€”, substitute the Unicode escape sequence \u20ac
, below.
#include <string.h> // For memset()
#include <stdlib.h> // For mbtowc()
#include <locale.h> // For setlocale()
#include <wchar.h>
int main(void)
{
mbstate_t state;
memset(&state, 0, sizeof state);
setlocale(LC_ALL, "");
wprintf(L"State dependency: %d\n", mbtowc(NULL, NULL, 0));
char mb[10] = {0};
int bytes = wcrtomb(mb, L'€', &state);
wprintf(L"L'€' takes %d bytes as multibyte char '%s'\n", bytes, mb);
}
mbsrtowcs()
Convert a multibyte string to a wide character string restartably
#include <wchar.h>
size_t mbsrtowcs(wchar_t * restrict dst, const char ** restrict src,
size_t len, mbstate_t * restrict ps);
This is the restartable version of mbstowcs()
.
It converts a multibyte string to a wide character string.
The result is put in the buffer pointed to by dst
, and the pointer src
is updated to indicate how much of the string was consumed (unless dst
is NULL
).
At most len
wide characters will be stored.
This also takes a pointer to its own mbstate_t
variable in ps
for holding the conversion state.
You can set dst
to NULL
if you only care about the return value. This could be useful for getting the number of characters in a multibyte string.
In the normal case, the src
string will be consumed up to the NUL character, and the results will be stored in the dst
buffer, including the wide NUL character. In this case, the pointer pointed to by src
will be set to NULL
. And the conversion state will be set to the initial conversion state.
If things go wrong because the source string isn’t a valid sequence of characters, conversion will stop and the pointer pointed to by src
will be set to the address just after the last successfully-translated multibyte character.
If successful, returns the number of characters converted, not including any NUL terminator.
If the multibyte sequence is invalid, the function returns (size_t)(-1)
and errno
is set to EILSEQ
.
Here we’ll convert the string “€5 ± π” into a wide character string:
#include <locale.h> // For setlocale()
#include <string.h> // For memset()
#include <wchar.h>
#define WIDE_STR_SIZE 10
int main(void)
{
const char *mbs = "€5 ± π"; // That's the exact price range
wchar_t wcs[WIDE_STR_SIZE];
setlocale(LC_ALL, "");
mbstate_t state;
memset(&state, 0, sizeof state);
size_t count = mbsrtowcs(wcs, &mbs, WIDE_STR_SIZE, &state);
wprintf(L"Wide string L\"%ls\" is %d characters\n", wcs, count);
}
Output:
Wide string L"€5 ± π" is 6 characters
Here’s another example of using mbsrtowcs()
to get the length in characters of a multibyte string even if the string is full of multibyte characters. This is in contrast to strlen()
, which returns the total number of bytes in the string.
#include <stdio.h> // For printf()
#include <locale.h> // For setlocale()
#include <string.h> // For memset()
#include <stdint.h> // For SIZE_MAX
#include <wchar.h>
size_t mbstrlen(const char *mbs)
{
mbstate_t state;
memset(&state, 0, sizeof state);
return mbsrtowcs(NULL, &mbs, SIZE_MAX, &state);
}
int main(void)
{
setlocale(LC_ALL, "");
char *mbs = "€5 ± π"; // That's the exact price range
printf("\"%s\" is %zu characters...\n", mbs, mbstrlen(mbs));
printf("but it's %zu bytes!\n", strlen(mbs));
}
Output on my system:
"€5 ± π" is 6 characters... but it's 10 bytes!
mbrtowc()
, mbstowcs()
, wcsrtombs()
, strlen()
, errno
wcsrtombs()
Convert a wide character string to a multibyte string restartably
#include <wchar.h>
size_t wcsrtombs(char * restrict dst, const wchar_t ** restrict src,
size_t len, mbstate_t * restrict ps);
If you have a wide character string, you can convert it to a multibyte character string in the current locale using this function.
At most len
bytes of data will be stored in the buffer pointed to by dst
. Conversion will stop just after the NUL terminator is copied, or len
bytes get copied, or some other error occurs.
If dst
is a NULL
pointer, no result is stored. You might do this if you’re just interested in the return value (nominally the number of bytes this would use in a multibyte string, not including the NUL terminator).
If dst
is not a NULL
pointer, the pointer pointed to by src
will get modified to indicate how much of the data was copied. If it contains NULL
at the end, it means everything went well. In this case, the state ps
will be set to the initial conversion state.
If len
was reached or an error occurred, it’ll point one address past dst+len
.
If everything goes well, returns the number of bytes needed for the multibyte string, not counting the NUL terminator.
If any character in the string doesn’t correspond to a valid multibyte character in the currently locale, it returns (size_t)(-1)
and EILSEQ
is stored in errno
.
Here we’ll convert the wide string “€5 ± π” into a multibyte character string:
#include <locale.h> // For setlocale()
#include <string.h> // For memset()
#include <wchar.h>
#define MB_STR_SIZE 20
int main(void)
{
const wchar_t *wcs = L"€5 ± π"; // That's the exact price range
char mbs[MB_STR_SIZE];
setlocale(LC_ALL, "");
mbstate_t state;
memset(&state, 0, sizeof state);
size_t count = wcsrtombs(mbs, &wcs, MB_STR_SIZE, &state);
wprintf(L"Multibyte string \"%s\" is %d bytes\n", mbs, count);
}
Here’s another example helper function that malloc()
s just enough memory to hold the converted string, then returns the result. (Which must later be freed, of course, to prevent leaking memory.)
#include <stdlib.h> // For malloc()
#include <locale.h> // For setlocale()
#include <string.h> // For memset()
#include <stdint.h> // For SIZE_MAX
#include <wchar.h>
char *get_mb_string(const wchar_t *wcs)
{
setlocale(LC_ALL, "");
mbstate_t state;
memset(&state, 0, sizeof state);
// Need a copy of this because wcsrtombs changes it
const wchar_t *p = wcs;
// Compute the number of bytes needed to hold the result
size_t bytes_needed = wcsrtombs(NULL, &p, SIZE_MAX, &state);
// If we didn't get a good full conversion, forget it
if (bytes_needed == (size_t)(-1))
return NULL;
// Allocate space for result
char *mbs = malloc(bytes_needed + 1); // +1 for NUL terminator
// Set conversion state to initial state
memset(&state, 0, sizeof state);
// Convert and store result
wcsrtombs(mbs, &wcs, bytes_needed + 1, &state);
// Make sure things went well
if (wcs != NULL) {
free(mbs);
return NULL;
}
// Success!
return mbs;
}
int main(void)
{
char *mbs = get_mb_string(L"€5 ± π");
wprintf(L"Multibyte result: \"%s\"\n", mbs);
free(mbs);
}
wcrtomb()
, wcstombs()
, mbsrtowcs()
, errno