blob: 0e38eb3d75561d99f94317ae9f18c3b19095283e [file] [log] [blame]
Alexander Gutkin439f3d12014-02-28 11:33:45 +00001.deEX
2.ift .ft5
3.nf
4..
5.deEE
6.ft1
7.fi
8..
9.TH RUNE 3
10.SH NAME
11runetochar, chartorune, runelen, runenlen, fullrune, utfecpy, utflen, utfnlen, utfrune, utfrrune, utfutf \- rune/UTF conversion
12.SH SYNOPSIS
13.ta \w'\fLchar*xx'u
14.B #include <utf.h>
15.PP
16.B
17int runetochar(char *s, Rune *r)
18.PP
19.B
20int chartorune(Rune *r, char *s)
21.PP
22.B
23int runelen(long r)
24.PP
25.B
26int runenlen(Rune *r, int n)
27.PP
28.B
29int fullrune(char *s, int n)
30.PP
31.B
32char* utfecpy(char *s1, char *es1, char *s2)
33.PP
34.B
35int utflen(char *s)
36.PP
37.B
38int utfnlen(char *s, long n)
39.PP
40.B
41char* utfrune(char *s, long c)
42.PP
43.B
44char* utfrrune(char *s, long c)
45.PP
46.B
47char* utfutf(char *s1, char *s2)
48.SH DESCRIPTION
49These routines convert to and from a
50.SM UTF
51byte stream and runes.
52.PP
53.I Runetochar
54copies one rune at
55.I r
56to at most
57.B UTFmax
58bytes starting at
59.I s
60and returns the number of bytes copied.
61.BR UTFmax ,
62defined as
63.B 3
64in
65.BR <libc.h> ,
66is the maximum number of bytes required to represent a rune.
67.PP
68.I Chartorune
69copies at most
70.B UTFmax
71bytes starting at
72.I s
73to one rune at
74.I r
75and returns the number of bytes copied.
76If the input is not exactly in
77.SM UTF
78format,
79.I chartorune
80will convert to 0x80 and return 1.
81.PP
82.I Runelen
83returns the number of bytes
84required to convert
85.I r
86into
87.SM UTF.
88.PP
89.I Runenlen
90returns the number of bytes
91required to convert the
92.I n
93runes pointed to by
94.I r
95into
96.SM UTF.
97.PP
98.I Fullrune
99returns 1 if the string
100.I s
101of length
102.I n
103is long enough to be decoded by
104.I chartorune
105and 0 otherwise.
106This does not guarantee that the string
107contains a legal
108.SM UTF
109encoding.
110This routine is used by programs that
111obtain input a byte at
112a time and need to know when a full rune
113has arrived.
114.PP
115The following routines are analogous to the
116corresponding string routines with
117.B utf
118substituted for
119.B str
120and
121.B rune
122substituted for
123.BR chr .
124.PP
125.I Utfecpy
126copies UTF sequences until a null sequence has been copied, but writes no
127sequences beyond
128.IR es1 .
129If any sequences are copied,
130.I s1
131is terminated by a null sequence, and a pointer to that sequence is returned.
132Otherwise, the original
133.I s1
134is returned.
135.PP
136.I Utflen
137returns the number of runes that
138are represented by the
139.SM UTF
140string
141.IR s .
142.PP
143.I Utfnlen
144returns the number of complete runes that
145are represented by the first
146.I n
147bytes of
148.SM UTF
149string
150.IR s .
151If the last few bytes of the string contain an incompletely coded rune,
152.I utfnlen
153will not count them; in this way, it differs from
154.IR utflen ,
155which includes every byte of the string.
156.PP
157.I Utfrune
158.RI ( utfrrune )
159returns a pointer to the first (last)
160occurrence of rune
161.I c
162in the
163.SM UTF
164string
165.IR s ,
166or 0 if
167.I c
168does not occur in the string.
169The NUL byte terminating a string is considered to
170be part of the string
171.IR s .
172.PP
173.I Utfutf
174returns a pointer to the first occurrence of
175the
176.SM UTF
177string
178.I s2
179as a
180.SM UTF
181substring of
182.IR s1 ,
183or 0 if there is none.
184If
185.I s2
186is the null string,
187.I utfutf
188returns
189.IR s1 .
190.SH SOURCE
191.B http://swtch.com/plan9port/unix
192.SH SEE ALSO
193.IR utf (7),
194.IR tcs (1)