blob: 4c675933b9a27b914e4642ce4cb034858bead1f2 [file] [log] [blame]
Bill Yi4e213d52015-06-23 13:53:11 -07001.TH SIMD-VITERBI 3
2.SH NAME
3create_viterbi27, set_viterbi27_polynomial, init_viterbi27, update_viterbi27_blk,
4chainback_viterbi27, delete_viterbi27,
5create_viterbi29, set_viterbi_29_polynomial, init_viterbi29, update_viterbi29_blk,
6chainback_viterbi29, delete_viterbi29,
7create_viterbi39, set_viterbi_39_polynomial, init_viterbi39, update_viterbi39_blk,
8chainback_viterbi39, delete_viterbi39,
9create_viterbi615, set_viterbi615_polynomial, init_viterbi615, update_viterbi615_blk,
10chainback_viterbi615, delete_viterbi615 -\ IA32 SIMD-assisted Viterbi decoders
11.SH SYNOPSIS
12.nf
13.ft B
14#include "fec.h"
15void *create_viterbi27(int blocklen);
16void set_viterbi27_polynomial(int polys[2]);
17int init_viterbi27(void *vp,int starting_state);
18int update_viterbi27_blk(void *vp,unsigned char syms[],int nbits);
19int chainback_viterbi27(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
20void delete_viterbi27(void *vp);
21.fi
22.sp
23.nf
24.ft B
25void *create_viterbi29(int blocklen);
26void set_viterbi29_polynomial(int polys[2]);
27int init_viterbi29(void *vp,int starting_state);
28int update_viterbi29_blk(void *vp,unsigned char syms[],int nbits);
29int chainback_viterbi29(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
30void delete_viterbi29(void *vp);
31.fi
32.sp
33.nf
34.ft B
35void *create_viterbi39(int blocklen);
36void set_viterbi39_polynomial(int polys[3]);
37int init_viterbi39(void *vp,int starting_state);
38int update_viterbi39_blk(void *vp,unsigned char syms[],int nbits);
39int chainback_viterbi39(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
40void delete_viterbi39(void *vp);
41.fi
42.sp
43.nf
44.ft B
45void *create_viterbi615(int blocklen);
46void set_viterbi615_polynomial(int polys[6]);
47int init_viterbi615(void *vp,int starting_state);
48int update_viterbi615_blk(void *vp,unsigned char syms[],int nbits);
49int chainback_viterbi615(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
50void delete_viterbi615(void *vp);
51.fi
52.SH DESCRIPTION
53These functions implement high performance Viterbi decoders for four
54convolutional codes: a rate 1/2 constraint length 7 (k=7) code
55("viterbi27"), a rate 1/2 k=9 code ("viterbi29"),
56a rate 1/3 k=9 code ("viterbi39") and a rate 1/6 k=15 code ("viterbi615").
57The decoders use the Intel IA32 or PowerPC SIMD instruction sets, if available, to improve
58decoding speed.
59
60On the IA32 there are three different SIMD instruction sets. The first
61and most common is MMX, introduced on later Intel Pentiums and then on
62the Intel Pentium II and most Intel clones (AMD K6, Transmeta Crusoe,
63etc). SSE was introduced on the Pentium III and later implemented in
64the AMD Athlon 4 (AMD calls it "3D Now! Professional"). Most
65recently, SSE2 was introduced in the Intel Pentium 4, and has been
66adopted by more recent AMD CPUs. The presence of SSE2 implies the
67existence of SSE, which in turn implies MMX.
68
69Altivec is the PowerPC SIMD instruction set. It is roughly comparable
70to SSE2. Altivec was introduced to the general public in the Apple
71Macintosh G4; it is also present in the G5. Altivec is actually a
72Motorola trademark; Apple calls it "Velocity Engine" and IBM calls it
73"VMX". All refer to the same thing.
74
75When built for the IA32 or PPC architectures, the functions
76automatically use the most powerful SIMD instruction set available. If
77no SIMD instructions are available, or if the library is built for a
78non-IA32, non-PPC machine, a portable C version is executed
79instead.
80
81.SH USAGE
82Four versions of each function are provided, one for each code.
83In the following discussion, change "viterbi" to "viterbi27", "viterbi29", "viterbi39"
84or "viterbi615" as desired.
85
86Before Viterbi decoding can begin, an instance must first be created with
87\fBcreate_viterbi()\fR. This function creates and returns a pointer to
88an internal control structure
89containing the path metrics and the branch
90decisions. \fBcreate_viterbi()\fR takes one argument that gives the
91length of the data block in bits. You \fImust not\fR attempt to
92decode a block longer than the length given to \fBcreate_viterbi()\fR.
93
94Before decoding a new frame,
95\fBinit_viterbi()\fR must be called to reset the decoder state.
96It accepts the instance pointer returned by
97\fBcreate_viterbi()\fR and the initial starting state of the
98convolutional encoder (usually 0). If the initial starting state is unknown or
99incorrect, the decoder will still function but the decoded data may be
100incorrect at the start of the block.
101
102Blocks of received symbols are processed with calls to
103\fBupdate_viterbi_blk()\fR. The \fBnbits\fR parameter specifies the
104number of \fIdata bits\fR (not channel symbols) represented by the
105\fBsyms\fR buffer. (For rate 1/2 codes, the number of symbols in
106\fBsyms\fR is twice \fInbits\fR, and so on.)
107Each symbol is expected to range
108from 0 through 255, with 0 corresponding to a "strong 0" and 255
109corresponding to a "strong 1". The caller is responsible for
110determining the proper pairing of input symbols (commonly known as
111decoder symbol phasing).
112
113At the end of the block, the data is recovered with a call to
114\fBchainback_viterbi()\fR. The arguments are the pointer to the
115decoder instance, a pointer to a user-supplied buffer into which the
116decoded data is to be written, the number of data bits (not bytes)
117that are to be decoded, and the terminal state of the convolutional
118encoder at the end of the frame (usually 0). If the terminal state is
119incorrect or unknown, the decoded data bits at the end of the frame
120may be unreliable. The decoded data is written in big-endian order,
121i.e., the first bit in the frame is written into the high order bit of
122the first byte in the buffer. If the frame is not an integral number
123of bytes long, the low order bits of the last byte in the frame will
124be unused.
125
126Note that the decoders assume the use of a tail, i.e., the encoding
127and transmission of a sufficient number of padding bits beyond the end
128of the user data to force the convolutional encoder into the known
129terminal state given to \fBchainback_viterbi()\fR. The tail is
130always one bit less than the constraint length of the code, so the k=7
131code uses 6 tail bits (12 tail symbols), the k=9 code uses 8 tail bits
132(16 tail symbols) and the k=15 code uses 14 tail bits (84 tail
133symbols).
134
135The tail bits are not included in the length arguments to
136\fBcreate_viterbi()\fR and \fBchainback_viterbi()\fR. For example, if
137the block contains 1000 user bits, then this would be the length
138parameter given to \fBcreate_viterbi27()\fR and
139\fBchainback_viterbi27()\fR, and \fBupdate_viterbi27_blk()\fR would be called
140with a total of 2012 symbols - the last 12 encoded symbols
141representing the tail bits.
142
143After the call to \fBchainback_viterbi()\fR, the decoder may be reset
144with a call to \fBinit_viterbi()\fR and another block can be decoded.
145Alternatively, \fBdelete_viterbi()\fR can be called to free all resources
146used by the Viterbi decoder.
147
148The \fBset_viterbi_polynomial()\fR function allows use of other than the default
149code generator polynomials. Although only one set of polynomials are generally
150used with each code, there can are different conventions as to their order and
151symbol polarity, and these functions simplifies their use.
152
153The default polynomials for the viterbi27 routes
154are those of the NASA-JPL convention \fIwithout\fR symbol inversion.
155The NASA-JPL convention normally inverts the first symbol.
156The CCSDS/NASA-GSFC convention swaps the two symbols and inverts the second.
157.sp
158To set the NASA-JPL convention with symbol inversion:
159.sp
160.nf
161.ft B
162int polys[2] = { -V27POLYA,V27POLYB };
163set_viterbi27_polynomial(polys);
164.ft R
165.fi
166.sp
167and to set the CCSDS convention with symbol inversion:
168.sp
169.nf
170.ft B
171int polys[2] = { V27POLYB,-V27POLYA };
172set_viterbi27_polynomial(polys);
173.ft R
174.fi
175.sp
176The default polynomials for the viterbi615 routines
177are those used by the Cassini spacecraft \fIwithout\fR
178symbol inversion. Mars Pathfinder (MPF) and STEREO
179swap the third and fourth polynomials.
180Both conventions invert the
181first, third and fifth symbols. Refer to fec.h for the polynomial constant definitions.
182.sp
183To set the Cassini convention with symbol inversion, do the following:
184
185.nf
186.ft B
187int polys[6] = { -V615POLYA,V615POLYB,-V615POLYC,V615POLYD,-V615POLYE,V615POLYF };
188set_viterbi615_polynomial(polys);
189.ft R
190.fi
191.sp
192and to set the MPF/STEREO convention with symbol inversion:
193.sp
194.nf
195.ft B
196int polys[6] = { -V615POLYA,V615POLYB,-V615POLYD,V615POLYC,-V615POLYE,V615POLYF };
197set_viterbi615_polynomial(polys);
198.ft R
199.fi
200
201For performance reasons, calling this function changes the code
202generator polynomials for \fIall\fR instances of corresponding Viterbi decoder,
203including those already created.
204
205.SH ERROR PERFORMANCE
206These decoders have all been extensively tested and found to provide
207performance consistent with that expected for soft-decision Viterbi
208decoding with 8-bit symbols.
209
210Due to internal differences, the implementations
211vary slightly in error performance. In
212general, the portable C versions exhibit the best error performance
213because they use full-sized branch metrics, and the MMX versions
214exhibit the worst because they use 8-bit branch metrics with modulo
215comparisons. The SSE, SSE2 and Altivec implementations of the r=1/2 k=7 and
216r=1/2 k=9 codes use unsigned
2178-bit branch metrics, and are almost as good as the C versions. The
218r=1/3 k=9 and r=1/6 k=15 codes are implemented with 16-bit path metrics in all SIMD
219versions.
220
221.SH DIRECT ACCESS TO SPECIFIC FUNCTION VERSIONS
222Calling the functions listed above automatically calls the appropriate
223version of the function depending on the CPU type and available SIMD
224instructions. A particular version can also be called directly by
225appending the appropriate suffix to the function name. The available
226suffixes are "_mmx", "_sse", "_sse2", "_av" and "_port", for the MMX,
227SSE, SSE2, Altivec and portable versions, respectively. For example,
228the SSE2 version of the update_viterbi27_blk() function can be invoked
229as update_viterbi27_blk_sse2().
230
231Naturally, the _av functions are only available on the PowerPC and the
232_mmx, _sse and _sse2 versions are only available on IA-32. Calling
233a SIMD-enabled function on a CPU that doesn't support the appropriate
234set of instructions will result in an illegal instruction exception.
235
236.SH RETURN VALUES
237\fBcreate_viterbi\fR returns a pointer to the structure containing
238the decoder state.
239The other functions return -1 on error, 0 otherwise.
240
241.SH AUTHOR & COPYRIGHT
242Phil Karn, KA9Q (karn@ka9q.net)
243
244.SH LICENSE
245This software may be used under the terms of the GNU Limited General Public License (LGPL).
246
247