| .TH SIMD-VITERBI 3 |
| .SH NAME |
| create_viterbi27, set_viterbi27_polynomial, init_viterbi27, update_viterbi27_blk, |
| chainback_viterbi27, delete_viterbi27, |
| create_viterbi29, set_viterbi_29_polynomial, init_viterbi29, update_viterbi29_blk, |
| chainback_viterbi29, delete_viterbi29, |
| create_viterbi39, set_viterbi_39_polynomial, init_viterbi39, update_viterbi39_blk, |
| chainback_viterbi39, delete_viterbi39, |
| create_viterbi615, set_viterbi615_polynomial, init_viterbi615, update_viterbi615_blk, |
| chainback_viterbi615, delete_viterbi615 -\ IA32 SIMD-assisted Viterbi decoders |
| .SH SYNOPSIS |
| .nf |
| .ft B |
| #include "fec.h" |
| void *create_viterbi27(int blocklen); |
| void set_viterbi27_polynomial(int polys[2]); |
| int init_viterbi27(void *vp,int starting_state); |
| int update_viterbi27_blk(void *vp,unsigned char syms[],int nbits); |
| int chainback_viterbi27(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate); |
| void delete_viterbi27(void *vp); |
| .fi |
| .sp |
| .nf |
| .ft B |
| void *create_viterbi29(int blocklen); |
| void set_viterbi29_polynomial(int polys[2]); |
| int init_viterbi29(void *vp,int starting_state); |
| int update_viterbi29_blk(void *vp,unsigned char syms[],int nbits); |
| int chainback_viterbi29(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate); |
| void delete_viterbi29(void *vp); |
| .fi |
| .sp |
| .nf |
| .ft B |
| void *create_viterbi39(int blocklen); |
| void set_viterbi39_polynomial(int polys[3]); |
| int init_viterbi39(void *vp,int starting_state); |
| int update_viterbi39_blk(void *vp,unsigned char syms[],int nbits); |
| int chainback_viterbi39(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate); |
| void delete_viterbi39(void *vp); |
| .fi |
| .sp |
| .nf |
| .ft B |
| void *create_viterbi615(int blocklen); |
| void set_viterbi615_polynomial(int polys[6]); |
| int init_viterbi615(void *vp,int starting_state); |
| int update_viterbi615_blk(void *vp,unsigned char syms[],int nbits); |
| int chainback_viterbi615(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate); |
| void delete_viterbi615(void *vp); |
| .fi |
| .SH DESCRIPTION |
| These functions implement high performance Viterbi decoders for four |
| convolutional codes: a rate 1/2 constraint length 7 (k=7) code |
| ("viterbi27"), a rate 1/2 k=9 code ("viterbi29"), |
| a rate 1/3 k=9 code ("viterbi39") and a rate 1/6 k=15 code ("viterbi615"). |
| The decoders use the Intel IA32 or PowerPC SIMD instruction sets, if available, to improve |
| decoding speed. |
| |
| On the IA32 there are three different SIMD instruction sets. The first |
| and most common is MMX, introduced on later Intel Pentiums and then on |
| the Intel Pentium II and most Intel clones (AMD K6, Transmeta Crusoe, |
| etc). SSE was introduced on the Pentium III and later implemented in |
| the AMD Athlon 4 (AMD calls it "3D Now! Professional"). Most |
| recently, SSE2 was introduced in the Intel Pentium 4, and has been |
| adopted by more recent AMD CPUs. The presence of SSE2 implies the |
| existence of SSE, which in turn implies MMX. |
| |
| Altivec is the PowerPC SIMD instruction set. It is roughly comparable |
| to SSE2. Altivec was introduced to the general public in the Apple |
| Macintosh G4; it is also present in the G5. Altivec is actually a |
| Motorola trademark; Apple calls it "Velocity Engine" and IBM calls it |
| "VMX". All refer to the same thing. |
| |
| When built for the IA32 or PPC architectures, the functions |
| automatically use the most powerful SIMD instruction set available. If |
| no SIMD instructions are available, or if the library is built for a |
| non-IA32, non-PPC machine, a portable C version is executed |
| instead. |
| |
| .SH USAGE |
| Four versions of each function are provided, one for each code. |
| In the following discussion, change "viterbi" to "viterbi27", "viterbi29", "viterbi39" |
| or "viterbi615" as desired. |
| |
| Before Viterbi decoding can begin, an instance must first be created with |
| \fBcreate_viterbi()\fR. This function creates and returns a pointer to |
| an internal control structure |
| containing the path metrics and the branch |
| decisions. \fBcreate_viterbi()\fR takes one argument that gives the |
| length of the data block in bits. You \fImust not\fR attempt to |
| decode a block longer than the length given to \fBcreate_viterbi()\fR. |
| |
| Before decoding a new frame, |
| \fBinit_viterbi()\fR must be called to reset the decoder state. |
| It accepts the instance pointer returned by |
| \fBcreate_viterbi()\fR and the initial starting state of the |
| convolutional encoder (usually 0). If the initial starting state is unknown or |
| incorrect, the decoder will still function but the decoded data may be |
| incorrect at the start of the block. |
| |
| Blocks of received symbols are processed with calls to |
| \fBupdate_viterbi_blk()\fR. The \fBnbits\fR parameter specifies the |
| number of \fIdata bits\fR (not channel symbols) represented by the |
| \fBsyms\fR buffer. (For rate 1/2 codes, the number of symbols in |
| \fBsyms\fR is twice \fInbits\fR, and so on.) |
| Each symbol is expected to range |
| from 0 through 255, with 0 corresponding to a "strong 0" and 255 |
| corresponding to a "strong 1". The caller is responsible for |
| determining the proper pairing of input symbols (commonly known as |
| decoder symbol phasing). |
| |
| At the end of the block, the data is recovered with a call to |
| \fBchainback_viterbi()\fR. The arguments are the pointer to the |
| decoder instance, a pointer to a user-supplied buffer into which the |
| decoded data is to be written, the number of data bits (not bytes) |
| that are to be decoded, and the terminal state of the convolutional |
| encoder at the end of the frame (usually 0). If the terminal state is |
| incorrect or unknown, the decoded data bits at the end of the frame |
| may be unreliable. The decoded data is written in big-endian order, |
| i.e., the first bit in the frame is written into the high order bit of |
| the first byte in the buffer. If the frame is not an integral number |
| of bytes long, the low order bits of the last byte in the frame will |
| be unused. |
| |
| Note that the decoders assume the use of a tail, i.e., the encoding |
| and transmission of a sufficient number of padding bits beyond the end |
| of the user data to force the convolutional encoder into the known |
| terminal state given to \fBchainback_viterbi()\fR. The tail is |
| always one bit less than the constraint length of the code, so the k=7 |
| code uses 6 tail bits (12 tail symbols), the k=9 code uses 8 tail bits |
| (16 tail symbols) and the k=15 code uses 14 tail bits (84 tail |
| symbols). |
| |
| The tail bits are not included in the length arguments to |
| \fBcreate_viterbi()\fR and \fBchainback_viterbi()\fR. For example, if |
| the block contains 1000 user bits, then this would be the length |
| parameter given to \fBcreate_viterbi27()\fR and |
| \fBchainback_viterbi27()\fR, and \fBupdate_viterbi27_blk()\fR would be called |
| with a total of 2012 symbols - the last 12 encoded symbols |
| representing the tail bits. |
| |
| After the call to \fBchainback_viterbi()\fR, the decoder may be reset |
| with a call to \fBinit_viterbi()\fR and another block can be decoded. |
| Alternatively, \fBdelete_viterbi()\fR can be called to free all resources |
| used by the Viterbi decoder. |
| |
| The \fBset_viterbi_polynomial()\fR function allows use of other than the default |
| code generator polynomials. Although only one set of polynomials are generally |
| used with each code, there can are different conventions as to their order and |
| symbol polarity, and these functions simplifies their use. |
| |
| The default polynomials for the viterbi27 routes |
| are those of the NASA-JPL convention \fIwithout\fR symbol inversion. |
| The NASA-JPL convention normally inverts the first symbol. |
| The CCSDS/NASA-GSFC convention swaps the two symbols and inverts the second. |
| .sp |
| To set the NASA-JPL convention with symbol inversion: |
| .sp |
| .nf |
| .ft B |
| int polys[2] = { -V27POLYA,V27POLYB }; |
| set_viterbi27_polynomial(polys); |
| .ft R |
| .fi |
| .sp |
| and to set the CCSDS convention with symbol inversion: |
| .sp |
| .nf |
| .ft B |
| int polys[2] = { V27POLYB,-V27POLYA }; |
| set_viterbi27_polynomial(polys); |
| .ft R |
| .fi |
| .sp |
| The default polynomials for the viterbi615 routines |
| are those used by the Cassini spacecraft \fIwithout\fR |
| symbol inversion. Mars Pathfinder (MPF) and STEREO |
| swap the third and fourth polynomials. |
| Both conventions invert the |
| first, third and fifth symbols. Refer to fec.h for the polynomial constant definitions. |
| .sp |
| To set the Cassini convention with symbol inversion, do the following: |
| |
| .nf |
| .ft B |
| int polys[6] = { -V615POLYA,V615POLYB,-V615POLYC,V615POLYD,-V615POLYE,V615POLYF }; |
| set_viterbi615_polynomial(polys); |
| .ft R |
| .fi |
| .sp |
| and to set the MPF/STEREO convention with symbol inversion: |
| .sp |
| .nf |
| .ft B |
| int polys[6] = { -V615POLYA,V615POLYB,-V615POLYD,V615POLYC,-V615POLYE,V615POLYF }; |
| set_viterbi615_polynomial(polys); |
| .ft R |
| .fi |
| |
| For performance reasons, calling this function changes the code |
| generator polynomials for \fIall\fR instances of corresponding Viterbi decoder, |
| including those already created. |
| |
| .SH ERROR PERFORMANCE |
| These decoders have all been extensively tested and found to provide |
| performance consistent with that expected for soft-decision Viterbi |
| decoding with 8-bit symbols. |
| |
| Due to internal differences, the implementations |
| vary slightly in error performance. In |
| general, the portable C versions exhibit the best error performance |
| because they use full-sized branch metrics, and the MMX versions |
| exhibit the worst because they use 8-bit branch metrics with modulo |
| comparisons. The SSE, SSE2 and Altivec implementations of the r=1/2 k=7 and |
| r=1/2 k=9 codes use unsigned |
| 8-bit branch metrics, and are almost as good as the C versions. The |
| r=1/3 k=9 and r=1/6 k=15 codes are implemented with 16-bit path metrics in all SIMD |
| versions. |
| |
| .SH DIRECT ACCESS TO SPECIFIC FUNCTION VERSIONS |
| Calling the functions listed above automatically calls the appropriate |
| version of the function depending on the CPU type and available SIMD |
| instructions. A particular version can also be called directly by |
| appending the appropriate suffix to the function name. The available |
| suffixes are "_mmx", "_sse", "_sse2", "_av" and "_port", for the MMX, |
| SSE, SSE2, Altivec and portable versions, respectively. For example, |
| the SSE2 version of the update_viterbi27_blk() function can be invoked |
| as update_viterbi27_blk_sse2(). |
| |
| Naturally, the _av functions are only available on the PowerPC and the |
| _mmx, _sse and _sse2 versions are only available on IA-32. Calling |
| a SIMD-enabled function on a CPU that doesn't support the appropriate |
| set of instructions will result in an illegal instruction exception. |
| |
| .SH RETURN VALUES |
| \fBcreate_viterbi\fR returns a pointer to the structure containing |
| the decoder state. |
| The other functions return -1 on error, 0 otherwise. |
| |
| .SH AUTHOR & COPYRIGHT |
| Phil Karn, KA9Q (karn@ka9q.net) |
| |
| .SH LICENSE |
| This software may be used under the terms of the GNU Limited General Public License (LGPL). |
| |
| |