Bill Yi | 4e213d5 | 2015-06-23 13:53:11 -0700 | [diff] [blame] | 1 | .TH SIMD-VITERBI 3 |
| 2 | .SH NAME |
| 3 | create_viterbi27, set_viterbi27_polynomial, init_viterbi27, update_viterbi27_blk, |
| 4 | chainback_viterbi27, delete_viterbi27, |
| 5 | create_viterbi29, set_viterbi_29_polynomial, init_viterbi29, update_viterbi29_blk, |
| 6 | chainback_viterbi29, delete_viterbi29, |
| 7 | create_viterbi39, set_viterbi_39_polynomial, init_viterbi39, update_viterbi39_blk, |
| 8 | chainback_viterbi39, delete_viterbi39, |
| 9 | create_viterbi615, set_viterbi615_polynomial, init_viterbi615, update_viterbi615_blk, |
| 10 | chainback_viterbi615, delete_viterbi615 -\ IA32 SIMD-assisted Viterbi decoders |
| 11 | .SH SYNOPSIS |
| 12 | .nf |
| 13 | .ft B |
| 14 | #include "fec.h" |
| 15 | void *create_viterbi27(int blocklen); |
| 16 | void set_viterbi27_polynomial(int polys[2]); |
| 17 | int init_viterbi27(void *vp,int starting_state); |
| 18 | int update_viterbi27_blk(void *vp,unsigned char syms[],int nbits); |
| 19 | int chainback_viterbi27(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate); |
| 20 | void delete_viterbi27(void *vp); |
| 21 | .fi |
| 22 | .sp |
| 23 | .nf |
| 24 | .ft B |
| 25 | void *create_viterbi29(int blocklen); |
| 26 | void set_viterbi29_polynomial(int polys[2]); |
| 27 | int init_viterbi29(void *vp,int starting_state); |
| 28 | int update_viterbi29_blk(void *vp,unsigned char syms[],int nbits); |
| 29 | int chainback_viterbi29(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate); |
| 30 | void delete_viterbi29(void *vp); |
| 31 | .fi |
| 32 | .sp |
| 33 | .nf |
| 34 | .ft B |
| 35 | void *create_viterbi39(int blocklen); |
| 36 | void set_viterbi39_polynomial(int polys[3]); |
| 37 | int init_viterbi39(void *vp,int starting_state); |
| 38 | int update_viterbi39_blk(void *vp,unsigned char syms[],int nbits); |
| 39 | int chainback_viterbi39(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate); |
| 40 | void delete_viterbi39(void *vp); |
| 41 | .fi |
| 42 | .sp |
| 43 | .nf |
| 44 | .ft B |
| 45 | void *create_viterbi615(int blocklen); |
| 46 | void set_viterbi615_polynomial(int polys[6]); |
| 47 | int init_viterbi615(void *vp,int starting_state); |
| 48 | int update_viterbi615_blk(void *vp,unsigned char syms[],int nbits); |
| 49 | int chainback_viterbi615(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate); |
| 50 | void delete_viterbi615(void *vp); |
| 51 | .fi |
| 52 | .SH DESCRIPTION |
| 53 | These functions implement high performance Viterbi decoders for four |
| 54 | convolutional codes: a rate 1/2 constraint length 7 (k=7) code |
| 55 | ("viterbi27"), a rate 1/2 k=9 code ("viterbi29"), |
| 56 | a rate 1/3 k=9 code ("viterbi39") and a rate 1/6 k=15 code ("viterbi615"). |
| 57 | The decoders use the Intel IA32 or PowerPC SIMD instruction sets, if available, to improve |
| 58 | decoding speed. |
| 59 | |
| 60 | On the IA32 there are three different SIMD instruction sets. The first |
| 61 | and most common is MMX, introduced on later Intel Pentiums and then on |
| 62 | the Intel Pentium II and most Intel clones (AMD K6, Transmeta Crusoe, |
| 63 | etc). SSE was introduced on the Pentium III and later implemented in |
| 64 | the AMD Athlon 4 (AMD calls it "3D Now! Professional"). Most |
| 65 | recently, SSE2 was introduced in the Intel Pentium 4, and has been |
| 66 | adopted by more recent AMD CPUs. The presence of SSE2 implies the |
| 67 | existence of SSE, which in turn implies MMX. |
| 68 | |
| 69 | Altivec is the PowerPC SIMD instruction set. It is roughly comparable |
| 70 | to SSE2. Altivec was introduced to the general public in the Apple |
| 71 | Macintosh G4; it is also present in the G5. Altivec is actually a |
| 72 | Motorola trademark; Apple calls it "Velocity Engine" and IBM calls it |
| 73 | "VMX". All refer to the same thing. |
| 74 | |
| 75 | When built for the IA32 or PPC architectures, the functions |
| 76 | automatically use the most powerful SIMD instruction set available. If |
| 77 | no SIMD instructions are available, or if the library is built for a |
| 78 | non-IA32, non-PPC machine, a portable C version is executed |
| 79 | instead. |
| 80 | |
| 81 | .SH USAGE |
| 82 | Four versions of each function are provided, one for each code. |
| 83 | In the following discussion, change "viterbi" to "viterbi27", "viterbi29", "viterbi39" |
| 84 | or "viterbi615" as desired. |
| 85 | |
| 86 | Before Viterbi decoding can begin, an instance must first be created with |
| 87 | \fBcreate_viterbi()\fR. This function creates and returns a pointer to |
| 88 | an internal control structure |
| 89 | containing the path metrics and the branch |
| 90 | decisions. \fBcreate_viterbi()\fR takes one argument that gives the |
| 91 | length of the data block in bits. You \fImust not\fR attempt to |
| 92 | decode a block longer than the length given to \fBcreate_viterbi()\fR. |
| 93 | |
| 94 | Before decoding a new frame, |
| 95 | \fBinit_viterbi()\fR must be called to reset the decoder state. |
| 96 | It accepts the instance pointer returned by |
| 97 | \fBcreate_viterbi()\fR and the initial starting state of the |
| 98 | convolutional encoder (usually 0). If the initial starting state is unknown or |
| 99 | incorrect, the decoder will still function but the decoded data may be |
| 100 | incorrect at the start of the block. |
| 101 | |
| 102 | Blocks of received symbols are processed with calls to |
| 103 | \fBupdate_viterbi_blk()\fR. The \fBnbits\fR parameter specifies the |
| 104 | number of \fIdata bits\fR (not channel symbols) represented by the |
| 105 | \fBsyms\fR buffer. (For rate 1/2 codes, the number of symbols in |
| 106 | \fBsyms\fR is twice \fInbits\fR, and so on.) |
| 107 | Each symbol is expected to range |
| 108 | from 0 through 255, with 0 corresponding to a "strong 0" and 255 |
| 109 | corresponding to a "strong 1". The caller is responsible for |
| 110 | determining the proper pairing of input symbols (commonly known as |
| 111 | decoder symbol phasing). |
| 112 | |
| 113 | At the end of the block, the data is recovered with a call to |
| 114 | \fBchainback_viterbi()\fR. The arguments are the pointer to the |
| 115 | decoder instance, a pointer to a user-supplied buffer into which the |
| 116 | decoded data is to be written, the number of data bits (not bytes) |
| 117 | that are to be decoded, and the terminal state of the convolutional |
| 118 | encoder at the end of the frame (usually 0). If the terminal state is |
| 119 | incorrect or unknown, the decoded data bits at the end of the frame |
| 120 | may be unreliable. The decoded data is written in big-endian order, |
| 121 | i.e., the first bit in the frame is written into the high order bit of |
| 122 | the first byte in the buffer. If the frame is not an integral number |
| 123 | of bytes long, the low order bits of the last byte in the frame will |
| 124 | be unused. |
| 125 | |
| 126 | Note that the decoders assume the use of a tail, i.e., the encoding |
| 127 | and transmission of a sufficient number of padding bits beyond the end |
| 128 | of the user data to force the convolutional encoder into the known |
| 129 | terminal state given to \fBchainback_viterbi()\fR. The tail is |
| 130 | always one bit less than the constraint length of the code, so the k=7 |
| 131 | code uses 6 tail bits (12 tail symbols), the k=9 code uses 8 tail bits |
| 132 | (16 tail symbols) and the k=15 code uses 14 tail bits (84 tail |
| 133 | symbols). |
| 134 | |
| 135 | The tail bits are not included in the length arguments to |
| 136 | \fBcreate_viterbi()\fR and \fBchainback_viterbi()\fR. For example, if |
| 137 | the block contains 1000 user bits, then this would be the length |
| 138 | parameter given to \fBcreate_viterbi27()\fR and |
| 139 | \fBchainback_viterbi27()\fR, and \fBupdate_viterbi27_blk()\fR would be called |
| 140 | with a total of 2012 symbols - the last 12 encoded symbols |
| 141 | representing the tail bits. |
| 142 | |
| 143 | After the call to \fBchainback_viterbi()\fR, the decoder may be reset |
| 144 | with a call to \fBinit_viterbi()\fR and another block can be decoded. |
| 145 | Alternatively, \fBdelete_viterbi()\fR can be called to free all resources |
| 146 | used by the Viterbi decoder. |
| 147 | |
| 148 | The \fBset_viterbi_polynomial()\fR function allows use of other than the default |
| 149 | code generator polynomials. Although only one set of polynomials are generally |
| 150 | used with each code, there can are different conventions as to their order and |
| 151 | symbol polarity, and these functions simplifies their use. |
| 152 | |
| 153 | The default polynomials for the viterbi27 routes |
| 154 | are those of the NASA-JPL convention \fIwithout\fR symbol inversion. |
| 155 | The NASA-JPL convention normally inverts the first symbol. |
| 156 | The CCSDS/NASA-GSFC convention swaps the two symbols and inverts the second. |
| 157 | .sp |
| 158 | To set the NASA-JPL convention with symbol inversion: |
| 159 | .sp |
| 160 | .nf |
| 161 | .ft B |
| 162 | int polys[2] = { -V27POLYA,V27POLYB }; |
| 163 | set_viterbi27_polynomial(polys); |
| 164 | .ft R |
| 165 | .fi |
| 166 | .sp |
| 167 | and to set the CCSDS convention with symbol inversion: |
| 168 | .sp |
| 169 | .nf |
| 170 | .ft B |
| 171 | int polys[2] = { V27POLYB,-V27POLYA }; |
| 172 | set_viterbi27_polynomial(polys); |
| 173 | .ft R |
| 174 | .fi |
| 175 | .sp |
| 176 | The default polynomials for the viterbi615 routines |
| 177 | are those used by the Cassini spacecraft \fIwithout\fR |
| 178 | symbol inversion. Mars Pathfinder (MPF) and STEREO |
| 179 | swap the third and fourth polynomials. |
| 180 | Both conventions invert the |
| 181 | first, third and fifth symbols. Refer to fec.h for the polynomial constant definitions. |
| 182 | .sp |
| 183 | To set the Cassini convention with symbol inversion, do the following: |
| 184 | |
| 185 | .nf |
| 186 | .ft B |
| 187 | int polys[6] = { -V615POLYA,V615POLYB,-V615POLYC,V615POLYD,-V615POLYE,V615POLYF }; |
| 188 | set_viterbi615_polynomial(polys); |
| 189 | .ft R |
| 190 | .fi |
| 191 | .sp |
| 192 | and to set the MPF/STEREO convention with symbol inversion: |
| 193 | .sp |
| 194 | .nf |
| 195 | .ft B |
| 196 | int polys[6] = { -V615POLYA,V615POLYB,-V615POLYD,V615POLYC,-V615POLYE,V615POLYF }; |
| 197 | set_viterbi615_polynomial(polys); |
| 198 | .ft R |
| 199 | .fi |
| 200 | |
| 201 | For performance reasons, calling this function changes the code |
| 202 | generator polynomials for \fIall\fR instances of corresponding Viterbi decoder, |
| 203 | including those already created. |
| 204 | |
| 205 | .SH ERROR PERFORMANCE |
| 206 | These decoders have all been extensively tested and found to provide |
| 207 | performance consistent with that expected for soft-decision Viterbi |
| 208 | decoding with 8-bit symbols. |
| 209 | |
| 210 | Due to internal differences, the implementations |
| 211 | vary slightly in error performance. In |
| 212 | general, the portable C versions exhibit the best error performance |
| 213 | because they use full-sized branch metrics, and the MMX versions |
| 214 | exhibit the worst because they use 8-bit branch metrics with modulo |
| 215 | comparisons. The SSE, SSE2 and Altivec implementations of the r=1/2 k=7 and |
| 216 | r=1/2 k=9 codes use unsigned |
| 217 | 8-bit branch metrics, and are almost as good as the C versions. The |
| 218 | r=1/3 k=9 and r=1/6 k=15 codes are implemented with 16-bit path metrics in all SIMD |
| 219 | versions. |
| 220 | |
| 221 | .SH DIRECT ACCESS TO SPECIFIC FUNCTION VERSIONS |
| 222 | Calling the functions listed above automatically calls the appropriate |
| 223 | version of the function depending on the CPU type and available SIMD |
| 224 | instructions. A particular version can also be called directly by |
| 225 | appending the appropriate suffix to the function name. The available |
| 226 | suffixes are "_mmx", "_sse", "_sse2", "_av" and "_port", for the MMX, |
| 227 | SSE, SSE2, Altivec and portable versions, respectively. For example, |
| 228 | the SSE2 version of the update_viterbi27_blk() function can be invoked |
| 229 | as update_viterbi27_blk_sse2(). |
| 230 | |
| 231 | Naturally, the _av functions are only available on the PowerPC and the |
| 232 | _mmx, _sse and _sse2 versions are only available on IA-32. Calling |
| 233 | a SIMD-enabled function on a CPU that doesn't support the appropriate |
| 234 | set of instructions will result in an illegal instruction exception. |
| 235 | |
| 236 | .SH RETURN VALUES |
| 237 | \fBcreate_viterbi\fR returns a pointer to the structure containing |
| 238 | the decoder state. |
| 239 | The other functions return -1 on error, 0 otherwise. |
| 240 | |
| 241 | .SH AUTHOR & COPYRIGHT |
| 242 | Phil Karn, KA9Q (karn@ka9q.net) |
| 243 | |
| 244 | .SH LICENSE |
| 245 | This software may be used under the terms of the GNU Limited General Public License (LGPL). |
| 246 | |
| 247 | |