

Generating Implementations of Error Correcting Codes using Kansas Lava

Andy Gill, Tristan Bull, Andrew Farmer, Garrin Kimmel, Ed Komp, Erik Perrins University of Kansas



### Information and Telecommunication Technology Center (ITTC)





# Center collaboration based round focus areas or labs

• Faculty are associated with one or more labs.

Labs for Bioinformatics, Communications and Networks, Computer Systems, e-Learning, Intelligent Systems, Information Assurance, Radar and Remote Sensing.

This project is a collaboration between three labs.





# **HFEC Project**

Forward Error Correction (FEC) codes are part of the migration path in future aeronautical telemetry standards for DoD/ NASA test ranges

Two candidate FEC codes have been selected

 A serially concatenated convolutional code (SCCC) developed at KU

 A low-density parity check (LDPC) code developed at NASA's Jet Propulsion Laboratory (JPL)

Both codes have an information block size of 4096 bits and a rate of 2/3

Hardware prototypes of these systems are needed as the next step in the evolution of the standard Block Diagram of Prototype LDPC Implementation

### Encode



### HFEC Game Plan

We want to generate circuits for implementing LDPC!

- Interesting, practical problem.
- Based on well understood math.
- Real world constraints and requirements.

### Current workflow is

- Implement prototype of transmit / receive in MATLAB,
- $igodoldsymbol{O}$  then re-implement in VHDL,
- O then re-re-implement in VHDL (once requirements are better understood).

### **Research Questions**

- Can we use use functional programming to complement and support the developments being made in MATLAB?
- Can we build a functional program that allow the tradeoffs which require re-implementation to be avoided?
  - Can we gain a stronger assurance of the relationship between the specification and implementation?



# Binary phase-shift keying (BPSK)

### symbol '0' symbol '1'

 $igodoldsymbol{\Theta}$  BPSK encoding picks two phases for the binary symbols 0 and 1



# Binary phase-shift keying (BPSK)



BPSK encoding picks two phases for the binary symbols 0 and 1
 Always a possibility of the received symbol being wrong
 Probability density function changes with different signal to noise ratios



# Binary phase-shift keying (BPSK)



BPSK encoding picks two phases for the binary symbols 0 and 1
 Always a possibility of the received symbol being wrong
 Probability density function changes with different signal to noise ratios











Identity Matrix

# $\begin{array}{c} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 & 0 & 1 \\ \end{array}$

# 1010100100



Parity Generation

# LDPC decoding 1010100100

Original Code





LDPC decoding 0011101011 0101110101 1101001110

Single

parity

check







# LDPC decoding 1010100100

1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0



0

 $\mathbf{O}$ 





Single parity check 

 $\mathbf{0}$ 



### LDPC belief propagation What does a point need for a successful parity check?



### LDPC belief propagation What does a point need for // a successful parity check?



### LDPC belief propagation 1 What does a point need for a successful parity check?

If codeword is 1 the parity of activated vertical siblings should sum to 1

 $\begin{array}{c} 1 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 & 1 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 & 1 & 1 & 0 & 1 & 0 & 1 \\ 1 & 1 & 0 & 1 & 0 & 0 & 1 & 1 & 1 & 0 \end{array}$ 



### LDPC belief propagation <sup>0</sup> What does a point need for a successful parity check?

 $1 0 \sqrt[4]{111} 1 \sqrt[4]{10} 1 \sqrt[4]{10} 1 (1)$  0 0 1 1 1 0 1 0 1 (1)

If codeword is 1 the parity of activated vertical siblings should sum to 1

If codeword is 0 the parity of activated vertical siblings should sum to 0



### LDPC belief propagation 0 What does a point need for a successful parity check?

If codeword is 1 the parity of activated vertical siblings should sum to 1

If codeword is 0 the parity of activated vertical siblings should sum to 0

Depending on how certain you are about your input, you can make suggestions to your siblings.

 $\begin{array}{c} 1 & 0 & \forall & \forall & 1 & \forall & 1 & \forall & 0 \\ 0 & 0 & 1 & 1 & 0 & 1 & 0 & 1 & 1 \end{array}$ 

















## LDPC iterates over the codeword



## LDPC Specification

Algorithm 15.2 Iterative Log Likelihood Decoding Algorithm for Binary LDPC Codes

**Input:** A, the received vector **r**, the maximum # of iterations L, and the channel reliability  $L_c$ . **Initialization:** Set  $\eta_{m,n}^{[0]} = 0$  for all (m, n) with A(m, n) = 1. Set  $\lambda_n^{[0]} = L_c r_n$ Set the loop counter l = 1. **Check node update**: For each (m, n) with A(m, n) = 1: Compute

$$\eta_{m,n}^{[l]} = -2 \tanh^{-1} \left( \prod_{j \in \mathcal{N}_{m,n}} \tanh \left( -\frac{\lambda_j^{[l-1]} - \eta_{m,j}^{[l-1]}}{2} \right) \right)$$
(15.33)

**Bit node update:** For n = 1, 2, ..., N: Compute

$$\lambda_n^{[l]} = L_c r_n + \sum_{m \in \mathcal{M}_n} \eta_{m,n}^{[l]}$$
(15.34)

Make a tentative decision: Set  $\hat{c}_n = 1$  if  $\lambda_n[l] > 0$ , else set  $\hat{c}_n = 0$ . If  $A\hat{c} = 0$ , then **Stop**. Otherwise, if #iterations < L, loop to **Check node update** Otherwise, declare a decoding failure and **Stop**.

From Error Correction Coding: Mathematical Methods and Algorithms, Tood K. Moon, Wiley-Interscience.



### LDPC In Haskell

```
loop options lc n a@(A a rref aRows aCols) ne lam orig lam
                | BitMatrix.cardinality ans == 0 = return (Just c hat)
                | n > iterations options = return Nothing
                | otherwise = loop options lc (succ n) a ne' lam' orig_lam
  where
            c hat :: Matrix x U1
            c hat = (\ c -> if c > 0 then 1 else 0) <$> lam
            ans :: BitMatrix (y, X1)
            ans = a rref `BitMatrix.mm` BitMatrix.fromMatrix (M.unitColumn c hat)
            ne' :: SM.Matrix (y,x) a
            ne' = SM.fromAssocList 0
                   [ ((m,n),
                      -2 * (atanh (product
                                  [tanh ((-((lam ! j) - (ne SM.! (m,j))))/2)
                                           | j <- BitMatrix.toList (aRows ! m)</pre>
                                           , j /= n
                                           ])))
                   (m,n) <- BitMatrix.toList a rref ]</pre>
            lam' :: Matrix x a
            lam' = forAll \ (orig lam ! n)
                     + sum [ ne' SM.! (m,n)
                            | m <- BitMatrix.toList (aCols ! n) ]</pre>
```

### Lava

### Lava is an Embedded Domain Specific Language (EDSL) for describing hardware level concerns

### • Haskell acts as the host language

- Lava is a Library in Haskell
- Lava programs are Haskell programs
- Haskell programs are not necessarily Lava programs

### halfAdder

```
:: (Signal Bool, Signal Bool)
-> (Signal Bool, Signal Bool)
halfAdder (a,b) = (carry,sum)
where carry = and2 (a,b)
sum = xor2 (a,b)
```

### library IEEE; use IEEE.STD\_LOGIC\_1164.ALL; use IEEE.STD\_LOGIC\_ARITH.ALL; use IEEE.STD\_LOGIC\_UNSIGNED.ALL; use work.all;

```
architecture str of half is
signal sig_o0_4 : std_logic;
signal sig_o0_2 : std_logic;
begin
    sig_o0_4 <= i0 XOR i1;
    sig_o0_2 <= i0 AND i1;
    o0 <= sig_o0_2;
    o1 <= sig_o0_4;
end architecture;
```



# Specifications and Implementations

We want to link together our Haskell executable specification with our Lava implementations

Both

Haskell

- Ease of test generation and debugging
- Stepping stones provide placeholders for assurance arguments
- Possibility of a future design methodology

### **Specification**

 Runs on Haskell RTS
 Big step functions
 No concept of clock cycles
 Haskell recursion for control flow
 Mutable state is global

VHDL / ModelSim / FPGAs execution platform
 Fine grain execution (in efficient implementations)
 It is all about the cycles
 Control logic direct datums
 Mutable state is local



Implementation

### Kansas Lava

Started as a teaching tool for FP class
 Used to generate binaries for orbital simulation in summer 2009
 Grown into a primary FP research platform at KU
 We want to address the range of computations that can be mapped over functors

| Classical<br>Haskell                   | Hawk                                | Kansas<br>Lava                          |
|----------------------------------------|-------------------------------------|-----------------------------------------|
| Execution                              | Execution                           | Synthesis & Execution                   |
| Value-based<br>computation model       | Signal-based<br>computation model   | Signal-based<br>computation model       |
| Arbitrary computation<br>over functors | Arbitrary computation over functors | restricted computation<br>over functors |
|                                        |                                     | THE UNIVERSITY OF <b>KANSAS</b>         |

### Family of Kansas Lava Functors


#### Kansas Lava Example

sumMatrix :: (...) => Seq (Matrix x a) -> Seq a
sumMatrix = foldb (+)

- . M.toList
- . unpack
- Primitives like "+" are defined over Seq.
- Kansas Lava programs use functions like unpack to move values so that primitives can act.



Refactor specification in Haskell to architecture, reflecting where computation should take place

- Computation is placed by architecture
- Sub-components should be synthesizable in hardware
- This is the push stage

#### Refine architecture to use Kansas Lava types

- sub-components are joined to make larger synthesizable components
- O This is the pull stage

#### The result is a synthesizable circuit

- This circuit reflects the chosen architecture
- This circuit implements the specification.

#### Spec

Spec



• Refactor specification in Haskell to architecture, reflecting where computation should take place

- Computation is placed by architecture
- Sub-components should be synthesizable in hardware
- This is the push stage

#### Refine architecture to use Kansas Lava types

- sub-components are joined to make larger synthesizable components
- This is the pull stage

#### The result is a synthesizable circuit

- This circuit reflects the chosen architecture
- This circuit implements the specification.



#### Arch



Refactor specification in Haskell to architecture, reflecting where computation should take place

- Computation is placed by architecture
- Sub-components should be synthesizable in hardware
- This is the push stage
- Refine architecture to use Kansas Lava types
  - sub-components are joined to make larger synthesizable components
  - This is the pull stage
  - The result is a synthesizable circuit
    - This circuit reflects the chosen architecture
    - This circuit implements the specification.





















# Divide and Conquer LDPC



# Divide and Conquer LDPC

1110011001 101011010 0011101011 0101110101 110101110 Dividing LDPC horizontally requires performing an addition to combine answers

addition is associative



# Divide and Conquer LDPC

1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 1 0 1 0 0 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 0 1 0 1

Dividing LDPC horizontally requires performing an addition to combine answers

addition is associative

Dividing LPDC vertically is <u>much</u> more challenging There was considerable sharing between vertical siblings



#### min\* – Common Implementation Trick $\bigcirc$ tanh and tanh<sup>-1</sup> are tricky in hardware Common implementation trick is to use the min\* function instead. min\* x y = signum(x) \* signum(y) \* (min (abs x) (abs y)) This costs about 0.2dB min\* when quantized is associative $\bigcirc$ fold min<sup>\*</sup> (...) replaces tanh<sup>-1</sup>( $\Pi$ ...) After **Before** $f_i = -1 * (lam_i - chk_i)$ $f_i = tanh(-0.5 * (lam_i - chk_i))$ $g_i = -0.75 * fold min* (f(...))$ $g_i = -2 * \tanh^{-1}(\prod f(...))$





- - f f f - f - f f

## - - g g g - g - g g













 1
 1
 0
 0
 1
 1
 0
 0
 1

 1
 0
 1
 0
 1
 1
 0
 1
 1
 0
 1

 0
 0
 1
 1
 0
 1
 0
 1
 1
 0

 0
 0
 1
 1
 0
 1
 0
 1
 1
 1

 0
 1
 0
 1
 1
 0
 1
 0
 1
 1

 1
 0
 1
 1
 1
 0
 1
 0
 1
 1
 0
 1

 1
 0
 1
 0
 0
 1
 1
 0
 1
 0
 1
 0
 1
 0
 1
 0
 1
 0
 1
 0
 1
 0
 1
 0
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 <

In JPL LDPC code the matrix is built from many rotated identity matrixes (and empty matrixes)



 1
 1
 0
 0
 1
 1
 0
 0
 1

 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1

 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1
 0
 1
 1</td

In JPL LDPC code the matrix is built from many rotated identity matrixes (and empty matrixes)













### Type of Cell Architecture

:: ([Stream (Array y b)], [Stream (Array x (Maybe b))])
-> ([Stream (Array x (Maybe b))], [Stream (Array y b)])



# Dimensions of a Channel

Our channels are a composition of functors
 Consider our neighbor sharing channel

| [Str          | eam (Arra    | ly x (Maybe            | è b))]   |
|---------------|--------------|------------------------|----------|
| List          | . Stream     | . Array <sub>x</sub> . | Maybe    |
|               |              |                        |          |
| List          | Stream       | Arrayx                 | Maybe    |
| Many<br>Cells | Over<br>Time | Collection             | Optional |



## Specification to Architecture (1)

Fission into driver and execution unit
 Idea is execution unit will become our hardware entity
 Driver can be used to (automatically) generate test vectors.

loop :: Matrix y a -> IO (Maybe (Matrix y U1))





## Specification to Architecture (1)

Fission into driver and execution unit
 Idea is execution unit will become our hardware entity
 Driver can be used to (automatically) generate test vectors.

loop :: Matrix y a -> IO (Maybe (Matrix y U1))



# Specification to Architecture (2)

• First, push memory into computation.







## Specification to Architecture (2)

solve0 :: (SM.Matrix (x,y) a, Matrix y a)
-> (SM.Matrix (x,y) a, Matrix y a)

 $\bigcirc$  First, push memory into computation.

solve1 :: Stream (Matrix y a)
 -> Stream (Matrix y a)



## Specification to Architecture (3)

solve0 :: (SM.Matrix (x,y) a, Matrix y a)
-> (SM.Matrix (x,y) a, Matrix y a)

Push memory into computation (introduce Stream)
 Generalize the size of the solution (List of (Stream of) Array)
 Accept and send data to neighbors (extra in and out arguments)

:: ([Stream (Array y b)], [Stream (Array x (Maybe b))])
-> ([Stream (Array x (Maybe b))], [Stream (Array y b)])



## Specification to Architecture (4)

:: ([Stream (Array y b)], [Stream (Array x (Maybe b))])
-> ([Stream (Array x (Maybe b))], [Stream (Array y b)])

Divide and conquer in both vertical and horizontal
 Zero "cells" are trivial to handle
 Our chosen "cell" size calls the earlier solution
 We are ready to start the implementation





# Architecture to Implementation

List . Stream . Array<sub>x</sub> . Maybe



#### List . Maybe . Seq . Pipesz



### Architecture to Implementation




List

Stream

Array<sub>x</sub>

Maybe



















## **Conclusion and Status**



The methodology of communing functors guided our rewriting towards the types we wanted to use. • We used the worker/wrapper transformation to manual rewrite the types of the functions each time. igodot This should be possible to automate. igodown We could focus on the meaning of values under control logic. Our model (and mix\* model) matched exactly the published bit error rate curves, and implementation came in exactly where expected. We have working simulations of the complete LDPC in Modelsim. No FPGA version of LDPC (yet) We want to make our system more flexible; exploring design decisions.

