Novel Systolic Architectures for Realization Type-IV Discrete Sine Transform Using Recursive Algorithm

M.N.Murty¹, S.S.Nayak², B.Padhy³ and B. Joga Rao⁴

¹(Retired Reader in Physics, Flat No.503, 54-11-3/14, Isukhathota, Visakhapatnam -530022, Andhra Pradesh, India.)
²(Department of Physics, Centurion University, Parlakhemundi - 761200, Odisha, India.)
³(Department of Physics, Khallikote Autonomous College, Berhampur - 760001, Odisha, India.)
⁴(Department of Mathematics, Gayatri Vidya Parishad College for Degree and PG courses, Visakhapatnam -530017, Andhra Pradesh, India.)

Abstract: In this paper, a novel recursive algorithm for realizing type-IV discrete sine transform (DST-IV) is proposed. Two linear systolic architectures for realization of this algorithm are presented in this paper. The recursive algorithms apply to arbitrary length algorithms and are appropriate for VLSI implementation.

Keywords: Discrete cosine transform, discrete sine transform, recursive algorithm, systolic architecture.

Date of Submission: 26-02-2020 Date of Acceptance: 09-03-2020

I. Introduction

Discrete transforms play a significant role in digital signal processing. Discrete cosine transform (DCT) and discrete sine transform (DST) are used as key functions in many signal and image processing applications. There are eight types of DCT and DST. Of these, the DCT-II, DCT-IV, and DST-IV have gained popularity. The DCT and DST transform of types I, II, III and IV, form a group of so-called “even” sinusoidal transforms. Much less known is group of so-called “odd” sinusoidal transforms: DCT and DST of types V, VI, VII and VIII.

The DST was first introduced to the signal processing by Jain [1], and several versions of this original DST were later developed by Kekre et al. [2], Jain [3] and Wang et al. [4]. Ever since the introduction of the first version of the DST, the different DSTs have found wide applications in several areas in Digital signal processing (DSP), such as image processing [1,5,6], adaptive digital filtering [7] and interpolation [8]. The performance of DST can be compared to that of the DCT and it may therefore be considered as a viable alternative to the DCT. For images with high correlation, the DCT yields better results; however, for images with a low correlation of coefficients, the DST yields lower bit rates [9]. Yip and Rao [10] have proven that for large sequence length (N ≥ 32) and low correlation coefficient (ρ< 0.6), the DST performs even better than the DCT.

In this paper, a recursive algorithm for computation of type-IV discrete sine transform (DST-IV) is presented and two linear systolic architectures were proposed for realization of this recursive algorithm. The recursive algorithms apply to arbitrary length algorithms and are appropriate for VLSI implementation. The systolic architecture has the following characteristics:

- A massive and non-centralised parallelism
- Local communications
- Synchronous evaluation

The systolic arrays are used in the design and implementation of high performance digital signal processing equipment. Systolic architectures are established as the most popular and dominant class of VLSI structures due to the simplicity of their processing elements (PEs), modularity of their structure, regular and nearest neighbour interconnections between the PEs, high level of pipelinability, small chip area and lower dissipation. In the systolic architectures, the desired data are pumped rhythmically in regular intervals across the PEs for yielding high throughput by fully pipelined processing. The systolic array concept can also be exploited at bit level in the design of individual chips. The highly regular structure of systolic circuits renders them comparatively easy to design and test.

The rest of the paper is organized as follows. The proposed recursive algorithm for DST-IV is presented in Section-II. Two linear systolic architectures for realization of DST-IV are given in Section-III. Conclusion is given in Section-IV.
II. Proposed Recursive Algorithm for DST-IV

The one dimensional type-IV discrete sine transform (DST-IV) for input sequence \( \{x(n); n = 1, 2, ..., N\} \) is defined by

\[
Y(k) = \frac{\sqrt{2}}{\sqrt{N}} \sum_{n=1}^{N} x(n) \sin \left( \frac{(2k-1)(2n-1)\pi}{4N} \right) \\
\text{for } k = 1, 2, ..., N
\]

(1)

The \( Y(k) \) values represent the transformed data. Without loss of generality, the scale factor \( \sqrt{2/N} \) in (1) may be ignored in the rest of the paper. After omitting the scale factor, (1) can be written as

\[
Y(k) = \sum_{n=1}^{N} x(n) \sin \left( \frac{(2k-1)(2n-1)\pi}{4N} \right) \\
\text{for } k = 1, 2, ..., N
\]

(2)

The above expression can be expressed as

\[
Y(k) = \text{Im} \left[ \sum_{n=1}^{N} x(n) e^{i\left(\frac{(2k-1)(2n-1)\pi}{4N}\right)} \right]
\]

(3)

Where, \( \text{Im} \) denotes ‘Imaginary Part of’ and \( i = \sqrt{-1} \). Eqn. (3) can be written as

\[
Y(k) = \text{Im} \left[ \sum_{n=1}^{N} x(n) \left\{ e^{i\left(\frac{(2k-1)n\pi}{2N}\right)} \right\} \right]
\]

\[
\Rightarrow Y(k) = \text{Im} \left[ \alpha_k P(k) \right]
\]

(4)

Where,

\[
\alpha_k = e^{-i\left(\frac{(2k-1)\pi}{2}\right)}
\]

(5)

and

\[
P(k) = \sum_{n=1}^{N} x(n) e^{i\left(\frac{(2k-1)n\pi}{2N}\right)}
\]

(6)

Now, solving the above expression for \( N = 4 \), we get

\[
P(k) = \sum_{n=1}^{4} x(n) e^{i\left(\frac{(2k-1)n\pi}{2}\right)}
\]

\[
= \sum_{n=1}^{4} x(n) \beta_k^N
\]

\[
= x(4)\beta_k^4 + x(3)\beta_k^3 + x(2)\beta_k^2 + x(1)\beta_k
\]

\[
\Rightarrow P(k) = \left( (x(4)\beta_k + x(3))\beta_k + x(2)\beta_k + x(1) \right) \beta_k
\]

(7)

Where,

\[
\beta_k = e^{i\left(\frac{(2k-1)\pi}{2}\right)}
\]

(8)

Hence, in general (7) & (8) can be written as

\[
P(k) = \left( (x(N)\beta_k + x(N-1))\beta_k + x(N-2)\beta_k + \cdots + x(1) \right) \beta_k
\]

(9)

Where,

\[
\beta_k = e^{i\left(\frac{(2k-1)\pi}{2}\right)}
\]

(10)

The recursive relation (9) is called Horner’s rule. The DST-IV can be realized using (9) and (10) in (4).

III. Proposed Systolic Architectures for Realization of DST-IV

(a) **Systolic linear array architecture -I:** The structure of linear -1 is shown in Fig.1. It consists of \( N \) locally connected identical processing elements (PEs). The function of \( k^{th} \) PE is shown in Fig.2. The \( k^{th} \) PE computes \( P(k) \) after \( N \) identical recursions in \( N \) successive time-steps and stores it in its accumulator \( t \). During the next time-step, the accumulator content \( P(E) \) of \( k^{th} \) PE is multiplied with \( \alpha_k \) to yield the DST-IV component \( Y(k) \) as given in (4). Both \( \alpha_k \) and \( \beta_k \) are stored in \( k^{th} \) PE.

Each PE of linear array -1 performs an addition followed by a multiplication. The duration of each time-step is, therefore, the time required for computing an addition and a multiplication. The duration of a time-step will depend on the hardware used in a PE for implementing an addition and a multiplication. In the linear array -1, \( k = 1 \) for \( 1^{st} \) PE, \( k = 2 \) for \( 2^{nd} \) PE, \( k = 3 \) for \( 3^{rd} \) PE and \( k = N \) for \( N^{th} \) PE. The first output \( Y(1) \) is obtained after \( (N+1) \) time-steps from \( 1^{st} \) PE and the rest \( (N-1) \) outputs \( Y(2), Y(3), \ldots, Y(N) \) are obtained from \( 2^{nd}, 3^{rd}, \ldots, N^{th} \) PE respectively in the subsequent \( (N-1) \) time-steps. The duration of a time-step \( T \) is given by \( T = T_\lambda + T_m \), where \( T_\lambda \) and \( T_m \) are the times required for performing an addition and a multiplication respectively.

For example, if \( N = 4 \), the \( k^{th} \) PE \( (k = 1, 2, 3 \text{ and } 4) \) of linear array -1 computes \( P(k) \) given by (7) in 4 time-steps and the output component \( Y(k) \) given by (4) in 5th time-step. The output components \( Y(1), Y(2), Y(3) \) and \( Y(4) \) are obtained one after the other from \( 1^{st} \) PE, \( 2^{nd} \) PE, \( 3^{rd} \) PE and \( 4^{th} \) PE respectively.
Novel Systolic Architectures for Realization Type-IV Discrete Sine Transform Using Recursive...

**Fig.1.** Systolic linear array architecture-1 for computing N-point DST-IV.

\[
\alpha_k = e^{-i(2k-1)\pi/N} \quad \text{and} \quad \beta_k = e^{i(2k-1)\pi/2N} \quad \text{for} \quad k = 1, 2, \ldots, N
\]

If total No. of counts = \( N + 1 \), then

- \( \text{Count} = 0 \)
- \( t \leftarrow X_{in} \)
- \( \text{Count} \leftarrow \text{Count} + 1 \)

end if.

If \( 1 \leq \text{count} \leq N \), then

- \( t \leftarrow t \beta_k + X_{in} \)
- \( X_{out} \leftarrow X_{in} \)
- \( \text{Count} \leftarrow \text{Count} + 1 \)

end if.

**Fig.2.** Function of \( k^{th} \) PE of the systolic architecture shown in Fig.1.

(b) **Systolic linear array architecture -2:** The structure of linear array -2 is shown in Fig.3. It consists of \( N \) locally connected identical PEs (1st PE to Nth PE). The Nth PE is connected to the last (\( N + 1 \))th PE. The function of each PE (1st PE to Nth PE) is shown in Fig.4. The first recursion for computing \( P(k) \) according to (9) is implemented in the 1st PE. The successive \((N - 1)\) recursions are implemented in the succeeding \((N - 1)\) PE to Nth PE). The last (\( N+1 \))th PE multiplies the output \( P(k) \) from Nth PE with \( ak \) to yield the \( k^{th} \) DST-IV component \( Y(k) \). The first output \( P(1) \) from Nth PE is obtained after \( N \) time-steps and the rest \( (N-1) \) outputs \( P(2) \) to \( P(N) \) are obtained in subsequent \((N-1)\) time-steps.
\[ \alpha_k = e^{-i(2k-1)\pi / 4N} \quad \text{and} \quad \beta_k = e^{i(2k-1)\pi / 2N} \quad \text{for} \quad k = 1, 2, \ldots, N \]

Fig.3. Systolic linear array architecture-2 for computing N-point DST-IV.

For continuous processing in every time-step:
\[
R \leftarrow (Q + X)U \\
V \leftarrow U
\]

Fig.4. Function of each PE, except (N+1)th PE, of the systolic architecture-2 shown in Fig.3.

When \( k = 1 \), we obtain
- in case of 1st PE
  \[
  R = [0 + x(N)]\beta_1 = x(N)\beta_1 \\
  V = \beta_1
  \]
- in case of 2nd PE
  \[
  R = [x(N)\beta_1 + x(N-1)]\beta_1 \\
  V = \beta_1
  \]
- in case of 3rd PE
  \[
  R = [(x(N)\beta_1 + x(N-1))\beta_1 + x(N-2)]\beta_1 \\
  V = \beta_1
  \]
  and so on.
- in case of Nth PE
  \[
  R = P(1) = \left( (x(N)\beta_1 + x(N-1))\beta_1 + x(N-2) \right)\beta_1 + \cdots + x(1) \beta_1
  \]
The proposed systolic architectures require $N$ multiplications and $(N-1)$ additions for realization of DST-IV of length $N$. During each clock cycle, each PE performs one multiplication and one addition. The duration of one cycle period is $T = T_M + T_A$, where $T_M$ and $T_A$ are the times involved in performing one multiplication and one addition respectively. In both the systolic linear array architectures, the successive DST-IV output components $Y(1), Y(2), Y(3), \ldots, Y(N)$ are obtained in every $N$ time-steps.

Table 1 gives the comparison of the computation complexities of the proposed architectures with related works [11] to [14] found in literature. The number of multipliers of the proposed structures is less than those of [11], [12], [13] and [14]. The number of adders of the proposed structures is less than those of [11], [12], [13] and [14].

Table 1: Comparison of computation complexities

<table>
<thead>
<tr>
<th>Structure</th>
<th>Multipliers</th>
<th>Adders</th>
</tr>
</thead>
<tbody>
<tr>
<td>Proposed</td>
<td>$N$</td>
<td>$N-1$</td>
</tr>
<tr>
<td>[11]</td>
<td>$(1/2)N \log_2 N$</td>
<td>$(3/2)N \log_2 N - N + 1$</td>
</tr>
<tr>
<td>[12]</td>
<td>$(1/2)N \log_2 N$</td>
<td>$(3/2)N \log_2 N - N + 1$</td>
</tr>
<tr>
<td>[13]</td>
<td>$N$</td>
<td>$2N$</td>
</tr>
<tr>
<td>[14]</td>
<td>$N - 1$</td>
<td>$N + 1$</td>
</tr>
</tbody>
</table>

IV. Conclusion

This paper presents two linear systolic architectures for realization of the recursive algorithm for type-IV discrete sine transform. The recursive algorithms apply to arbitrary length algorithms and are appropriate for VLSI implementation. The advantages are that the number of multipliers and adders in the proposed architectures are less than some other structures found in literature.

References
