7.4. Quantization of Filter Coefficients

The finite numerical resolution of digital number representations has impact on the properties of filters, as already discussed for non-recursive filters. The quantization of coefficients, state variables, algebraic operations and signals plays an important role in the design of recursive filters. Compared to non-recursive filters, the impact of quantization is often more prominent due to the feedback. Severe degradations from the desired characteristics and instability are potential consequences of a finite word length in practical implementations.

A recursive filter of order \(N \geq 2\) can be decomposed into second-order sections (SOS). Due to the grouping of poles/zeros to filter coefficients with a limited amplitude range, a realization by cascaded SOS is favorable in practice. We therefore limit our investigation of quantization effects to SOS. The transfer function of a SOS is given as

\[H(z) = \frac{b_0 + b_1 z^{-1} + b_2 z^{-2}}{1 + a_1 z^{-1} + a_2 z^{-2}}\]

This can be split into a non-recursive part and a recursive part. The quantization effects of non-recursive filters have already been discussed. We therefore focus here on the recursive part given by the transfer function

\[H(z) = \frac{1}{1 + a_1 z^{-1} + a_2 z^{-2}}\]

This section investigates the consequences of quantization in recursive filters. As for non-recursive filters, we first take a look at the quantization of filter coefficients. The structure used for the realization of the filter has impact on the quantization effects. We begin with the direct form followed by the coupled form, as example for an alternative structure.

7.4.1. Direct Form

Above transfer function of the recursive part of a SOS can be rewritten in terms of its complex conjugate poles \(z_{\infty}\) and \(z_{\infty}^*\) as

\[H(z) = \frac{1}{(z-z_{\infty}) (z-z_{\infty}^*)} = \frac{z^{-2}}{ 1 \underbrace{- 2 r \cos(\varphi)}_{a_1} \; z^{-1} + \underbrace{r^2}_{a_2} \; z^{-2} }\]

where \(r = |z_{\infty}|\) and \(\varphi = \arg \{z_{\infty}\}\) denote the absolute value and phase of the pole \(z_{\infty}\), respectively. Let’s assume a linear uniform quantization of the coefficients \(a_1\) and \(a_2\) with quantization step \(Q\). Discarding clipping, the following relations for the locations of the poles can be found

for \(n \in \mathbb{N}_0\) and \(m \in \mathbb{Z}\). Quantization of the filter coefficients \(a_1\) and \(a_2\) into a finite number of amplitude values leads to a finite number of pole locations. In the \(z\)-plane the possible pole locations are given by the intersections of

  • circles whose radii \(r_n\) are given by \(r_n = \sqrt{n \cdot Q}\) with
  • equidistant vertical lines which intersect the horizontal axis at \(\frac{1}{2} m \cdot Q\).

The finite number of pole locations may lead to deviations from a desired filter characteristic since a desired pole location is moved to the next possible pole location. The filter may even get unstable, when poles are moved outside the unit circle. For illustration, the resulting pole locations for a SOS realized in direct form are computed and plotted.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
import scipy.signal as sig
import itertools


def compute_pole_locations(Q):
    a1 = np.arange(-2, 2+Q, Q)
    a2 = np.arange(0, 1+Q, Q)

    p = np.asarray([np.roots([1, n, m]) for (n,m) in itertools.product(a1, a2)])
    p = p[np.imag(p)!=0]

    return p

def plot_pole_locations(p, Q):
    ax = plt.gca()
    for n in np.arange(np.ceil(2/Q)+1):
        circle = Circle((0,0), radius=np.sqrt(n*Q), fill=False, color='black', ls='solid', alpha=0.05)
        ax.add_patch(circle)
        ax.axvline(.5*n*Q, color='0.95')
        ax.axvline(-.5*n*Q, color='0.95')

    unit_circle = Circle((0,0), radius=1, fill=False, color='red', ls='solid')
    ax.add_patch(unit_circle)

    plt.plot(np.real(p), np.imag(p), 'b.', ms = 4)
    plt.xlabel(r'Re{$z$}')
    plt.ylabel(r'Im{$z$}')
    plt.axis([-1.1, 1.1, -1.1, 1.1])

# compute and plot pole locations
for w in [5,6]:
    Q = 2/(2**(w-1))  # quantization stepsize
    plt.figure(figsize=(5, 5))
    p = compute_pole_locations(Q)
    plot_pole_locations(p, Q)
    plt.title(r'Direct form coefficient quantization to $w=%d$ bits'%w)
../_images/recursive_filters_quantization_of_coefficients_2_0.png
../_images/recursive_filters_quantization_of_coefficients_2_1.png

Exercise

  • What consequences has the distribution of pole locations on the desired characteristics of a filter for e.g. low/high frequencies?

7.4.2. Coupled Form

Besides the quantization step \(Q\), the pole distribution depends also on the topology of the filter. In order to gain a different distribution of pole locations after quantization, one has to derive structures where the coefficients of the multipliers are given by other values than the direct form coefficients \(a_1\) and \(a_2\).

One of these alternative structures is the coupled form (also known as Gold & Rader structure)

Coupled form second order section

Coupled form second order section

where \(\Re\{z_\infty\} = r \cdot \cos \varphi\) and \(\Im\{z_\infty\} = r \cdot \sin \varphi\) denote the real- and imaginary part of the complex pole \(z_\infty\), respectively. Analysis of the structure reveals its difference equation as

and its transfer function as

\[H(z) = \frac{\Im\{z_\infty\} \; z^{-1}}{ 1 - 2 \Re\{z_\infty\} \; z^{-1} + (\Re\{z_\infty\}^2 + \Im\{z_\infty\}^2) \; z^{-2} }\]

Note that the numerator of the transfer function differs from the recursive only SOS given above. However, this can be considered in the design of the transfer function of a general SOS.

The real- and imaginary part of the pole \(z_\infty\) occur directly as coefficients for the multipliers in the coupled form. Quantization of these coefficients results therefore in a Cartesian grid of possible pole locations in the \(z\)-plane. This is illustrated in the following.

In [2]:
def compute_pole_locations(w):
    Q = 1/(2**(w-1))  # quantization stepsize
    a1 = np.arange(-1, 1+Q, Q)
    a2 = np.arange(-1, 1+Q, Q)

    p = np.asarray([n+1j*m for (n,m) in itertools.product(a1, a2) if n**2+m**2 <= 1])

    return p

def plot_pole_locations(p):
    ax = plt.gca()

    unit_circle = Circle((0,0), radius=1, fill=False, color='red', ls='solid')
    ax.add_patch(unit_circle)

    plt.plot(np.real(p), np.imag(p), 'b.', ms = 4)
    plt.xlabel(r'Re{$z$}')
    plt.ylabel(r'Im{$z$}')
    plt.axis([-1.1, 1.1, -1.1, 1.1])


# compute and plot pole locations
for w in [5,6]:
    plt.figure(figsize=(5, 5))
    p = compute_pole_locations(w)
    plot_pole_locations(p)
    plt.title(r'Coupled form coefficient quantization to $w=%d$ bits'%w)
../_images/recursive_filters_quantization_of_coefficients_5_0.png
../_images/recursive_filters_quantization_of_coefficients_5_1.png

Excercise

  • What is the benefit of this representation in comparison to the direct from discussed in the previous section?

7.4.3. Example

The following example illustrates the effects of coefficient quantization for a recursive Butterworth filter realized in cascaded SOSs in transposed direct form II.

In [3]:
w = 12  # wordlength of filter coefficients
N = 5  # order of filter

def uniform_midtread_quantizer(x, w, xmin=1):
    # quantization step
    Q = xmin/(2**(w-1))
    # limiter
    x = np.copy(x)
    idx = np.where(x <= -xmin)
    x[idx] = -1
    idx = np.where(x > xmin - Q)
    x[idx] = 1 - Q
    # linear uniform quantization
    xQ = Q * np.floor(x/Q + 1/2)

    return xQ

# coefficients of recursive filter
b, a = sig.butter(N, 0.2, 'low')
# decomposition into SOS
sos = sig.tf2sos(b, a, pairing='nearest')
sos = sos/np.amax(np.abs(sos))
# quantization of SOS coefficients
sosq = uniform_midtread_quantizer(sos, w, xmin=2)
# compute overall transfer function of (quantized) filter
H = np.ones(512)
Hq = np.ones(512)
for n in range(sos.shape[0]):
    Om, Hn = sig.freqz(sos[n, 0:3], sos[n, 3:6])
    H = H * Hn
    Om, Hn = sig.freqz(sosq[n, 0:3], sosq[n, 3:6])
    Hq = Hq * Hn


# plot magnitude responses
plt.figure(figsize=(10, 3))
plt.plot(Om, 20 * np.log10(abs(H)), label='continuous')
plt.plot(Om, 20 * np.log10(abs(Hq)), label='quantized')
plt.title('Magnitude response')
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$|H(e^{j \Omega})|$ in dB')
plt.legend(loc=3)
plt.grid()
# plot phase responses
plt.figure(figsize=(10, 3))
plt.plot(Om, np.unwrap(np.angle(H)), label='continuous')
plt.plot(Om, np.unwrap(np.angle(Hq)), label='quantized')
plt.title('Phase')
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$\varphi (\Omega)$ in rad')
plt.legend(loc=3)
plt.grid()
../_images/recursive_filters_quantization_of_coefficients_8_0.png
../_images/recursive_filters_quantization_of_coefficients_8_1.png

Exercise

  • Decrease the word length w of the filter. What happens? At what word length does the filter become unstable?
  • Increase the order N of the filter for a fixed word length w. What happens?