## Model Predictive Control

### QP Solver - Vibration Control of a Cantilevered Beam

The quadratic programming solver is of key interest in MPC algorithms, as its computational complexity has historically been a barrier to more widespread deployment. Here we revisit an earlier DSP demonstation of MPC on a cantilevered beam, and study how customised microelectonics may be used to provide a substantially improved design. These improvements allow for improved feasibility of more complex MPC systems.

The previous implementation (discussed here) used a low cost ADSP-21262 DSP device. It was programmed in assembly language, to demonstate effective control using an 18 state model and 12 step prediciton horizon, at a 5kHz sampling rate. The power rating of this DSP is approximately 600mW.

Whilst a significant achievement, this also represents the limits of the DSP's capabilities. For more demanding applications, such as those with a longer horizon, higher dimensions, or faster sampling rate, a better solution is required. One option is to use a more capable (but more expensive) DSP. The other, which we consider, is the use of custom microelectronic circuits.

While DSPs are themselves mass produced custom circuits, they are very generalised to meet multiple applications, and are consequently inefficient. In particular, the cheaper devices only provide sequential, single threaded, operation. Other DSPs may be more powerful, but are also more expensive and have higher power rating.

FPGAs and ASICs offer a high level of circuit customisation, allowing the design to be optimised to suit the specific application. They also allow the use of custom numeric systems, and provided the abilities to implement low power designs, if required.

FPGAs are a mid-point between ASICs and DSPs. They contain pre-defined blocks that are joined together in the "programming" process to form the circuit. While they have a low prototyping cost, they can have relatively large per-unit costs and the circuit complexity can be limited by the FPGA's resources and internal architecture. However, they provide a good first step at determining and demonstrating feasibility.

ASICs offer the ultimate in circuit customisation. They allow for fully customised circuits, with the main limited determined by the economics associated with the size and power rating of the design. While they have a very high prototyping cost, they can be extremely cheap when mass produced and represent the ideal milestone to demonstrating the full potential of a design.

However, building these custom circuits also presents a new set of challenges and goals. In particular, we require a design that has

The key issue to observe is that almost every step requires some sort of numerical manipulation that is traditionally considered non-trivial in hardware circuits. Operations such as divisions, searching of candidate constraints, and vector-matrix manipulation, typically require significant processing time in a DSP solutions.

In particular, a key part of the active set alogorithm is the maintainence of an upper triangular matrix, that represents the current set of active constraints. This requires repeated Givens rotations, an operation that traditionally involves division by square roots and adjustment of the (possibly large) affected matrices. It is consequently one of the most significant bottlenecks in the algorithm.

Each block contains a number of parallel elements to allow fast simultaneous computation of various quantities. For any given problem specification, the number and size of each of these computational blocks can be scaled to suit the dimensions of the problem. For example, for the above-mentioned beam control problem, we use 12 parallel elements in each array. However, it would also be possible to use just 6 elements for a smaller, but slower, implementation.

A key optimisation in the design is the use of parrallelism within calulation blocks, and also effectively multithreading the processor by performing independent operations simultaneously. For further details on the parallelism exploited, refer to our ECC'09 publication

To study the effects of numerical precision, we developed the c4Hardware and c4HDL C++ libraries to provide bit accurate emulation of hardware performance. This was used to create a MEX file that replaces the original MATLAB simulation to exactly emulate the performance of the proposed hardware.

The following plot demonstrates the performance of the original MATLAB script, when a distubance occurs on the beam :

The upper plot shows the output of the plant (the sensors on the beam) and the lower plot represnts the control input with a simple set of constraints. We see that the control hits the constraints after the initiall disturbance, but stays within them. Using these results as a reference of ideal performance, we can then test the behaviour of the restricted precision c4Hardware model. As we see from the following plot, when using as little a 6 mantissa bits, there is very little discernable difference from the full precision MATLAB script.

The important implication of this result is that using a high prescision implementation, such as a DSP, is wasteful since it achieves little performance benefit. In particular, increasing the number of bits used in the numerical representation increases the size and power consumption of a circuit, as well as negatively impacting potential speed. This suggests that using custom circuits, such as FPGAs and ASICs, provides scope for a much more efficient result.

Implementation in an Altera Stratix III FPGA suggests a worst case speed of 25us per iteration, representing a circuit at least 8 times faster than the DSP. The design is also generic, so it can also be readily applied to other FPGA devices.

The design may also be used to generate an ASIC design, as demonstrated by the following layout plot:

While this design is indicative only, it provides a strong indication of the possibilities of an ASIC implementaton. Using a 130nm CMOS technology, the above ASIC can potentially provide at least a doubling of speed over the FPGA (16x the DSP) with a core area of less than 2mm

The optimisation of the design is a work in progress, but current results indicate substantial advantages in using custom ciruits over a generic DSP.

Our conservative results suggest that the proposed FPGA and ASIC circuits will be capable of speeds up to 16 times faster than the existing DSP, allowing scope for implementation of more complex systems. For example, it is now possible to use longer prediction horizons, faster sampling rates, and high dimension systems.

These designs have been successfully verified against the bit-accurate simulation model, and work is underway to demonstrate them as a proof-of-concept on physical hardware.

### Previous DSP Implementation

The previous implementation (discussed here) used a low cost ADSP-21262 DSP device. It was programmed in assembly language, to demonstate effective control using an 18 state model and 12 step prediciton horizon, at a 5kHz sampling rate. The power rating of this DSP is approximately 600mW.

Whilst a significant achievement, this also represents the limits of the DSP's capabilities. For more demanding applications, such as those with a longer horizon, higher dimensions, or faster sampling rate, a better solution is required. One option is to use a more capable (but more expensive) DSP. The other, which we consider, is the use of custom microelectronic circuits.

### FPGAs and ASICs

The two types of custom circuits that we consider are Field Programmable Gate Arrays (FPGAs) and Application Specific Integerated Circuits (ASICs).While DSPs are themselves mass produced custom circuits, they are very generalised to meet multiple applications, and are consequently inefficient. In particular, the cheaper devices only provide sequential, single threaded, operation. Other DSPs may be more powerful, but are also more expensive and have higher power rating.

FPGAs and ASICs offer a high level of circuit customisation, allowing the design to be optimised to suit the specific application. They also allow the use of custom numeric systems, and provided the abilities to implement low power designs, if required.

FPGAs are a mid-point between ASICs and DSPs. They contain pre-defined blocks that are joined together in the "programming" process to form the circuit. While they have a low prototyping cost, they can have relatively large per-unit costs and the circuit complexity can be limited by the FPGA's resources and internal architecture. However, they provide a good first step at determining and demonstrating feasibility.

ASICs offer the ultimate in circuit customisation. They allow for fully customised circuits, with the main limited determined by the economics associated with the size and power rating of the design. While they have a very high prototyping cost, they can be extremely cheap when mass produced and represent the ideal milestone to demonstrating the full potential of a design.

However, building these custom circuits also presents a new set of challenges and goals. In particular, we require a design that has

- A flexible architecture, to avoid a complete re-design for each problem specification
- Out-perform exisiting DSP solutions, otherwise there is no advantage
- Keep the design cost effective, which equates to a small power efficient circuit

### The Active Set Algorithm

For this proof of concept exercise, we consider the active set method as a suitable option for use in the quadratic programming solver.The key issue to observe is that almost every step requires some sort of numerical manipulation that is traditionally considered non-trivial in hardware circuits. Operations such as divisions, searching of candidate constraints, and vector-matrix manipulation, typically require significant processing time in a DSP solutions.

In particular, a key part of the active set alogorithm is the maintainence of an upper triangular matrix, that represents the current set of active constraints. This requires repeated Givens rotations, an operation that traditionally involves division by square roots and adjustment of the (possibly large) affected matrices. It is consequently one of the most significant bottlenecks in the algorithm.

### General Architecture

Despite the computational complexities, careful observation of the algorithm allows it to be represented in a form that can be implemented by a very general architecture. We consider a device that contains just a few computational blocks, as follows :Each block contains a number of parallel elements to allow fast simultaneous computation of various quantities. For any given problem specification, the number and size of each of these computational blocks can be scaled to suit the dimensions of the problem. For example, for the above-mentioned beam control problem, we use 12 parallel elements in each array. However, it would also be possible to use just 6 elements for a smaller, but slower, implementation.

A key optimisation in the design is the use of parrallelism within calulation blocks, and also effectively multithreading the processor by performing independent operations simultaneously. For further details on the parallelism exploited, refer to our ECC'09 publication

### Limited Precision

An important decision in the hardware design is the choice of numeric system and numerical prescision. For this application, we choose a customised floating point representation.To study the effects of numerical precision, we developed the c4Hardware and c4HDL C++ libraries to provide bit accurate emulation of hardware performance. This was used to create a MEX file that replaces the original MATLAB simulation to exactly emulate the performance of the proposed hardware.

The following plot demonstrates the performance of the original MATLAB script, when a distubance occurs on the beam :

The upper plot shows the output of the plant (the sensors on the beam) and the lower plot represnts the control input with a simple set of constraints. We see that the control hits the constraints after the initiall disturbance, but stays within them. Using these results as a reference of ideal performance, we can then test the behaviour of the restricted precision c4Hardware model. As we see from the following plot, when using as little a 6 mantissa bits, there is very little discernable difference from the full precision MATLAB script.

The important implication of this result is that using a high prescision implementation, such as a DSP, is wasteful since it achieves little performance benefit. In particular, increasing the number of bits used in the numerical representation increases the size and power consumption of a circuit, as well as negatively impacting potential speed. This suggests that using custom circuits, such as FPGAs and ASICs, provides scope for a much more efficient result.

### Results

We applied the above architecture to the cantilevered beam problem to compare against the existing DSP solution. The original DSP solution required up to 200us to process each sample and produce an appropriate control output.Implementation in an Altera Stratix III FPGA suggests a worst case speed of 25us per iteration, representing a circuit at least 8 times faster than the DSP. The design is also generic, so it can also be readily applied to other FPGA devices.

The design may also be used to generate an ASIC design, as demonstrated by the following layout plot:

While this design is indicative only, it provides a strong indication of the possibilities of an ASIC implementaton. Using a 130nm CMOS technology, the above ASIC can potentially provide at least a doubling of speed over the FPGA (16x the DSP) with a core area of less than 2mm

^{2}(die area of 4mm^{2}). This contrasts with microprocessors used in desktop computers, which can exceed 100mm^{2}.The optimisation of the design is a work in progress, but current results indicate substantial advantages in using custom ciruits over a generic DSP.

### Summary

While a DSP implementation has previously demonstrated MPC for the cantilevered beam problem, it was somewhat limited in its capabilities. A bit-accurate simulation model using the c4Hardware library demonstrated that only a very small numerical precision is required, suggesting that custom circuits could provide a much more efficient implementation.Our conservative results suggest that the proposed FPGA and ASIC circuits will be capable of speeds up to 16 times faster than the existing DSP, allowing scope for implementation of more complex systems. For example, it is now possible to use longer prediction horizons, faster sampling rates, and high dimension systems.

These designs have been successfully verified against the bit-accurate simulation model, and work is underway to demonstrate them as a proof-of-concept on physical hardware.