Current location - Loan Platform Complete Network - Big data management - The development direction of digital signal processor
The development direction of digital signal processor

Digital signal processor from the 1970s dedicated signal processor began to develop to today's VLSI array processor, its application areas have been developed from the initial voice, sonar and other low-frequency signal processing to today's radar, images, and other video large data volume signal processing. Due to the utilization of floating-point operations and parallel processing technology, the signal processor processing capability has been greatly improved. Digital signal processor will continue to improve the processing speed and arithmetic precision along the two directions of development in the architecture of the data flow structure and artificial neural network structure will probably become the next generation of digital signal processor basic structure mode.

Algorithm format

There are various algorithms for DSP. The vast majority of DSP processors use fixed-point algorithms, where numbers are represented as integers or decimal forms between -1.0 and +1.0. Some processors use floating-point algorithms, where the data is represented in the form of a trailing number plus an exponent: trailing number x 2 exponents.

The floating-point algorithm is a more complex conventional algorithm, and a large dynamic range of data can be achieved using floating-point data (this dynamic range can be expressed as the ratio of the maximum and minimum numbers). Floating-point DSPs are used in applications where the design engineer does not have to be concerned with issues such as dynamic range and accuracy. Floating-point DSPs are easier to program than fixed-point DSPs, but have high cost and power consumption.

Fixed-point DSPs are generally used for volume products due to cost and power consumption, and programmers and algorithm designers use analysis or simulation to determine the dynamic range and accuracy needed. Floating-point DSPs can be considered if they are easy to develop and have a wide dynamic range and high accuracy.

Floating-point calculations can also be implemented in software with fixed-point DSPs, but such software programs take up a lot of processor time and are rarely used. An effective approach is "block floating point", which uses a set of data with the same index and different tails as a block of data to be processed. "Block floating-point processing is usually implemented in software.

Data width

All floating-point DSPs have 32-bit word widths, while fixed-point DSPs typically have 16-bit word widths, and there are also 24-bit and 20-bit DSPs, such as Motorola's DSP563XX series and Zoran's ZR3800X series. Since the word width has a great relationship with the external size of the DSP, the number of pins and the size of the memory required, the length of the word width directly affects the cost of the device. The wider the word width, the larger the size, the more pins, the larger the memory requirements, and the cost increases accordingly. In order to meet the design requirements of the conditions, to try to choose a small word width of the DSP to reduce costs.

When choosing between fixed-point and floating-point, the relationship between word width and development complexity can be weighed. For example, a 16-bit word-width DSP device can also implement a 32-bit word-width double-precision algorithm by combining instructions together (although double-precision algorithms are of course much slower than single-precision algorithms). If single-precision meets the majority of computational requirements and only a small amount of code requires double-precision, this approach is also feasible, but if the majority of computations require a high degree of precision, a processor with a larger word width is required.

Please note that most DSP devices have the same width for the instruction word and the data word, and some do not, such as ADI's (Analog Devices Inc.) ADSP-21XX series, which has a 16-bit data word and a 24-bit instruction word.

Processing speed

The key to whether the processor meets the design requirements is whether it meets the speed requirements. There are a number of ways to test the speed of a processor, the most basic of which is to measure the instruction cycle, which is the amount of time it takes for the processor to execute the fastest instruction. The inverse of the instruction cycle is divided by one million, and then multiplied by the number of instructions executed in each cycle, resulting in the highest rate of the processor, which is measured in millions of instructions per second (MIPS).

But the instruction execution time doesn't indicate the true performance of the processor. Different processors do not accomplish the same amount of work on a single instruction, and simply comparing the instruction execution time doesn't make a fair distinction between the differences in performance. Some newer DSPs now use very long instruction word (VLIW) architectures, in which multiple instructions can be implemented in a single cycle time, and each instruction accomplishes fewer tasks than in traditional DSPs, so comparing the size of the MIPS relative to VLIW and general-purpose DSP devices can be misleading.

Even comparing MIPS sizes among traditional DSPs is somewhat one-sided. For example, some processors allow several bits to be shifted together at the same time in a single instruction, while some DSPs can only shift a single data bit in a single instruction; some DSPs can perform parallel processing of data unrelated to the ALU instruction being executed (load operands while executing the instruction), while other DSPs can only support parallel processing of data related to the ALU instruction being executed; Some newer DSPs allow two MACs to be defined within a single instruction. so a MIPS comparison alone does not accurately yield processor performance.

One way to address the above issues is to use a basic operation (rather than an instruction) as a criterion for comparing processor performance. Commonly used is the MAC operation, but the MAC operation time does not provide enough information to compare the performance differences of DSPs. in the vast majority of DSPs, the MAC operation is implemented only in a single instruction cycle with a MAC time equal to the instruction cycle time, and, as mentioned above, some DSPs process more tasks in a single MAC cycle than others. the MAC time does not reflect the performance of, for example, a cyclic operation, which is used in all DSPs. which are used in all applications.

The most general approach is to define a standard set of routines that compare execution speeds on different DSPs. Such routines may be the "core" function of an algorithm, such as a FIR or IIR filter, or the whole or part of an application (e.g., a speech encoder). Figure 1 shows the performance of several DSP devices tested using BDTI's tools.

When comparing the speed of DSP processors, it is important to pay attention to the advertised MOPS (million operations per second) and MFLOPS (million floating-point operations per second) parameters, because different vendors have different interpretations of "operations" and different meanings of the metrics. For example, some processors can do both floating-point multiplication and floating-point addition, and therefore boast twice the MFLOPS of MIPS.

Second, when comparing processor clock rates, a DSP's input clock may be the same as its instruction rate, or it may be double to quadruple the instruction rate, which may vary from processor to processor. In addition, many DSPs have clock multipliers or phase-locked loops that can use an external low-frequency clock to generate the high-frequency clock signals needed on-chip.

Practical applications

Speech processing: speech coding, speech synthesis, speech recognition, speech enhancement, voicemail, and speech storage.

Image/Graphics: 2D and 3D graphics processing, image compression and transmission, image recognition, animation, robot vision, multimedia, electronic maps, image enhancement, etc.

Military; classified communications, radar processing, sonar processing, navigation, global positioning, frequency hopping radio, search and counter-search, etc.

Instrumentation: spectrum analysis, function generation, data acquisition, seismic processing, etc.

Automation: control, deep space operations, autopilot, robot control, disk control, etc.

Medical: hearing aid, ultrasound equipment, diagnostic tools, patient monitoring, electrocardiogram, etc.

Home appliances: digital audio, digital TV, video phone, music synthesis, tone control, toys and games, etc.

Examples of biomedical signal processing:

CT: Computerized Tomography. (Of which Hausfeld of EMI UK, who invented the cranial CT, won the Nobel Prize.)

CAT: computerized X-ray spatial reconstruction device. Appearance of whole body scanning, cardiac activity stereo graphics, brain tumor foreign body, human torso image reconstruction. Electrocardiogram analysis.

Memory Management

The performance of a DSP is affected by its ability to manage the memory subsystem. As mentioned earlier, MAC and a number of other signal processing functions are fundamental to the signal processing capabilities of a DSP device, and the ability to perform fast MAC requires that one instruction word and two data words be read from memory per instruction cycle. There are a number of ways to implement such reads, including multi-interface memories (which allow multiple accesses to memory per instruction cycle), separate instruction and data memories ("Harvard" architecture and its derivatives), and instruction caches (which allow instructions to be read from the cache instead of from the memory, thereby freeing up the memory for data reads). Figures 2 and 3 show the difference between the Harvard memory structure and the "von Norman" structure used by many microcontrollers.

Also note the amount of memory space supported. The primary target market for many fixed-point DSPs is embedded applications, where memory is generally small, so these DSP devices have small to medium on-chip memory (4K to 64K words or so) with a narrow external data bus. In addition, most fixed-point DSPs have address buses of less than or equal to 16 bits, thus limiting the amount of external memory space available.

Some floating-point DSP on-chip memory is very small, or even no, but the external data bus is wide. For example, TI's TMS320C30 has only 6K on-chip memory, a 24-bit external bus, and a 13-bit external address bus. And ADI's ADSP2-21060 has 4Mb of on-chip memory, which can be divided into program memory and data memory in a variety of ways.

The choice of DSP needs to be based on the specific application on the size of the storage space and the requirements of the external bus.

Type characteristics

There are significant differences between DSP processors and general-purpose processors (GPPs) such as Intel, Pentium, or Power

PCs, and these differences arise from the fact that the structure and instructions of DSPs have been designed and developed specifically for signal processing, and have the following characteristics.

-Hardware Multiply-Accumulate Operations (MACs)

In order to perform multiply-accumulate operations such as signal filtering efficiently, the processor needs to perform efficient multiplication operations.

The GPPs were not initially designed for laborious multiplication operations, and the first major technological improvement that set the DSPs apart from earlier GPPs was the addition of specialized hardware and explicit MAC instructions capable of single-cycle multiplication operations. specialized hardware and explicit MAC instructions capable of single-cycle multiplication operations.

-Harvard structure

Traditional GPPs use a Von. Norman memory architecture, in which there is one memory space connected to the processor core through two buses (an address bus and a data bus), which does not satisfy the requirement that the MAC must make four accesses to the memory in a single instruction cycle.DSPs generally use the Harvard architecture, in which there are two memory spaces: the program memory and the data memory. The processor core is connected to these storage spaces through two sets of buses, allowing two simultaneous accesses to memory, an arrangement that doubles the processor's bandwidth. In Harvard architectures, greater storage bandwidth is sometimes achieved by adding a second data store and bus. Modern high-performance GPPs typically have two on-chip ultra-cache memories one holding data and one holding instructions. From a theoretical point of view, this dual on-chip cache and bus connection is equivalent to the Harvard architecture, however, GPPs use control logic to determine which data and instruction words reside in the on-chip cache, a process that is usually invisible to the programmer, whereas in DSPs, the programmer is able to explicitly control which data and instructions are stored in the on-chip memory cell or cache.

- Zero-Consumption Loop Control

The **** same characteristic of DSP algorithms: most of the processing time is spent executing a small number of instructions contained within a relatively small loop. As a result, most DSP processors have specialized hardware for zero-consumption loop control. A zero-consumption loop is a loop in which the processor executes a set of instructions without taking the time to test the value of the loop counter, and the hardware accomplishes the loop jumps and the decay of the loop counter. Some DSPs also implement high-speed, single-instruction loops through a single instruction's ultra cache.

-Special Addressing Modes

DSPs often include specialized address generators that produce special addressing required by signal processing algorithms, such as cyclic addressing and bit-flip addressing. Cyclic addressing corresponds to the streaming FIR filtering algorithm, and bit-flip addressing corresponds to the FFT algorithm.

-Predictability of execution time

Most DSP applications have hard real-time requirements, where all processing must be completed within a specified time in each case. This real-time constraint requires the program designer to determine exactly how much time is needed for each sample or at least how much time is used up in the worst case scenario.The process by which DSPs execute a program is transparent to the programmer, so it is easy to predict the execution time to process each job. However, for high-performance GPPs, the prediction of execution time becomes complex and difficult due to the use of large amounts of ultra-high-speed data and program caches that dynamically allocate programs.

-With rich peripherals

DSPs have peripherals such as DMA, Serial, Link ports, timers, and more.