Current location - Loan Platform Complete Network - Big data management - What is the role of front-end bus, cache, and how to see the actual speed of CPU from the parameters?
What is the role of front-end bus, cache, and how to see the actual speed of CPU from the parameters?
1.1.1 Main Frequency

The main frequency, also known as the clock frequency, in MHz, is used to indicate the CPU's computing speed.The main frequency of the CPU = external frequency × multiplier factor. Many people believe that the main frequency determines the CPU's operating speed, which is not only a one-sided, but also for the server, this understanding also appeared to be biased. So far, there is no definitive formula to achieve a numerical relationship between the main frequency and the actual computing speed, even the two major processor manufacturers Intel and AMD, there is a great deal of controversy on this point, we can see from the development trend of Intel's products, Intel is very much focused on strengthening its own development of the main frequency. Like other processor manufacturers, someone once took a fast 1G Allmax to make a comparison, it is equivalent to the running efficiency of the 2G Intel processor.

So, the CPU's main frequency and the CPU's actual computing power is not directly related to the main frequency, the main frequency represents the speed of the digital pulse signal oscillation within the CPU. In Intel's processor products, we can also see examples where a 1 GHz Itanium chip can perform almost as fast as a 2.66 GHz Xeon/Opteron, or a 1.5 GHz Itanium 2 is about as fast as a 4 GHz Xeon/Opteron.The speed of a CPU's computing also depends on the CPU's pipeline of performance metrics.

Of course, the CPU frequency is related to the actual computing speed, but it can only be said that the CPU frequency is only one aspect of the performance of the CPU, and does not represent the overall performance of the CPU.

1.1.2 External Frequency

The external frequency is the base frequency of the CPU, and the unit is also MHz. The external frequency of the CPU determines the speed of the whole motherboard. To be clear, in desktop computers, what we call overclocking is overclocking the CPU's external frequency (of course, in general, the CPU's multiplier is locked), and I believe this point is very well understood. But for the server CPU, overclocking is absolutely not allowed. As mentioned earlier, the CPU determines the operating speed of the motherboard, the two are synchronized, if the server CPU overclocking, change the external frequency, will produce asynchronous operation, (desktop many motherboards support asynchronous operation) which will cause the entire server system instability.

The vast majority of current computer systems in the external frequency is also the speed of synchronous operation between the memory and the motherboard, in this way, it can be understood as the CPU's external frequency is directly connected to the memory, to achieve synchronous operation between the two states. External frequency and front-side bus (FSB) frequency is easily confused, the following front-side bus introduction we talk about the difference between the two.

1.1.3 Front Side Bus (FSB) Frequency

Front Side Bus (FSB) Frequency (i.e., bus frequency) is a direct impact on the speed of direct data exchange between the CPU and memory. There is a formula to calculate that the data bandwidth = (bus frequency x data bandwidth)/8, the maximum bandwidth of data transmission depends on the width and transmission frequency of all simultaneously transmitted data. Let's say that the current Xeon Nocona with 64-bit support has a front-end bus of 800MHz, and according to the formula, its maximum bandwidth for data transfer is 6.4GB/sec.

The difference between the external frequency and the front-side bus (FSB) frequency: the speed of the front-side bus refers to the speed of data transmission, and the external frequency is the speed of synchronous operation between the CPU and the motherboard.

Intel? Advanced Vector Extensions (AVX)
Increases the performance of Accelerated Floating Point and Vector Computing Intensive Applications with minimal power gains.

In fact, now with the advent of the "HyperTransport" architecture, this actual front-side bus (FSB) frequency has changed. Before we know IA-32 architecture must have three important building blocks: Memory Controller Hub (MCH), I/O Controller Hub and PCI Hub, like Intel's very typical chipset Intel 7501, Intel 7505 chipset, tailored to the dual Xeon processor, they contain the MCH for the CPU to provide the frequency of 533MHz of the front-side bus, with DDR, the front-side bus, the front-side bus, the front-side bus, the front-side bus, the front-side bus, the front-side bus, the front-side bus, the front-side bus and the front-side bus, the front-side bus. The MCH included in these chipsets provides the CPU with a 533MHz front-side bus, and with DDR memory, the front-side bus bandwidth can reach up to 4.3GB/second. However, the increasing performance of the processor also brings many problems to the system architecture. The "HyperTransport" architecture not only solves the problem, but also improves the bus bandwidth more effectively, such as the AMD Opteron processor, flexible HyperTransport I/O bus architecture allows it to integrate the memory controller, so that the processor does not pass through the system bus to the chipset, but directly with the memory controller. The flexible HyperTransport I/O architecture of the AMD Opteron processor allows it to integrate a memory controller, allowing the processor to exchange data directly with the memory instead of passing it through the system bus to the chipset. In this case, the Front Side Bus (FSB) frequency on AMD Opteron processors is unknown.

1.1.4 CPU bits and word length

Bit : In digital circuits and computer technology, binary is used, where the code is only "0" and "1", where either "0" or "1" is used. "or "1" is a "bit" in the CPU.

Word length: The number of bits of binary numbers that can be processed at once by the CPU in a unit of time (at the same time) is called the word length in computer technology. So a CPU that can handle 8-bit word length data is usually called an 8-bit CPU, and similarly a 32-bit CPU can handle 32-bit binary data in a unit of time. Difference between byte and word length: Since common English characters can be expressed in 8-bit binary, 8-bit is usually called a byte. The length of the word length is not fixed, for different CPUs, the length of the word length is not the same. 8-bit CPUs can only handle one byte at a time, while 32-bit CPUs can handle four bytes at a time, the same word length of 64-bit CPUs can handle eight bytes at a time.

1.1.5 Multiplier Factor

The multiplier factor is the relative ratio between the main CPU frequency and the external frequency. At the same external frequency, the higher the multiplier, the higher the frequency of the CPU. However, in reality, under the premise of the same external frequency, high multiplier CPU itself is not very meaningful. This is because the data transfer speed between the CPU and the system is limited, the pursuit of high multiplier and get a high frequency CPU will appear obvious "bottleneck" effect - the CPU from the system to get the data limit speed can not meet the speed of the CPU computing. In general, except for the engineering prototype, Intel's CPUs are locked with a multiplier, while AMD's were not locked before.

1.1.6 Cache

The size of the cache is also one of the important indicators of the CPU, and the structure and size of the cache has a very large impact on the CPU speed, the cache within the CPU is running at a very high frequency, generally the same frequency as the processor, the efficiency is much greater than the system memory and hard disk. In practice, the CPU often needs to repeatedly read the same block of data, and the increase in cache capacity can significantly improve the CPU internal read data hit rate, without having to go to memory or hard disk to find, in order to improve system performance. However, due to CPU chip size and cost considerations, caches are very small.

L1 Cache is the first level of CPU cache, divided into data cache and instruction cache. The capacity and structure of the built-in L1 cache has a large impact on the performance of the CPU, however, the cache memory is composed of static RAM, the structure is more complex, in the case of the CPU core area can not be too large, the capacity of the L1 level cache is not possible to do too much. The general server CPU L1 cache capacity is usually 32-256KB.

L2 Cache (L2 Cache) is the second layer of the CPU cache, divided into internal and external two chips. The internal chip L2 cache runs at the same speed as the main frequency, while the external L2 cache is only half of the main frequency. L2 cache capacity also affects the performance of the CPU, and the principle is that the bigger, the better, the largest capacity of the CPU for home use is now 512KB, and the L2 cache of the CPU for servers and workstations is as high as 256-1MB, and some are as high as 2MB or 3MB.

L3 Cache (L3 Cache) is the second layer of the CPU cache, and it is the first one to be used by the CPU. >

L3 Cache (L3 cache), there are two kinds, the early is external, now are built-in. The actual role of the L3 cache is to further reduce memory latency and improve processor performance for large data-volume calculations. Reducing memory latency and increasing the ability to compute large amounts of data are both very helpful for gaming. And in the server space adding L3 cache still provides a significant performance boost. For example, a configuration with a larger L3 cache utilizes physical memory more efficiently, so its slower disk I/O subsystem can handle more data requests. Processors with larger L3 caches provide more efficient file system caching behavior and shorter message and processor queue lengths.

In fact, the earliest L3 cache was used in AMD's K6-III processors, where the L3 cache was limited by the manufacturing process and was not integrated into the chip, but rather on the motherboard. At that time, the L3 cache was not integrated into the chip due to the manufacturing process, but was integrated into the motherboard. The L3 cache, which was only able to synchronize with the system bus frequency, was not much different from the main memory. L3 cache was later used in Intel's Itanium processors for the server market. Intel also plans to introduce a 9MB L3 cache Itanium2 processor, and later a 24MB L3 cache dual-core Itanium2 processor.

But basically, the L3 cache is not very important to the performance of the processor. For example, the Xeon MP processor with 1MB of L3 cache is still not a match for the Opteron, which shows that an increase in the front-side bus is more effective than an increase in the cache to bring about a more effective performance improvement.

1.1.7 CPU Extended Instruction Set

CPUs rely on instructions to compute and control systems, and each CPU is designed with a set of instructions that match its hardware circuitry. The strength of the instructions is also an important indicator of the CPU, and the instruction set is one of the most effective tools for improving the efficiency of microprocessors. From the mainstream architecture at this stage, the instruction set can be divided into two parts: complex instruction set and streamlined instruction set, and from the specific use, such as Intel's MMX (Multi Media Extended), SSE, SSE2 (Streaming-Single instruction multiple data- Extensions 2), SEE3 and AMD's 3DNow! are all CPU extension instruction sets, which enhance the CPU's multimedia, graphics and Internet processing capabilities. We usually refer to the CPU extended instruction set as the "instruction set of the CPU". SSE3 is also the smallest instruction set at present, after MMX contains 57 commands, SSE contains 50 commands, SSE2 contains 144 commands, and SSE3 contains 13 commands. SSE3 is also the most advanced instruction set, with Intel Prescott processors already supporting SSE3, AMD adding support for SSE3 in future dual-core processors, and Ameda processors supporting this instruction set.

1.1.8 CPU Core and I/O Voltage

From the 586 CPU, the CPU operating voltage is divided into two types, the core voltage and I/O voltage, and usually the core voltage of the CPU is less than or equal to the I/O voltage. The size of the kernel voltage is based on the CPU production process, generally the smaller the production process, the lower the kernel operating voltage; I/O voltage is generally in the 1.6 ~ 5 V. Low voltage can solve the problem of excessive power consumption and high heat.

1.1.9 Manufacturing Process

The micron of the manufacturing process is the distance between the circuits inside the IC. The trend in manufacturing processes is toward higher and higher densities. A higher density IC circuit design means that you can have a higher density and more complex circuit design in the same size area of the IC. Now the main 180nm, 130nm, 90nm. 65nm manufacturing process has recently been officially indicated.

1.1.10 Instruction Set

(1) CISC Instruction Set

CISC Instruction Set, also known as Complex Instruction Set, the English name is CISC, (abbreviation of Complex Instruction Set Computer). In a CISC microprocessor, the individual instructions of a program are executed serially in order, and the individual operations in each instruction are also executed serially in order. The advantage of sequential execution is the simplicity of control, but the utilization of the computer parts is not high and the execution speed is slow. It is actually the x86 series (a.k.a. IA-32 architecture) CPUs produced by Intel and its compatible CPUs such as AMD and VIA. Even the new up and coming X86-64 (also known as AMD64) is now in the CISC category.

To know what is the instruction set from today's X86 architecture of the CPU to say. X86 instruction set is Intel for its first 16-bit CPU (i8086) developed specifically for the world's first PC launched by IBM in 1981 in the CPU-i8088 (i8086 simplified version) using the X86 instructions, while the computer in order to increase the floating float of the CPU (i8086 simplified version) to use X86 instructions. instructions, while the X87 chip was added to the computer to improve floating-point data processing capabilities, and later the X86 instruction set and X87 instruction set were collectively referred to as the X86 instruction set.

Although with the continuous development of CPU technology, Intel has developed newer i80386, i80486 until the past PII Xtreme, PIII Xtreme, Pentium 3, and finally to today's Pentium 4 series, Xtreme (excluding Xtreme Nocona), but in order to ensure that the computer can continue to run the various types of previously developed However, in order to ensure that the computer can continue to run the various applications developed in the past to protect and inherit the rich software resources, so all the CPUs produced by Intel still continue to use the X86 instruction set, so its CPUs still belong to the X86 series. Since the Intel X86 series and its compatible CPUs (such as AMD Athlon MP,) all use the X86 instruction set, it has formed the huge lineup of X86 series and compatible CPUs today. x86CPUs are now mainly available in two categories: intel's server CPUs and AMD's server CPUs.

(2) RISC instruction set

RISC is the abbreviation of "Reduced Instruction Set Computing", which means "Reduced Instruction Set" in Chinese. It is developed on the basis of the CISC instruction system, some people test the CISC machine shows that the frequency of use of various instructions is quite different, the most commonly used are some relatively simple instructions, they account for only 20% of the total number of instructions, but the frequency of the program accounted for 80%. Complex instruction systems inevitably increase the complexity of the microprocessor, making the processor's development time is long and high cost. And complex instructions require complex operations, which will inevitably reduce the speed of the computer. Based on the above reasons, RISC-type CPU was born in the 1980s. Compared with CISC-type CPU, RISC-type CPU not only streamlines the instruction system, but also adopts a kind of superscalar and super pipeline structure, which greatly increases the parallel processing capability. The RISC instruction set is the direction of development for high-performance CPUs. The RISC instruction set is the direction of development for high-performance CPUs, as opposed to the traditional CISC (Complex Instruction Set). Comparatively speaking, RISC has a unified instruction format, fewer types of instructions, and fewer addressing modes than the complex instruction set. Of course, the processing speed is much higher. Currently, CPUs with this instruction system are commonly used in mid-range and high-grade servers, especially high-end servers that all use RISC instruction system CPUs.RISC instruction system is more suitable for UNIX, the operating system of high-end servers, and now Linux also belongs to the UNIX-like operating system.RISC-type CPUs are not compatible with the CPUs of Intel and AMD, in terms of software and hardware. The RISC-type CPUs are not compatible with Intel and AMD CPUs in either software or hardware.

Currently, the CPUs that use RISC instructions in mid-range and high-end servers are mainly of the following types: PowerPC processors, SPARC processors, PA-RISC processors, MIPS processors, and Alpha processors.

(3) IA-64

There has been a lot of debate on whether EPIC (Explicitly Parallel Instruction Computers) is the successor to the RISC and CISC systems, and in terms of the EPIC system alone, it's more like Intel's processors taking an important step towards the The EPIC system alone is more like an important step in the evolution of Intel's processors to the RISC system. Theoretically, CPUs designed for the EPIC system can handle Windows applications much better than Unix-based applications in the same host configuration.

Intel's server CPU with EPIC technology is the Anthem Itanium (development code name i.e. Merced). It is a 64-bit processor and the first in the IA-64 family. Microsoft has also developed an operating system codenamed Win64 to support it in software. After Intel adopted the X86 instruction set, it turned to more advanced 64-bit microprocessors. Intel did this because they wanted to get rid of the huge x86 architecture and introduce an energetic and powerful instruction set, and so the IA-64 architecture was born, using the EPIC instruction set. In many ways, IA-64 is a great improvement over x86. It breaks through many of the limitations of the traditional IA32 architecture, and achieves breakthrough improvements in data processing capabilities, system stability, security, availability, and visualization.

The biggest drawback of IA-64 microprocessors is their lack of compatibility with x86, and Intel, in order for IA-64 processors to better run the software of the two dynasties, it introduced on IA-64 processors (Itanium, Itanium2 ......) the x86-to- IA-64 decoder so that it could translate x86 instructions into IA-64 instructions. This decoder is not the most efficient decoder, nor is it the best way to run x86 code (the best way is to run x86 code directly on an x86 processor), and as a result, the performance of Itanium and Itanium2 when running x86 applications is very poor. This was the root cause of X86-64.

(4) X86-64 (AMD64 / EM64T)

Designed by AMD, it can handle 64-bit integer arithmetic at the same time and is compatible with the X86-32 architecture. It supports 64-bit logical addressing, with options to convert to 32-bit addressing; however, the data manipulation instructions default to 32-bit and 8-bit, with options to convert to 64-bit and 16-bit; and it supports general-purpose registers, so that in the case of 32-bit arithmetic operations, the result has to be scaled up to a full 64-bit. In this way, there is a difference between "direct execution" and "converted execution" of an instruction, and the instruction field is either 8-bit or 32-bit, which avoids excessively long fields.

The creation of x86-64 (also called AMD64) was not an empty gesture. x86 processors are limited to 4GB of memory for 32-bit addressing space, and IA-64 processors are not compatible with x86. AMD took into account the needs of its customers and strengthened the functionality of the x86 instruction set to support 64-bit modes of operation, which is why AMD called their structure x86-64. Technically, AMD has introduced new R8-R15 general-purpose registers as an expansion of the original X86 processor registers for 64-bit operations in the x86-64 architecture, but these registers are not fully utilized in the 32-bit environment. The original registers such as EAX and EBX have also been expanded from 32-bit to 64-bit. Eight new registers have been added to the SSE unit to provide support for SSE2. The increase in the number of registers will result in a performance increase. At the same time, in order to support both 32- and 64-bit code and registers, the x86-64 architecture allows the processor to operate in the following two modes: Long Mode and Legacy Mode, with Long Mode divided into two sub-modes (64bit mode and Compatibility mode). The standard has been introduced in AMD's server processors in the Opteron processor

And this year also introduced support for 64-bit EM64T technology, and then has not been officially named EM64T before the IA32E, which is the name of Intel's 64-bit Extended Technology, used to differentiate the X86 instruction set. Intel's EM64T supports 64-bit sub-mode, and AMD's X86 instruction set. Intel's EM64T supports 64-bit sub-mode, similar to AMD's X86-64 technology, with 64-bit linear planar addressing, the addition of eight new general purpose registers (GPRs), and the addition of eight registers to support SSE instructions. Similar to AMD, Intel's 64-bit technology will be compatible with IA32 and IA32E only when running under a 64-bit operating system.IA32E will consist of 2 sub-modes: a 64-bit sub-mode and a 32-bit sub-mode, and is backward compatible like AMD64.Intel's EM64T will be fully compatible with AMD's X86-64 technology. Some 64-bit technology has now been added to Nocona processors, and Intel's Pentium 4E processors also support 64-bit technology.

It should be noted that both are 64-bit microprocessor architectures compatible with the x86 instruction set, but there are still some differences between EM64T and AMD64, and the NX bits in AMD64 processors will not be available in Intel's processors.

1.1.11 Hyperpipelining and superscalar

Before explaining hyperpipelining and superscalar, understand pipeline. Pipelines were first used by Intel in its 486 chips. The pipeline works like an assembly line in industrial production. In the CPU by 5-6 different functions of the circuit unit to form an instruction processing pipeline, and then an X86 instruction is divided into 5-6 steps, and then by these circuit units are executed, so that it can be realized in a CPU clock cycle to complete an instruction, thus increasing the speed of the CPU operation. Each of the Classic Pentium's integer pipelines is divided into four levels of flow, namely instruction prefetching, decoding, execution, and writing back the result, and the floating-point pipeline is further divided into eight levels of flow.

Superscalar is the simultaneous execution of multiple processors by building in multiple pipelines, which in essence is trading space for time. Hyper pipelining, on the other hand, makes it possible to complete one or even more operations in a single machine cycle by refining the pipeline and increasing the main frequency; in essence, it trades time for space. For example, the pipeline of Pentium 4 is as long as 20 steps. The longer the step (level) of the pipeline design, the faster it completes an instruction, so that it can adapt to the work of a higher frequency CPU, but the pipeline is too long also brings certain side effects, it is likely that the main frequency of the CPU actual computing speed of the phenomenon of lower, Intel's Pentium 4 appeared in this case, although it can be as high as the main frequency of more than 1.4G, but its computing performance is

1.1.12 Packaging

CPU packaging is a protective measure to prevent damage to the CPU chip or CPU module by using specific materials to solidify the CPU chip or CPU module, which must be packaged before the CPU can be delivered to the user. The CPU package depends on the CPU installation form and device integration design, from a broad classification point of view, usually using the Socket socket for the installation of the CPU using PGA (Grid Array) package, while the Slot x slot for the installation of the CPU are all used in the form of SEC (Single Edge Connector Box) package. There are also PLGA (Plastic Land Grid Array) and OLGA (Organic Land Grid Array) packaging technologies. Due to the increasingly fierce competition in the market, the current direction of development of CPU packaging technology to save costs.

.1.13 CPU interface type

Interface type refers to the type of CPU interface that the CPU cooler applies to, this is because each type of CPU interface has a different form factor as well as a different heat generation, and the size and layout of its CPU socket is also different, and generally not to be mixed. For example, AMD Athlon XP cooler can not be used in Socket 478 Intel Pentium 4, and vice versa. Of course, some coolers now come with two (or more) types of mounts so that they can support different types of CPUs.

The main interface types today are Intel's Socket 478, Socket 775, Socket 603/604 for servers, Socket 771, and AMD's Socket 462, Socket 754, Socket 939, Socket AM2 940, and so on.