VLIW mode e can't keep up with the times
AMD has been adhering to VLiW mode since R600. VLIW (very long instruction) is to connect many instructions together to build a very long instruction, so that the GPU's arithmetic unit can start continuous execution at one time, eliminating many scheduling instructions and waiting cycles. , thereby improving the efficiency of the operation. However, VLIW has some unpredictable logic flaws when multiple stream processors run in parallel. On a larger parallel processor such as GPU, the overall throughput of thousands of stream processors is affected.
In addition, DirectX1 1 requires the GPU architecture to have better flexibility and general-purpose computing capabilities than the traditional simple throughput capability. The VLIW mode has high throughput and low flexibility, which seems to be somewhat dull.
CU unit - the basic unit of the GPU
AMD defines the GCN architecture as "Non-VLIWISA With Scalar+Vector Unint" -- a non-VLIW system using scalar and vector elements. The GPU's constituent units are no longer SIMD arrays, but are called "Compute Units". Computational unit (referred to as cu).
We introduced the HD7970 graphics card as an example to introduce the cu unit in detail. The HD7970 has 2048 stream processors and is divided into 32 computing units. Each computing unit is equivalent to an computing hub. When the chip is under high load, each computing unit can simultaneously allocate and execute instructions, and the architecture utilization and High throughput, more suitable for handling multi-threaded multi-task parallel operations. Each of the 32 parallel computing units has 64 stream processors inside, and these stream processors form 4 vector units, each of which is matched with a 64 KB vector register. Each computing unit also has a data register and some auxiliary function modules to form a complete computing hub. Through the new instruction set, each computing unit can receive and execute instructions at the same time. Many computing units have high parallel processing capability, which makes the utilization ratio and instruction throughput of the computing unit higher than VLIW.
In recent years, the disadvantages of the GPU chip to change performance through the heap processor are obvious, such as register port conflicts, complicated scheduling instructions, etc., so the more stream processors, the more difficult it is to achieve theoretical peak performance. The GCN architecture is clearer and more straightforward in planning, and 32 parallel computing units solve the conflict of register ports in a targeted manner, eliminating many potential logical deadlock phenomena with high flexibility, thus making the chip performance more stable. The actual performance is closer to the theoretical prediction.
Cache design --- important conditions for general computing
Cache design may have little impact on graphics calculations, but is very important for general purpose computing. Since there are vector units and scalar units, the GCN cache has been redesigned to introduce a multi-level cache, which is quite large and complex. Each unit has a 16KB data cache, and each four compute units share a 16KB instruction cache and a 32KB L1 scalar data cache, and is connected to the L2 cache; each compute unit has its own registers and local data sharing, with 16KB. Read and write level 1 cache, with a bandwidth of 64 bytes per clock cycle.
The total capacity of the L2 cache is 768KB, which can be read and written. Each memory controller is divided into six groups, each group has a capacity of 128KB, and the bandwidth per clock cycle is also 64 bytes. Global data sharing is used for synchronization assistance between different computing units.
General purpose computing also requires software support
When the GCN architecture was announced, AMD expressed that graphics are calculations, and calculations are the grand ideals of graphics. We still use the HD7970 as an example. From the data given by AMD official, performing common general calculations such as ray tracing rendering, encryption, and Fourier transform, we can see that the HD7970 has no performance improvement compared with the previous generation. Less than 50%, like AES256 even more than 3 times.
For general-purpose computing, support for software platforms is also important. The GCN architecture is highly programmable and will support C, C++ and other high-level programming languages. In addition, AMD is also looking for relevant partners to reflect the general computing power of the GPU. AMD is currently working with WinZip to develop features related to version 16 5, and WinZip will use OpenCL to speed up encoding/decompression/AES encryption.
From the height of PC evolution. The trend of CPU fusion GPU has been difficult to block. As a result, maintaining the performance advantages of multiple times and dozens of times of integrated graphics will be an important guarantee for the continued existence of independent GPU chips. In addition, since the GPU micro-architecture has shifted to the stream processor architecture, GPU general-purpose computing technology has also received increasing attention. But whether it is graphics or general-purpose computing, high performance will be the ultimate appeal. Therefore, on the HD7000, AMD's GCN architecture has changed its mindset, and the high-performance and high-performance computing concepts have officially returned. General-purpose computing has become as important as graphics computing. This is the future of GPU chips.
Smart Bms,Bms For Battery,Bms For Lithium Battery,Bms Module
HuiZhou Superpower Technology Co.,Ltd. , https://www.spchargers.com