Página 1 de 1

Inside XboxONE: HSA·104 - hUMA & RAM MultiPools

NotaPublicado: 30 Abr 2016, 12:49
por BiG Porras
In the previous article, we saw the UMA and NUMA models, and as hUMA provided a unified virtual memory with a single pool of RAM through the synchronization of both MMUs.

Imagen
ORIGINAL ARTICLE BY: f5inet
viewtopic.php?f=20&t=1368


As we have explained in previous articles, a vision of a unified memory for all CUs is a requirement for a system to be compatible HSA. All AMD Kaveri APUs onwards are compatible HSA. This includes home consoles Playstation4 and XboxONE, as both incorporate derivatives of AMD Kaveri.

However, much graphic systems requirement, the Kaveri APUs AMD or standard Carrizo are insufficient for the job. We simply do not have the necessary raw power.

Imagen

In these extreme cases, where 6 or 8 TCUs additional GCN at 4 LCUs equipped, are not enough to give a good account of work, the ideal would be to add a dedicated graphics card with a few more TCUs where you can download computer work ..

Imagen

However, in this way, we would be making an HSA system in a NUMA system, losing all the advantages HSA provides us with respect to reductions in latency, unified language and ease of programming.
Imagen

HSA is expandable:

This really does not have to be this way. HSA requirement regarding Huma, is what the pool should be UNIFIED VIRTUAL memory, and it is here where the MMUs return to do it again, stellar entrance.

Imagen

if both the CPU and GPU dedicated, are compatible HSA, the MMUs of both devices can talk to each other, and establish a unified virtual memory RAM grouping both pools, despite being diametrically opposed. This allows to simulate a Huma architecture, and thus, the HSA architecture extends the discrete GPU having mounted equipment.

Not the same rule your RAM, why do it through a consulate:

However, all is not as nice as it looks. What a discrete GPU, need to access controlled by the MMU of the CPU via PCI Express memory is technically possible, but remain limited by the bandwidth of the bus, in the case of the image above, the TCUs operate on the RAM of the CPU, at a rate of between 8 and 16GB / s, a much lower rate than the rate at which its GDDR5 dedicated officer.

We also talked CUs what the TCUs are specifically designed to shine with a large bandwidth. When bandwidth is tight, the TCUs, more shine, pale.

Sometimes we are consistent, and sometimes not:

Therefore, HSA offers various ways to maintain performance when accessing slower TCUs memories. First, the model of internal memory HSA establishes different 'segments' which can be assigned to different pools. This matter is very long and will be discussed in a later article, and what deserves amply.

Now we talk about Coherence.



It is called Access RAM Coherent when all participants MMUs the HSA system are notified of such access, and is called non-coherent access when the only MMU is aware of such access is the MMU of which he is dependent.

A non-coherent access allows you to use all available bandwidth bus, and is the preferred access when you want maximum performance.

However, we were talking about how one of the requirements of HSA is to ensure consistency throughout the pool (or pools) of RAM HSA system, so the preferred mode of access to RAM for the HSA system is consistently in the which all system MMUs are aware of the entrances of the various CUs equipped to RAM

Since these coherent accesses are much more costly from the bureaucratic point of view, HSA allows mixing coherent and consistent access not to get the most out of each of the CUs. That if, with rules which we will see later when we discuss the HSA model memory.

For now you must stay with this idea: HSA allows you to use several different pools of RAM, and various kinds (DDR, GDDR, eDRAM, ESRAM), provided that the MMUs governing such pools of RAM are synchronized with each other and allow a pool Heterogeneous unified a single virtual address space, that's what comes to mean Huma from first hour.