SemiAnalysis and SkyJuice have teamed up in order to analyze the A14 die shot from ICmasters. Our previous analysis of the A14, delved into why Apple and TSMC have deviated from previous generations when comparing theoretical logic transistor density to a real world utilized transistor density.
We theorized this was due to the lackluster SRAM scaling that is becoming a industry trend across all new leading-edge nodes. With the following analysis, we can confirm this is the case. Disclaimer: This type of analysis on a top metal die photo with 14+ metal layers is not going to be an exact science, but we can draw some interesting conclusions regardless.
Apple’s A14 is comprised of 2 Firestorm cores, 4 Icestorm small cores, and a 4 core GPU. This core configuration is similar to the A13, so all generational performance increases in CPU and GPU come directly from architectural changes and clock speeds. The A14’s Icestorm little cores have had a large rework with the L1i going from 96KB in the A13’s Thunder core to 128KB. The L1d has also grown from 48KB to 64KB. The NPU has doubled to a 16 cores versus the 8 in the A13. Using our annotation above, we can also estimate how large each IP block is.
As expected, most IP blocks shrink a fair amount, despite the architectural changes. The NPU is larger as a result of the doubled core count. The Icestorm core has not shrunk due to a larger architectural rework. As expected, the LPDDR4x PHY has not shrunk. The 16MB of system level cache is of particular interest given it is the same size as last year’s A13.
As it will be mostly comprised of SRAM cells, the architectural changes should not skew the area comparison much. Despite TSMC’s claims of a 1.35x shrink on SRAM from N7 to N5, Apple’s 16MB system cache has only shrunk 1.19x. The A14 cache is shaped differently with it being more narrow and longer vs the A13’s. Given this change in aspect ratio, there may be a rework of how this cache operates.
The main conclusion to draw from this lackluster decrease in area for the 16MB LLC is a far reaching one for the industry. Apple is not the only company who has relied on ballooning sizes of on die memory to deliver large performance and power benefits.
AMD, Intel, Nvidia, and various AI startups like Graphcore have also used the increased last level cache sizes as a crutch to assist with architectural gains on CPUs, GPUs, and AI ASICs. The easy days of ballooning on die memory to combat DRAM’s poor latency are behind us with TSMC’s N5 only bringing a 1.35x theoretical SRAM shrink and the future N3 bringing a paltry 1.2x. As discussed in the previous article, 3D integration will give us a bit more oomph, but architects will need to get much smarter and scrappier about how to improve performance and power. Two potential avenues to purse are moving to a more complex/efficient cache designs or bringing alternative memories to market such as NRAM, FeRAM, or MRAM. Moore’s Law is slowly suffocating with transistor costs rising and SRAM scaling dying. Red Alert!
Clients and employees of SemiAnalysis may hold positions in companies referenced in this article.