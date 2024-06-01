sleepeeg3 said: Not really sure what this article is about (other than semi design), but thanks for keeping the news flowing, erek. Click to expand...

It basically breaks the growing issue that CPU’s have grown disgustingly large, not physically large but the amount of components crammed in there has increased complexity exponentially and that complexity is a problem at the datacenter scale. Even if there’s only a 1 in a trillion chance of a single transistor failing per CPU per year that still going to happen to dozens of CPUs per year.What do those failures look like, what does it do to output, does it corrupt it slightly how does that corruption get detected and corrected. Does the CPU fail completely what are your redundancies built in place?Now that CPUs are adopting chiplets or tiles or silos or stacks, how are potential failures on a single one of those handled how does that chip handle it internally? How do adjacent chips verify and shape input. We’re approaching a point where a CPU or a GPU package might have 50 or more individual chips making it up, each of them needs specialized error detection and error correction and that needs its own team specialized in that chip and its functions.