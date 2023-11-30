h/t RedGamingTechIt is here: multi-GCD chip. The holy grail of GPUIn one implementation, chiplets (labelled)A to 320 N are coupled to (a single) index buffer (labelled)stored in memory (labelled)via communication link (labelled)In one implementation, when a draw call is initiated on GPU, chipletsA-N are notified of the draw call and the location and size of index bufferwhich includes the indices corresponding to one or more graphics objects of the draw call. Index buffercan be stored in any number and type of memory devices accessible by chipletsA-N. In one implementation, index bufferincludes a list of pointers to vertices of graphics primitives that make up the graphics object(s). The graphics primitives can be, but are not limited to, points, lines, triangles, rectangles, patches, and so on.In response to receiving the notification of the initiation of the draw call, each chipletA-N calculates which indices to process from index buffer. Then each chipletA-N fetches and processes indices independently and in parallel with other chipletsA-N fetching and processing their corresponding portions of index buffer. In one implementation, the chipletsA-N fetch indices a primitive group at a time in a round-robin fashion, resulting in an interleaving arrangement of portions of indices of index buffermapped to chipletsA-N. This allows chipletsA-N to process different portions of a draw call independently and in parallel with each other. This distributed geometry processing scheme relies on each chipletA-N determining which portion(s) of the draw call to process without relying on a central distributor of work that dispenses work to the chipletsA-N. In other words, each chiplet knows where in index bufferthe previous chiplet left off and where the next chiplet will pick up again.Referring now to FIG., a block diagram of another implementation of a chiplet GPUis shown. As shown in FIG., chiplet GPUincludes chipletsA-N which are representative of any number of chiplets. In one implementation, in order to keep chipletsA-N in synchronization when processing draw calls, chipletsA-N utilize a state management scheme. For example, in this implementation, for a given draw call, each command processorA-N generates a state IDA-P, respectively, for each corresponding pipeline. The pipeline refers to the various graphics processing stages implemented by each chipletA-N.It should be emphasized that the above-described implementations are only non-limiting examples of implementations. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.