A simple ring is not very effective for above small tens of nodes. Instead, richer meshes of elements are used and the elements can have a higher radix, such as 4x4.
There are a number of well-known switch wiring schemes, whth names such as Benes, Clos, Shuffle, Delta, Torus, Mesh, Express-Mesh, Butterfly. These vary in terms of the complexity and contention ratios.
Note even a full-crossbar (any input to any output in unit time), which is very costly, still suffers from output port contention, so rarely justified on performance grounds, but uniform access delays make it easy to provide sequential consistency (see my Comparative Architecture notes).
Illustrated is using two-by-two switch element connects eight devices in three stages. Using a higher-radix (e.g. 4) is common.
Problem: typically we will not need quite as many initiators as targets. Throughput=no of ports, but fabric may block and there may be receiver contention.
Can be overly complex on the small scale, but scale ups well.
See Network On Chip Synthesis Tool: Mullins NetGen Network Generator. »RDM NoC Notes