RAMs have a small number of ports but when RTL arrays are held in RAM it is easy to write RTL expressions that require many operations on the contents of a RAM in one operation, even from within one thread. For instance we might need three operations on a RAM to implement
A[x] <= A[y + A[z]]
Because RTL is a very-low-level language, RTL typically requires the user to do manual schedulling of port use. (However, some current FPGA tools do a certain amount of schedulling for the user.)
Multipliers and floating point units also typically present hazards.
To overcome hazards automatically, stalls and holding registers must be inserted. The programmer's model of the design is stalled as ports are re-used in the time domain, using extra clock cycles to copy data to and from the holding registers.
A non-fully pipelined component cannot start a new operation on every clock cycle. Instead it has handshake wires that start it and inform the client logic when it is ready.