A: 100 100 97 94 91 88 86 86 85 85 85 85 82 75 74 72 70 B: 69 66 D: 42 F: 23
A more rigorous answer would be to come up with a Markov model which incorporates the fact that users back off when the network is too congested, e.g. a model in which the down rate is larger than C/f, or where the up rate is smaller than λ, and find the equilibrium distribution, and then the mean number of active flows. Some students took this approach.
The instructions for how to simulate a Markov process are given on page 11 of the first section of the notes [pdf]. Keep a variable called n, the number of active flows. Generate two exponential random variables, X1 with rate λ, X2 with rate min(C,An)/f. Record the fact that the number of active flows remains n for a period of time min(X1,X2). Let the new state be either n+1 or n-1, depending on whether X1<X2 or not (mathematically it should never be the case that X1=X2, and programmatically it is extremely unlikely, so it doesn't matter which state you jump to in that case). Repeat.
You need your simulator to record the mean number of active flows. This should be weighted by time, not by number of visits to each state. For example, if I make 1000 visits to state n=0 and 1000 visits to state n=1, and I spend 800 seconds of my time in state n=0 and only 200 seconds in state n=1, then the mean number of active flows is 0.2, not 0.5. To find the mean number of active flows, you therefore need to keep an accumulator sumnt of n*(time spent in state n); increment this accumulator after every transition. Finally, report sumnt/(total simulated time). Many students spotted this; some of you used the clunkier solution of keeping an array which records every state visited and the time spent in it.
Some students were confused about the difference between simulated time and real time. They programmed their simulator to sleep for min(X1,X2), or set timers. What would you have done if I had asked you to simulate events occurring over geological timescales? What do you think the climate change modellers do, when they're running their code for twenty years of simulated time? There is no need for the computer to slow down; all you need to do is to record how long the simulated system stays in each state, and the simulator system doesn't need to wait. (The timer-based programs still work, so you still get the marks.)
One student programmed a discrete-time simulation. This requires clever thought, to turn a continuous-time Markov process into a discrete-time system, and it's unnecessary. (The discretized programs still work, so you still get the marks.)
I asked you to simulate a range of parameters. Therefore you have to report the results for more than just one set of parameters.
The most effective way to proceed is to choose a starting set of parameters, then systematically vary each parameter in turn, and plot a graph of what you get (including confidence intervals). That way, you get a feel for how the mean number of flows depends on each of the parameters. The question is all about A, the access link speed, so this is the parameter you should focus on the most. (But see also my comments about Q4.)
Some of you spotted issues about instability. If ρ>1 then Prof Theodore's model is unstable, i.e. the number of active flows increases without bound. The same thing happens here (as you should expect, since Ciscelia's model will result in more flows that Prof Theodore's model). Therefore, you should note this, and you should restrict your attention to situations where ρ<1.
Everyone gave reasonable starting parameters. Some answers were very impressive: you had obviously thought carefully and researched to come up with sensible choices of parameters, in real-world scenarios.
Some of you made a good attempt at solving the equations algebraically. It's impossible to produce a closed-form solution, but you can at least come up with an expression which only involves a finite sum. Ciscelia's model is something in between the Erlang link and Prof Theodore's processor sharing model, for both of which you know the formula for equilibrium distribution, so it would have been good to plot those two formulae next to your own results. No one did.
Many of you used my suggested numerical technique, of truncating the state space and using e.g. the Excel spreadsheet I provided for computing equilibrium distributions. If you do this, you should show me that you've executed the procedure for sufficiently large N: don't just do it for N=1 and 2 and stop there. Some of you looked at the equilibrium distribution to see if they had made N large enough (i.e. to see if most of the probability mass is on states smaller than N), which was very good.
No one thought hard about how the parameters depend on each other. It looks like there are four parameters, but in fact some of them are superfluous.
Therefore, you should just have chosen any sensible ρ (not too close to 1, since you're told that Prof Theodore's model predicts a small number of flows; somewhere around 50% might be reasonable for a core Internet link), and found the ratio C/A which results in 5000 flows.
What you need to do is write a more complicated simulator. It should keep track of the residual filesize of every active flow. Generate a random variable X1 as above, the time at which the next new flow should arrive. Also calculate Y2, the soonest time at which one of the active flows will finish. To calculate Y2 you need to look at all the residual filesizes, choose whichever is smallest, and work out the transmission rates that all flows are getting. Subtract the amount of data that can be transmitted in time min(X1,Y2) from each of the active flows. Finally, if X1<Y2 then start up a new flow with filesize drawn from the Pareto distribution; otherwise terminate the flow which has just finished. Repeat.
This is a very subtle point, and I was pleased that some of you spotted it. Since it is so subtle, this question does not contribute very many marks to your overall grade.