home search a-z help
University of Cambridge Computer Laboratory
Student Projects (2008)
Computer Laboratory > Research > Systems Research Group > NetOS > Student Projects > Student Projects (2008)

NetOS

This page collects together various Part II project suggestions from the Network and Operating Systems part of the Systems Research Group. In all cases there is a contact e-mail address given; please get in touch if you want more information about the project.

Note: there are a number of stumbling blocks in Part II project selection, proposal generation and execution. Some useful guidance from a CST alumnus (and current NetOS PhD student) is here.

  • Various web2.0ish, distributed (social) networksy projects.

    Contact: Nishanth Sastry

    Last update: 8/Sep/08

    The general theme is post Web2.0, distributed or (social) networksy stuff. I am willing to oversee one, or at most two, of the projects below, based on interest. All other things being equal, I will choose to oversee projects that interest me most (more asterisks below=>more interest, possibility of answering research questions =>more interest).

    Secretary Bot**
    In the dark ages before the Internet, a human secretary would guard access to the important employees, organise their days, take phone calls on their behalf, remind them of important meetings etc. These days, people are deluged by an enormous amount of information in the form of email, instant messaging, phone etc.[1], and face enormous difficulties in juggling multiple tasks [2].

    The goal of the project is to build a secretary bot that would mediate on behalf of the user and manage their attention. Instant messages should only be allowed if the user is not working on something important. Arrival of unimportant emails need not be notified whilst the user is engaged in a complex task. Phone calls can be forwarded to voice mail directly. At the same time, important interrupts should be allowed to go through. User should be reminded of and prepared for imminent meetings and appointments.

    An example implementation might use XMPP to interface with google talk, use the google calendar API to store, and retrieve appointments, use Twitter API to send reminders to mobile phones, even when the user is away from their computers. Interaction with the bot could be in a language resembling natural english, to make it seem even more secretary-like. By integrating tightly with the OS you choose, you can detect what apps (tasks) the user is focusing on, and determine whether it is an interruptible moment or not.

    [1] Models of Attention in Computing and Communication
    [2] A diary study of task switching and interruptions

    Skills and interests: HCI, NLP, Web 2.0 APIs

    Challenge level: Simple. A safe grade would likely be easy to get. A high grade is not out of bounds, but depends entirely on your creativity in coming up with useful secretary-like features, and level of tight integration achieved with user's task flow.

    Distributed Twitter***
    Twitter is a popular micro-blogging site that sends user status updates ('tweets') to friends who subscribe to such feeds. Twitter uses SMS & Instant Messaging services to distribute these updates. The goal of this project is to instead use a pocket switched network [1] to transmit tweets. Imlement an Android app that sends tweets to people near by, who then send it to other people they meet later on, and so on, until all the intended recipients of a tweet receive it. Evaluate the idea using traces of real human contacts, which will be made available.

    Extension 1: A person who has a lot of SMSes free in their monthly plan could be used as a gateway to transmit tweet SMSes of people near by.
    Extension 2: Hashtags have developed as a means to provide context [2]. For instance, users can subscribe to a bus stop (e.g. JJ Thomson Ave) and be alerted that their bus is about to arrive via tweets from other passengers at previous bus stops. Ditto for current traffic conditions in major cities. Use presence at a bus stop to detect what contexts other users are interested in, and develop a method to automatically decide which contexts to subscribe to.

    [1] Pocket Switched Networks: Real-world mobility and its consequences for opportunistic forwarding
    [2] Usefulness of Micro-blogging (see section on Twitter as Message Bus)

    Skills and interests: Web 2.0 APIs, Mobile platforms

    Challenge level: Moderately simple.

    Spreading your Aura*****
    How can friends of your friends discover what you are good at, what your interests are and so forth? What about friends of friends of friends? What about friends of friends of friends of friends?...

    [1] proposes a way to distribute expertise information via email in a social network using a modification of bloom filters [2]. This can easily be extended to other kinds of keyword information as well. I call this spreading your aura. In this project, you will investigate how well aura can be spread in a social network by mining friendship data from an online social network such as myspace or facebook or orkut (APIs are available to help do this).

    [1] Disseminating Expertise in Social Networks
    [2] Bloom Filter

    Skills and interests: Implementing algorithms and cool data structures

    Challenge level: Moderately simple. With proper evaluation, this could be converted into a research paper.

    Folksonomy-based routing****
    [1] suggests a way to use people's interests to route data in a pocket switched network. User interests are captured by a folksonomy, and relations between interests can be found by forming a bipartite graph with the folksonomy tags on one side and the tagged objects on the other. Related tags are then used to make routing decisions. The goal of the project is to implement and evaluate this idea.

    [1] Folksonomy-based Reasoning in Pocket Switched Networks

    Skills and interests: Social Networks, routing, graph algorithms

    Challenge level: Moderately hard - most of the difficulty is in coming up with a convincing evaluation methodology for validating your solution. This could become a research paper.

    Statistical characterisation of online social networks****
    User generated content and the social web revolution have thrown up numerous sites where social interaction is part of the added value. Examples: Youtube, flickr, twitter, overstock, jaman, amazon, netflix (ilovefilm)
    Analyse and characterise one of them. See [1,2] for illustrations of characterisation. Folksonomies embedded in many of these sites have not been characterised and offers an interesting opportunity.

    [1] I Tube, You Tube, Everybody Tubes: Analyzing the World's Largest User Generated Content System
    [2] Delivery Properties of Human Social Networks

    Skills and interests: statistical skills, scripting (perl, python etc).

    Challenge level: Hard - the difficulty is in finding interesting things in the characterisation. If done well, there is lot of scope to do great research. For instance, [1] got the best paper award at the 2007 Internet Measurement Conference.

    Automatic code splitting between server and client for web apps****
    Web apps have a reputation of being easier to write than systems programs, but there is a huge deal of difficulty because different parts of the code are in different languages and execute at different times (e.g. javascript on client, and JSP on the server). It is hard to imagine when writing the code that some of the code actually will not execute in a linear fashion. This complex information flow is the source of a lot of bugs and security holes.

    This project would take an existing web application framework and add a preprocessor that would split code into server and client side executables. [1] is an excellent first attempt at doing this, and solves some of the most difficult problems ([1] got the SOSP 2007 best paper award). The contribution of this project would be to make it practical by embedding into a widely used web app framework (e.g. JSP, Rails, Django).

    [1] Secure Web Applications via Automatic Partitioning (the PPT may be a good intro)

    Skills and interests: Experience writing web apps, strong programming skills, knowledge of security

    Challenge level: Very Hard!! If successful (and sufficiently improved from [1]), this will be an outstanding piece of research.

  • Social Network based Spam Filtering

    Contact: Jon Crowcroft

    There are a lot of social networks out there - we have a project characterising them. These are not just the obvious online systems like facebook, bebo, linkedin, etc, but also the networks implicit in the set of activities that are human such as sports clubs, work, family, contacts either face-to-face, or via communications (e-mail, IM, SMS, etc).

    It is conjectured (and has been shown by several research groups) that while spam emanates from hijacked machines and even hijacked accounts, it has at least one characteristic which is that it is distributed to people who have little social relationship.

    This project is to build a system based on work in the Haggle project, that characterises social graphs by co-location, and work in the spam filtering world based on afiliation and communcation graphs. Public data sets on co-location and on afiliation (via the web) and e-mail (believe it or not, Enron's trial evidence!), can be used to test a tool - it loosely could be integrated with the University's own deployed system (spamassasin), and can be readily evaluated with a number of corpuses of spam. Note that spam filtering is realtively compute intensive, and algoriths to (re-)compute social graphs over large datasets can be costly, so some thought needs to be put into the implementation, which makes for an interesting as well as useful project. [Note a social net is a "temporal graph" with nodes represernting people, edges representing relationships - friends of friends (of friends) are 2 (3) hops away. An edge relationship could be a meeting (or a minimum number and duration of meetings) or it could be a family relationship, or a communications event, or co-authoring a paper (or 1B project) with someone.

    This project is moderate, rather than easy or dificult. There could be some help from Computing Service folks, but it is also nicely achieveable in a standalone way.

    Refs (papers can be got from google scholar:

  • Social Networks Project
  • Haggle Project
  • Spamassassin
  • Spam!, by LF Cranor, BA LaMacchia - Communications of the ACM, 1988
  • Personal Email Networks: An Effective Anti-Spam Tool, by PO Boykin, V Roychowdhury -
  • SybilGuard: defending against sybil attacks via social networks, by H Yu, M Kaminsky, PB Gibbons, A Flaxman
  • Re: Reliable Email, by S Garriss, M Kaminsky, MJ Freedman, B Karp, et al
  • Collaborative blog spam filtering using adaptive percolation search by S Han, Y Ahn, S Moon, H Jeong
  • MailRank: using ranking for spam detection, by PA Chirita, J Diederich, W Nejdl
  • Collaborative Spam Filtering Using E-Mail Networks, by JS Kong, BA Rezaei, N Sarshar, VP Roychowdhury,

    An interesting add on is to see if the different social networks (characterised by different edge types - where type is face-to-face, e-mail, telephone call, afilliation, etc - are correlated with each other, or causal of each other. And which is more effective as a spam predictor.

    Along the way, various interesting statisitcs could be gathered - for example, people who have more "friends" tend to be (allegedly) more linguistically capable and so one could look at the content of their communication and estimate their vocabulary size.

  • IDRM -Inter-Domain Routing Protocol for MANETs

    Contact: Jon Crowcroft

    We have a whole slew of routing algorithms for mobile ad hoc networks (MANETs). MANETs are next years big thing when radio devices start to interconnect in-car entertainment systems. So then we'll have lots of seperate MANETs and we'll need to glue them together. We have a spec for just such a protocol. We also have an simulated version, courtesy of IBM research, but we dont have a real implementation.

    This project is to implement, and evaluate IDRM on a bunch of windows smart phones (with wifi and other radios) and laptops and fixed wireless nodes.

    One challenge in this is to devise a suitable evaluation scenario.

    A Post doc who was associated with the work last year will be around to help out as well as the proposer (see contact above!).

  • peer-to-peer online netflix (lovefilm)

    Contact: Jon Crowcroft

    Net2netflix (name subject to change)

    Discussion document v0 
     

    Solution has 2 parts

    1. delivery network: The plan is to use an off-the-shelf structured p2p solution. Siddhu is looking at different approaches, bittorrent, vidtorrent, zigzag for delivery tree, etc. and evaluating the pros and cons of each one. Initial report to be discussed around the end of September or so.
    2. superpeer overlay: where most of our "new stuff" will come in to play, allowing
      1. the deployment (push model) of superpeers on chosen xenoservers
      2. the selection of "movie lists" by clients
      3. admission control based on available resources, movies, and the probability of meeting the delivery guarantee for a given movie list
      4. recommendations of alternative movies etc
      5. prediction of delivery time based on available copies of each move in the list and the estimated bandwidth to/from the nodes
      6. management and monitoring of downloads, including the superpeer "stepping in" when it looks like a guarantee will not be met
     

    The operation of the system could be summarised as follows:

    • a client contacts the closest (or a close enough) superpeer, providing a list of movies he or she wishes to view in the next few days

        challenge 1: how does a client find the closest superpeer? 

    • the superpeer looks in its own cache of movies as well as in the peers around it, to see which movies (and how many copies of each one) are available in the client's vicinity

        challenge 2: how do superpeers know which files are available in nearby peers?  

    • the superpeer combines its own statistics with client-side measurements (?) to come up with a prediction in terms of the probability of a successful download within a given time for the first few movies in the list, based on
      • availability of movie copies in the superpeer's vicinity
      • availability of the movie in the superpeer's cache
      • bandwidth availability between the client and the peers that hold the movie
      • bandwidth availability between the superpeer and the client
      • ?
     

        challenge 3: what is the probability distribution of delivery time vs. number of copies available (and the available bandwidth by each one) 
         

    • based on the feasibility of transmitting one of the selected movies to the client in the required time, and a "probability threshold", the superpeer's admission control system accepts or denies the client movie list request
     
    • the superpeer monitors all downloads, and if a download looks like it is not going to meet the delivery guarantee (in terms of time) it steps in, allowing the movie to be downloaded directly from its own cache (at a relatively predictable and stable throughput). This costs resources to the superpeer and is a "fallback" plan.

      Other implementation issues

      could use planetlab for a start -- latency and throughput very stable 
      eric: there is a good chance we can do fairly ok

         challenge 4:  work around firewalls?

  • NetFPGA Storage Device

    Contact: Andrew Moore

    Network Attached Storage is not a new thing, commodity devices that implement NAS are increasingly common. Ever wondered how they work? What makes them hard and expensive (or cheap and easy)?

    The aim of this project is to construct a simple NAS device using a dedicated development board based upon the NetFPGA platform.

    You will be provided a development system, a linux box with NetFPGA board and development tools. The NetFPGA board is equiped with SATA connectors for attaching a disk system.

    This project is very flexible and permits a student to make specific decisions about the co-design of operating-system and hardware components.

    A network-attached disk may be a tiny PC, implementing an entire NFS/CIFS system in software and accessing attached disks in the same manner as Server

    OR at the other end of the scale

    To achieve higher-performance or fewer componnents, a network-attached disk may be use a much more limited general-purpose CPU, implementing more of the higher performance tasks in FPGA hardware.

    At the other end of complexity is the implementation You will be expected to implement at least one file-system protocol, e.g., NFSv2 over UDP, and demonstrate its functionality.

    This project is based upon the NetFPGA platform - a board specifically intended to allow the development of software/FPGA projects for networking.

    Confidence with a hardware compiler language (e.g. Verilog) is mandatory, familiarity with C will also help greatly.

    This project is advanced and should not be undertaken by a weak student.

  • Hardware (High-Speed) network-application identification

    Contact: Andrew Moore

    An active piece of work in the SRG has been on how to identify applications from their network traffic...

    This project would look to create a (high-speed, hardware) implementation of what currently is a software prototype.

    This project incorporates all the classic problems of hardware/software co-design with decisions about code and algorithm placement being critical in order to achieve best performance.

    This project is based upon the NetFPGA platform - a board specifically intended to allow the development of software/FPGA projects for networking.

    Familiarity with a hardware compiler language (e.g. Verilog) is mandatory, familiarity with C/C++ will help a lot.

    This project is advanced and should not be undertaken by a weak student.

  • NetFPGA implementation of flow-aware packet sampling mechanism

    Contact: Andrew Moore

    The aim of this project is to implement a flow-aware packet sampling mechanism using a dedicated development board based upon the NetFPGA platform.

    The mechanism is already designed and a proof-of-concept implementation written in C++ exists.

    You will be provided a development system, a Linux box with NetFPGA board and development tools.

    You will be expected to implement the sampling mechanism and demonstrate its functionality. This project is based upon the NetFPGA platform - a board specifically intended to allow the development of software/FPGA projects for networking.

    Familiarity with a hardware compiler language (e.g. Verilog) is mandatory, familiarity with C/C++ will help a lot.

    This project is advanced and should not be undertaken by a weak student.

  • Build a better Network monitor

    Contact: Andrew Moore

    Network monitoring hardware can be very expensive - requiring both high bandwidth I/O and high relability/resolution clocks.

    The specific aim of this project is to build a networking monitoring system based around the NetFPGA card. The NetFPGA platform enables the development of novel and interesting software/FPGA projects for networking. With four 1Gbps ports and onboard SATA controllers, this system is very capable.

    This includes examining what the limits are for network-hardware to maintain accurate/high-reliability time-stamping. One aspect of this project is to implement an existing time-stamp solution that seems common use in the network-capture community but previously has required specialist hardware solutions.

    This project is based upon the NetFPGA platform - a board specifically intended to allow the development of software/FPGA projects for networking.

    Comfort with a hardware compiler language (e.g. Verilog) is mandator, familiarity with C/C++ will also help.

  • NetFPGA Bright Idea

    Contact: Andrew Moore

    The NetFPGA platform enables the development of novel and interesting software/FPGA projects for networking. With four 1Gbps ports this system is very capable

    What is your bright idea?

    This project is based upon the NetFPGA platform - a board specifically intended to allow the development of software/FPGA projects for networking.

    Familiarity with a hardware compiler language (e.g. Verilog) is mandatory.

  • NetFPGA implementation of MOOSE

    Contact: Malcolm Scott

    MOOSE is an extension to Ethernet being developed in the lab which adds hierarchical MAC addressing and address rewriting to improve its scalability. An implementation of a MOOSE switch on a NetFPGA board (see above) would be an interesting project. This should be implemented in a similar way to fast, high-end Ethernet switches — i.e. a true hardware switching fabric, not a software switch.

    This would suit a fairly strong student with a good knowledge of Verilog, Linux tools and network protocols. Switch fabric design will be taught in Digital Communication II but you are likely to need to read up on this before the course.

  • Simulation of MOOSE

    Contact: Malcolm Scott

    MOOSE is an extension to Ethernet being developed in the lab which adds hierarchical MAC addressing and address rewriting to improve its scalability. A study of MOOSE in simulation on a large network does not yet exist, and would be a useful resource.

    It is very strongly recommended to make use of an existing widely-used simulation tool; writing your own introduces considerable extra requirements for validation and testing and will take much longer to produce credible results. However, it will be necessary to adapt the tool you choose in order to add an implementation of MOOSE.

    This is likely to require fluency in C or other languages depending on your choice of simulator.

  • BitTorrent for Mobile Content Sharing

    Contact: Cecilia Mascolo

    We have developed an approach to content sharing which allows users to share the content of their Bluetooth devices with peers met, for instance on public transport, coffee shops, etc.

    This project relates to the expansion of this tool with some ideas taken from BitTorrent: i.e., sessions of download do not need to contain full files but could allow the download of chunks.

    The project will involve the development of this extention for both a mobile phone implementation and for a simulator called Omnet++ on which large scale performance can be tested.

    Documentation on this project is available upon request (other extensions, such as trust related issues can also be added).

  • Application of the Cannon and Fox-Otto algorithms to efficient parallel route planning methods

    Lab Contact: Andrew Moore

    Originator: Jagdish Modi, email: at89@dial.pipex.com

    Special Resources: None

    Various investigations have been made into parallel algorithms for fastest-route determination, in particular (for example [1]) for journeys between London Underground stations, based on the available travel information. These involve a sequence of operations on route matrices. For efficient computation it is useful both to avoid large sparse areas and to minimize data movements.

    One approach could be to modify the Cannon algorithm or Fox -Otto algorithm, which work on sub-blocks and are very space efficient. The use of Cannons algorithm, for example, avoids the need for a single processor to collect and redistribute the data before and after each operation, and therefore is efficient in both time and space. The Fox -Otto algorithm is similar, where for a matrix product A*B each sub-block of matrix A is broadcast across each row of processors and then multiplied with each sub-block of matrix B. The B blocks are then rotated upwards and another row broadcast is performed.

    Developments might include:

      novel ways of using an analogue of matrix squaring instead of successive multiplications, to reduce the total number of steps, and incorporating the path specification for all paths simultaneously

      to investigate different data layout algorithms taking account of the processor topology, and their effect on the overall performance; a layout algorithm that takes an arbitrary topology and returns an optimal data layout would be of interest

    Parallel computers have as a core structure a set of interconnected processors. The optimal choice of interconnection for a particular problem may not always be available. In fact, in practice the interconnection usually takes place within some sort of plane two-dimensional lattice. In the preparation of the parallel algorithm, it will be interesting to explore the features of the two-dimensional graph arising from the connections of stations (nodes) and the travel times between stations (edges), and to compare this with the lattice of processor connectivity. If the student is mathematically inclined, it may be worthwhile to investigate certain properties possessed by the two-dimensional graph, such as degree of symmetry and homogeneity, as well as the extent to which they can be embedded onto a lattice.

    In addition it may be useful to implement the algorithm on a random topology of interconnected processors, evaluate the cost component, and then to compare this with the cost on a lattice connectivity, so as to determine the extent to which interconnection between processors plays an important role. In this case it would be instructive to find out what other attributes of the parallel machine have a bearing on the overall cost.

    References:

      [1]Reynolds M, Improved Parallel Route Planning for the London Underground, Computer Science Tripos, Part II, Computer Laboratory, 2008.
      [2]Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms, second edition, 2003.
      [3]Modi J, Parallel Algorithms and Matrix Computation, OUP, 1988.
      [4]Grama, Gupta, Karypis, Kumar, Introduction to Parallel Computing, Pearson Education, second edition, 2005.
      [5]Akpan, O. Efficient parallel implementation of the Fox algorithm. Computer Science Department, Bowie State University, 2003.

  • Optimising object placement in a client-server distributed system

    Contact: Stephen Kell

    The goal of this project is to develop a client-server distributed system which monitors and optimises itself with respect to the placement of its working storage and/or computation. In other words, it shifts computation (threads) and state (objects) from client to server or vice-versa, in order to improve overall performance. It must do this across variations in latency and throughput of the link between client and server, and amid disparity of computational resources available at each. If you've ever tried using a thin-client protocol (say an X11 application, or even a web application) over a high-latency link, you will understand why this could be useful.

    Usually the partitioning of computation and storage between the client and server locations is fixed. Despite this, typically users don't care where computation happens, nor where working storage happens. (They do care, if often only a little, about where persistent storage happens.) Performance of the system depends mostly on the locations of computation and working storage, both because of the differences in available resources, and because of the available communication throughput and latency (and jitter, etc.).

    To take on this project, you will need a very strong set of practical skills, and should already have ideas about how you would go about implementing such a system. For example, you might take an X11 client and server, and build a runtime monitoring and migration service which optimises the system by moving state (objects) or computations (threads; harder!) between client and server. One option would then be to reengineer the communication protocol to use an object middleware like Java RMI, and extend the JVM to support migration of objects (i.e. atomically replacing a local object with a stub, which hands off to a migrated copy of the original object on a remote system). However, you should expect to bring your own ideas to the problem. Experimental evaluation is fairly straightforward with this project. There are also (if you do it well!) lots of interesting algorithms involved. Difficulty: hard!

  • A self-organising file system

    Contact: Stephen Kell

    If you're anything like me, your file system tends to build up collections of assorted downloads which ideally you'd like to sort through and file somewhere appropriate in a directory structure. You probably also have some files which you misfiled for various reasons. It'd be neat if a tool could learn the rules underlying our individual filing system, and then automatically file stuff as it comes in, and point out exceptions to the rules caused by misfiling.

    There are several potentially interesting aspects to this: learning rules based on a manually-classified training set; expressing rules in some sort of logic; implementing interesting predicates in that logic by analysing files in a semantically aware way. Although I picked the file system as the target domain (since it's what I find hardest to keep organised!), you could possibly do something similar with mail, bookmarks, and so on. However, the idea is to go beyond simple syntactic rules on available metadata: you should aim to support reasonably complex rules and sophisticated semantic discrimination, and this probably makes file systems the most suitable domain. For example, in my file system I make several distinctions that aren't captured simply by file name or type: distinctions among written articles based on what they're about, distinctions among music based on whether it's from a record I own (i.e. has a corresponding entry in my cddb cache), and so on. You might support these and other kinds of semantic awareness using NLP techniques (using named entity recognition, perhaps) or image recognition techniques, or music recognition techniques, and so on.

    To take on this project, you should really be someone who has a need for such a system! In other words you should have your own large file system to work with, and most importantly, some non-trivial filing policies to implement. Given these, a thorough evaluation of the system will involve measuring the accuracy of classification achieved by your system, and should be pretty easy. Overall difficulty: moderate.

  • Visualisation of software

    Contact: Stephen Kell

    Visualisation of software is useful for a number of reasons: comprehension, prototyping, debugging, profiling and so forth. Various projects are possible which would implement and evaluate a tool for visualising software in some particular way, aimed towards one or other of these goals.

    Specific example problems include: visualising dynamic structure of programs at the object level (most likely for debugging); visualising static structure at the module level (for comprehension or prototyping); visualising communication patterns within execution traces (for debugging or profiling). The hardest part in all cases is making the visuals comprehensible to the viewer: usually there will be too much data, and/or too dense an interconnection structure, for these to be useful without further processing. For example, in the case of static graph-based abstractions, such as call graphs or linkage graphs, usually the number of edges is a problem, since it makes the layout completely incomprehensible; there are many interesting techniques for clustering nodes, grouping related edges and eliminating uninformative ones. Three-dimensional visualisations are also a very interesting avenue, which has been less well-explored and can possibly allow more of the raw complexity to be usefully retained.

    Evaluation of the effectiveness of a visualisation is slightly tricky, but there are a number of reasonable proxy metrics such as reduction in size (number of nodes/edges in the graph), reduction in edge crossings (in a two-dimensional graph layout), accuracy of approximation (i.e. how well your edge grouping/clustering has preserved the information in the original graph). Difficulty: depends on what specific project you choose, but probably moderate to hard.

  • A quantitative analysis of qualitative paging algorithms

    Contact: Ripduman Sohan

    The past decade has witnessed the introduction of a number of different page replacement algorithms (2Q, ARC, MQ, etc). Although implemented in simulation and bespoke OSes, no real implementations exist for general purpose OSes. Furthermore, no clear data points exist comparing algorithms against each other in a systematic manner.

    This project may be approached in various dimensions:

    • A basic approach would implement a number of common algorithms using a simple software framework and quantitatively compare them against run-time traces obtained from real-world data-sets.
    • A more advanced project would attempt to validate the current Linux paging algorithm (which uses heuristics to move pages to the front of the LRU list) against an emulator where all memory can be tracked to determine opportunities for improvement.
    • A most advanced project would attempt to change the Linux paging subsystem to create a framework where it is possible to dynamically change the paging algorithm being used similar to the disk scheduler.

    Prereq Knowledge: Linux, C, kernel knowledge useful.

  • Distributed shared memory over ethernet

    Contact: Ripduman Sohan

    The goal of this project is to provide a simple load-store abstraction processes use for inter-host communication. Ideally, a process would "export" a region of its address space and processes running on remote hosts can "import" this region and simply read and write to it, providing a simple, straightforward communication mechanism. The kernel does all the heavy lifting by intercepting and synchronising access to the shared memory region.

    Prereq Knowledge: Linux, C, kernel knowledge useful.

    A previous version of this project (2008) won the Data Connection project prize!

  • User-space process check-pointing save/restore

    Contact: Ripduman Sohan

    I would very much like to be able to checkpoint a process to disk and restore its state at some later time. Previous projects have achieved this but all have required some form of advanced kernel modification. It should be possible to checkpoint entirely in user-space for at least a subset of processes out there (with all processes being possible with minor kernel modifications).

    Prereq knowledge: C, knowledge of Linux process management useful.