Systems Research Group – NetOS

ACS Projects (2012—2013)

NetOS

This page collects together various ACS project suggestions from the Network and Operating Systems part of the Systems Research Group. In all cases there is a contact e-mail address given; please get in touch if you want more information about the project.

Under construction: please keep checking back, as more ideas will hopefully be added to this page during the coming weeks.

Cross-cultural Interactions
Contact: Christos Efstratiou (with Daniele Quercia)

Face-to-face interactions within the workplace can be influenced by a wide range of factors, including spacial variations (e.g., location of a person's desk) and cultural differences (e.g., a person's country of origin). In July 2012, we collected co-location information of 60 colleagues in the Computer Lab building. Participants were asked to wear active RFID tags (http://www.openbeacon.org/) that were able to record face-to-face encounters in the building. From this dataset, it is possible to identify social interactions and corresponding interaction strength (defined as contact frequency). The aim of this project is to identify how cultural differences among participants have affected their interactions. Cultural differences will be encoded using Hofstede's five dimensions, which nicely correspond to five publicly-available numbers for each nation of origin. The work will consist in selecting the appropriate network metrics, extracting them from our face-to-face network, and correlating their values with the five cultural dimensions. Desirable skills include willingness to learn data mining techniques and engage with basic stats.

[1] G. Hofstede, G. J. Hofstede, M. Minkov. Cultures and Organizations: Software of the Mind.
Trust and Influence on Facebook
Contact: Jon Crowcroft

Two possible projects
1. A quick comparison of the two different clustering tricks - CNR based on interaction data, and the MPI one based on homophily - one could also build some more tools for this (like the MPI ones) see this CNR paper on Ego Nets and discovering clusters of users via their interactions, and the MPI paper on same thing but via the "likeness" of users.
To make this into a nice Masters level project, one could then use the nested clusters (Dunbar's circles, for which, see Roberts, S., Dunbar, R.: Communication in social networks: Effects of kinship, network size, and emotional closeness. Personal Relationships 18(3) (2011) 439-452) to build facebook privacy setting defaults (presumably, as you go out each level, you decrease trust) - then one could do a user study to see how well the privacy settings matched their understanding of trust - coupled with the myPersonality facebook app one could control for various quirks of users (extrovert, introvert etc)
One step more might be to link privacy and circle setting failures to unfriending behaviours....
2. A longer project on intervention in social nets to "flatten" the double impact that high popularity users get from having greater degree, centrality, but also greater influence inherently (which is why they get greater degree and centrality - so their opinion is getting double counted) - there's a paper about this in the area sample biases in medical work somewhere here...but not on opinion dynamics - so it might be interesting to see if one can get a fairer measure of opinion somehow...
this could be done by slowing down tweets or randomly dropping tweets (to some randomly selected subset of followers) from popular users - this is interesting ethically and technically:)
See this paper on Encouraging moderation: Clues from a simple model of ideological conflict
some more details
Originator:Andrea Passarella [Javascript required]
and the Recognition project
Content Based Steering for Efficient Data Delivery in 10G NICs
Contact: Ripduman Sohan

The coarse grained coupling between the NIC and Endhost interface in current high speed NIC designs can result in lowered application performance. In particular, the mapping of NIC receive queues to CPUs can only be done at a very high (per connection) level. This makes it very hard to arrange the system so data is always received on the same CPU on which it is consumed resulting in higher application data processing latency (due to CPU cache misses) and lower throughput performance (due to CPU stalls). Furthermore, mapping a single connection to one or more CPUs can result in overloading of those CPUs (network processing overhead). While recent technologies such as Receive Side Scaling and FlowDirector try to mitigate this issue by spreading load over CPUs and trying to automatically deliver data to the "right" CPUs they are very simple mechanisms and therefore not very effective.

In this project you would be exploring the idea of "content based data steering". The core hypothesis of this work is that if the NIC possesses some simple application logic it can efficiently steer packets to the correct CPUs reducing latency and increasing performance. The idea is that you would examine the feasibility of the approach and design and implement the idea on the NetFPGA platform. Ideally the system would be based on a BPF-like language that allows applications to specify steering policies to the NIC coupled with minor application changes required to leverage the architecture.
A Framework For High-Speed Endhost NIC Evaluation
Contact: Ripduman Sohan

Benchmarking high speed (10G) endhost NICs can be a complicated, error prone and tedious task. There can be significant differences in performance based on the hardware configuration, kernel, driver and software versions involved. Moreover, there are OS, application and device specific runes which often get overlooked or misconfigured impacting performance. Finally, most devices benchmarks are usually irreproducible as the entirety of the configuration is unknown.

In this project you will attempt to create an extensible open-source high-speed end host NIC evaluation framework that makes it simple and fast to obtain reproducible benchmarks over a wide range of 10G NICs and applications. You will then proceed to test it by benchmarking a number of production 10G NICs for the purposes of quantitative comparision.
Towards Memory Mapped Socket Buffers
Contact: Ripduman Sohan

For data intensive network communication the memory copy between kernel and user space can amount to up to 40% of network processing overheads [1]. Various solutions have been proposed to mitigate this issue over the years. In particular, various implementations of hardware and software based zero-copy networking [2, 3] has been investigated as has user-level networking [4, 5]. Recent attempts have also attempted at exporting the NIC DMA queue into user-space for the purposes of performance and flexibility [6]. However, all these solutions require extensive application and/or specialised hardware.

In this project you will attempt to mitigate the kernel-user memory copy issue by extending the sockets interface to allow applications to memory map the socket send and receive buffers into their process address space for the purposes of zero-copy communciation between the application and the kernel. In conjunction with a lightweight application-kernel notification mechanism it is our hypothesis that this architecture will reduce the memory copy overhead for a number of data intensive network applications.
1. Wimpy Nodes with 10GbE: Leveraging One-Sided Operations in Soft RDMA to Boost Memcached
2. Zero Copy Direct Protocol Over Infiniband - Preliminary Implementation and Performance Analysis
3. PF_RING
4. Openonload
5. ATM And Fast Ethernet Network Interfaces For User-level Communication
6. Netmap: A Novel Framework For Fast Packet I/O
Geographic Analysis of Public Transport Fare Spending
Contact: Neal Lathia
In [1], we examined the relationship between people's mobility and their fare purchasing habits and found that people often make the incorrect choice regarding which ticket will best suit their travel needs. Similarly, in [2] we found a weak correlation between people's mobility patterns and the social deprivation of where they live. This project aims to investigate the connection between these two findings: is overspending happening in socially deprived neighbourhoods? What is the geographic distribution of spending (and overspending)? Examining these questions will involve analysing large samples of Oyster card mobility and fare purchase data, correlating these with social deprivation scores, and defining metrics and visualisations that can be used to show the results.
[1] N. Lathia, L. Capra. Mining Mobility Data to Minimise Travellers' Spending on Public Transport.
In ACM SIGKDD 2011 Conference on Knowledge Discovery and Data Mining. San Diego, California, USA. August 21-24, 2011.
[2] N. Lathia, D. Quercia, J. Crowcroft. The Hidden Image of the City: Sensing Community Well-Being from Urban Mobility
In Pervasive 2012. Newcastle, UK. June 18-22, 2012.
Close to Home: Inferring Peoples' Home Locations from their Public Transport Usage
Contact: Neal Lathia
A number of analyses of mobility and transport habits assume that people's mobility will center around the stations that is closest to their home location (for example [1]). In doing so, they only consider proximity (rather than, say, convenience) when assigning people to home locations. This project aims to examine and quantify the extent that this assumption is true: we have a dataset of Oyster card mobility patterns alongside the registered post-code of the card holder. Can the mobility data be used to predict London residents from non-residents? For those who reside inside London, how accurately can public transport usage pinpoint where users live?
[1] N. Lathia, D. Quercia, J. Crowcroft. The Hidden Image of the City: Sensing Community Well-Being from Urban Mobility
In Pervasive 2012. Newcastle, UK. June 18-22, 2012.
[claimed] Pulse of the World's (Shared) Bicycles
Contact: Neal Lathia
Recent research has been using data mining techniques (clustering and regression prediction) in order to measure and characterise bicycle sharing schemes, for example, using data from Barcelona [1] and London [2] (for a further list of papers, see here [3]). However, each individual paper focuses on an individual city. We have collected large data sets from over 10 shared bicycle systems around the world, which each vary in size and spatial layout. The goal of this project would be to perform a cross-city analysis of the dynamics of bike sharing systems, as characterised by the data. How similar is the layout of cities' stations activity patterns? Are station capacity prediction algorithms equally accurate across the world?
[1] Jon Froehlich, Joachim Neumann, Nuria Oliver. Sensing and Predicting the Pulse of the City Through Shared Bicycling
Proceedings of IJCAI 2009
[2] N. Lathia, S. Ahmed, L. Capra. Measuring the Impact of Opening the London Shared Bicycle Scheme to Casual Users.
In Elsevier Transportation Research Part C, January 2012.
[claimed] Musical Social Networks and Personality-based Recommendation
Contact: Neal Lathia (with Cecilia Mascolo and Jason Rentfrow)
Is the music that people say they like the music that they actually listen to? To address that question and to examine questions about social influence and musical tastes, we have a unique data set of just over 2,500 users of Facebook and Last.FM. For each person, we know which music they publicly claim to like (from their Facebook likes), what music they actually listen to (on Last.FM), and their network of friends. Additionally, we have their responses to a musical-preference questionnaire, as well as other psychological surveys. Students interested in network analysis, social influence, and psychology would be well-suited for this project. Furthermore, recent research in recommender systems indicates that psychological data could be very useful for improving personalisation and recommendation systems. The second goal of this project would be to build from the analysis above to understand whether, to what extent, and how psychological survey data can be incorporated into collaborative filtering algorithms. This step of the project would be suited to students who are interested in learning about machine learning, personalisation, and recommender systems.
The Language of Influence
Contact:Diarmuid O Seaghdha (with Daniele Quercia)
While macro-level studies of Twitter have concluded that "Twitter is not a social network but a news media" (Kwak et al., 2010), micro-level analyses of users and user-user interactions suggest that the Twittersphere functions as a set of real communities according to well-established sociological definitions of community (Cheng et al., 2011; Quercia et al., 2011). Twitter facilitates conversations between users, each of whom has a particular degree of status online (which may or may not reflect real-life status). This project would study social power relationships on Twitter, with a focus on linguistic indicators. Notions of influence on Twitter have been studied before (Liu et al, 2010; Danescu-Niculescu-Mizil et al, 2011) but there are still many potential aspects to investigate. Recently, Bramsen et al. (2011) looked at identifying power hierarchies as manifested in email data; here the sociolinguistic motivation would be similar but the problem would be quite different (and novel!). The NLP techniques involved would start with basic lexical, topic-model and sentiment analysis, but could extend to sophisticated dialogue act modelling (Ritter et al., 2010) and discourse effects such as accommodation (Danescu-Niculescu-Mizil et al, 2011; Danescu-Niculescu-Mizil et al, 2012) and coherence.
- Philip Bramsen, Martha Escobar-Molano, Ami Patel and Rafael Alonso. 2011. Extracting Social Power Relationships from Natural Language. In Proceedings of ACL 2011.
- Justin Cheng, Daniel Romero, Brendan Meeder and Jon Kleinberg. 2011. Predicting Reciprocity in Social Networks. In Proceedings of SocialCom 2011.
- Cristian Danescu-Niculescu-Mizil, Michael Gamon and Susan Dumais. 2011. Mark My Words! Linguistic Style Accommodation in Social Media. In Proceedings of WWW 2011.
- Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction. In Proceedings of WWW 2012.
- Haewoon Kwak, Changhyun Lee, Hosung Park and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of WWW 2010.
- Lu Liu, Jie Tang, Jiawei Han, Meng, Jiang and Shiqiang Yang. 2010. Mining Topic-level Influence in Heterogeneous Networks. In Proceedings of CIKM 2010.
- Daniele Quercia, Jonathan Ellis, Licia Capra and Jon Crowcroft. 2011. In the Mood for Being Influential on Twitter. In Proceedings of SocialCom 2011.
- Alan Ritter, Colin Cherry and Bill Dolan. 2010. Unsupervised Modeling of Twitter Conversations. In Proceedings of NAACL 2010.
Nudging Music Serendipity
Contact:Neal Lathia (with Daniele Quercia)
We have recently proposed a recommendation framework named Auralist that produces recommendations that are not only reasonably accurate, but are also diverse, novel and serendipitous [1]. We run a user study and found that Auralist improves overall recommendation satisfaction, but serendipitous recommendations are not always found to be 'very useful', at least at first sight. It would be thus useful to nudge people in clicking on serendipitous recommendations. This project is about designing a new music recommender system that is based on new 'nudging' ideas in behavioural economics [2].
- [1] Auralist: Introducing Serendipity into Music Recommendation
- [2] Nudge: Improving Decisions About Health, Wealth, and Happiness

Polypath Transport Protocols
Contact: Anil Madhavapeddy or Toby Moncaster
Cloud computing has become a central part of modern business. Services such as Amazon's EC2 offer users "Infrastructure as a service" -- the ability to lease virtual servers to run their business applications on. These virtual servers are actually virtual machines running on top of a hypervisor (such as Xen or VMWare) running on a physical server within a data-centre. Applications that talk between multiple such VMs usually use TCP (or a related variant such as DCTCP) as they don't know where the machines are physically located. Work such as [1] shows that TCP is rather slow for many tasks (but it always works). Shared memory (shmem) type transports are very much quicker for inter-process communications, however they can only be used between VMs residing on the same physical core.

We have been working on a new approach that builds on multipath TCP [2] and offers the ability to set up shmem links between VMs regardless of their physical location. The key aspect is that it always maintains an underlying TCP flow for resilience. This will mean that the connection can survive live migration of the VM between physical cores. Currently this work is still in its early stages. We have completed the basic protocol design, but have only completed limited implementation and evaluation to date. This makes it a very open-ended project that is targeted for publication in a major conference (such as SIGCOMM or Infocom). As a minimum we need a more complete implementation of the protocol. This could simply build directly on the mechanisms of MPTCP (in which case it should take a relatively modest amount of effort), but a student might also want to significantly enhance the implementation and improve the protocol design to address issues such as how to deal with short flows (where control traffic can dominate the actual data) and the issue of dealing with hardware offload without sacrificing the performance benefits it offers [3]. One key aspect of this approach is that it should offer transparent integration with existing system libraries such as OpenSSL. We also intend to integrate it with name services such as DNSSEC to provide a mechanism to identify which hosts offer support for the protocol.

[1] The case for reconfigurable I/O channels. S. Smith, A. Madhavapeddy, C. Smowton, M. Schwarzkopf, R. Mortier, R.M. Watson, S. Hand. In ReSoLVE12, Mrch 2012.
[2] The IETF Multipath TCP Working Group Homepage
[3] How had can it be? Designing and implementing a deployable Multipath TCP. C. Raiciu, C. Paasch, S. Barre, A. Ford, M. Honda, F. Duchene, O. Bonaventure, M. Handley. In SIGCOMM'12
Scheduling on speed: doping processes for fairness
Contact: Malte Schwarzkopf
Operating system schedulers currently enforce a notion of fairness by making sure that processes receive an equal share of CPU time (modulo their priority level). However, modern CPUs are deeply hierarchical systems with many levels of shared micro-architectural resources, and co-scheduling processes can (and often does) lead to performance interference between them [4, 5].

Recent Intel and AMD architectures support forms of "dynamic overlocking", by which they can dynamically increase the clock frequency of one or several cores on a chip, with differences as high as 500 MHz [2, 3]. However, to remain within the overall TDP limits, the processor may only apply this "doping" to either a subset of cores, or for a part of the time. The operating system can control the boosting of cores' clock by setting ACPIi power states for them, where the highest power state corresponds to the highest clock frequency. The control exterted by the OS is, however, limited by throttling mechanisms on the CPU: if it gets too hot, it will force the frequency down to a lower level.

The aims of this project would be to (i) monitor unfairness arising from performance interference (e.g. cache misses, memory bus contention), meaning that processes cannot make full use of their CPU time, and (ii) compensate for such unfairness by regularly giving the disadvantaged process a "shot of dope", so that it can achieve more during its allocated time. This would be a similar approach to that described in [1], except that, instead of increasing the time the process is scheduled for, the frequency of the core it runs on is increased. In further work, you could evaluate how this is best balanced between different cores in order to avoid any being throttled; and look into trading off the cost of migrating a process to a different core against the benefit of getting doped.

This project, if completed well and with good performance results, is likely to lead to a conference or workshop publication.

[1] Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. A. Fedorova, M. Seltzer, M.D. Smith. In Proceedings of PACT 2007.
[2] Intel TurboBoost, Wikipedia
[3] AMD Bulldozer turbo mode, Wikipedia
[4] Resource-Conscious Scheduling for Energy Efficiency on Multi-Core Processors, A. Merkel, J. Stoess, F. Bellosa. In Proceedings of EuroSys 2010.
[5] Dynamic Classification of Program Memory Behaviors in CMPs, Y. Xie, G. Loh. In Proceedings of CMP-MSI 2008.
Raspberry-BSP: Cheap and Safe Bulk Synchronous Processing on Wimpy Nodes
Contact: Eiko Yoneki (with Amitabha Roy)
Keywords: Graph Processing, Data Parallel, Bulk Synchronous Processing

This project is inspired by FAWN [1][2] and aims to replicate the benefit of an array of low power processors backed by persistent storage to graph mining. The aim is to build a software stack for Raspberry-PI that allows a set of such devices, each equipped with either an SD card or a USB flash drive to act as a node for Bulk Synchronous Processing of graphs (see Pregel [3] for an example of bulk synchronous processing). All the necessary data structures will reside on Flash with the small 256MB RAM on the Raspberry-PI acting as a scratchpad. This means that a failure of the Raspberry-PI unit can be tolerated by simply swapping the flash memory over to another good unit. However this will need careful integration of a 2-phase commit protocol with bulk synchronous processing (BSP). One of the other goals of this project is to integrate a prefetching algorithm we have developed for graphs stored on flash drives with the software stack built for this project. The final aim will be to show that when processing time, energy consumption and cost are taken together this solution is competitive to running BSP on a cluster of PCs.
References:
[1] D. Andersen, J. Franklin, A. Phanishayee, L. Tan and V. Vasudevan: FAWN: A Fast Array of Wimpy Nodes, SOSP, 2009. http://www.cs.cmu.edu/~fawnproj/papers/fawn-sosp2009.pdf
[2] D. Andersen, J. Franklin, A. Phanishayee, L. Tan and V. Vasudevan: FAWN: A Fast Array of Wimpy Nodes, Communications of the ACM, July 2011.
[3] G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski: Pregel: A System for Large-Scale Graph Processing, SIGMOD, 2010.
Graph Compression
Contact: Eiko Yoneki (with Amitabha Roy)
Keywords: Graph Compression

This project explores graph compression mechanisms as part of a project looking into high performance semi-external memory graph processing (see [1] to get an idea of semi-external memory approaches). The graph compression work will build on an in-house graph representation format that we have developed that allows succinct representation of graphs that show hierarchical structure. The project will explore ways to improve the representation yielding smaller graphs on disk that are less costly to traverse. A key element of the representation scheme is a recursive graph partitioning step that minimises the number of edges between partitions. This is a rich space for exploration of suitable algorithms. We are concerned primarily with experimentally evaluating IO costs on large graphs and measuring the compression performance. However a student with a theoretical background might consider designing algorithms with provable bounds on compression performance, which would be a big plus. If time allows you can also implement efficient transformation tool based on the developed graph compression algorithm using parallel processing tools (e.g. map/reduce).
References:
[1] R. Pearce, M. Gokhale and N. Amato: Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory, ACM/IEEE High Performance Computing, Networking, Storage and Analysis, 2010. http://dl.acm.org/citation.cfm?id=1884675
Prefetcher for Graph Traversal
Contact: Eiko Yoneki (with Amitabha Roy)
Keywords: Graph processing, Graph structure

This project takes part of our ongoing project on high performance semi-external memory graph processing (see [1] to get an idea of semi-external memory approaches). This project will build on an in-house prefetching algorithm we have developed for mining large graphs stored on Solid State Drives. A solid state drive provides the ability to service many random reads in parallel, provided that the requesting application is able to deliver requests at a rate that keeps the drive sufficiently busy. Thus pre-fetching multiple pages based on the current request in an intelligent manner can greatly improve performance. Our current pre-fetcher implementations are aligned with the graph traversal approach, (that is the graph iterator being employed). The challenge in this project is to design a prefetcher that is algorithm-agnostic. The starting point for this project will be the existing infrastructure and data sets and some initial ideas from the supervisors. Further to this, the prefetcher implementation could be extended to utilise multiple storage devices in parallel. The aim in this setting would be to out-perform standard storage parallelisation techniques such as hardware or software RAID. The performance of this approach would also depend on the method of dividing graph data between available storage devices.
References:
[1] R. Pearce, M. Gokhale and N. Amato: Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory, ACM/IEEE High Performance Computing, Networking, Storage and Analysis, 2010. http://dl.acm.org/citation.cfm?id=1884675
Optimising I/O in Scale-Free Graph Processing
Contact: Eiko Yoneki (with Amitabha Roy)
Keywords: Graph processing, I/O optimisation

An issue with real-world graphs is that these typically exhibit scale-free characteristics (a degree distribution following a power law). This usually means that such graphs contain a large number of very poorly connected vertices, and a small number of very well-connected (high degree) vertices. Our work has found that processing such data is challenging, especially where access to secondary storage is required. Related work in this area has sought to employ various static optimisations, for example to deal with a high-degree vertex. Another approach may be to actively measure the graph, either in a pre-processing stage or during processing, to provide some guidance on how the underlying graph storage system can be tuned to fit the graph's structure. A system employing such features could be evaluated in a range of different scenarios: the type of graphs to be analysed, the processing algorithm, availability of main memory, and across a range of storage media.
Integration of Data-flow programming with Overlay/Crowd through Task Farming
Contact: Eiko Yoneki (with Karthik Nilakant)
Keywords: Data flow programming, Crowd, Haggle

MapReduce [5] and Dryad/LINQ [6] have facilitated the exploitation of multiple machines in a computer cloud. In order to provide more expressive coordination language for different classes of computation, including iterative and recursive algorithms, a project called Ciel [7],Skywriting [1] has been developed in our systems research group, applying a purely-functional script language for describing distributed computations. This project investigates extending Ciel/Skywriting including an overlay or crowd that represents a series of distributed or mobile nodes. Thus, a worker invoked by Ciel could be a crowd of mobile phones that form opportunistic networks, i.e. a Haggle opportunistic network platform. This requires an extension to the current task spawning architecture in Ciel to add capability to create a new master task for the crowd. In an opportunistic network environment, tasks will be executed in a delay-tolerant manner, and this aspect should be integrated in the design of Ciel engine for crowd. As an option if time allows, you could exploit the social structure to execute tasks in the crowd. In general, mobile networks formed by smart phones are time dependent and dynamic with high node mobility. Skywriting programming can be exploited in such dynamic network environments, which is powerful for cooperative task farming. For example, the map function could be exploited based on time-space graph of a social network to achieve an efficient distributed computing. The logical network topology could be obtained from various sources such as online social networks (e.g. Facebook, Orkut), and a social graph could be transformed to a task graph, indicating the order of operations. Efficient task graph generation based on a given distributed computation is powerful. Mobile networks offer substantial aggregate bandwidth and processing power. The intersection between networking and programming languages is becoming popular. The proposed project aims at building a new paradigm, where networking and programming are well integrated, with special emphasis on mobile networks. The prototype of Ciel engine has been implemented in Haggle opportunistic platform [3], and you can start building your project on top of the prototype. The evaluation can be done in both using Android devices and Xen-based Haggle simulator.
References:
[1] D. Murray and S. Hand, Scripting the cloud with Skywriting, USENIX HotCloud, 2010.
[2] D. Murray, E. Yoneki, J. Crowcroft and S. Hand, The Case for Crowd Computing. ACM MOBIHELD, 2010.
[3] EU FP6 Haggle platform http://code.google.com/p/haggle/
[4] J. Crowcroft, E. Yoneki, et al. Promoting Tolerance for Delay Tolerant Network Research. ACM CCR, 38(5):63-68, 2008.
[5] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters. USENIX OSDI, 2004.
[6] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K.Gunda, and J. Currey. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. OSDI, 2008.
[7] D. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy and S. Hand. Ciel: a universal execution engine for distributed data-flow computing, NSDI, 2011.
Sync-Ball
Contact: Eiko Yoneki (with Karthik Nilakant)
Keywords: Mobile Agent, Opportunistic networks, Haggle

In this project, you will build function/mechanism for migrating data objects over the Haggle platform [1]. Haggle is built over a "layerless" networking architecture that incorporates event-driven and asynchronous operations, which reside in the device as a kernel component. Functional components implement the functional logic and interact only directly with the kernel (see [2] for further details about the Haggle implementation). A data object is an object defined in Haggle as an entity which can move between Haggle equipped devices. Each data object contains metadata, consisting of name and attribute pairs. Metadata operations like search or forwarding depend on the size and complexity of the network. Individual nodes in Haggle (e.g. smart phones) are powerful enough to perform substantial amounts of computing. An example of computation would be to pass around a "Sync-Ball" from node to node as they meet, and into which any node may stick the result of some computation. The main task of the "Sync Ball" project is to design a data object which is able to operate at a Haggle node, collect/append data, and then migrate to other devices. Haggle already has an Android platform implementation, and Sync-Ball can be developed over it. A challenge is that the data object should be designed for multiple purposes and simple programming languages for such data object is desired or design the function as part of database like operation. For example, you can design a migrating (distributed) Excel, where computations are a set of formulae each computed over a column, and each node may append zero or more rows when it has the ball. If time allows, you can add parallel operation of data objects and writing also runtime to support it. It might also be interesting to consider issues of data integrity and availability (through redundancy). Potential applications for Sync-Ball:
- Synchronisation among devices: information to be synchronised is in the data object, devices receives/updates the data then sent out to the other devices.
- Twitter entries, which are accumulated through the migration within Haggle name space until the data object is sent out to the Internet based twitter.
- Environmental monitoring object: dispatch sync-ball gathering Co_2 values from various sensor devices in the area.
References:
[1] EU FP6 Haggle Project
[2] Erik Nordstrom, Per Gunningberg, and Christian Rohner. A Search-based Network Architecture for Mobile Devices. Tech Report Uppsala University, 2009-003.
[3] E. Yoneki, I. Baltopoulos and J. Crowcroft " D^3N: Programming Distributed Computation in Pocket Switched Networks", ACM MOBIHELD at SIGCOMM, 2009.
[4] Xerox PARC's Bayou Project.
Building Naming Scheme in NDN over V2V Communication
Contact: Eiko Yoneki
Keywords: Named Data Networking, Content Centric Networking, Naming

This project explores an emerging new Internet architecture called "Named Data Networking" (NDN) originated by the project Content Centric Networking (CCN) by Van Jacobson in PARC. NDN provides network storage, mobility, and multi-path forwarding. In NDN, communication is driven by the data consumer who initiates sending out a packet, which contains the interest of the consumer by a hierarchically-structured name. One Interest retrieves one data packet, and a sequence of Interests is sent to retrieve the sequence of individually-named Data packets that make up a large piece of content such as a video. The naming in the NDN architecture is not mature. In this project, you will explore naming schemes in NDN. The semantics of names is opaque to the network but depends on the application. The structure of names (e.g. hierarchically structured with an arbitrary number of components), discovery, and namespace navigation should be investigated and the vehicle-to-vehicle communication environment is targeted. You can exploit location information (e.g. latitude and longitude by GPS or any geographical information that can be obtained from the map) to navigate the data flow depending on the consumers' interests. The meaning of names can be determined by local and global distribution requirements, and the content producer and consumer application conventions, as embodied in individual router's routing tables and application-level transport libraries. You will use CCNx open source environment http://www.ccnx.org/software-download-information-request/download-releases/. If time allows, you can add a comparison study with another content centric networking paradigm Haggle from naming aspects.
References:
[1] Named Data Networking: http://www.named-data.org/ndn-proj.pdf
[2] CCNx: http://www.ccnx.org/
[3] NDN: http://www.named-data.org/
[4] Haggle: http://code.google.com/p/haggle/
Decentralised Community Detection in Opportunistic Networks
Contact: Eiko Yoneki
Keywords: Community Detection, Opportunistic Networks, Haggle

BUBBLE RAP introduced in [3] is a forwarding algorithm for Delay tolerant networks, which uses information of community and graph centrality. It consists of RANK and LABEL strategies. RANK delegates messages to high centrality nodes, while LABEL selects a next hop from any member of the destination community as a relay. BUBBLE works in a decentralised world without access to infrastructure. Therefore the distributed detection and dissemination of node popularity and node communities, and the use of these for forwarding decisions are crucial. This project aims at implementing distributed community detection in Haggle framework [1] and deploying BUBBLE forwarding algorithm. Integration work will be carried out using Xen based Haggle simulator including generation of a few different types of networks (e.g. small world, and emulation of real world trace). Performance testing in terms of packet delivery compared to prior schemes should be demonstrated. A brief description of the work items is listed below.
- Implement distributed community detection
- Implement BUBBLE forwarding algorithm [3] in Haggle
- Implement network generator (e.g. small world, inter-contact power-law etc.) within Xen Haggle simulator
- Integrate all above
- Performance testing in different network setting
References:
[1] EU FP6 Haggle Project http://code.google.com/p/haggle/
[2] P. Hui, E.Yoneki, S. Chan, and J. Crowcroft. Distributed community detection in delay tolerant networks. Proc. MobiArch, 2007.
[3] P. Hui, J. Crowcroft, and E. Yoneki. BUBBLE Rap: Social Based Forwarding in Delay Tolerant Networks. MobiHoc, 2008
[4] I. Leung, P. Hui, P. Lio, and J. Crowcroft. Towards real time community detection in large networks. Phys. Rev. E, 79(6), 2009.
NetFPGA 10G Xen interface
Contact: Andrew W. Moore
Originator/Supervisor: Andrew W. Moore
Keywords: NetFPGA, Virtual Machines, Xen,

The NetFPGA 10G project is an exciting upgrade for the NetFPGA platform providing an interface card that is enabled with 4 x 10Gbps interfaces. Xen is the Cambridge-grown super success story of Virtualization. This project will be to deliver a NetFPGA10G Xen device driver and while on that 'journey' explore how rudimentary decisions in design and function-placement affect the ability of the driver providing optimal performance once integrated into Xen. While a clear pathway has been considered candidates are encouraged to explore the solution space independently where plausible.
References:
1. Xen: http://xen.org/
2. NetFPGA: http: //www.netfpga.org/
An Information Centric Network search engine architecture
Contact: Andrea Lo Pumo

Goal: design a search engine architecture for an Information Centric Network.

In today's Internet, search engines can be seen as "universal front-ends" to access publicly available content, whether the content is located in the web or in alternative media networks. For example, through means such as ed2k links or .torrent metafiles[2-4] contents in alternative media networks can be searched via standard web search engines (e.g. "ubuntu filetype:torrent" in Google). The main premise of Content Centric Networks is to push at the network layer content distribution functionalities. For example, the NDN[1] architecture can be thought as a network layer implementation of a pure p2p overlay network. However, even though search engines are a fundamental part of a network whose usages are centred on content, NDN and other ICN networks do not provide network primitives to build a search engine.
The idea of the project is to explore how one could embed in ICN architectures, NDN especially, some primitives that allow to exploit the design of the ICN network in order to ease the construction of search engines. The exploration could be focused on the "crawling and indexing" part of a search engine. In NDN there could be no need to have crawling at all. One could think of developing a protocol that let publishers join to a public and distributed index, similar to what publishers currently do with site-maps[5]. Each publisher will index its own contents and then share the index or push it to specialised nodes. An organisation willing to provide search services will retrieve the whole index, or part of it, will convert the index in the format it prefers and run on its own machines a search engine. A starting point for indexing algorithms and code could be the distributed p2p search engine YaCy[6].
In short, if NDN is pushing p2p systems down at network layer, the idea proposed here is to push search engines as well, exploiting as much as possible what the network offers.
[1] NDN project, http://www.named-data.net/
[2] Ed2k links, http://en.wikipedia.org/wiki/Ed2k_URI_scheme
[3] .torrent metafile, http://en.wikipedia.org/wiki/Torrent_file
[4] Magnet links, http://en.wikipedia.org/wiki/Magnet:
[5] Sitemaps, http://en.wikipedia.org/wiki/Sitemaps
[6] YaCy search engine, http://en.wikipedia.org/wiki/YaCy
Delay-Tolerant NDN
Contact: Eiko Yoneki (with Karthik Nilakant)
Keywords: Named Data Networking (NDN), Content Centric Networking (CCN), DTN

The CCNx project is currently under active development. It implements the content-centric networking framework presented in [1,2,3]. One of the observations of the original paper was that the underlying store-and-forward methodology lends itself particularly well to delay or disruption-prone networks. However, CCNx does not currently perform routing for such a network (relying instead on network broadcast, or static user-defined routes). In contrast, the Haggle project provides PROPHET-based probabilistic routing. To provide this type of functionality for CCNx, a new control model would need to be written. In this project, desirable attributes of such a module would be the ability to switch between multiple protocols, and the ability to tune the module programmatically will be investigated. Once the module has been completed, it can be put to the test by performing performance evaluations against Haggle or other frameworks.
References:
[1] Named Data Networking: http://www.named-data.org/ndn-proj.pdf
[2] CCNx: http://www.ccnx.org/
[3] NDN: http://www.named-data.org/
[4] Haggle: http://code.google.com/p/haggle/
Modularity Preserved Synthetic Dynamic Network Generator
Contact: Eiko Yoneki
Keywords: Clustering, Network Topology Generation

tRandom networks, scale-free networks, or small-world networks have been used for simulation of epidemic spread and social network analysis, among other applications. Such networks represent human contact networks or social networks, and in most cases a single network view is used in an aggregated form. Real human contact networks are dynamic, and such a single network topology may not present the dynamic feature of networks. We have used time series of network traces from the real world to simulate the dynamic contact networks in our previous research on routing in delay-tolerant networks. However, those network traces are an instance in a specific environment. Ideally creating synthetic network topology in time series will help to evaluate a large-scale and complex network structure. Although several attempts have been made to achieve building synthetic networks, it is still a challenge to generate the topology of a real dynamic network. In this project, you would focus on generating a dynamic network topology that preserves modularity. Thus, you will pursue a rewiring strategy without losing the module/cluster architecture of the network. You can start your investigation using our prototype Randomise-Matrix algorithm, which preserves the single level of modules and may be extended to a more hierarchical structure. Using online social networks such as Twitter data for validating the algorithm is desirable.
References:
[1] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Network motifs: simple building blocks of complex networks, Science. 2002 25;298(5594):824-7.
Eye Tracking for Behavioural Study
Contact: Eiko Yoneki
Keywords: Behaviour Study and Model, Perception, Cognitive Science

In this project you would use eye tracking device to extract unconscious behaviour of people when they use online social network tools. Example questions to solve are how people make a decision to re-tweet in Twitter or post a wall comments n Facebook. We will provide an eye tracking experiment device, which can automatically record the web browser and monitor's information from the participant's eye movement. You would design the experimental setting up and goal of the experiment. Our on-going photo rating experiments in our EU FP7 RECOGNITION project can be used as an experimental material to complement the extracting data. You can also write your experimental application or online social networks (e.g. Facebook) to extract unconscious social behaviour in comparison to feedback information (e.g. how the other people rated, what is the average of rating) from the participant's social clique. As an extension, you could write an economic gaming application to capture the participant's behaviour on decision making for competitive tasks. The work is based on the EU FP7 RECOGNITION project http://www.recognition-project.eu/. This role is suitable for the student, who is interested in data analytics and experimental work.
References:
[1] Roman Bednarik , Markku Tukiainen, An eye-tracking methodology for characterizing program comprehension processes, Proceedings of the 2006 symposium on Eye tracking research and applications, March 27-29, 2006, San Diego, California
[2] Hayes, T. R., Petrov, A. A., and Sederberg, P. B. (2011) A novel method for analyzing sequential eye movements reveals strategic inuence on Raven's Advanced Progressive Matrices. Journal of Vision, 11, 2011. (http://www.journalofvision.org/11/10/10/).
Modelling Spread of Rumours between Communities
Contact: Eiko Yoneki
Keywords: Opinion Dynamics, Community

Our previous work [1] reported a preliminary investigation on interactions between networked social communities using the Ising model to analyse the spread of rumours. The inner opinion of a given community is forced to change through the introduction of a unique external source, and we analyse how the other communities react to this change. We model two conceptual external sources: namely, Strong-belief and Propaganda, by an infinitely strong inhomogeneous external field and a finite uniform external field, respectively. In the former case, the community changes independently from other communities, while in the latter case also according to interactions with other communities. We applied our model to synthetic networks as well as various real world data ranging from human physical contact networks to online social networks. The experimental results of applying the Ising model using real world data clearly demonstrate two distinct scenarios of phase transitions. This project aims at validation of the above model to examine if such Strong-belief and Propaganda will be exhibited in real situations. Thus, the project work requires deploying a series of the experiments, e.g. an online and mobile phone based software to share opinions on specific topics within the given time, and monitor the change of individual opinion in time series by controlling information available to the individual. This will capture the dynamics of opinion change and investigate validation of our Ising model. The experiments can start off with the simplest condition and along our development of extending work such as coexisting two opinions (i.e. different opinions coexisting in the society), the software should be also extending for appropriate validation. The Ising model for understanding interactions of communities and subsequently modelling rumour propagation is a unique approach that will give significant input for modelling information propagation in social networks. The result of validation can be compared with voting model.
References:
[1] M. Ostillia, E. Yoneki, I. Leung, J. Mendes, P. Lio, and J. Crowcroft: "Statistical mechanics of rumour spreading in network communities", International Conference on Computational Science, ICCS, 2010.
[2] M. Ostilli, J. F. F. Mendes, "Communication and correlation among communities". Phys. Rev. E 80 (011142); and M. Ostilli, J. F. F. Mendes, "Small-world of communities: communication and correlation of the meta-network". J. Stat. Mech. L08004 (2009).
Impact of social influence on information propagation in social networks
Contact: Eiko Yoneki (with Mehdi Hosseini)
Keywords: Information Diffusion, Twitter, Behaviour Model

Social networks like Facebook and Twitter have brought new entertainment experiences. People or companies publish information in a social network through the network's topology and by sharing them with their neighbours. The published information can either be quickly ignored or widely spread in the network. The extent to which information is propagated highly depends on social influences that the members have on each other. This project investigates the impact of social influences on social networks. Specifically, by monitoring members' activity, this project will develop (i) data-driven models that explain various behaviours and influences happening in a social network, (ii) metrics that quantify various forms of social influences e.g. conformity, reactance, peer pressure, sales and marking, (iii) methods that identify influential members and their characteristics, (iv) gamification experiments for collecting the required data. In Twitter, for instance, a simple type of influence is the ability to get others to retweet a message. Proposed solutions should answer questions such as: (i) what is the probability that a member will be influenced by a Twitter message and retweet it? (ii) What is the type of influence? (iii) Who are the influential members? In this project, you will work on Twitter or Facebook data. You need to know a programming language, e.g. Java, C, Python, for collecting data and one programming language for statistical computing, e.g. R, Matlab, Java.
References:
[1] Cha, Haddadi, Benevenuto, Gummadi: Measuring User Influence on Twitter: The Million Follower Fallacy. ICWSM 2010.
[2] Evan T.R. Rosenman: Retweets, but Not Just Retweets: Quantifying and Predicting Influence on Twitter. Thesis, Harvard University.
[3] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010.
[4] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks and their Implications, ACM EuroSys 2009.
[5] Balicer, R.:Modeling Infectious Diseases Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007.
[6] Sastry, Yoneki and Crowcroft: Buzztraq: Predicting geographical access patterns of social cascades using social networks. EuroSys workshop on Social Network Systems, 2009.
Trust Relationships and Content Filtering
Contact: Eiko Yoneki (with Arman Idani)
Keywords: Social Trust, Content Filtering

Online social networks have become extremely popular in recent years. One of the challenges that people are now facing is to keep up with the amount of content that is available to them on their preferred social networks (status updates, tweets, photos, news, etc). They have increased to a point which is nearly impossible for the average user to review all of them. In this project you would work on automatic filtering of unwanted content. One of the ways to achieve the goal is through the use of trust relationships. Trust relationships simply mean the extent that a user on a network trust other users or services. By monitoring the relationship between members you will investigate which parameters contribute towards trust between different users and how they can be quantified. The references below are good starting points. The solution should be able to answer research questions such as: (i) How do users choose which content, and from which other user, to care about? (ii) Can a quantifiable measurement of trust relationship help to predict if a content is of no desire to the user? (iii) What makes a user trustworthy? In this project you will work on Facebook or Twitter data. You need to know a programming language, e.g. Java, C, C#, Python, for collecting data and one programming language for statistical computing, e.g. R, MATLAB , Java.
References:
[1] Yarden Katz and Jennifer Golbeck. 2006. Social network-based trust in prioritized default logic. In proceedings of the 21st national conference on Artificial intelligence - Volume 2 (AAAI'06), Anthony Cohn (Ed.), Vol. 2. AAAI Press 1345-1350.
[2] Ugur Kuter and Jennifer Golbeck. 2010. Using probabilistic confidence models for trust inference in Web-based social networks. ACM Trans. Internet Technol. 10, 2, Article 8 (June 2010), 23 pages.
Trust Relationships and Propagation of Trust in Online Social Communities
Contact: Eiko Yoneki (with Arman Idani)
Keywords: Social Trust, Community, Information Propagation

Communities in social networks are groups of tightly coupled nodes (users). In communities, users normally have a lot of mutual friends with whom they might interact differently. The extent that a user trusts other users, trust relationship, is usually measured by monitoring their interaction. In this project you would investigate trust relationships in online social communities and its relationship to the properties of the community, an how trust propagates among the members of a community. References below are good starting points. The solution should be able to answer research questions such as: (i) How does trust relationships within a community differs from trust relationships outside of the community? (ii) To what extent does the trust relationship among members of a community depend on its properties? (iii) How a user's role in the community correlate with their trustworthiness, is there a relationship? In this project you will work on Facebook or Twitter data. You need to know a programming language, e.g. Java, C, C#, Python, for collecting data and one programming language for social network analysis and statistical computing, e.g. R, MATLAB, Python.
References:
[1] Yarden Katz and Jennifer Golbeck. 2006. Social network-based trust in prioritized default logic. In proceedings of the 21st national conference on Artificial intelligence - Volume 2 (AAAI'06), Anthony Cohn (Ed.), Vol. 2. AAAI Press 1345-1350.
[2] R. Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. 2004. Propagation of trust and distrust. In Proceedings of the 13th international conference on World Wide Web (WWW '04). ACM, New York, NY, USA, 403-412.
[3] Christian Borgs, Jennifer Chayes, Adam Tauman Kalai, Azarakhsh Malekian, and Moshe Tennenholtz. 2010. A novel approach to propagating distrust. In Proceedings of the 6th international conference on Internet and network economics (WINE'10), Amin Saberi (Ed.). Springer-Verlag, Berlin, Heidelberg, 87-105.

Department of Computer Science and Technology

ACS Projects (2012—2013)

NetOS