Elastic provisioning for data streams

Research topic and goals

Dynamic data streams produced by sensing and experimental devices as well as social networks, give scientists an unprecedented opportunity to explore a variety of environmental and social phenomena. One of the main challenges is that such streams and their computation requirements are volatile: sensors or social networks may generate data at highly variable rates, or processing time in an application may significantly change from one stage to the next one. Cloud computing is a promising platform allowing us to cope with such volatility because it enables to allocate computational resources on demand, for short periods of time, and at an acceptable cost. At the same time using clouds for this purpose is challenging because an application may yield a very different performance depending on the hosting infrastructure, requiring us to pay special attention to how and where we schedule resources. In our research, we first aim at characterizing the implications of using different instance offers of Chameleon cloud to run similar applications. Then, our goal is to identify key features of applications handling dynamic data streams. Base on such features, we will produce statistical models that allow us to predict an application’s computing needs and elastically provision (and de-provision) the required resources.

Results for 2015/2016

We carried out a set of experiments using an application relying on input from social networks, notably geo-located tweets, to discover correlation between users’ work and home locations. In order to assess the impact of running the same application in offerings from different providers, we executed the various stages of our use case application in two flavors of Chameleon cloud instances, namely bare-metal and KVM. Also, we analyzed specific configuration parameters, such as data block size, replication factor and parallel processing, towards statistically modeling the application performance in a given infrastructure. Finally, we looked into the gains brought by accounting for data proximity when scheduling a resource in a multi-site environment. A poster depicting the results of our research was presented at Supercomputing 2015 (Pineda-Morales et al. 2015).

Visits and meetings

Luis Pineda Morales (INRIA) was hosted for a Summer internship at Argonne National Laboratory from May to August 2015.

Impact and publications

  1. Pineda-Morales, Luis, Balaji Subramaniam, Kate Keahey, Gabriel Antoniu, Alexandru Costan, Shaowen Wang, Anand Padmanabhan, and Aiman Soliman. 2015. “Scaling Smart Appliances for Spatial Data Synthesis.” SC15 - ACM/IEEE International Conference in Supercomputing. https://hal.inria.fr/hal-01241718.
    @misc{PinedaEtAl2015,
      author = {Pineda-Morales, Luis and Subramaniam, Balaji and Keahey, Kate and Antoniu, Gabriel and Costan, Alexandru and Wang, Shaowen and Padmanabhan, Anand and Soliman, Aiman},
      hal_id = {hal-01241718},
      hal_version = {v1},
      howpublished = {{SC15 - ACM/IEEE International Conference in Supercomputing}},
      keywords = {spatial data ;  cloud computing ;  elastic provisioning},
      month = nov,
      note = {Poster},
      pdf = {https://hal.inria.fr/hal-01241718/file/Pineda-Morales_SC.pdf},
      title = {{Scaling Smart Appliances for Spatial Data Synthesis}},
      url = {https://hal.inria.fr/hal-01241718},
      year = {2015}
    }
    

Future plans

Next steps include experimenting with Phantom auto-scaling service (Keahey et al. 2012) in the Chameleon cloud to elastically provision resources for our use case application.

References

  1. Keahey, Kate, Patrick Armstrong, John Bresnahan, David LaBissoniere, and Pierre Riteau. 2012. “Infrastructure Outsourcing in Multi-Cloud Environment.” In Proceedings of the 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit, 33–38. FederatedClouds ’12. ACM. doi:10.1145/2378975.2378984.
    @inproceedings{KeaheyEtAl2012,
      acmid = {2378984},
      author = {Keahey, Kate and Armstrong, Patrick and Bresnahan, John and LaBissoniere, David and Riteau, Pierre},
      booktitle = {Proceedings of the 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit},
      doi = {10.1145/2378975.2378984},
      isbn = {978-1-4503-1754-2},
      keywords = {cloud computing, infrastructure-as-a-service, nimbus, platform-as-a-service},
      location = {San Jose, California, USA},
      numpages = {6},
      pages = {33--38},
      publisher = {ACM},
      series = {FederatedClouds '12},
      title = {Infrastructure Outsourcing in Multi-cloud Environment},
      url = {http://doi.acm.org/10.1145/2378975.2378984},
      year = {2012}
    }