All PRObE related publications are listed on this page. We ask our users for their cooperation to satisfy our reporting requirements to our sponsors by submitting related publication references so that we can put them here. You can read the full PRObE Publication Policy for further information about your reporting requirements. Please send your references via email to This email address is being protected from spambots. You need JavaScript enabled to view it. .

  • Journal Articles
    • p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px 'Helvetica Neue'; color: #232323; -webkit-text-stroke: #232323} span.s1 {font-kerning: none}
      Speed Up Big Data Analytics by Unveiling the Storage Distribution of Sub-datasets. J. Wang, X. Zhang, J. Yin, H. Wu, D. Han. IEEE Transactions on Big Data. 2016. Page 1. 2332-7790 (c) 2016 IEEE. [PDF]
    • An Innovative Approach to Bridge a Skill Gap and Grow a Workforce Pipeline: The Computer System, Cluster, and Networking Summer Institute. Carolyn Connor, Andree Jacobson, Amanda Bonnie, Gary Grider. InJournal of Education in Systems Administration (JESA), Vol 2, Number 1, 2016. [URL]
    • Load-balanced and locality-aware scheduling for dataintensive workloads at extreme scales. Ke Wang, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Tonglin Li, Michael Lang, Ioan Raicu.  Concurrency and Compuation: Practice and Experience. 2015. 00:1-29. [PDF].
    • Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions. A. Turcu, R. Palmieri, B. Ravindran and S. Hirve. IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS). vol.PP, no.99, pp.1,1, 2015.
    • NSF’s PRObE CFP: 1000 Nodes for Systems Research Experiments. Garth Gibson, Gary Grider, Andree Jacobson, and Wyatt Lloyd. USENIX Login Magazine, June 2013 Issue [URL]
    • Don't Settle for Eventual Consistency. Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, David G. Andersen. 2014. Communications of the ACM. Volume 57, Issue 5. May 1, 2014 Issue. pp. 61-68. [URL]
  • Conference Proceedings
    • Proteus: Agile ML Elasticity through Tiered Reliability in Dynamic Resource Markets.  Aaron Harlap, Alexey Tumanov, Andrew Chung, Greg Ganger, Phil Gibbons.  ACM European Conference on Computer Systems, 2017 (EuroSys'17), 23rd-26th April, 2017, Belgrade, Serbia. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-16-102. May 2016. [PDF]
      FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. Kalia A, Kaminsky M, Andersen DG. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). October 2016. [PDF]
      Sapprox: Enabling Efficient and Accurate Approximations on Sub-datasets with Distribution-aware Online Sampling. Xuhohg Zhang, Jun Wang, Jiangling Yin. 2017. Accepted to VLDB 17. Munich, Germany. August 8 - September 21, 2017. [PDF]
      Addressing the Straggler Problem for Iterative convergent parallel ML. Aaron Harlap, Henggain Cui, Wei Dai, Jinliang Wei, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, Eric P. Xing. 2016. ACM Symposium on Cloud Computing. October 5-7, 2016. Santa Clara, CA. [PDF]
    • The SNOW Theorem and Latency-Optimal Read-Only TransactionsHaonan Lu, Christopher Hodsdon, Khiem Ngo, Shuai Mu, Wyatt Lloyd. In Proc. 12th Symposium on Operating Systems Design and Implementation (OSDI 16), October 2016. [PDF]
    • Design Guidelines for High Performance RDMA Systems. Anuj Kalia, Michael Kaminsky, David G. Andersen. 2016 USENIX Annual Techical Conference. June 22-24, 2016. Denver, CO. [PDF]
    • STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning. Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth Gibson, Eric P. Xing. ACM European Conference on Computer Systems, 2016. EuroSys’16. April 18-21, 2016, London, UK. [PDF]
    • TetriSched: Global Rescheduling with Adaptive Plan-ahead in Dynamic Heterogeneous Clusters. Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger. ACM European Conference on Computer Systems, 2016. EuroSys’16. April 2016, London, UK. [PDF]
    • GeePS: Scalable Deep Learning on Distributed GPUs with a GPU-Specialized Parameter Server. Henggang Cui, Hao Zhang, Gregory R. Ganger, Phillip B. Gibbons, and Eric P. Xing. EuroSys'16. ACM European Conference on Computer Systems. April 2016. London, UK.
    • Experiences in using os-level virtualization for block I/O. Huang, D., Wang, J., Liu, Q., Yin, J., Zhang, X., & Chen, X. November 2015. In Proceedings of the 10th Parallel Data Storage Workshop (pp. 13-18). ACM.
    • Achieving up to Zero Communication Delay in BSP-based Graph Processing via Vertex Categorization. Zuhong Zhang, Ruijun Wang, Xunchao Chen, Jun Wang, Tyler Lukasiewicz, Dezhi Han. IEEE International Parallel & Distributed Processing Symposium. 2015. 
    • Finding Schools of Fish in the Ocean: A Sub-dataset Locality-aware Method for Accelerating Data Analytics. Jun Wang, Jianglin Yin, Jian Zhou, Xuhong Zhang, Tyler Lukasiewicz, Dan Huang, Xunchao Chen, and Ruijun Wang, University of Central Florida,  Submitted to ACM Symposium on Cloud Computing 2015 (SoCC'15). 
    • Opass: Analysis and Optimization of Parallel Data Access on Distributed File Systems. Jiangling Yin, Jun Wang, Jian Zhou, Tyler Lukasiewicz, Dan Huang and Junyao Zhang, University of Central Florida. Accepted to 29th IEEE International Parallel & Distributed Processing Symposium. 2015. [PDF]
    • Optimize Parallel Data Access in Big Data Processing. Jiangling Yin and Jun Wang , University of Central Florida. Accepted to 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing: Doctoral Symposium Program. 
    • Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing.  Ke Wang, Xiaobing Zhou, Kan Qiao, Michael Lang, Benjamin McClelland, Ioan Raicu. 2015. ACM HPDC. [PDF]
    • Simba: Tunable End-to-End Data Consistency for Mobile Apps. Dorian Perkins, Nitin Agrawal, Akshat Aranya, Curtis Yu, Younghwan Go, Harsha Madhyastha, and Cristian Ungureanu. To appear in Proceedings of the 10th European Conference on Computer Systems (EuroSys 15), Bordeaux, France, April 21-24, 2015. [PDF]
    • Deep Fried Convnets. Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang. International Conference on Learning Representations. May 7-9, 2015. San Diego, CA. [PDF]
    • IndexFS: Scaling File System Metadata Performance. Kai Ren, Qing Zheng, Garth Gibson. Supercomputing 2014. November 16-21, 2014. New Orleans. [PDF]. Highlight: Won SC14 Best Paper Award! 
    • On Model Parallelization and Scheduling Strategies for Distributed Machine Learning. Seunghak Lee, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A Gibson, Eric P Xing. Neural Information Procession Systems Foundation. December 9-14, 2014. Quebec, Canada. [PDF]
    • Archie: A Speculative Replicated Transactional System. Sachin Hirve, Roberto Palmieri and Binoy Ravindran. ACM/IFIP/USENIX 15th International Middleware Conference. MIDDLEWARE 2014. December 8-12, 2014. Bordeaux, France. [PDF]
    • Sebo: Selective Bulk Analysis Optimization in Big Data Processing. Jiangling Yin and Jun Wang, University of Central Florida. Accepted to Supercomputing Frontiers 2015 Programme. March 17 - 20, 2015. Biopolis, Singapore.
    • Exploring the Design Tradeoffs for Extreme-Scale High-Performance Computing System Software. K Wang, A Kulkarni, M Lang, D Arnold, I Raicu. 2015. IEEE Transactions on Parallel and Distributed Systems, MANUSCRIPT ID 1. [PDF]
    • Overcoming Hadoop Scaling Limitations through Distributed Task Execution. K. Wang, N. Liu, I. Sadooghi, X. Yang, X. Zhou, M. Lang, X.-H. Sun and I. Raicu. In Proc. of the IEEE International Conference on Cluster Computing 2015 (Cluster’15), Chicago, IL, USA, Sept. 2015. [PDF]
    • Raising the Bar for Using GPUs in Software Packet Processing. Anuj Kalia, Dong Zhou, Michael Kaminsky, David G. Andersen. To appear in Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI'15), Oakland, CA, May 2015. [PDF]
    • Reducing File System Tail Latencies with Chopper. He J., Nguyen D., Arpaci-Dusseau A., Arpaci-Dusseau R. 2015. 13th USENIX Conference on File and Storage Technologies (FAST 15). Santa Clara, CA. [PDF]. 
    • Exploiting iterative-ness for parallel ML computations. Henggang Cui, Alexey Tumanov, Jinliang Wei, Lianghong Xu, Wei Dai, Jesse Haber-Kucharsky, Qirong Ho, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, Eric P. Xing. 2014 ACM Symposium on Cloud Computing (SOCC 2014), Nov 3-5, Seattle, WA. [PDF]
    • BatchFS: Scaling the File System Control Plane with Client-Funded Metadata Servers. Qing Zheng, Kai Ren and Garth Gibson. 9th Petascale Data Storage Workshop. Supercomputing (PDSW), 2014.  New Orleans. [PDF]
    • Optimizing Load Balancing and Data-Locality with Data-Aware Scheduling. Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, Ioan Raicu. 2014 IEEE International Conference on Big Data. October 27-30. Washington DC. [PDF]
    • Scaling Distributed Machine Learning with the Parameter Server. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su. 11th USENIX Symposium on Operating Systems Design and Implementation. October 6-8, 2014. Broomfield, CO. [PDF]
    • Extracting More Concurrency from Distributed Transactions. Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, Jinyang Li. 11th USENIX Symposium on Operating Systems Design and Implementation. October 6-8, 2014. Broomfield, CO. [PDF]
    • SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems. Tanakorn Leesatapornwongsa, Mingzhe Hao, Pallavi Joshi, Jeffrey Lukman, Haryadi Gunawi. 11th USENIX Symposium on Operating Systems Design and Implementation. October 6-8, 2014. Broomfield, CO. [PDF]
    • Efficient Mini-batch Training for Stochastic Optimization. Mu Li, Tong Zhang, Yuqiang Chen, Alex Smola. KDD 2014. 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York City. August 24-27, 2014. [PDF]
    • Using RDMA Efficiently for Key-Value Services. Anuj Kalia, Michael Kaminsky, David Andersen. ACM SIGCOMM 2014. Chicago, Illinois, August 17-22, 2014. [PDF]
    • ScalScheduling: A Scalable Scheduling Architecture for MPI-based Interactive Analysis Programs. Jiangling Yin, Andrew Foran, Xuhong Zhang and Jun Wang. The 23rd International Conference on Computer Communications and Networks (ICCCN 2014). Shanghai, China, August 4-7, 2014. [PDF]
    • SLAM: Scalable Locality-Aware Middleware for I/O in Scientific Analysis and Visualization. Jiangling Yin, Jun Wang, Wuchun Feng, Xuhong Zhang, Junyao Zhang. The 23rd International Symposium on High Performance Distributed Computing (ACM HPDC2014). Vancouver, Canada. June 23-27, 2014. [PDF]
    • Exploiting Bounded Staleness to Speed up Big Data Analytics. Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, Eric P. Xing. 2014 USENIX Annual Technical Conference (ATC'14). June 19-20, 2014. Philadelphia, PA. [PDF]
    • The Energy Efficiency of Database Replication Protocols. Nicolas Schiper, Fernando Pedone, and Robbert van Renesse. The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2014). Atlanta, GA. June 2014. [PDF]
    • Next Generation Job Management Systems for Extreme-Scale Ensemble Computing. Ke Wang, Xiaobing Zhou, Hao Chen, Michael Lang, Ioan Raicu. 2014. International Symposium on  High-Performance Parallel and Distributed Computing (HPDC). Vancouver, Canada. June 23-27, 2014. [PDF]
    • Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. Christopher Mitchell, Yifeng Geng, Jinyang Li. USENIX Annual Technical Conference 2013. San Jose, CA. June 26-28, 2013.  [PDF]
    • PARROT: A Practical Runtime for Deterministic, Stable and Reliable Threads. Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xinan Xu, Junfeng Yang, Garth A. Gibson. 24th ACM Symposium on Operating Systems Principles (SOSP'13), Nov 4-6, 2013, Farmington, PA. [URL]
    • Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale Systems Services. Ke Wang, Abhishek Kulkarni, Michael Lang, Dorian Arnold, Ioan Raicu.  IEEE/ACM Supercomputing/SC 2013. [PDF]
    • Sprinkler - Reliable Broadcast for Geographically Dispersed Datacenters. Haoyan Geng and Robbert van Renesse. International Middleware Conference (Middleware). Beijing, China. December 2013. [PDF]
    • Leveraging Sharding in the Design of Scalable Replication Protocols. Hussam Abu-Libdeh, Robbert van Renesse, and Ymir Vigfusson. Symposium on Cloud Computing (SoCC). Farmington, PA. October 2013. [PDF]
    • DL-MPI: Enabling Data Locality Computation for MPI-based Data-Intensive Application. Jiangling Yin, Andrew Foran, and Jun Wang. In the 2013 IEEE International Conference on Big Data (BigData 2013), Oct 6-9, 2013, Santa Clara, CA, USA [PDF]
    • Stronger semantics for low-latency geo-replicated storage. Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen., In Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation, NSDI 2013. pp. 313-328. Lombard, IL. USENIX Association. [PDF]
    • TABLEFS: Enhancing Metadata Efficiency in the Local File System. Kai Ren, Garth Gibson. 2013. USENIX Annual Technical Conference. June 26-28, 2013. San Jose, CA. [URL]
    • Stronger semantics for low-latency go-replicated storage. Wyatt Lloyd, Michael J. Feedman, Michael Kaminsky, David G. Andersen. 2013. 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI). Lombard, IL, April 2-5, 2013. USENIX Association. [PDF]

    p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px 'Helvetica Neue'; color: #323333; -webkit-text-stroke: #323333} span.s1 {font-kerning: none} span.s2 {font-kerning: none; color: #606060; -webkit-text-stroke: 0px #606060}

  • Workshops
    • Scalability Bugs: When 100-Node Testing is Not Enough. Haryadi Gunawi, Huan Ke, Tanakorn Leesatapornwongsa, Jeffrey Lukman, Cesar Stuardo, and Riza Suminto. To appear in the 16th Workshop on Hot Topics in Operating Systems (HotOS), 2017 [PDF]
      Workshop on Education for High Performance Computing. Carolyn Connor, Andree Jacobson, Amanda Bonnie, Gary Grider.  Next Generation HPC Workforce Development – The Computer System, Cluster, and Networking Summer Institute. EduHPC’16 Workshop in conjunction with SC’16, Salt Lake City, Utah. Nov 14, 2016. [URL]
    • Recruiting and Professional Development of HPC Systems Professionals. The Inaugural HPC Systems Professional workshop (HPCSYSPROS’16) at SuperComputing’16. Nov 14, 2016. Salt Lake City, UT. [URL]
    • DeltaFS: Exascale File Systems Scale Better Without Dedicated Servers. Q Zheng, K Ren, G Gibson, BW Settlemyer, G Grider. Supercomputing 2015. Parallel Data Storage Workshop. Austin, Texas. [PDF]
    • Caveat-Scriptor: Write Anywhere Shingled Disks. Saurabh Kadekodi, Swapnil Pimpale, Garth A. Gibson. HotStorage'15. USENIX Workshop. July 6-7, 2015. Sanga Clara, CA. [PDF]
    • Speculative Client Execution in Deferred Update Replication. B. Arun, S. Hirve, R. Palmieri, S. Peluso and B. Ravindran. ACM/IFIP/USENIX 9th Middleware for Next Generation Internet Computing (MW4NG). Workshop of the ACM/IFIP/USENIX 15th International Middleware Conference (Middleware 2014). December 8-12, 2014, Bordeaux, France.
    • Building Scalable Multimedia Search Engine Using Infiniband. Qi Chen, Yisheng Liao, Christopher Mitchell, Jinyang Li, Zhen Xiao. HotCloud '14. 6th USENIX Workshop on Hot Topics in Cloud Computing. Philadelphia, PA. June 17-18, 2014.
    • Achieving Data-Aware Load Balancing through Distributed Queues and Key/Value Stores. Ke Wang, Ioan Raicu. 3rd Annual Greater Chicago Area System Research Workshop (GCASR) Chicago, Illinois, 2014.
    • A Way Forward: Enabling Operating System Innovation in the Cloud. Dan Schatzberg, James Cadden, Orran Krieger, and Jonathan Appavoo, to appear in proceedings of the 6th USENIX Workshop on Hot Topics in Cloud Computing, 2014, PA
    • Automatic File System Metadata Exploration. Jun He, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau. Presented at WISDOM Spring Workshop. University of Wisconsin-Madison. May 2014.
    • High-Performance Key-Value Storage. JJ Wang, Jon Morton, Jordan Friendshuh, Allen Liu, Remzi Arpaci-Dusseau. Presented at WISDOM Spring Workshop. University of Wisconsin-Madison. May 2014.
    • Parameter Server for Distributed Machine Learning. Mu Li, Li Zhou, Zichao Yang, Aaron Li, Fei Xia, David Andersen and Alex Smola. Big Learning Workshop, NIPS 2013. Lake Tahoe, Nevada. December 5, 2013. [PDF]
    • Structuring PLFS for Extensibility. Cranor, Chuck, Milo Polte, Garth A. Gibson. Proc. of the Eighth Parallel Data Storage Workshop (PDSW13), co-located with the Int. Conference for High Performance Computing, Networking, Storage and Analysis (SC13), Denver, CO, November 2013. [URL]
    • Distributed Delayed Proximal Gradient Methods. Mu Li, David Andersen, and Alex Smola. Optimization Workshop, NIPS 2013. Lake Tahoe, December 9, 2013. [PDF]
    • SDAFT: A novel scalable data access framework for parallel blast. Jiangling Yin, Junyao Zhang, Jun Wang, and Wu-chun Feng. In the 2013 International Workshop on Data-Intensive Scalable Computing Systems (DISCS-2013) held in conjunction with the 2013 ACM/IEEE Supercomputing Conference (SC'13), Nov 18, Denver, CO, USA. [PDF]
    • Cooperative client caching strategies for social and web applications. Stavros Nikolaou, Robbert van Renesse, and Nicolas Schiper. Large-Scale Distributed Systems and Middleware (LADIS). Farmington, PA. November 2013. [PDF]
  • Presentations
    • PRObE Update: 1000 Node Systems Availability & Path Forward.Andree Jacobson and Garth Gibson. Birds-of-a-Feather session at Filesystems And Storage Technologies conference - FAST'14. Santa Clara, CA. Feb 18, 2014. [URL]
    • New Mexico Consortium and PRObE.Andree Jacobson. University of New Mexico Computer Science Department Colloquium. Apr 15, 2014. [URL]
    • PRObE - Building a super computer from scratch. Andree Jacobson. Presentation for 2014 New Mexico Super Computing Challenge students. Los Alamos, New Mexico. April 21, 2014.
    • New Mexico Consortium and PRObE. Andree Jacobson and Katharine Chartrand. New Mexico State University Computer Science Department Colloquium. Aug 21, 2013 [URL]
    • Availability of PRObE - 1000 nodes for systems research experiments. Garth Gibson (chair), Andree Jacobson,Nitin Agrawal, Jonathan Appavoo, Wyatt Lloyd, Jun Wang. Birds-of-a-Feather session at the Symposium on Operating Systems Principles - OSDI'13. Nemacolin Woodlands Resort, Pennsylvania, Nov 5, 2013. [URL]
    • Introduction & Tutorial for PRObE. Andree Jacobson. Booth presentation at SuperComputing Conference - SC'13. Denver, Colorado, Nov 20-21, 2013 [PDF]
    • Availability of NSF PRObE Cluster Resources. Garth Gibson and Andree Jacobson. Birds-of-a-Feather session. Filesystems and Storage Technologies (FAST'13) conference. San Jose, CA. February 13th, 2013. [URL]
    • PRObE: A 1000 Node Facility for Systems Infrastructure Researchers. Garth Gibson (CMU) and Andree Jacobson (NMC). USENIX Operating Systems Design and Implementation conference (OSDI'12). Tue Oct 9, 2012, Los Angeles, CA. [URL]
    • PRObE: A 1000 Node Facility for Systems Infrastructure Researchers. Garth Gibson (CMU), Gary Grider (LANL), Andree Jacobson(NMC). Birds-of-a-Feather session. SuperComputing conference (SC'12). Nov 14th, 2012. Salt Lake City, UT. [URL]
    • NSF PRObE: A Community Facility for Systems Testing at Scale. Garth Gibson. SOSP'11 WIP (Tues Oct 25, 2011, Cascais, Portugal). [URL]
    • PRObE. Robert Ricci. GENI Engineering Conference WIP, July 9-11, 2012, Boston, MA [URL]
    • NSF PRObE Community Testing-at-Scale Facility. Garth Gibson and Andree Jacobson. FAST'12 (Wednesday Feb 15, 2012, San Jose, CA) [URL]
    • Developing the Rarest of the Rare: Low-level Systems Infrastructure Skills at Scale.Gary Grider (LANL), Garth Gibson (CMU), Rob Ross (ANL), Karsten Schwan (GaTech). SC'11 Panel Discussion. Nov 17, 2011, Seattle, WA [URL]
    • PRObE: A 1000+ Node Systems Research Testbed to be Available in 2011.Garth Gibson (CMU), Gary Grider (LANL), Andree Jacobson (NMC), and Robert Ricci (University of Utah). Birds-of-a-Feather session. Filesystems and Storage Technologies conference (FAST'11) Feb 16, 2011. San Jose, CA [URL]
  • Posters
    • JamaisVu: Robust Scheduling with Auto-Estimated Job Runtimes. Alexey Tumanov, Angela Jiang, Jun Woo Park, Michael A. Kozuch, Gregory R. Ganger. OSDI’16. Savannah, GA. November 2-4, 2016 [PDF]
    • Visualizing File Transfer Agents for Increased Throughput on a Single Host. Amanda Bonnie, Zach Fuerst, Thomas Stitt. To Appear Supercomputing 2015. Austin, Texas. November 15-20, 2015. [HTML].
    • I/O Monitoring in a Hadoop Cluster. Carson L Wiens, Joshua M. C. Long, Joel R. Ornstein. Supercomputing 2014. New Orleans. November 16-21, 2014. [HTML] [PDF]
    • SLAM: Scalable Locality-Aware Middleware for I/O in Scientific Analysis and Visualization. Jun Wang. NSF CyberBridges Workshop 2014 (Poster Session). June 1-3, 2014. Arlington, VA. [PDF]
    • PRObE Overview. Andree Jacobson. Presented at SC'13, Denver, Colorado, Nov 19-23, 2013 [PDF]
    • Enabling Data-Intensive HPC Analytics for interdisciplinary community. Jun Wang. NSF CyberBridges Workshop 2013 (Poster Session). July 15-16, 2013. Arlington, VA.
    • Disk-Failure Injection Framework for Fault-Tolerant Systems Research. Yathindra Naik, Mike Hibler, Eric Eide, Robert Ricci. Poster session presented at USENIX Filesystems and Storage Technoligies (FAST) 2012. Feb 14-17. San Jose, CA. [PDF]
  • Technical Reports
    • Managed Communication and Consistency for Fast Data-Parallel Iterative Analytics. Jinliang Wei, Wei Dai, Aurick Qiao, Qirong Ho, Henggang Cui, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, Eric P. Xing. Parallel Data Laboratory. Carnegie Mellon University. Technical Report CMU-PDL-15-105. April 2015. [PDF]
    • SMPFRAME: A Distributed Framework for Scheduled Model Parallel Machine Learning. Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth Gibson, Eric Xing. Parallel Data Laboratory. Carnegie Mellon University. Technical Report CMU-PDL-15-103. May 2015. [PDF]
    • ShardFS vs. IndexFS: Replication vs. Caching Strategies for Distributed Metadata Management in Cloud Storage Systems. Lin Xiao, Kai Ren, Qing Zheng, Garth Gibson. Parallel Data Laboratory. Carnegie Mellon University. Technical Report CMU-PDL-15-104. April 2015. [PDF]
    • Extending CloudKon to Support HPC Job Scheduling. Isha Kapur, Karthik Belgodu, Pankaj Purandare, Iman Sadooghi, Ioan Raicu. Illinois Institute of Technology. Department of Computer Science. Technical Report. 2013. [PDF]
    • Exploring Distributed Resource Allocation Techniques in the SLURM Job Management System. Xiaobing Zhou, Hao Chen, Ke Wang, Michael Lang, Ioan Raicu. Illinois Institute of Technology. Department of Computer Science. Technical Report. 2013. [PDF]
    • Tetrisched: Space-Time Scheduling for Heterogeneous Datacenters. Alexey Tumanov, Timothy Zhu, Michael A. Kozuch†, Mor Harchol-Balter, Gregory R. Ganger, Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-13-112, December, 2013. Carnegie Mellon University, Pittsburgh, PA 15213 [PDF]
    • Exploring Data Compression in Distributed File Systems. Dongfang Zhao, Ioan Raicu. 2013. Illinois Institute of Technology. Department of Computer Science. IIT CS Technical Report. [PDF]
  • Whitepapers
    • SkyeFS: Distributed Directories using Giga+ and PVFS. Anthony Chivetta, Swapnil Patil & Garth Gibson. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-12-104, May 2012. Abstract / PDF [398K]
    • HPC Computation on Hadoop Storage with PLFS. Chuck Cranor, Milo Polte, Garth Gibson. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-12-115. Nov. 2012. Abstract / PDF [170K]
  • Online publications
    • Structure Aware Dynamic Scheduler For Parallel Machine Learning. Seunghak Lee, Jin Kyu Kim, Qirong Ho, Garth Gibson, Eric Xing, Cornell arXiv.org, Dec 2013. [PDF]
    • Tierra Encantada Charter School Tours PRObE Supercomputer Lab. Shannan Yeager and Carrie Talus. May 2014. NMC. [URL]
    • LANL Gives UNM Large Scale Computer System for Science and Engineering Research. Shannan Yeager and Carrie Talus. February 2014. NMC. [URL]
    • UNM Gains Supercomputer from the New Mexico Consortium. Kim Delker. UNM Press Release. February 2014. [URL]
    • PRObE Provides Access to Large Scale Supercomputers. Carol A. Clark. Los Alamos Daily Times. October 7, 2013. [URL]
    • Consistency, Availability, and Geo-Replicated Storage. Mike Freedman. Princeton SNS group blog. [URL]
    • Caring about Causality - now in Cassandra. Dave Anderson. Dave's Data blog. [URL]
    • High-School, College Students Instrumental in Assembling PRObE Supercomputer. Katy Bowman, Cogito.org, May 2013 [URL]
    • PRObE Project - Call for proposals. Garth Gibson, PRObE Website, May 2013 [URL]
    • NSF Supports Unique, High Performance Supercomputer Center, NSF Press Release 12-204, Oct 24, 2012. [URL]
  • Code repositories
    • wlloyd/eiger on github [URL]
  • Graduate Dissertations
    • Exploiting Application Characteristics for Efficient System Support of Data-Parallel Machine Learning. Henggang Cui. Electrical & Computer Engineering Carnegie Mellon University Pittsburgh, PA. May 2017. [PDF]
    • Scalable Resource Management System Software for Extreme-Scale Distributed Systems. Ke Wang. Illinois Institute of Technology. Department of Computer Science.  Chicago, Illinois. July 2015. [PDF]
    • Trading Freshness for Performance in Distributed Systems. James Cipar. 2014. Carnegie Mellon University. School of Computer Science. December 2014. [PDF]
    • On Fault-tolerant and High Performance Replicated Transactional Systems. Sachin Hirve. Preliminary Examination Proposal, Virginia Polytechnic Institute and State University, Doctor of Philosophy, Computer Engineering, October 17, 2014, Blacksburg, Virginia. [PDF]
    • Modeling Large Social Networks in Context. Qirong Ho. School of Computer Science. Carnegie Mellon University. CMU-ML-14-100. July 2014. [PDF]
    • Stronger Consistency and Sematics for Low-Latency Geo-Replicated Storage. Wyatt Lloyd. 2013. Princeton University. June 2013. [PDF]
    • Paxos-based directory update for geo-replicated cloud storage. Srivathsava Rangarajan. 2014. Masters Thesis, School of Electrical and Computer Engineering, Purdue University (expected July 2014)
    • Systematic and Scalable Testing of Concurrent Programs. Jiří Šimša. 2013. PhD. Thesis. Technical Report CMU-CS-13-133. Carnegie Mellon University. December 2013. [PDF]
  • Submitted/Works In Progress
    • A Way Forward: Enabling Operating System Innovation in the Cloud. To appear in proceedings of  the 6th USENIX Workshop on Hot Topics in Cloud Computing, 2014, PA
    • EbbRT: Elastic Building Block Runtime. Submitted to OSDI 2014.
    • Scale-outing I/O in Scientific Analysis and Visualization. Submitted to AC/IEEE SC2014.
    • A Unified Storage Framework for Hybrid Scientific Workflow. Submitted to AC/IEEE SC2014.
    • G-SD: Achieving Scalable Reverse Lookup using Group-based Shifted Declustering Layout in Large-scale File Systems. Pending review with IEEE Transactions on Cloud Computing.
    • SDAFT: A Novel Scalable Data Access Framework for Parallel BLAST Parallel Computing. Pending review with Parallel Computing.
    • Distributed Convolutional Neural Networks across GPU clusters. Under submission
    • RoCoCo: Extracting More Concurrency in Distributed Transactions. Under submission.
    • Fast B-tree Storage for RDMA-Enabled Networks.  Under submission .
    • Scaling Distributed Machine Learning with the Parameter Server. Submitted into OSDI 2014.
    • Scalable cost-effective partial replication for causally consistent geo-replicated cloud storage. Technical Report. In preparation.
    • HiperTM:High Performance, Fault Tolerant Transactional Memory, submitted to the Journal of Theoretical Computer Science, Elsevier. [PDF]
    • Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion, submitted to ACM/IEEE SC 2014.
    • Scheduling Big Models for Parallel Machine Learning, submitted to Neural Information Processing Systems (NIPS 2014).
  • Miscellaneous
    • Repurposing Supercomputers - What Happens on the "Other Side". Andree Jacobson. CIO Review. February 20, 2016. [PDF]
    • What is PRObE. Andree Jacobson. Techapp for SC'13. Denver. Colorado. Nov 19-23, 2013. [PDF]