Email: (@ yuemmawang (. google com))

Google Scholar Profile

(Emma reserves the copyright of all the photos on this website.)

Professional Experience

Staff Software Engineer, Google

April 2023 -- Present

  • Optimizing the performance of large models.

Senior Software Engineer, Google

April 2021 -- April 2023

  • Optimized the performance of large models.

Software Engineer, Google

Nov 2019 -- April 2021

  • Speeded up fleetwide ML workloads with an automatic optimization system.

Research Assistant, Harvard University

Sep 2013 -- Sep 2019

  • Conducted deep and systematic performance analysis for machine learning workloads and extracted architectural insights to optimize those workloads.
  • (See my dissertation.)

Research Intern, Facebook

Mentors: Xiaodong Wang and Carole-Jean Wu

Nov 2018 -- Feb 2019

  • Performed performance comparison across different deep learning frameworks and identified the source of performance difference in depth.
  • Extracted insights to optimize Caffe2 from the analysis results.

Software Engineering Intern, Google Platforms

Mentor: Hui Huang

May -- Aug 2018

  • Benchmarked 3rd generation of Tensor Processing Units (TPU v3) with state-of-the-art deep learning workloads.
  • Predicted potential bottlenecks of Cloud TPU v3.
  • Quantified the impact of NUMA-aware allocation for Cloud TPU v3.
  • Shared Silver Perfy Award in 2019 Q1 at Google.

Software Engineering Intern, Google Brain

Mentor: Cliff Young

Sep -- Dec 2017

  • Benchmarked 2nd generation of Tensor Processing Units (TPU v2) with state-of-the-art deep learning workloads and analyzed their bottlenecks.
  • Quantified performance scalability and speedup of Cloud TPU v2.

Parallel Computing Intern, Intel Labs

Mentor: Victor Lee

July 2015 -- Jan 2016

  • Developed a set of tools to characterize CPU workloads, extracted platform independent features including memory locality, memory footprint, and branch entropy.

Research Assistant, Shanghai Jiao Tong University

Mentor: Prof. Bo Yuan

Sep 2011 -- July 2013

  • Optimized a Bayesian network learning algorithm and implemented on GPU.
  • Achieved a 143× speedup on GPU over CPU.
  • Applied this method to networks of up to 125 nodes.

Research Assistant, Shanghai Jiao Tong University

Mentor: Prof. Xiaoyao Liang

June 2012 -- July 2013

  • Designed a hardware profile-guided scheduler for green datacenters.
  • Reduce the datacenter energy cost up to 54% while maintaining fairly balanced processor utilization.

Publications

Sameer Kumar, Yu Emma Wang, Cliff Young, James Bradbury, Anselm Levskaya, Blake Hechtman, Dehao Chen, HyoukJoong Lee, Mehmet Deveci, Naveen Kumar, Pankaj Kanwar, Shibo Wang, Skye Wanderman-Milne, Steve Lacy, Tao Wang, Tayo Oguntebi, Yazhou Zu, Yuanzhong Xu, Andy Swing, "Exploring the limits of Concurrency in ML Training on Google TPUs." MLSys (2021).

Yu Emma Wang, Carole-Jean Wu, Xiaodong Wang, Kim Hazelwood, David Brooks, "Exploiting Parallelism Opportunities with Deep Learning Frameworks." arXiv preprint arXiv:1908.04705 (2019).

Yu Emma Wang, Gu-Yeon Wei, David Brooks, "A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms." MLSys (2020).

(The arXiv version of the above paper) Yu Emma Wang, Gu-Yeon Wei, David Brooks, "Benchmarking TPU, GPU and CPU for Deep Learning." arXiv preprint arXiv:1907.10701 (2019).

Yu Emma Wang, Yuhao Zhu, Glenn G. Ko, Brandon Reagen, Gu-Yeon Wei, and David Brooks. "Demystifying Bayesian Inference Workloads." IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 177-189. IEEE, 2019.

Yu Emma Wang, Victor Lee, Gu-Yeon Wei, and David Brooks. "Predicting New Workload or CPU Performance by Analyzing Public Datasets." ACM Transactions on Architecture and Code Optimization (TACO). vol. 15, no. 4 (2019): 53:1–53:21.

Yu Emma Wang, Weikang Qian, Shuchang Zhang, Xiaoyao Liang, and Bo Yuan. "A Learning Algorithm for Bayesian Networks and Its Efficient Implementation on GPU," IEEE Transactions on Parallel and Distributed Systems. vol. 27, no. 1 (2016): 17–30.

Weichao Tang, Yu Emma Wang, Haopeng Liu, Tao Zhang, Chao Li, and Xiaoyao Liang. "Exploring Hardware Profile-Guided Green Datacenter Scheduling." International Conference on Parallel Processing (ICPP), pp. 11-20. 2015.

Dissertation

Yu Emma Wang. "Performance Analysis for Machine Learning Applications." PhD Dissertation, Harvard University, Nov 2019.

Open-Source Software

Feel free to download our software and use in your project. If you do, please cite our corresponding papers.

ParaDnn

ParaDnn is a tool that enables systematic performance analysis for deep learning platforms.

Mille Crepe Bench

Mille Crepe Bench is a multi-layer performance analysis tool for deep learning frameworks.

BayesSuite

BayesSuite is a Bayesian inference benchmark suite based on Stan.

BN-GPU

BN-GPU is a GPU implementation of a Bayesian network learning algorithm.

Professional Service

Technical Program Committee

  • Machine Learning and Systems Rising Stars 2024
  • Conference on Machine Learning and Systems (MLSys) 2024
  • ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2024
  • ACM/IEEE Supercomputing Conference (SC) 2023
  • MLBench workshop in MLSys'23
  • Conference on Machine Learning and Systems (MLSys) 2023
  • ACM/IEEE Supercomputing Conference (SC) 2022
  • Conference on Machine Learning and Systems (MLSys) 2022
  • MLBench workshop in MLSys'21

Journal reviews

  • IEEE Computer Architecture Letters (CAL)
  • ACM Transactions on Architecture and Code Optimization (TACO)

Talks

Demystify Bayesian Inference Workloads

  • ISPASS, Madison, WI, March 2019.
  • ADA Symposium, Ann Arbor, MI, April 2019.

A Systematic Methodology for Analysis of Deep Learning Platforms

  • Google, Aug 2018.
  • Google, Aug 2018.
  • Google, Sep 2018. (No, the three lines are not typos.)
  • Facebook, Sep 2018.
  • ADA Center, Dec 2018.
  • IBM, March 2019.
  • Micron, May 2019.
  • MLSys, March 2020.

Photography

I enjoy food, traveling, photographing and interacting with people. These are samples of the outcome.

For more photos please refer to my 500PX page. Copyright reserved :)