Yu (Emma) Wang

I am a staff software engineer at Google. I work on analyzing and optimizing the performance of Machine Learning workloads, especially large models. I obtained my Ph.D from Harvard University with the Harvard Architecture, Circuits, and Compiler Group in 2019. My advisors are Prof. David Brooks and Prof. Gu-Yeon Wei. My dissertation committee also includes Prof. Vijay Reddi and Dr. Cliff Young. I received my Bachelor of Science degree in Computer Science from Shanghai Jiao Tong University in 2013.

Professional Experience

Staff Software Engineer, Google

April 2023 -- Present

Optimizing the performance of large models.

Senior Software Engineer, Google

April 2021 -- April 2023

Optimized the performance of large models.

Software Engineer, Google

Nov 2019 -- April 2021

Speeded up fleetwide ML workloads with an automatic optimization system.

Research Assistant, Harvard University

Sep 2013 -- Sep 2019

Conducted deep and systematic performance analysis for machine learning workloads and extracted architectural insights to optimize those workloads.
(See my dissertation.)

Research Intern, Facebook

Mentors: Xiaodong Wang and Carole-Jean Wu

Nov 2018 -- Feb 2019

Performed performance comparison across different deep learning frameworks and identified the source of performance difference in depth.
Extracted insights to optimize Caffe2 from the analysis results.

Software Engineering Intern, Google Platforms

Mentor: Hui Huang

May -- Aug 2018

Benchmarked 3rd generation of Tensor Processing Units (TPU v3) with state-of-the-art deep learning workloads.
Predicted potential bottlenecks of Cloud TPU v3.
Quantified the impact of NUMA-aware allocation for Cloud TPU v3.
Shared Silver Perfy Award in 2019 Q1 at Google.

Software Engineering Intern, Google Brain

Mentor: Cliff Young

Sep -- Dec 2017

Benchmarked 2nd generation of Tensor Processing Units (TPU v2) with state-of-the-art deep learning workloads and analyzed their bottlenecks.
Quantified performance scalability and speedup of Cloud TPU v2.

Parallel Computing Intern, Intel Labs

Mentor: Victor Lee

July 2015 -- Jan 2016

Developed a set of tools to characterize CPU workloads, extracted platform independent features including memory locality, memory footprint, and branch entropy.

Research Assistant, Shanghai Jiao Tong University

Mentor: Prof. Bo Yuan

Sep 2011 -- July 2013

Optimized a Bayesian network learning algorithm and implemented on GPU.
Achieved a 143× speedup on GPU over CPU.
Applied this method to networks of up to 125 nodes.

Research Assistant, Shanghai Jiao Tong University

Mentor: Prof. Xiaoyao Liang

June 2012 -- July 2013

Designed a hardware profile-guided scheduler for green datacenters.
Reduce the datacenter energy cost up to 54% while maintaining fairly balanced processor utilization.

Publications

Sameer Kumar, Yu Emma Wang, Cliff Young, James Bradbury, Anselm Levskaya, Blake Hechtman, Dehao Chen, HyoukJoong Lee, Mehmet Deveci, Naveen Kumar, Pankaj Kanwar, Shibo Wang, Skye Wanderman-Milne, Steve Lacy, Tao Wang, Tayo Oguntebi, Yazhou Zu, Yuanzhong Xu, Andy Swing, "Exploring the limits of Concurrency in ML Training on Google TPUs." MLSys (2021).

Yu Emma Wang, Carole-Jean Wu, Xiaodong Wang, Kim Hazelwood, David Brooks, "Exploiting Parallelism Opportunities with Deep Learning Frameworks." arXiv preprint arXiv:1908.04705 (2019).

Yu Emma Wang, Gu-Yeon Wei, David Brooks, "A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms." MLSys (2020).

(The arXiv version of the above paper) Yu Emma Wang, Gu-Yeon Wei, David Brooks, "Benchmarking TPU, GPU and CPU for Deep Learning." arXiv preprint arXiv:1907.10701 (2019).

Yu Emma Wang, Yuhao Zhu, Glenn G. Ko, Brandon Reagen, Gu-Yeon Wei, and David Brooks. "Demystifying Bayesian Inference Workloads." IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 177-189. IEEE, 2019.

Yu Emma Wang, Victor Lee, Gu-Yeon Wei, and David Brooks. "Predicting New Workload or CPU Performance by Analyzing Public Datasets." ACM Transactions on Architecture and Code Optimization (TACO). vol. 15, no. 4 (2019): 53:1–53:21.

Yu Emma Wang, Weikang Qian, Shuchang Zhang, Xiaoyao Liang, and Bo Yuan. "A Learning Algorithm for Bayesian Networks and Its Efficient Implementation on GPU," IEEE Transactions on Parallel and Distributed Systems. vol. 27, no. 1 (2016): 17–30.

Weichao Tang, Yu Emma Wang, Haopeng Liu, Tao Zhang, Chao Li, and Xiaoyao Liang. "Exploring Hardware Profile-Guided Green Datacenter Scheduling." International Conference on Parallel Processing (ICPP), pp. 11-20. 2015.

Open-Source Software

Feel free to download our software and use in your project. If you do, please cite our corresponding papers.

Technical Program Committee

Machine Learning and Systems Rising Stars 2024
Conference on Machine Learning and Systems (MLSys) 2024
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2024
ACM/IEEE Supercomputing Conference (SC) 2023
MLBench workshop in MLSys'23
Conference on Machine Learning and Systems (MLSys) 2023
ACM/IEEE Supercomputing Conference (SC) 2022
Conference on Machine Learning and Systems (MLSys) 2022
MLBench workshop in MLSys'21

Journal reviews

IEEE Computer Architecture Letters (CAL)
ACM Transactions on Architecture and Code Optimization (TACO)

Talks

Demystify Bayesian Inference Workloads

ISPASS, Madison, WI, March 2019.
ADA Symposium, Ann Arbor, MI, April 2019.

A Systematic Methodology for Analysis of Deep Learning Platforms

Google, Aug 2018.
Google, Aug 2018.
Google, Sep 2018. (No, the three lines are not typos.)
Facebook, Sep 2018.
ADA Center, Dec 2018.
IBM, March 2019.
Micron, May 2019.
MLSys, March 2020.

Yu (Emma) Wang

Professional Experience

Publications

Dissertation

Open-Source Software

ParaDnn

Mille Crepe Bench

BayesSuite

BN-GPU

Professional Service

Talks

Photography

Guggenheim Museum

Monterey Aquarium

Izakaya

Dessert in Lauduree

A Sushi Restaurant

An Italian Restaurant

Duck Confit