WSDM'22 Workshop FL4P-WSDM: Federated Learning for Private Web Search and Data Mining

    Many popular web-based services and data mining applications nowadays leverage the power of machine learning (ML) and artificial intelligence (AI) to ensure effective performance. All of these are made possible because of the huge volume of data constantly generated on various devices, such as PCs/laptops and mobile smartphones.

    Centralized ML and AI pose significant challenges due to regulatory and privacy concerns in real-world use cases. Privacy has been traditionally viewed as an essential human right. There have been increasing legislation endeavors on data privacy protection, e.g. European Union General Data Protection Regulation and California Consumer Privacy Act.

    Federated learning (FL) is a new paradigm in machine learning that was first introduced by Google in 2017. It aims to address the challenges above by training a global model using distributed data, without the need for the data to be shared nor transferred to any central facility. Despite the clear advantages, there are still many technical challenges waiting to be solved, such as fairness issues, data statistical heterogeneity, communication efficiency and network robustness.

    The workshop is targeted on the above and other relevant issues, aiming to create a platform for people from academia and industry to communicate their insights and recent results.

    Topics of interest include, but are not limited to, the following:

FL algorithm related issues, e.g. adversarial attack, communication compression, algorithm explainability/interpretability, data/device heterogeneity, optimization algorithm advances, personalization, fairness, resource efficiency, and so on;
FL and collaborative ML applications, like advertising, query analysis and processing, web healthcare, search engine, log mining, recommender system, blockchain，social network, and others;
Other data privacy preservation techniques, such as differential privacy, secure multi-party computing, data/model distillation, data anonymization, etc;
Social, operational challenges and legislation issues about privacy in web search and data mining;
Datasets and open-source tools for federated and privacy-preserving web search and data mining.

Time	Speaker	Title
9:00-9:05 (MST)	Opening remark
9:05-10:05	Keynote 1 (Yang Liu)	Towards Private, Efficient and Robust Cross-Silo Federated Learning
10:05-11:05	Keynote 2 (Ameet Talwalkar)	Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing
11:05-11:15	Student talk 1 (Liwei Che）	FedTriNet: A Pseudo Labeling Method with Three Players for Federated Semi-supervised Learning
11:15-11:25	Student talk 2 (Qi Chang)	Federated Synthetic Learning from Multi-institutional and Heterogeneous Medical Data
11:25-11:35	Break
11:35-12:35	Keynote 3 (Salman Avestimehr and Chaoyang He)	FedML: Social, Secure, Scalable, and Efficient Edge-Cloud Platform for Federated Learning
12:35-13:35	Keynote 4 (Heiko Ludwig)	Federated Learning for the Enterprise - Addressing Organizational and Regulatory Boundaries for Machine Learning
13:35-13:45	Student talk 3 (Kai Zhang）	Sword and Shield for Data Privacy in Federated Representation Learning
13:45-13:55	Student talk 4 (Pengfei Guo）	Multi-Institutional Collaborations for Improving Deep Learning-Based Magnetic Resonance Image Reconstruction Using Federated Learning
13:55-14:00	Closing remark

Keynote Abstract

Yang Liu, Tsinghua University

Towards Private, Efficient and Robust Cross-Silo Federated Learning

In this talk, I will mainly introduce the concept and advances for cross-silo Federated learning. Cross-silo federated learning (FL) enables organizations (e.g., financial, or medical) to collaboratively train a machine learning model without sharing privacy-sensitive data. Applying cross-silo Federated Learning to real-world systems still faces major challenges, including privacy protection, model complexity and performance, computation and communication efficiency in heterogeneous environments. In this talk, we discuss new advances in research for tackling these challenges. To this end, we introduce a new paradigm to take advantage of a powerful server model to boost the privacy, robustness and efficiency in a unified FL framework, where knowledge instead of model parameters/gradients are communicated and accumulated.

Ameet Talwalkar, Carnegie Mellon University

Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing

Tuning hyperparameters is a crucial but arduous part of the machine learning pipeline. Hyperparameter optimization is even more challenging in federated learning, where models are learned over a distributed network of heterogeneous devices; here, the need to keep data on device and perform local training makes it difficult to efficiently train and evaluate configurations. In this work, we investigate the problem of federated hyperparameter tuning. We first identify key challenges and show how standard approaches may be adapted to form baselines for the federated setting. Then, by making a novel connection to the neural architecture search technique of weight-sharing, we introduce a new method, FedEx, to accelerate federated hyperparameter tuning that is applicable to widely-used federated optimization methods such as FedAvg and recent variants. Theoretically, we show that a FedEx variant correctly tunes the on-device learning rate in the setting of online convex optimization across devices. Empirically, we show that FedEx can outperform natural baselines for federated hyperparameter tuning by several percentage points on the Shakespeare, FEMNIST, and CIFAR-10 benchmarks, obtaining higher accuracy using the same training budget.

Salman Avestimehr and Chaoyang He, FedML Inc. & University of Southern California

FedML: Social, Secure, Scalable, and Efficient Edge-Cloud Platform for Federated Learning

We provide an overview of the FedML platform (https://fedml.ai), which consists of (1) cutting-edge federated learning algorithms; (2) a lightweight and cross-platform Edge AI SDK for deployment over GPUs, smartphones, and IoTs; and (3) a user-friendly MLOps platform to simplify collaboration and real-world deployment; and (4) platform-supported vertical Solutions across a broad range of industries. We highlight several key design features and their algorithmic innovations. We also present a tutorial on using FedML.

Heiko Ludwig, IBM Almaden Research Center

Federated Learning for the Enterprise - Addressing Organizational and Regulatory Boundaries for Machine Learning

Increasing use of machine learning and other uses of data have led to increased regulation around the world. Under particular scrutiny is the gathering of large sets of person-identifying data. Data locality and use regulation as well as liability risk exposure is leading to an increased consideration of federated learning in enterprise use cases. Federated learning also provides increased opportunity for collaboration between companies on non-competitive models. This talk will review some use cases arising in enterprise federated learning, address some important issues arising and some recent approaches to solving these issues.

Speakers

Ameet Talwalkar

Assistant professor
Carnegie Mellon University

Dr. Ameet Talwalkar, Assistant Professor, Carnegie Mellon University

Dr. Ameet Talwalkar is an assistant professor in the Machine Learning Department at Carnegie Mellon University. He also co-founded and served as Chief Scientist at Determined AI until its recent acquisition by HPE. His primary interest is in the field of statistical machine learning. His current work is motivated by the goal of democratizing machine learning, with a focus on topics related to automation, interpretability, and distributed learning. He also helped to create the MLSys conference, and currently serve as President of the MLSys Board. More information about Ameet can be found on his personal homepage:Ameet Talwalkar.

Chaoyang He

Co-founder & CTO
FedML Inc.

Mr. Chaoyang He, FedML Inc. & University of Southern California

Mr. Chaoyang He is Co-founder & CTO at FedML Inc., PhD Candidate at USC. Previously, He was an R&D Team Manager and Staff Software Engineer at Tencent (2014-2018), a Team Leader and Senior Software Engineer at Baidu (2012-2014), and a Software Engineer at Huawei (2011-2012). His research focuses on distributed/federated machine learning algorithms, systems, and applications. He has received a number of awards in academia and industry, including Amazon ML Fellowship (2021-2022), Qualcomm Innovation Fellowship (2021-2022), Tencent Outstanding Staff Award (2015-2016), WeChat Special Award for Innovation (2016), Baidu LBS Group Star Awards (2013), and Huawei Golden Network Award (2012). During his Ph.D. study, he has published papers at ICML, NeurIPS, CVPR, MLSys, AAAI, among others. Besides pure research, he also has R&D experience for Internet products and businesses such as Tencent Cloud, Tencent WeChat Automotive / AI in Car, Tencent Games, Tencent Maps, Baidu Maps, and Huawei Smartphone. More information about Chaoyang can be found on his personal homepage:Chaoyang He.

Heiko Ludwig

Principal Research Staff Member and Senior Manager
IBM Almaden Research Center

Dr. Heiko Ludwig, IBM Almaden Research Center

Dr. Heiko Ludwig is Principal Research Staff Member and Senior Manager with IBM’s Almaden Research Center in San Jose, CA, USA. Leading the AI Platforms research group, Heiko is currently working on topics related to computational platforms for AI, more specifically for machine learning, from Cloud to IoT. This includes federated machine learning and inference along with machine learning privacy and security. The results of this work contribute to various IBM lines of business. He is an ACM Distinguished Engineer and published more than 100 refereed articles, conference papers, and book chapters as well as technical reports. He is a managing editor of the International Journal of Cooperative Information Systems and involved in about 150 program committees of conferences and workshops in the field, where he also gave a number of keynote speeches and served as PC Co-Chair and General Co-Chair. Prior to the Almaden Research Center, he held different positions at IBM in the TJ Watson Research Center, the Zurich Research Laboratory, and IBM’s South American Delivery Centers in Argentina and Brazil. He holds a PhD in information systems (Wirtschaftsinformatik) from Otto-Friedrich University Bamberg, Germany. More information about Heiko can be found on his personal homepage:Heiko Ludwig.

Salman Avestimehr

Dean's Professor, University of Southern California
CEO, FedML Inc.

Prof. Salman Avestimehr, FedML Inc. & University of Southern California

Dr. Salman Avestimehr is the CEO at FedML Inc.，a Dean's Professor, the inaugural director of the USC-Amazon Center on Secure and Trusted Machine Learning (Trusted AI), and the director of the Information Theory and Machine Learning (vITAL) research lab at the Electrical and Computer Engineering Department of University of Southern California. He is also an Amazon Scholar at Alexa AI. His research interests include information theory, and large-scale distributed computing and machine learning, secure and private computing/learning, and federated learning. Dr. Avestimehr has received a number of awards for his research, including the James L. Massey Research & Teaching Award from IEEE Information Theory Society, an Information Theory Society and Communication Society Joint Paper Award, a Presidential Early Career Award for Scientists and Engineers (PECASE) from the White House (President Obama), a Young Investigator Program (YIP) award from the U. S. Air Force Office of Scientific Research, a National Science Foundation CAREER award, the David J. Sakrison Memorial Prize, and several Best Paper Awards at Conferences. He has been an Associate Editor for IEEE Transactions on Information Theory and a general Co-Chair of the 2020 International Symposium on Information Theory (ISIT). He is a fellow of IEEE. More information about Salman can be found on his personal homepage: Salman Avestimehr.

Yang Liu

Associate Professor
Tsinghua University

Prof. Yang Liu, Tsinghua University

Dr. Yang Liu is Associate Professor at the Institute for AI Industry Research (AIR) of Tsinghua University, China. Before joining Tsinghua University, she was Principal Researcher and Research Team Lead at WeBank. Her research interests include federated learning, machine learning, multi-agent systems, statistical mechanics and AI indutrial applications. She received her PhD from Princeton University. She holds more than 20 patents and more than 100 patent applications. Her research was published in well-known international conferences and journals including Nature, AAAI, IJCAI, USENIX, and ACM TIST, receiving more than 3000 citations overall. She coauthored Federated Learning [6], the first book on federated learning. She also serves as a guest/associate editor for IEEE Intelligent Systems and ACM TIST. She co-chaired multiple workshops at IJCAI, AAAI and NeurIPS. Her research work has been recognized with multiple awards, such as AAAI Innovation Award and CCF Technology Award. More information about Yang can be found on her personal homepage: Yang Liu.