Projects & Talks
Privately Learning from Graphs with Applications in Fine-tuning Large Language Models
Existing privacy-preserving methods, such as DP-SGD, which rely on gradient decoupling assumptions, are unsuited for relational learning due to the inherent dependencies between coupled training samples. We first propose a privacy-preserving relational learning pipeline that decouples dependencies in sampled relations during training, ensuring differential privacy through a tailored application of DP-SGD. We apply this method to fine-tune LLMs (e.g., BERT, Llama2) on sensitive graph data and tackle the associated computational complexities. The results demonstrate significant improvements in relational learning tasks, all while maintaining robust privacy guarantees during training.
Yin, H., Wei, R., Chien, E., & Li, P. (2024). Privately Learning from Graphs with Applications in Fine-tuning Large Language Models. In Workshop on Statistical Frontiers in LLMs and Foundation Models @ NeurIPS 2024. [PDF(arXiv), Code, Poster]
Learning Scalable Structural Representations for Link Prediction with Bloom Signatures
Bloom signatures are hashing-based compact encodings of node neighborhoods, which are used to augment the message-passing framework for structural link representations. GNNs with Bloom signatures are provably more expressive than vanilla MPNNs and more scalable than existing edge-wise models. A neural network that inputs Bloom signatures can estimate any type of neighborhood overlap-based heuristic with guaranteed accuracy.
Zhang, T.*, Yin, H.*, Wei, R., Li, P., & Shrivastava, A. (2024). Learning Scalable Structural Representations for Link Prediction with Bloom Signatures. In Proceedings of the ACM Web Conference 2024. [Link, PDF(arXiv), Code]
SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning
SUREL is a novel set-based computation framework for scaling subgraph-based GRL to industry-level graphs. It is the first time that SGRL has been successfully deployed on a billion-edge graph (twitter-2010). SUREL+ substitutes costly subgraph extraction by node set sampling, where the set union via online joining can act as a proxy of query-induced subgraphs for the prediction of given queries.
Yin, H., Zhang, M., Wang, J., & Li, P. (2023). SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning. In Proceedings of the VLDB Endowment 16 (11): 2939-2948. [Link, PDF(arXiv), Code].
Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning
SUREL is a novel framework for efficient Subgraph-based Graph Representation Learning by co-designing the learning algorithm and its system support. It adopts the walk-based decomposition of subgraphs and reuses the walks to form subgraphs, substantially reducing the redundancy of subgraph extraction and enabling parallel computation.
Yin, H., Zhang, M., Wang, Y., Wang, J., & Li, P. (2022). Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning. In Proceedings of the VLDB Endowment 15 (11): 2788-2796. [Link, PDF(arXiv), Code].
Revisiting Graph Neural Networks and Distance Encoding From a Practical View
GNNs with Distance Encoding (DE) technique are reviewed for learning on graphs: 1) categorize the labels for node classification tasks into community type and structure type. 2) investigate how DE makes GNNs fit for tasks like node classification and link prediction. 3) design eight variants to identify the mechanism that GNNs adopt to predict two types of node labels under different graph settings.
Yin, H., Wang, Y., & Li, P. (2020). Revisiting Graph Neural Networks and Distance Encoding From a Practical View. In Proceedings of the 35th AAAI Conference on Artificial Intelligence DLG Workshop. [Link, PDF(arXiv), Code].
Graph-Structured Sequence Modeling through Spatio-Temporal U-Network
Designed a novel multi-scale architecture, Spatio-Temporal U-Net (ST-UNet), for graph-structured time series modeling. In this U-shaped network, a paired sampling operation is proposed in the domain of space and time accordingly: the pooling (ST-Pool) and the unpooling (ST-Unpool). To better localize the representation from the input, higher-level features retrieved from the pooling part are concatenated with the upsampled output. The final output of ST-UNet can be utilized for predicting node attributes or the entire graph in the next few time steps.
Yu, B.*, Yin, H.*, & Zhu, Z. (2019). ST-UNet: A Spatio-Temporal U-Network for Graph-structured Time Series Modeling. arXiv preprint arXiv:1903.05631. [PDF(arXiv)].
Machine Learning Attacks to Location Privacy
Developed a model of adversary that uses machine learning to learn about the geographical data collected from users of location-based services and the corresponding privacy mechanism, and then performs its attack on users’ location privacy.
Project finished as a member of the international internship program at LIX, École Polytechnique & Inria Saclay.
Traffic Prediction with Deep Spatial Temporal Neural Nets
Designed a fully integrated convolutional neural network to precisely model the topology of the road network and then accurately forecast the future traffic condition (speed, flow or volume) of the network through space-time series in the mid-and-long term.
Yu, B.*, Yin, H.*, & Zhu, Z. (2018). Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (pp. 3634-3640). [Link, PDF (arXiv), Code, Slides].
Neural Artist - Style Transfer in Short Videos
Applied Conditional GANs and Fast Style Transfer to convert short videos into customized styles (e.g. Van Gogh, The Starry Night) through DNN-based texture abstraction and redesigned loss function to balance and minimize the flicker between rendering frames.
This project [Link] is awarded 'the Most Technical Difficulty Award' at Schlumberger HackPKU 2017.
Hotspot Prediction Based on Temporal Trajectory and Social Attributes
Proposed a location-vector embedding framework (Loc2vec) to predict the geographic hotspot in a certain area and explore its semantic meaning based on temporal trajectories, which are linked in a chronological order through users’ check-ins gathered from location-based social networks.
YIN, H., & LIU, Y. (2017). Semantic analysis of spatial temporal trajectory in LBSNs. (in Chinese) SCIENTIA SINICA Informationis, 47(8), 1051-1065. [Link, PDF]