1 Introduction

In recent years, graph representation learning and cell profiling have emerged as potent tools in understanding biological systems and identifying novel therapeutic strategies [1][5]. By amalgamating these state-of-the-art technologies, we can harness the rich, high-dimensional data within the framework of network medicine, providing crucial insights into the relationships amongst chemical compounds, diseases, proteins, and genes. This research focuses on the applicability of graph representation learning in analyzing cell profiling data to uncover latent correlations instrumental in propelling drug discovery.

The concept of graphs or networks has become a cornerstone in biomedical research, providing a platform to represent complex biological systems and associations [6], [7]. They can encapsulate the intricate relationships among various entities, such as molecular interactions, protein-protein relations, and gene-disease connections. The vast expanse of biological data and associations they hold make them a compelling platform for applying deep learning techniques, specifically graph representation learning. The potential of graphs extends beyond the realm of biology, as they allow the extrapolation of insights from other complex networks, such as the World Wide Web and social sciences [2].

Graphs map out different biological entities to a set of nodes and links, with nodes representing components of a biological system and links signifying the interactions between these components. To effectively understand these networks, graph representation learning has emerged as a powerful approach [8], [9]. It involves the transformation of nodes and edges into a lower vectorial space, known as embedding. Once the complex structure of the graph is transposed into this lower space, various machine-learning techniques can be applied to the data.

The application of graph representation learning in biomedical research has seen substantial progress in recent years. Machine learning algorithms, such as graph neural networks (GNNs), have been developed for various applications, including molecular interactions and recommendation systems. These techniques have shown significant promise in biological and biomedical data, predicting protein-protein interactions, understanding gene-disease associations, and discovering new drug targets [10].

Cell profiling, on the other hand, has emerged as a complementary strategy that provides a high-resolution view of biological systems at a cellular level. It involves the comprehensive analysis of cells in terms of their physiological, morphological, and molecular characteristics, allowing for the identification of phenotypic changes associated with disease states or drug responses. By producing high-dimensional data, cell profiling captures the complexity of cellular behaviours and responses, paving the way for discovering novel biomarkers and therapeutic targets [11].

1.1 Graphs

Networks, represented as graphs, can describe many biological entities and associations, making them highly effective tools in biomedical research. Graphs can represent components of a biological system as nodes or vertices and the interactions or relations between these components as links or edges. Graphs can be categorized into several models, such as scale-free, random, and hierarchical networks, each with distinct architectural features. These models can be mathematically analyzed through their topology and dynamics, with size-dependent descriptors such as the degree, path length, and clustering coefficient quantifying their connectivity, navigability, and local interconnectedness, respectively [6].

1.2 Graph representation learning

Graph representation learning, a powerful approach for understanding and extracting meaningful information from complex networks, has gained prominence recently. This paradigm is beneficial for various downstream tasks such as node classification, link prediction, and graph classification, which require learning and encoding graph data’s inherent structure and features [12].

Summary on network basics

Graph embedding methods are central to graph representation learning. These methods aim to represent nodes, edges, or entire graphs as continuous low-dimensional vectors while preserving the underlying graph structure. This process allows applying different types of machine-learning techniques on the data. Both unsupervised and supervised learning paradigms have been employed in deriving these embeddings. Unsupervised methods such as DeepWalk [13] and node2vec [14] utilize graph connectivity patterns to learn latent feature representations, while supervised methods like GraphSAGE [15] and Graph Convolutional Networks (GCN) employ node features and labels to guide the learning process [16], [17].

Recent developments in the field of graph neural networks (GNNs) have introduced sophisticated message-passing techniques, such as those employed by the GraphSAGE and MPNN (Message Passing Neural Network) frameworks [18]. These techniques provide a powerful way to learn node and edge representations by propagating and aggregating information from local neighborhoods. Moreover, they have been extended to non-Euclidean domains with frameworks like ChebyNet, GAT, and Recurrent Multi-Graph Neural Networks, enabling the capture of the intrinsic geometry and topology of the graph [19][21].

In the domain of autoencoders, developments like SDNE, DNGR, and VGAE have shown effectiveness in learning low-dimensional embeddings from the graph’s structure without supervision [22][24]. These models often employ techniques such as matrix factorization and skip-gram models to reconstruct the original graph from the learned embeddings. Furthermore, autoencoders can be combined with graph regularization and various learning methods, such as Isomap, MDS, and LLE, to improve the quality of the learned representations further [25][27].

Graph generation models like GCPN [28], JT-VAE [29], and GraphRNN [30] have also been developed for generating graphs with desirable properties or learning latent graph spaces. In applications like drug discovery, these models can be particularly effective for generating new chemical structures with specific characteristics.

Graph representation learning methods

Graph representation learning provides a powerful alternative to conventional deep learning techniques when dealing with complex data. Unlike traditional deep learning methods like neural networks and CNNs that use fixed-size inputs, it is specifically designed to capture intricate relationships within diverse inputs. This approach is versatile, handling various data types and incorporating rich, multimodal information to understand the underlying relationships better. It also remains consistent and accurate, regardless of node order or labeling, thanks to its invariance to isomorphism. This feature mainly benefits graph-structured data, ensuring resilience to arbitrary changes.

Furthermore, graph representation learning is efficient due to the sparse and local nature of graph data. Techniques like GraphSAGE and GCNs leverage this to perform efficient operations, even on large-scale graphs. In contrast, conventional deep learning methods might demand dense representations or significant memory resources, especially when dealing with high-dimensional data.

Graph representation learning has shown immense potential in transforming our understanding of complex networks. By combining graph theory, network diffusion, topological data analysis, and manifold learning, researchers continue to develop innovative approaches for analyzing and modeling graph data. As our knowledge in this field advances, the vast and complex world of graphs will likely become even more meaningful and accessible.

1.3 Cell profiling

Cell profiling is a powerful method employed in drug discovery, involving analyzing cellular changes induced by various compounds. This approach leverages high-content microscopy imaging techniques like Cell Painting, where cells stained with multiplexed dyes are used to observe the effects of different substances [31].

Machine learning (ML) plays an instrumental role in cell profiling. It assists in deciphering the multidimensional profiles generated from image-based features, enabling researchers to identify relevant patterns and biological activity crucial for drug discovery [32], [33]. Recent advancements have seen the incorporation of ML in image-based profiling, fostering an understanding of disease mechanisms, predicting drug activity and toxicity, and elucidating the mechanisms of action [34].

Graph representation learning is particularly promising in this context. It excels at capturing the intricate relationships between various entities, such as proteins or compounds, and their interactions, which is pertinent in cell profiling. This technique can deal with diverse data types and incorporate rich information for a comprehensive understanding of the underlying relationships, making it a compelling choice for improving the efficiency and accuracy of drug discovery processes.

References

[1]
M. M. Li, K. Huang, and M. Zitnik, “Graph representation learning in biomedicine,” arXiv preprint arXiv:2104.04883, 2021.
[2]
M. M. Li, K. Huang, and M. Zitnik, “Graph representation learning in biomedicine and healthcare,” Nature Biomedical Engineering, pp. 1–17, 2022.
[5]
D. C. Swinney and J. A. Lee, “Recent advances in phenotypic drug discovery,” F1000Research, vol. 9, 2020.
[6]
A.-L. Barabási, N. Gulbahce, and J. Loscalzo, “Network medicine: A network-based approach to human disease,” Nature reviews genetics, vol. 12, no. 1, pp. 56–68, 2011.
[7]
R. M. Piro, “Network medicine: Linking disorders,” Human genetics, vol. 131, pp. 1811–1820, 2012.
[8]
H.-C. Yi, Z.-H. You, D.-S. Huang, and C. K. Kwoh, “Graph representation learning in bioinformatics: Trends, methods and applications,” Briefings in Bioinformatics, vol. 23, no. 1, p. bbab340, 2022.
[9]
Y.-H. Feng and S.-W. Zhang, “Prediction of drug-drug interaction using an attention-based graph neural network on drug molecular graphs,” Molecules, vol. 27, no. 9, p. 3004, 2022.
[10]
T. Pawson and R. Linding, “Network medicine,” FEBS letters, vol. 582, no. 8, pp. 1266–1270, 2008.
[11]
J. C. Caicedo et al., “Data-analysis strategies for image-based cell profiling,” Nature methods, vol. 14, no. 9, pp. 849–863, 2017.
[12]
F. Chen, Y.-C. Wang, B. Wang, and C.-C. J. Kuo, “Graph representation learning: A survey,” APSIPA Transactions on Signal and Information Processing, vol. 9, p. e15, 2020.
[13]
B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, 2014, pp. 701–710.
[14]
A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 855–864.
[15]
W. W. Lo, S. Layeghy, M. Sarhan, M. Gallagher, and M. Portmann, “E-graphsage: A graph neural network based intrusion detection system for iot,” in NOMS 2022-2022 IEEE/IFIP network operations and management symposium, 2022, pp. 1–9.
[16]
W. L. Hamilton, R. Ying, and J. Leskovec, “Representation learning on graphs: Methods and applications,” arXiv preprint arXiv:1709.05584, 2017.
[17]
R. van den Berg, T. N. Kipf, and M. Welling, “Graph convolutional matrix completion,” arXiv preprint arXiv:1706.02263, 2017.
[18]
J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in International conference on machine learning, 2017, pp. 1263–1272.
[19]
M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” Advances in neural information processing systems, vol. 29, 2016.
[21]
F. Monti, M. Bronstein, and X. Bresson, “Geometric matrix completion with recurrent multi-graph neural networks,” Advances in neural information processing systems, vol. 30, 2017.
[22]
Z. Wang, J. Li, Z. Liu, and J. Tang, “Text-enhanced representation learning for knowledge graph,” in Proceedings of international joint conference on artificial intelligent (IJCAI), 2016, pp. 4–17.
[24]
S. Cao, W. Lu, and Q. Xu, “Deep neural networks for learning graph representations,” in Proceedings of the AAAI conference on artificial intelligence, 2016, vol. 30.
[25]
A. Majumdar, “Graph structured autoencoder,” Neural Networks, vol. 106, pp. 271–280, 2018.
[27]
H. Qu, L. Li, Z. Li, and J. Zheng, “Supervised discriminant isomap with maximum margin graph regularization for dimensionality reduction,” Expert Systems with Applications, vol. 180, p. 115055, 2021.
[28]
C. Shi, M. Xu, Z. Zhu, W. Zhang, M. Zhang, and J. Tang, “Graphaf: A flow-based autoregressive model for molecular graph generation,” arXiv preprint arXiv:2001.09382, 2020.
[29]
W. Jin, R. Barzilay, and T. Jaakkola, “Junction tree variational autoencoder for molecular graph generation,” in International conference on machine learning, 2018, pp. 2323–2332.
[30]
J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec, “Graphrnn: Generating realistic graphs with deep auto-regressive models,” in International conference on machine learning, 2018, pp. 5708–5717.
[31]
J. Rietdijk, T. Aggarwal, P. Georgieva, M. Lapins, J. Carreras-Puigvert, and O. Spjuth, “Morphological profiling of environmental chemicals enables efficient and untargeted exploration of combination effects,” Science of The Total Environment, vol. 832, p. 155058, 2022.
[32]
G. Tian, P. J. Harrison, A. P. Sreenivasan, J. Carreras-Puigvert, and O. Spjuth, “Combining molecular and cell painting image data for mechanism of action prediction,” Artificial Intelligence in the Life Sciences, vol. 3, p. 100060, 2023.
[33]
A. Mullard, “Machine learning brings cell imaging promises into focus,” Nature Reviews Drug Discovery, vol. 18, no. 9, pp. 653–656, 2019.
[34]
C. Scheeder, F. Heigwer, and M. Boutros, “Machine learning and image-based profiling in drug discovery,” Current Opinion in Systems Biology, vol. 10, pp. 43–52, 2018, doi: https://doi.org/10.1016/j.coisb.2018.05.004. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2452310018300027