|
1.INTRODUCTIONWith the development of social networks, a person may have accounts on several platforms. Identification these accounts on different platforms is useful for cross-domain recommendation, link prediction1, network dynamics2, cyberspace security and other research. Thus the user identity alignment problem arises, where user alignment across social networking platforms is defined as linking users with the same identity across different social platforms. User alignment is also known as user identification, anchor link prediction (ALP), profile linkage, user identity linkage (UIL), etc. The purpose is to use different social network platforms links to users of the same natural person3. At first, according to the network structure4, users with similar network structures, though they belonged to a natural person. Although this approach of predicting only based on attention and being followed is feasible5, it directly loses the user’s presence on the Internet. The content of this part of the information generated, and the prerequisite of this method is that the network maintains consistency, then some network platforms are blogs, some are videos, and network consistency cannot be fully guaranteed. Subject to the development of network embedding techniques6-9, more studies are now using network embedding methods. Some scholars have analyzed users’ writing styles on social platforms10 and considered the group of users with the most similar writing styles as the same natural person. Some scholars perform user alignment by analyzing users’ timestamps and location information11. Some Scholars have proposed a method combining user behavior and network structure together into user node features12 to improve this problem of content loss. The textual information generated by the user and the network structure is fused with each other13 for network alignment. Some scholars have used attention mechanisms for user alignment14. However, some of these approaches lack a unified model framework. Some do not take into account the global structure, and all of them work with homogeneous networks, that is, networks with only one type of node and edge. However, users in modern social networks will generate a large amount of content. For example, on Twitter and Facebook, there are not only relationships between users, but relationships between users and tweets, the whole network has more than one type of node and edge, and some scholars also consider the user alignment problem of heterogeneous networks, downscaling and fusion of multiple types of node features on heterogeneous networks15. Although this method is to some extent for heterogeneous network adapts but loses the global information. To address the above problems, this paper proposes a new approach to solve the user alignment problem on heterogeneous networks, MGUIL. This method uses the multi-layer graph attention mechanism and the idea of meta-paths6,16 to make a fusion by user-generated content in the first layer of GAT, taking the features of the original content and the features of this user node as the second layer of GAT network, all the meta-path fusion vectors are fused according to the network structure. Thus the global information is obtained. The same process is done for the other network, and finally, the two sets of node vectors with low latitude are aligned. It is worth noting that when feature extraction is performed for the second network, the parameters trained in the first network are used, which ensures that the two high-latitude nodes are mapped into the same low-latitude space. The contributions of this paper are summarized as follows:
2.PROBLEM DEFINITIONThis section defines the heterogeneous network and introduces two new nodes used in the text: the meta-path fusion vector and the global fusion vector. Finally, the user identity alignment module is defined. 2.1Heterogeneous networksA heterogeneous network means that there is more than one type of node and edge, which can be G = (V,E,T) to represent that V is the set of nodes, E is a set of edges, and T is a set of all types in the network. As an example, the network in Figure 1 is given as a network Gs = (VS,ES,TS), where Ts = {t1, t2 ⋯tp}, represents p the different node types, and 2.2Meta-path fusion vectorA meta-path is a path containing a sequence of relations, such as the relation A-P-V in Figure 1, which is: author A publishes a paper P in journal V. The meta-path fusion vector proposed in this paper refers to the use of attention mechanism to fuse the features of all nodes on a meta-path to the first node (the node usually represents the user node in social networks) as a way to obtain information about the locality. Given a node 2.3Global fusion vectorGiven a user node 2.4User identity alignmentGiven two heterogeneous social networksGs = (VS,ES,TS) andGg = (Vg,Eg,Tg) that are also known to have anchor links 3.MODELSIn this paper, we propose a unified framework MGUIL to solve the user identity alignment problem in heterogeneous networks, which uses a two-layer graph attention mechanism to fuse the meta-path vector and the global vector associated with each user node, respectively, and obtain the final combined representation of each user node In this chapter, it will be presented how the original node features are turned into the final combined representation through two layers of GAT to 3.1Meta-path fusion vectorThe meta-path fusion vector is generated by the first layer of GAT. Here we take Gs network as an example, for each node For each feature vector node, as shown in Figure 2, a linear transformation is first performed to obtain the weight matrix W ∈ Rfxf’, which can turn the initial feature vector into a higher latitude vector, denoted by where Then we have to calculate After that, the feature fusion operation can be performed, and each feature on the meta-path is fused according to the attention factor according to equation (3), including self-attention. In this paper, in order to enhance the fusion of relevant features, a multiple attention head mechanism is used for the meta-path fusion vector, where K represents the number of attention heads, thus Where and 3.2Global fusion vectorAfter the first layer of GAT, we get the meta-path fusion vector that fuses all of its own features. In the second layer, it will focus on fusing the features between user type nodes, and since the meta-path fusion vector already carries all of the user’s information, it can be concluded that the second layer is a global feature fusion. Then the influence between each user type node is shown in equation (5), and the final attention coefficient is shown in equation (6). where The final global fusion vector is obtained by weighting all the node type vectors based on the calculated attention coefficients according to equation (7). So far, in order to ensure that our extracted features can be well represented both locally and globally, the first GAT layer and the second GAT layer are combined together, i.e., the combined vector, as shown in Figure 2. In the case of Gs and Gg After performing the same operation, we can map the nodes in these two networks into a low-dimension space, and then we can perform user identity alignment in the low-dimension space. It is worth noting that in some cases the node types in the two networks do not coincide, in which case one should align to the one with more node types and initialize the feature vector of the missing node type in the other network to f dimension all zeros. 3.3User alignment modelBased on the above two operations, the high latitude nodes of two networks can be mapped to the same low latitude space. At this point, we can determine whether the two final combined vectors are the same natural person based on their similarity/distance. Already existing anchor link where d is a distance function, and the Chebyshev Distance is used in the text to calculate the distance between the metapath fusion vector and the global fusion vector of both, respectively. 4.EXPERIMENT4.1DatasetsTwitter-Foursquare is a heterogeneous pair of networks in which node types include users, tweets, and geographic locations17, 18. Foursquare is a platform that encourages mobile phone users to share information such as their current location with others. The details of this dataset are listed in Table 1. Table 1.Twitter-foursquare dataset.
4.2BaselinesTo evaluate the performance of the proposed MGUIL, we compare our framework with the following state-of-the-art methods:
4.3Comparison of experimental resultsIn the experiments, the hyperparameters of the proposed method MGUIL in this paper w = 0.6, K = 3. f’ = 256 and f” = 128. And the methods in BaseLines are set to be consistent with those in the original paper. Table 2 shows the performance of each method, using the evaluation metrics Precision@k (P@k) and MAP19. Table 2.Twitter-foursquare dataset.
Note: * Means the method works best. From Table 2, it can be found that:
4.4Hyperparameter setting experimentAfter comparing with baselines’ method, we then set the values of the hyperparameters in MGUIL differently to evaluate the effect of different hyperparameter settings on the results of this model, as a way to find the best hyperparameters for MGUIL. w and 𝜆 are hyperparameters used to balance the influence of meta-path fusion vectors and global fusion vectors on the final combined vector, and it can be seen from Figure 3a that MGUIL has the best effect when w = 0.6, which indicates that there is a balance point between the meta-path fusion vector and the global fusion vector and that the influence of the meta-path fusion vector on the combined vector is somewhat more important than that of the global fusion vector in terms of the percentage. Figure 3.Effect of equilibrium factor w, number of multiple attention heads k, embedding dimension on the results. ![]() Since the multiple attention head mechanism is introduced in the first layer of GAT, the number of attention heads K also affects the final effect. As can be seen from Figure 3b, the full capability of the model can be best exploited at K=4. In short, setting a small K value may lead to incomplete feature fusion and not extracting deeper information. Setting a large K value may lead to too much noise introduced in the fusion process and affect the accuracy of the features. The choice of the embedding dimension determines the complexity of the potential space, and in this paper, we choose 128 dimensions as the final dimension. As shown in Figure 3c, better results can be obtained at 128 dimensions. 5.SUMMARYIn this paper, we propose MGUIL, a model for user identity alignment in heterogeneous networks, which uses a two-layer attention mechanism to fuse all the features of user nodes themselves in the first layer and to fuse the global network structure through the following relationship between users in the second layer. Finally, the results of the two layers of GAT are combined together and fed into the identity alignment supervised model, which uses known anchor nodes to find a pair of combined nodes with minimal differences and the closest distance on the low-latitude embedding space. And we test it on a real online social platform and the results are ahead with existing methods ACKNOWLEDGMENTSThis work is supported by the Nation Nature Science Foundation of China (NSFC) (NO. 61572445). REFERENCESZhang, J., Yu, P. S. and Zhou, Z.,
“Meta-path based multi-network collective link prediction,”
KDD’, 14 1286
–95
(2014). Google Scholar
Zafarani, R. and Liu, H.,
“Users joining multiple sites: friendship and popularity variations across sites,”
Information Fusion, 28 83
–89
(2016). https://doi.org/10.1016/j.inffus.2015.07.002 Google Scholar
Chen, B. and Chen, X.,
“A survey on user alignment across social networks,”
Journal of Xihua University, 40 11
–26
(2021). Google Scholar
Wang, D., Cui, P. and Zhu, W.,
“Structural deep network embedding,”
KDD’, 16 1225
–34
(2016). Google Scholar
Zafarani, R. and Liu, H.,
“Connecting corresponding identities across communities,”
International AAAI Conf. on Web and Social Media, 354
–57
(2009). Google Scholar
Velickovic, P., Cucurull, G. and Casanova, A.,
“Graph attention networks,”
ICLR’, 18 1
–12
(2017). Google Scholar
Bryan, P., Al-Rfou, R. and Steven, S.,
“Deepwalk: Online learning of social representations,”
KDD’, 14 701
–10
(2014). Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J. and Mei, Q.,
“LINE: Large-scale information network embedding,”
WWW’, 15 1067
–77
(2015). Google Scholar
Chu, X., Fan, X., Yao, D., Zhu, Z., Huang, J. and Bi, J.,
“Cross-network embedding for multi-network alignment,”
WWW’, 19 273
–84
(2019). Google Scholar
Oana, G., Howard, L., Gerald, F., Robin, S. and Renata, T.,
“Exploiting innocuous activity for correlating users across sites,”
WWW’, 13 447
–58
(2013). Google Scholar
Christopher, R., Yunsung, K., Augustin, C., Nitish, K. and Silvio, L.,
“Linking users across domains with location data: Theory and validation,”
WWW’, 16 707
–19
(2016). Google Scholar
Liu, L., Cheung, W. K., Li, X., and Liao, L.,
“Aligning users across social networks using network embedding,”
IJCAI’, 16 1774
–80
(2016). Google Scholar
Liu, S., Wang, S., Zhu, F., Zhang, J., and Krishnan, R.,
“HYDRA: Large-scale social identity linkage via heterogeneous behavior modeling,”
SIGMOD’14 on Management of Data, 51
–62
(2014). Google Scholar
Li, X., Shang, Y. and Cao, Y.,
“Type-aware anchor link prediction across heterogeneous networks based on graph attention network,”
AAAI Conf, 147
–55 on Artificial Intelligence,2020). https://doi.org/10.1609/aaai.v34i01.5345 Google Scholar
Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P. and Yu, P. S.,
“Heterogeneous graph attention network WWW’19,”
2022
–32
(2019). Google Scholar
Dong, Y., Chawla, N. V. and Swami, A.,
“Metapath2vec: Scalable representation learning for heterogeneous networks,”
KDD’, 17 135
–44
(2017). Google Scholar
Velickovic, P., Cucurull, G. and Casanova, A.,
“Graph attention networks,”
ICLR’, 18 1
–12
(2017). Google Scholar
Zhang, J. and Yu, P.,
“Integrated anchor and social link predictions across social networks,”
IJ-CAI’, 15 2125
–32
(2015). Google Scholar
Zhou, F., Liu, L., Zhang, K., Trajcevski, G., Wu, J. and Zhong, T.,
“Deeplink: A deep learning approach for user identity linkage,”
in IEEE Conf. on Computer Communications,
1313
–21
(2018). Google Scholar
|