[CS224w] Subgraphs and Motifs

728x90

Subgraphs and Mofits

Introduction and Motivation

subgraph는 network의 block과도 같다. subgraph는 network의 특징이나 구별하는데 매우 도움이 된다.

특히 많은 도메인에서 반복되는 구조는 그래프에서 어떤 기능이나 행동(function or behavior)을 결정짓는다.

Definition: Subgraph

주어진 그래프 $G=(V, E)$에 대하여 2가지 방법으로 subgraph $G'=(V', E')$ 를 정의할 수 있다.

Def 1. Node-induced subgraph

$V' \subset V$
$E' = \{ (u,v) \in E \ | \ u, v \in V' \}$
$G'$ is the subgraph of $G$ induced by $V'$
induced subgraph 라고도 불린다.

Def 2. Edge-induced subgraph

$E' \in E$
$V' = \{ v \in V \ | \ (v, u) \in E' \text{ for some } u \}$
non-induced subgraph 또는 단순히 subgraph라고 부른다. (induced-subgraph라고는 하지 않는다)

이 두가지 정의는 결국 $V' \subset V$ and $E' \in E$이며 original graph $G$의 부분집합이어야 한다.

참고로 $V'$와 $E'$가 원래 그래프가 아니라 완전히 다른 그래프의 부분집합이면 contained in이라는 표현을 사용한다.

$G_1$ is "contained in" $G2$, not as s subgraph

Graph Isomorphism and Subgraph Isomorphism

Graph isomorphism problem

두 그래프 $G_1=(V_1, E_1)$과 $G_2=(V_2, E_2)$가 동일하면(itentical) isomorphic 하다고 한다.

현재까지는 graph isomorphism algorithm 이 NP-hard인지, polynomial algorithm이 발견되었는지 알려지지않았다.

Subgraph isomorphism

$G_2$의 일부 subgraph가 $G_1$와 isomorphic하면, "$G_2$ is subgraph-isomorphic to $G_1$"이라 한다.

참고로 이 문제는 NP-hard임이 알려져있다.

아래 그림의 경우, $A-B-C$와 $X-Y-Z$가 일치하므로 subgraph isomorphic하다.

Network Motifs

반복되고, 중요한 상호연결 패턴 (recurring, significant patterns of interconnections)

Pattern: small (node-induced) subgraph
Recurring: 자주 반복되는 (Q. how to define frequency?)
Significant: (랜덤하게 생성된 그래프에 비하여) 기대보다 더 반복되는. (Q. how to define random graphs?)

Subgraph Frequency

$G_Q$는 small graph라 하고, $G_T$를 데이터셋에 있는 target graph라 하자.

Graph-level Subgraph Frequency

target graph의 node subset $V_T$ is isomorphic to $G_Q$

Node-level Subgraph Frequency

$v$는 $G_Q$의 node일 때, anchor라 부른다.

$u \in G_T$일 때, $G_T$ is isomorphic to $G_Q$ and isomorphism maps node $u$ to $v$.

Node-level Subgraph Frequency Definition

$(G_Q, v)$를 node-anchored subgraph라 부른다.

outlier에 강하다. (robust to outliers)

Motif Significance

Key Idea: random graph보다 real graph에서 더 자주 나타나는 motif는 더 중요한 functional significance를 가질 것이다.

랜덤그래프로는 다음의 2가지가 가능하다.

Erdos-Renyi (ER) random graph

$G_{n,p}$는 $n$개의 node를 갖는 임의의 그래프에 edge $(u,v)$가 생길 확률이 $p$인 랜덤그래프이다.

Configuration model

node degree sequence $k_1, k_2, \dots, k_N$을 바탕으로 랜덤하게 edge를 생성하는 것이다.

이때, double edge와 self-loop는 마지막 단계에서 나타내지 않는다.

Network Significance Profile (SP)

$Z_i$는 Motif $i$의 statistical significance를 잘 capture한다.

\[ Z_i = \cfrac{N_i^{\text{real}} - \bar{N}_i^{\text{rand}} }{\text{std}(N_i^{\text{rand}})} \]

$N_i^{\text{real}}$은 실제 그래프 $G^{\text{real}}$의 $i$번째 motif의 개수

$\bar{N}_i^{\text{rand}}$은 랜덤 그래프 $G^{\text{random}}$의 $i$번째 motif의 기댓값(평균값)

\[ SP_i = \cfrac{Z_i}{\sqrt{\sum_j Z_j^2}} \]

이때 Network SP는 $Z$-score를 정규화한 벡터이다. $SP$의 차원은 motif의 개수이다.

$SP$가 negative라면 uner-representation이고, $SP$가 positive 값이라면 over-representatio이다.

실제 다양한 domain에 대하여 SP를 구하면 위와 같다.

같은(혹은 비슷한) domain끼리는 비슷한 SP 분포를 갖는 것을 알 수 있다.

즉, SP 분포가 다르다면, 해당 두 그래프들은 서로 다른 domain일 것으로 추측할 수 있다.

Summary

Subgraph와 motif는 그래프의 building block이다.
- Subgraph isomorphism은 NP-hard이다
어떤 motif가 frequent/significant인지 확인하는 것은 해당 domain의 고유한 특징을 파악하는데 도움을 준다.
motif의 significance를 평가할 때 random graph를 사용할 수 있다.

728x90

'스터디 > 데이터사이언스' 카테고리의 다른 글

[CS246] Sampling from a Data Stream (0)	2023.12.04
[CS246] Mining Data Streams (1)	2023.12.03
[Community Detection] Girvan-Newman (GN) Algorithm (0)	2023.11.14
[CS246] Word2Vec (0)	2023.11.10
[CS246] TrustRank vs. LinkFarms (0)	2023.11.06

궁금한게많은joon

[CS224w] Subgraphs and Motifs

Subgraphs and Mofits

Introduction and Motivation

Definition: Subgraph

Graph Isomorphism and Subgraph Isomorphism

Network Motifs

Subgraph Frequency

Motif Significance

Network Significance Profile (SP)

Summary

'스터디 > 데이터사이언스' 카테고리의 다른 글

티스토리툴바

[CS224w] Subgraphs and Motifs

Subgraphs and Mofits

Introduction and Motivation

Definition: Subgraph

Graph Isomorphism and Subgraph Isomorphism

Network Motifs

Subgraph Frequency

Motif Significance

Network Significance Profile (SP)

Summary

'스터디 > 데이터사이언스' 카테고리의 다른 글

관련글

티스토리툴바