SOCIAL MEDIA AND WEB ANALYTICS
LECTURE 1: SOCIAL NETWORK ANALYSIS
Introduction to SNA
Social Networks are collections of human or non-human objects, which may or may not be
connected. The objects are called actors, nodes or vertices. The connections between two nodes are
called social relationships, ties or edges. These can be directed or undirected. A soccer match is
usually directed for example. There are different types of social networks.
A unipartite network consists of only one type of node, for example all the nodes are students. A
bipartite network consists of two types of nodes, for example nodes of students and nodes of
different group assignments. These can then be linked and show which students share the same
group assignments as with other students. The bipartite network can be expanded to an x amount of
different nodes, this is then called a multipartite network/graph.
A network has five characteristics, captured by the Tukey’s five-
number summary:
- Size: The number of elements in the network
- Density: The proportion of observed ties in a network,
compared to the maximum amount of ties
This is a ratio that can range from 0 to 1. The calculation for
density differs depending on whether it is directed or undirected
L 2L
D= U=
k∗( k−1) k∗(k−1)
L is the amount of ties and k the number of nodes.
- Components: A subgroup in which all nodes are connected
- Diameter: Gives the degree of compactness of the network, it is the longest of all shortest
paths, across all pair of nodes. (SLIDES)
- Clustering coefficient: Clustering is the tendency to from closed triangles, the clustering
coefficient, also called transitivity, is the percentage of closed triangles over all open and
closed triangles
Full and restricted networks
Say A, B, C, D and E, talk about their own friends. There is
restricted information in this network, because we do not know if
there are any connections between Russ, Robert, John… These
restricted networks occur a lot in SMWA. To get the full network
information, we need to make an adjacency matrix A /
sociomatrix. This matrix provides insights into the connections
between different elements based on the information stored in the rows. You multiply the original
network matrix X with the transpose of matrix X. The adjacency matrix is easy to interpret, however it
can become very large and very sparse
1
,The way networks are plotted is usually done with force directed placement algorithms. This just
means that the vertices are placed/drawn close to each other, but not too close to ensure readability.
The thickness of edges can be used to denote the strength of each relationship.
Network Characteristics
Global Network Characteristics
Density or connectance tells you how dens your network is.
Denoted as the letter p, with M being number of edges and N
being the number of nodes:
Transitivity is the number of triangles divided by the number of
triplets that are connected with at least 2 edges. If the transitivity is
higher than the density, then there is a strong local connectivity.
Local (Individual) Characteristics
Geodesic is another name for the shortest path between two
nodes.
Degree d of a vertex v is the number of edges neighboring to this
vertex.
Maximum degree = the number of Nodes (N) – 1. The normalized degree is the degree of the node
divided by the maximum degree.
If a vertex has a large degree, this is called a hub.
Closeness c is the distance to every other vertex in the network. With g
being the geodesic path between vertex x and i.
Betweenness b of a vertex v is the fraction of shortest paths between any
pair of vertices, s and t, that pass through v. (VB SLIDES)
Graph theoretic center is the vertex that has the smallest maximum
distance to all other vertices. Usually the most central node in the network.
Network Clustering
Network clustering is the partitioning of a network into densely connected groups, these groups only
share a few other edges with other groups. Communities is a subgraph with a high number and more
intensive relationships among the members than a random subgraph.
Graph partitioning splits the network into a fixed number of clusters, usually done by an algorithm.
Common algorithm is the Girvan-Newman algorithm. It is a method for community detection in
networks. It systematically removes edges from the network, resembling hierarchical clustering.
Algorithm:
1. Calculate the betweenness of all edges in the full network
2. Remove the edge with the highest betweenness
3. Recalculate the betweenness of the affected edges
4. Repeat 1 and 2 till there are no remaining edges
2
, In community mining, a drawback of cluster partitioning, and clustering in general, is the necessity to
predefine clusters. However, evaluating the quality of these clusters poses a challenge. To address
this, we employ two main measures:
- Trace: This involves tracing the evolution of the clustering process to understand how the
communities form and evolve over time.
- Modularity (Q): This metric quantifies the quality of the clustering solution by assessing the
density of connections within clusters compared to connections between clusters. Higher
modularity values indicate better-defined communities within the network.
Homophily
Homophily, a concept rooted in sociology, encapsulates the idea that "birds of a feather flock
together." Essentially, individuals tend to associate more closely with others who are similar to
themselves.
A homophilic network is characterized by nodes that share a certain attribute being more connected
to other nodes with the same attribute. This concept is exemplified by situations where individuals
with similar characteristics or behaviors tend to form connections with each other. For instance, in
marketing, if churners (customers likely to discontinue their service) are more connected to other
churners within the network, it indicates a homophilic pattern.
In contrast, a heterophilic network would demonstrate connections between nodes with different
attributes.
Dyadicity is a metric used to assess the tendency of nodes with similar characteristics to connect with
each other within a network. It compares the actual number of edges between nodes of the same
label to the expected number if connections were distributed randomly. (SLIDES)
REPRESENTATION LEARNING
Representation Learning involves the automatic creation of feature representation of the data.
Representation Learning helps to get rid of the tedious work that is feature creation with the help of
automatically learning features.
It starts of with converting the graph into a low dimensional feature vector, which incorporates the
relational and structural characteristics of the network in question. The mapping aims to ensure that
nodes with similar attributes or characteristics have embeddings (vector representations) that are
positioned closely together in the vector space. Similarity can be measured by the means of the dot
product of vectors. The encoding of the vectors follow a certain procedure:
1. Define an encoder function: This function maps nodes to a d-dimensional vector space.
The encoder assigns each node a unique vector representation.
Shallow encoding represents the simplest and most commonly used encoder in graph representation
learning. This method employs a straightforward embedding lookup matrix, analogous to a dictionary.
In this setup, each node is associated with a unique vector representation through the embedding
lookup matrix, denoted as 𝒁, with d rows (representing the embedding size) and V columns
(corresponding to the total number of nodes). The indicator vector, 𝒗, selects the appropriate column
from 𝒁 for each node.
The primary objective in shallow encoding is to optimize the embedding lookup matrix 𝒁 during the
learning process. This optimization aims to ensure that the resulting embeddings capture meaningful
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller LLEO. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.68. You're not tied to anything after your purchase.