Samenvatting

Samenvatting - Social Media and Web Analytics (F000799)

0 keer verkocht

Instelling
Universiteit Gent (UGent)

Samenvatting van het vak Social Media and Web Analytics gegeven door professor Matthias Bogaert, jaar . Samengevat op basis van slides + lessen.

[Meer zien]

Voorbeeld 3 van de 21 pagina's

Bekijk voorbeeld

Geupload op 25 maart 2024
Aantal pagina's 21
Geschreven in 2023/2024
Type Samenvatting

€7,16

In winkelwagen

Opslaan

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

SOCIAL MEDIA AND WEB ANALYTICS
LECTURE 1: SOCIAL NETWORK ANALYSIS
Introduction to SNA

Social Networks are collections of human or non-human objects, which may or may not be
connected. The objects are called actors, nodes or vertices. The connections between two nodes are
called social relationships, ties or edges. These can be directed or undirected. A soccer match is
usually directed for example. There are different types of social networks.

A unipartite network consists of only one type of node, for example all the nodes are students. A
bipartite network consists of two types of nodes, for example nodes of students and nodes of
different group assignments. These can then be linked and show which students share the same
group assignments as with other students. The bipartite network can be expanded to an x amount of
different nodes, this is then called a multipartite network/graph.

A network has five characteristics, captured by the Tukey’s five-
number summary:

- Size: The number of elements in the network
- Density: The proportion of observed ties in a network,
compared to the maximum amount of ties

This is a ratio that can range from 0 to 1. The calculation for
density differs depending on whether it is directed or undirected

L 2L
D= U=
k∗( k−1) k∗(k−1)
L is the amount of ties and k the number of nodes.

- Components: A subgroup in which all nodes are connected
- Diameter: Gives the degree of compactness of the network, it is the longest of all shortest
paths, across all pair of nodes. (SLIDES)
- Clustering coefficient: Clustering is the tendency to from closed triangles, the clustering
coefficient, also called transitivity, is the percentage of closed triangles over all open and
closed triangles

Full and restricted networks

Say A, B, C, D and E, talk about their own friends. There is
restricted information in this network, because we do not know if
there are any connections between Russ, Robert, John… These
restricted networks occur a lot in SMWA. To get the full network
information, we need to make an adjacency matrix A /
sociomatrix. This matrix provides insights into the connections
between different elements based on the information stored in the rows. You multiply the original
network matrix X with the transpose of matrix X. The adjacency matrix is easy to interpret, however it
can become very large and very sparse

1

,The way networks are plotted is usually done with force directed placement algorithms. This just
means that the vertices are placed/drawn close to each other, but not too close to ensure readability.
The thickness of edges can be used to denote the strength of each relationship.

Network Characteristics

Global Network Characteristics

Density or connectance tells you how dens your network is.
Denoted as the letter p, with M being number of edges and N
being the number of nodes:

Transitivity is the number of triangles divided by the number of
triplets that are connected with at least 2 edges. If the transitivity is
higher than the density, then there is a strong local connectivity.

Local (Individual) Characteristics

Geodesic is another name for the shortest path between two
nodes.

Degree d of a vertex v is the number of edges neighboring to this
vertex.

Maximum degree = the number of Nodes (N) – 1. The normalized degree is the degree of the node
divided by the maximum degree.

If a vertex has a large degree, this is called a hub.

Closeness c is the distance to every other vertex in the network. With g
being the geodesic path between vertex x and i.

Betweenness b of a vertex v is the fraction of shortest paths between any
pair of vertices, s and t, that pass through v. (VB SLIDES)

Graph theoretic center is the vertex that has the smallest maximum
distance to all other vertices. Usually the most central node in the network.

Network Clustering

Network clustering is the partitioning of a network into densely connected groups, these groups only
share a few other edges with other groups. Communities is a subgraph with a high number and more
intensive relationships among the members than a random subgraph.

Graph partitioning splits the network into a fixed number of clusters, usually done by an algorithm.
Common algorithm is the Girvan-Newman algorithm. It is a method for community detection in
networks. It systematically removes edges from the network, resembling hierarchical clustering.

Algorithm:

1. Calculate the betweenness of all edges in the full network
2. Remove the edge with the highest betweenness
3. Recalculate the betweenness of the affected edges
4. Repeat 1 and 2 till there are no remaining edges

2

, In community mining, a drawback of cluster partitioning, and clustering in general, is the necessity to
predefine clusters. However, evaluating the quality of these clusters poses a challenge. To address
this, we employ two main measures:

- Trace: This involves tracing the evolution of the clustering process to understand how the
communities form and evolve over time.
- Modularity (Q): This metric quantifies the quality of the clustering solution by assessing the
density of connections within clusters compared to connections between clusters. Higher
modularity values indicate better-defined communities within the network.

Homophily

Homophily, a concept rooted in sociology, encapsulates the idea that "birds of a feather flock
together." Essentially, individuals tend to associate more closely with others who are similar to
themselves.

A homophilic network is characterized by nodes that share a certain attribute being more connected
to other nodes with the same attribute. This concept is exemplified by situations where individuals
with similar characteristics or behaviors tend to form connections with each other. For instance, in
marketing, if churners (customers likely to discontinue their service) are more connected to other
churners within the network, it indicates a homophilic pattern.

In contrast, a heterophilic network would demonstrate connections between nodes with different
attributes.

Dyadicity is a metric used to assess the tendency of nodes with similar characteristics to connect with
each other within a network. It compares the actual number of edges between nodes of the same
label to the expected number if connections were distributed randomly. (SLIDES)

REPRESENTATION LEARNING
Representation Learning involves the automatic creation of feature representation of the data.
Representation Learning helps to get rid of the tedious work that is feature creation with the help of
automatically learning features.

It starts of with converting the graph into a low dimensional feature vector, which incorporates the
relational and structural characteristics of the network in question. The mapping aims to ensure that
nodes with similar attributes or characteristics have embeddings (vector representations) that are
positioned closely together in the vector space. Similarity can be measured by the means of the dot
product of vectors. The encoding of the vectors follow a certain procedure:

1. Define an encoder function: This function maps nodes to a d-dimensional vector space.
The encoder assigns each node a unique vector representation.

Shallow encoding represents the simplest and most commonly used encoder in graph representation
learning. This method employs a straightforward embedding lookup matrix, analogous to a dictionary.
In this setup, each node is associated with a unique vector representation through the embedding
lookup matrix, denoted as 𝒁, with d rows (representing the embedding size) and V columns
(corresponding to the total number of nodes). The indicator vector, 𝒗, selects the appropriate column
from 𝒁 for each node.

The primary objective in shallow encoding is to optimize the embedding lookup matrix 𝒁 during the
learning process. This optimization aims to ensure that the resulting embeddings capture meaningful

3

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, Bancontact of creditcard en je bent klaar. Geen abonnement nodig.

Focus op de essentie

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper LLEO. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,16. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 65646 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen