Estimation of Global Network Statistics from Incomplete Data [arxiv]

Catherine A. Bliss, Christopher M. Danforth, Peter Sheridan Dodds

Simulated networks

We generate unweighted, undirected networks with N=2×105 nodes and average degree kavg=10 according to four known topologies:

  • Erdos-Renyi random graphs with a Poisson degree distribution
  • Scale-free random graphs with a power-law degree distribution: the number of new links each new node has upon entering the network, d=5
  • Small world networks: rewiring probability, p=0.1
  • Range dependent networks, λ=0.9, α=1.
We cretaed these networks using the CONTEST Toolbox for Matlab

Empirical networks

We examine examine six well known empirical data sets:
  • C. elegans earthworm neural network available from www-personal.umich.edu/~mejn/netdata. Please cite: D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998).
  • Airline route map from the OAG database available from J. Bagrow. Please cite: O. Woolley-Meza, D. Grady, C. Thiemann, J. Bagrow and D. Brockman, PLoSONE 8(8): e69829 (2013).
  • Zachary's karata club available from www.-personal.umich.edu/~mejn/netdata. Please cite: W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977).
  • Dolphin social network available from www.-personal.umich.edu/~mejn/netdata. Please cite: D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, Behavioral Ecology and Sociobiology 54, 396-405 (2003).
  • Condensed matter author collaboration network available from www.-personal.umich.edu/~mejn/netdata. Please cite: M. E. J. Newman, Proc. Natl. Acad. Sci. 98, 404-409 (2001).
  • Powergrid network representing the topology of the Western States Power Grid of hte U.S.A. available from https://wiki.gephi.org/index.php/Datasets. Please cite: D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998).

Twitter reply networks

We apply our techniques to Twitter reply networks. These networks are constructed from tweets we collected via Twitter gardenhose API service between September 9, 1998 and November 17, 1998. Each network is weighted and directed, whereby entries in the (i,j) cell of the adjacency matrix represent the number of replies directed from node i to node j. Note that there is no correlation between node labels from week to week. For example, the individual represented by node 1 in Week 1 is not the same individual represented by node 1 in Weeks 2, 3 and so forth. Each network is presented as a Matlab (.mat) file: