Volume 1, Issue 5, April 26, 1996

Submitted March 5, 1996
Resubmitted March 29, 1996
Accepted April 3, 1996
Publication Date: April 26, 1996

AN ALGORITHM TO GENERATE CONNECTED GRAPHS*

      John Skvoretz
      University of South Carolina

ABSTRACT

   Connected graphs hold special interest for network exchange 
researchers because all experimentally-studied networks are connected 
graphs. Competing theoretical proposals differ in the predictions they 
make for some sets of connected graphs, but not for other sets. Yet 
researchers do not have at their disposal any catalogue of connected 
graphs for even relatively small-sized groups. In this article, I 
review the theoretical basis for an algorithm that generates all 
connected graphs of a particular size and provide a pseudo-code version 
of the algorithm itself. I then address some considerations regarding 
its use in the network exchange field.

                             [page 43] 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                             [page 44]

INTRODUCTION

   All of the exchange networks studied experimentally in the field of 
network exchange research are connected graphs, wherein the nodes are 
positions and the lines are exchange relations (i.e., relations in 
which agreement on the terms of exchange produces mutual benefit). 
Further, all positions are connected to one another either directly or 
indirectly. This connectedness of the set of positions is essential 
since the very nature of network exchange research focuses on how 
indirect connections affect the relative power of actors who are 
directly connected to one another. The graphs vary in size, with six 
positions being the largest for which published data exist (Skvoretz 
and Willer 1993).

   Network exchange research presently harbors several contending 
theories of power distribution. Evaluating these theories involves many 
considerations, including parsimony in the assumptive base, breadth of 
coverage, predictive precision, and embeddedness in accepted 
explanatory frameworks (e.g., rational choice theory). One important 
consideration is empirical accuracy: does one proposal make more 
accurate predictions than the others? For many graphs, the contenders 
make very similar predictions and so they cannot be used to assess 
accuracy. Rather, research must use those graphs (if any) for which the 
alternatives make different predictions. Currently, discovery of such 
candidates is a hit-or-miss proposition as there exists no accessible 
catalogue of connected graphs with which to systematically search for 
"critical" networks.

   In addition to the evaluation of empirical accuracy, a catalogue of 
connected graphs could aid in the development of particular approaches, 
as can be exemplified by Lovaglia, Skvoretz, Markovsky, and Willer 
(1995). These authors use a series of "thought experiments" to refine 
the Graph Power Index (GPI) approach first proposed by Markovsky, 
Willer, and Patton (1988): Refinements are proposed to solve 
problematic networks and then evaluated by inspecting the predictions 
offered for other networks of similar size or configuration. Only those 
refinements that offer sensible predictions are retained for 
empirically-based evaluation. Hence, a catalogue of connected graphs is 
needed grist for the "thought experiment" mill.

                             [page 44] 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                             [page 45]

ALGORITHMS FOR GENERATING GRAPHS

   The problem of generating all connected graphs for a set p of points 
is a combinatorial problem. One procedure for generating combinatorial 
configurations of a certain type is called the "classical method" by 
Read (1978). In this procedure, there are two lists: one of old 
configurations and one of new configurations. Potential candidates for 
the new list are created by performing an operation on an old 
configuration, e.g., adding a node and an edge to one of the points in 
the old configuration. The candidate is added to the new list only if 
it is not already present. Doing this, however, requires checking the 
candidate against each of the items already present on the new list, a 
time-consuming process insofar as there are usually many isomorphic 
versions of any item on the new list, and each of these versions must 
be detected and discarded. For instance, the three networks in Figure 1 
are isomorphic (i.e., they exhibit the same structure). Therefore, on 
any list of connected graphs of size 4, only one of them can appear. As 
the size of the graph and the number of edges grow, the combinatorial 
possibilities explode.

        A - B        B - C        A - C
         \ /          \ /          \ /
          C            A            B
          |            |            |
          D            D            D

      Figure 1.  Three isomorphic graphs.

   Read (1978) proposes a general algorithm that does not require 
looking back over the new list to detect and discard duplicates. In 
this procedure, as each candidate is produced it can be immediately 
tested for inclusion on the new list by simply inspecting the 
configuration itself. Hence, Read titles his paper "Every One a Winner" 
because the candidate is added to the new list only if it is a winner. 
A principle advantage of this algorithm is that the list of new 
elements can be stored externally since searches of it are never 
required.

   Read's "orderly" algorithm may be described as follows. We must 
first specify which of all possible isomorphic configurations will 
represent its class in the list of configurations. This particular 
configuration is called the "canonical configuration." The center graph 
in Figure 1 is the canonical configuration for its isomorphism class. 
Canonicity is defined by representing the upper right triangle of the 
network's adjacency matrix as a binary integer, concatenating the first 
row of this triangle to the second, both to the third, and so on. The 
canonical configuration for the class is the one that produces the 
largest binary integer or code. The codes for the networks in Figure 1 
are 110101, 111100, and 110110, the second one being the largest.

                             [page 45] 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                             [page 46]

   Second, we must specify an ordering of the list of canonical 
configurations. The binary integers corresponding to each canonical 
configuration form a logical basis for ordering the list. In fact, we 
arrange the list in descending order by these integers.

   Finally, we must define an augmenting operation by which one element 
from an old list can produce an ordered sequence of potential elements 
on a new list in the order they would appear on the new list if they 
were found to be acceptable candidates. The operation we use takes the 
binary code of a canonical configuration of p nodes and q edges and 
produces potential candidates for the list of p nodes and q+1 edges as 
follows. The binary code of the input configuration is scanned from 
right to left to find the rightmost non-zero element. If the code is 
111100, this is the fourth element. Then the candidate configurations 
are created by changing in turn each of the trailing 0's to 1, 
proceeding from left to right. Thus, from 111100 we obtain two 
candidates, 111110 and 111101.

   The algorithm involves processing an old list of canonical 
configurations, taking each element in turn, and creating potential 
candidates for the new list. For each candidate, the algorithm must 
discover if it is a canonical configuration by inspecting permutations 
of rows and columns in the underlying adjacency matrix for an 
equivalent configuration with a larger code value. If this value ever 
equals or exceeds the code value of the last configuration added to the 
new list, the candidate is not a canonical configuration and processing 
can move on to the next candidate. If the configuration is canonical 
and its code value is less than the new list minimum, it is a "winner," 
allowing it to be added directly to the new list and the new list 
minimum to be updated.

   The program that actually generates all connected graphs of a given 
size is presented in pseudo-code form in Table 1. Note that it 
generates all graphs whether connected or not and then retains only 
those that are connected. The algorithm is not effective if its 
operation is confined to just connected graphs. There are connected 
graphs with q+1 edges whose code number is greater than other connected 
graphs with q+1 edges but whose production from the ordered list of 
canonical connected graphs with q edges will occur after the production 
of those graphs with lower code numbers. Consequently, they will never 
be added to the new list. However, they will be produced before these 
other graphs if the augmentation operation applies over all graphs, 
connected or not.

                             [page 46] 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                             [page 47]

input nodes
initialize file "g00" 
        with bit-string of 0's {length =[nodes(nodes-1)]/2}
for edges = 0 to [nodes(nodes-1)]/2 - 1
  open file "g{nodes}{edges}" for input as #1
  open file "g{nodes}{edges+1}" for output as #2
  open file "g{nodes}{edges+1}c" for output as #3
  initialize current minimum code value
  while not end-of-file #1
    read string from #1
    find rightmost non-zero element of string
    for each trailing zero
      change to 1 and create new adjacency matrix
      for each permutation of rows and columns 
        compare binary number of permuted adjacency matrix with current
                minimum and if greater, try next trailing zero
        compare binary number of permuted adjacency matrix with current
                maximum and if greater, save as new maximum
    set current minimum =3D maximum
    write maximum to #2
    if connected, write maximum to #3
  wend
close #1, #2, #3
end

Table 1: Pseudo-code for an orderly algorithm 
         to generate connected graphs

    Even though the orderly algorithm is relatively more efficient than 
the "classical" approach, it still becomes time consuming as the number 
of nodes increases. On a 90mhz Pentium it will take about 2 hours to 
generate all the connected graphs over 7 nodes, where one finds 853 
non-isomorphic connected graphs. Over 8 nodes, there are 11,118 
connected graphs. Consequently, generating the catalogue of these 
graphs takes literally days of CPU time. Thus, it is unreasonable for 
each researcher to generate his or her own catalogue of connected 
graphs. Instead, the catalogue, once generated, should be made widely 
accessible. Consequently, I supply an Appendix which catalogues all 
connected graphs from size 2 to size 7 and can supply upon request the 
catalogue of size 8 connected graphs. A Sun SparcStation is currently 
at work (on a part-time basis since December) generating all size 9 
connected graphs.

                             [page 47] 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                             [page 48]

CONCLUSION

   In addition to advancing theoretical inquiry, the catalogue of 
connected graphs can standardize nomenclature for experimental 
networks. For instance, the graph in Figure 2 is called "KITE" in the 
work of Markovsky, Skvoretz, Willer and colleagues (Markovsky, 
Skvoretz, Willer, Lovaglia, and Erger 1993; Skvoretz and Willer 1993) 
and "HOURGLASS" in the work of Bienenstock and Bonacich (1993). The 
catalogue provides a canonical name for this graph: namely, the number 
of nodes, followed by the number of edges, followed by its location in 
the canonical ordering. Thus the graph in Figure 2 has the canonical 
name "p5e6.2" because it is the second graph in the canonical listing 
of all connected graphs with 5 nodes or points and 6 edges.

                    B - C
                     \ /
                      A
                     / \
                    D - E

      Figure 2: The connected graph p5e6.2


   The catalogue also provides a canonical way of labeling the nodes of 
an entry. One simply assigns letters to nodes in alphabetical order and 
then builds the adjacency matrix as specified by the bit string, thus 
standardizing how graphs are drawn for research reports.

   Both uses of the catalogue become more salient as the size of the 
networks investigated becomes larger. Investigators from different 
approaches can more easily communicate with one another over 
problematic graphs since, in switching to a standardized naming system, 
we avoid "pet" names that reflect accidental circumstances surrounding 
an approach's discovery. 

  Thus, the actual use of the catalogue appears to depend on (a) 
maintaining theoretical heterogeneity in exchange network research, (b) 
extending empirical and theoretical investigation to relatively large 
networks (say, 7, 8 and 9 node networks) and (c) making accessible an 
organized catalogue of connected graphs.

  The track record of the network exchange research field over the last 
10 years suggests condition (a) will hold for the foreseeable future. 
The present paper guarantees the satisfaction of condition (c). And the 
internal dynamic of theoretical development via conjecture and 
refutation pushes the explanatory envelope in the direction consistent 
with condition (b). Only empirical extension may lag for the usual 
budgetary reasons.

                             [page 48] 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                             [page 49]

ENDNOTE

*Reviewers provided helpful comments and suggestions. Address 
correspondence to: skvoretz-john@sc.edu

REFERENCES

Bienenstock, E.J. and P. Bonacich. 1993. "Game-Theory Models for 
Exchange Networks: Experimental Results." Sociological Perspectives 36: 
117-35.

Lovaglia, M., J. Skvoretz, B. Markovsky, and D. Willer. 1995. 
"Assessing Fundamental Power Differences in Exchange Networks: 
Iterative GPI." Current Research in Social Psychology 1(2): 8-15, 
http://www.uiowa.edu/~grpproc.

Markovsky, B., J. Skvoretz, D. Willer, M. Lovaglia and J. Erger. 1993. 
"The Seeds of Weak Power: An Extension of Network Exchange Theory." 
American Sociological Review 58: 197-209.

Markovsky, B., D. Willer, and T. Patton. 1988. "Power Relations in 
Exchange Networks." American Sociological Review 53: 220-36.

Read, R.C. 1978. "Every One a Winner or How to Avoid Isomorphism Search 
When Cataloguing Combinatorial Configurations." Annals of Discrete 
Mathematics 2: 107-120.

Skvoretz, J. and D. Willer. 1993. "Exclusion and Power: A Test of Four 
Theories of Power in Exchange Networks." American Sociological Review 
58: 801-18.

AUTHOR BIOGRAPHY

John Skvoretz is a Carolina Distinguished Professor of Sociology at the 
University of South Carolina. Current research projects concern formal 
models of social networks, status and participation in discussion 
groups, power in exchange networks, and theoretical studies of action 
structures. He is the editor of Connections, the bulletin of the 
International Network for Social Network Analysis and an Associate 
Editor of the Journal of Mathematical Sociology.
Appendix
Back to Journal Cover Page