Core User Guide: Key Concepts

Terminology

Throughout the library we use the terms

  • node as a synonym for vertex and
  • edge as a generic term for arc, line or hyperedge
  • element to cover nodes and edges.

When dealing elements, we distinguish between

  • outer elements that you create and pass to the graph to be included. The types of the outer elements correspond to the type parameters of your graph, so outer nodes have the type N, outer edges have the type E[N].
  • inner elements that are created by the graph on addition internally. They wrap the outer elements to enrich them by contextual information. For instance an inner node also knows about the neighbours of the outer node it wraps. Inner elements have path-dependent types: Given a graph g, inner nodes have the type g.NodeT while inner edges have the type g.EdgeT. For a mnemonic remember that we call them "inner" elements because their type is an inner class.

For brevity, we call a graph

  • generic if its edge type is a class with a parametrized type for nodes like DiEdge[Int]. DiEdge is generic with respect to the type of nodes it may connect.
  • typed if its edge type is a class with no type parameter like Flight. Flight is typed with respect to the type of nodes it connects, namely Airport.

Mutable and immutable graphs

Both a mutable and an immutable implementation are at your disposal.

Unlike with standard collections it is not meaningful to use the immutable implementation by default because, unfortunately, its underlying data structure is not yet persistent. This means you need to carefully consider the run-time characteristics of the immutable implementation along with best practices.

Types of graphs

Graphs are defined by

trait Graph[N, E <: Edge[N]]
  • N   the type of nodes
  • E   the type of edges.

Based on the four edge categories

  • directed
  • undirected
  • directed hyperedge
  • hyperedge

we support

  • directed graphs
  • undirected graphs
  • directed hypergraphs
  • hypergraphs
  • mixed graphs with arbitrary sets of edge types
  • multigraphs with arbitrary multiplicity between nodes, and also
  • graphs with loops.

While nodes are labeled by nature, edges may be labeled or not.

You control the type of the graph by determining the edge type. You can use the provided edge classes to create generic graphs. However, it is often beneficial to roll out your own ADT of edges to constrain edge usage with respect to which nodes they are allowed to connect and what kind of labels they have. This will make your graphs more type safe, hence the term typed graph.

Graphs are not constrained other than by the type parameters N and E[N]. This means that, currently, you yourself need to ensure that you graph which is, for instance, supposed to represent a DAG, is really a DAG.

Relation to property graphs

Graph for Scala is superior to the popular Property Graph Model in that

  1. you can not only associate nodes and edges with properties but, by rolling out your specific ADT of edges, you have full control over which nodes may be connected by what kind of edges. Of course, every edge type in your ADT, usually represented by a case class, will go with its distinct set of "properties".
  2. In contrast to representing properties by generic maps of the type Map[String, Any], your graph becomes type safe thanks to the ability to accept your specific edge classes with properly typed fields.

As an example, you can properly represent RDF with Graph for Scala while the Property Graph model is less suitable for this purpose.

Edge classes

While all possible non-labeled as well as weighted edge classes are available out of the box, for labeled edges you need to define your edge class yourself. This is pretty straightforward based on the provided abstract classes and mix ins. You will find plenty of examples paving your way.

Weighted edges are just specific labeled ones so why did we provide them? We did so because

  • with weights having the label type Double you now have ready-made edge classes for this use case
  • they also serve as an example how to fine tune edge classes.

To achieve light-weight, unified syntax, it's a good idea to also add infix constructor and extractor support to your edge classes. Mix ins and examples help you to also get this right in minutes.

Finally you might wonder why there is no direct support for labeled edges. The reason is that

  • apart from weights, label types depend on your use case. It would be awkward to expect you to use label: Any instead of facilitating the definition of your edge classes with properly named and typed fields fitting your needs.
  • If we added a third type parameter for edge labels, what is also an option, the graph still would not know about different labels for different edges in a mixed graph on type level. A type parameter for labels would only help in case of graphs that contain edges with a single type of label.
  • When having a label type parameter, it would also be necessary to add new editing methods or change the signature of the existing graph editing methods to allow for passing labels. This would increase complexity.

Edge Equality

Since the edge set of a graph is represented by a hash set internally, edge equality determines if a specific edge may be added to the graph. Edge equality is implemented in a transparent way, so you only need to take care of it when dealing with multigraphs.

Take a look at the following examples to get a basic idea about edge equality:

Comparing with yields
1 ~ 2 == 2 ~ 1 true
1 ~> 2 == 2 ~> 1 false
1 ~ 2 == 1 ~> 2 false

Hyperedge equality

Looking at the endpoints of a hyperedge respectively the sources or targets of a directed hyperedge, equality is more weird. For instance, would you say that HyperEdge(1, 2, 2) is equal to HyperEdge(2, 2, 1) or not?

It turns out that, depending on the use case, both propositions may be desirable to hold. For this reason Graph for Scala supports two kinds of hyperedge equality semantics:

  • With the default bag semantics, the above hyperedges are equal.
  • When using the ordered semantics, they are distinct.

Some more examples:

Comparing with having semantics of yields
HyperEdge(1, 2, 2) == HyperEdge(2, 2, 1) bag true
HyperEdge(1, 2, 2) == HyperEdge(2, 2, 1) ordered false
DiHyperEdge(sources = Several(1, 2), targets = Several(2, 3, 2)) == DiHyperEdge(sources = Several(1, 2), targets = Several(2, 2, 3)) bag true
DiHyperEdge(sources = Several(1, 2), targets = Several(2, 3, 2)) == DiHyperEdge(sources = Several(1, 2), targets = Several(2, 2, 3)) ordered false
DiHyperEdge(sources = Several(1, 2), targets = Several(2, 3, 2)) == DiHyperEdge(sources = Several(2, 3, 2), targets = Several(1, 2)) bag false
DiHyperEdge(sources = Several(1, 2), targets = Several(2, 3, 2)) == DiHyperEdge(sources = Several(2, 3, 2), targets = Several(1, 2)) ordered false