Gephi: Converting Site Link Structure into Dynamic Visualization

One popular digital humanities tool is Gephi. You can quickly use it to convert output into a dynamic, date-ordered visualization to see how link structures have changed over time.

We have an in-depth walkthrough here. If you want to experiment with already generated data, you can find it here.

This video shows you how to take our data, generated by warcbase, and run Gephi on it.

Step One: Generate GDF Format Output

You can write directly to a Gephi-readable format by using our WriteGDF function. Here is an example script:

import org.warcbase.spark.matchbox.{ExtractDomain, ExtractLinks, RecordLoader, WriteGDF}
import org.warcbase.spark.rdd.RecordRDD._

val links = RecordLoader.loadArchives("/collections/webarchives/CanadianPoliticalParties/arc/", sc)
  .keepValidPages()
  .map(r => (r.getCrawlDate, ExtractLinks(r.getUrl, r.getContentString)))
  .flatMap(r => r._2.map(f => (r._1, ExtractDomain(f._1).replaceAll("^\\s*www\\.", ""), ExtractDomain(f._2).replaceAll("^\\s*www\\.", ""))))
  .filter(r => r._2 != "" && r._3 != "")
  .countItems()
  .filter(r => r._2 > 5)

WriteGDF(links, "all-links.gdf")

The ensuing all-links.gdf can be natively imported into Gephi. You may want to use getCrawlMonth instead.

Step Two: Import into Gephi

You now want to take it into Gephi. Install Gephi – make sure you're running Gephi 0.9, as it offers widest compatabilities with systems.

Open the GDF file that you just generated when prompted by the new project window (in our example, all-links.gdf). Click OK.

Now visit the 'Data Laboratory' panel and do the following. Select the 'edges' table so it looks like this.

Screen Shot 2015-06-05 at 11.44.38 AM

Click on the 'Merge Columns' button and do this:

Screen Shot 2015-06-05 at 11.45.14 AM

Make sure to parse dates as yyyymm.

The final step is to click on 'nodes' in the upper left, click 'copy data to other column,' select 'id,' and copy to 'label.'

Screen Shot 2015-06-05 at 11.47.00 AM

Step Three: Explore your Data

You'll now notice that you have the option to enable a dynamic timeslider at the bottom of your Gephi window.

While a full tutorial in Gephi visualization is outside the scope of this document, you may want to use the following to expand the link nodes. Select the 'overview' tab to see a preview of the node structure. Under layout, use the 'Force Atlas' visualization to move the nodes.