Saturday, June 28, 2014

Learning to Graph with D3

I've been meaning to play with D3 for quite some time. Maybe the only thing stopping me was the seemingly overwhelming amount of features available in the library. There's a definite learning curve involved to get in the right mindset required to use D3. However, once you get there, its actually really easy to use and, surprisingly, does a lot of the heavy lifting for you. As you navigate through the documentation, you need to separate the basics from the advanced features. If you need a static graph, you only need to learn a subset of the library to be successful. Once you get that foundation in place, you can build into transitions, interactivity, and layouts.

For my first foray into the world of D3, I chose to keep things as simple as possible and focus on deconstructing the multivariate example graph to attempt to isolate the basic components of a graph. I was really interested in what was actually happening at each step and how the pieces fit together. Below, I'll attempt to iteratively build the example one feature at a time to get to the final product. I consider each step, a basic component of any graph so you should be able to take the pieces and assemble them together to rapidly start your own "first" graph.

To start, you need to include D3 on your page. The library is hosted on the D3 site, so, for testing, its really easy to get up and running:

  <script src="http://d3js.org/d3.v3.min.js"></script>


Define the Plot Area

Next, we need a canvas to draw our graph. Based on the example, the setup is probably the same across most graphs. You define a total size and add some margins to reserve space for axis labels and titles. The remaining area is your actual area that contains the plot space:


var margin = {top: 20, right: 20, bottom: 30, left: 50},
    width = 600 - margin.left - margin.right,
    height = 400 - margin.top - margin.bottom;

var svg = d3.select("body").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)

    .style( "background-color", "green")
  .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

svg.append( "rect" )
   .attr( "width", width)
   .attr( "height", height)
   .attr( "fill", "blue");



Which yields the following:



I filled each region to clearly identify the different spaces. The overall SVG canvas is the green and blue space. The plotting space is actually the blue rectangle. All our data will be mapped into this coordinate space to render the graph.

Define Coordinate Mapping Functions

The next step is to define how our data will be mapped into the plotting space. D3 offers several types of scales that can be utilized to map from the data domain into the plot range. The terminology is important because you use specific functions on the scale classes to define how to map between the data and plot spaces:


var x = d3.scale.linear()
    .domain([1, 10])
    .range([0, width]);

var y = d3.scale.linear()
    .domain([5, 25])
    .range([height, 0]);



Here, I've defined a mapping function for my x-axis and y-axis. For simplicity, I've manually set the input domains for x and y. The next question to answer is what are x and y? They are both functions and object instances. Most things in D3 generate a function that attaches public functions to its internal scope which offer a way to customize the behavior of the generated function. The end result allows you to very naturally define different helper functions that behave in a consistent way. There are several types of functions in D3 that offer different methods of mapping data into the plot area. When you call x above with 3, it will return 115.5 which means our point should be plotted 115.5 pixels right of the left edge of that blue rectangle.

Another important thing to note is how the y function specifies height as the lower bound and zero as the upper bound. This is necessary because SVG plots increasing y values downward on the page but our graphs plot increasing values up on the graph. The flip creates an inverse relationship so our graph plots as expected. To see the difference, let's quickly define a scatter plot:


var valX = [1, 3, 7, 9],
    valY = [7, 11, 20, 25];



Now, using the [height, 0] relationship,



And now, flipping it to [0, height] but using the same X/Y values:



Typically, you won't hard code values into the scale's domain. Instead, you'll calculate the domain from the data dynamically. However, you can't actually do that until the data is loaded. I've seen the definition of the X/Y scales split such that the initial definition is before loading the data with the ranges defined based on the predefined plot area and then, once the data is loaded, complete the definition of the scale by finding the domain:

  
var x = d3.scale.linear()
    .range([0, width]);

var y = d3.scale.linear()
    .range([0, height]);

 ...

svg.tsv( "data.tsv", function( error, data ) {

/*
  pretend data looks like this:

  var data = [
    { x: 1, y: 7  },
    { x: 3, y: 11 },
    { x: 7, y: 20 }, 
    ...
  ];
*/

  // Finish defining the scales based on the loaded dataset
  x.domain(d3.extent(data, function(d) { return d.x; }));
  y.domain(d3.extent(data, function(d) { return d.y; }));

  ...
});


Being new, I kept running into examples that left out the original definitions of the X/Y functions and only showed the part that set the domain. While it may make sense to split these since the data is more likely to change than the size of your plot, keeping them together for the purpose of learning made more sense to me so I could easily see the whole definition in one spot.

Now, if you've been following along with the example I was using as my starting point, you'll notice the x-axis is a time scale because the input data has a date component. The data format in the file is also a bit different since it contains two y-axis components. The setup for this data is very similar to the above case:

  
/*
Pretend data looks like this:

date high low
20120701 62.2 56.3
20120702 58.8 54.0
20120703 60.4 52.3
20120704 58.8 53.6

Once loaded with d3.tsv, you'll have an array of objects:

  var data = [
    { date: 20120701, high: 62.2, low: 56.3  },
    { date: 20120702, high: 58.8, low: 56.3  },
    { date: 20120703, high: 60.4, low: 52.3  },
    ...
  ];
*/

var x = d3.time.scale()
    .domain(d3.extent(data, function(d) { return d.date; }))
    .range([0, width]);

var y = d3.scale.linear()
    .domain([d3.min(data, function(d) { return d.low; }), d3.max(data, function(d) { return d.high; })])
    .range([height, 0]);



We'll use those x/y functions next to draw the area on the graph.

Drawing Things

So far, we've not drawn anything useful on our graph. Everything has been about setting things up so we can draw something. The great thing about D3 is that you can use it to draw anything in the DOM. Its not limited to SVG or the SVG helper functions defined in the library. In the examples above, I just used SVG circle primitives to create the points on the graphs. I used D3 to map the points onto the graph via the X/Y scaling functions I defined. The purpose of the SVG classes included in D3 is to provide some additional features related to processing the data set. Not only does it make drawing the shapes easier, there are options to add interpolation as well as project data onto a polar coordinate system.

The example I was following used an svg.area class to plot the data onto the graph. The area will take the entire data set and build one path object to add to the SVG canvas. The area is composed by defining an area function and describing how to calculate each point in the path by iterating over each item in the data set. Our data contains a date for the x-axis and the low/high values for two different y-axis coordinates. Each of these values must pass through our X/Y scale functions to map from the data space into the plot space:

  
var area = d3.svg.area()
    .x(function(d) { return x(d.date); })
    .y0(function(d) { return y(d.low); })
    .y1(function(d) { return y(d.high); });



If you just take the array of data points defined above and pass this to the area function defined above,

  // See what area returns ....
  area( data )


you'll get a string representing a SVG path:

  M0,
  347.65886287625415L6.496350364963504,
  323.5785953177258L12.992700729927009,
  323.5785953177258L19.489051094890513
  ...


That's exactly what we want to draw on our graph, so let's plug that into an svg.append call to create the path and generate the plot:

  
  svg.append("path")
     .style("fill", "rgb(127,201,127)")
     .attr("d", area(data) );





Adding Axis Labels

A shape with no context provides no meaning so its time to add some labels. Another set of helper functions makes it easy to generate the labels:

  
var xAxis = d3.svg.axis()
    .scale(x)
    .orient("bottom");

var yAxis = d3.svg.axis()
    .scale(y)
    .orient("left");

  svg.append("g")
      .call(xAxis);

  svg.append("g")
      .call(yAxis);



And results in this graph:



I left out all styling to see exactly what I'm getting from the axis functions. Notice how the axis are added to the graph canvas - the call method is used on the defined function. This function calls the passed in function with the current selection passed as the first argument of the function. In this case, the axis function will draw a bunch of SVG objects into the SVG group just appended prior to invoking call. So unlike the area function above which returned a string that was set as the definition of a path object, the axis functions generate multiple SVG nodes and appends them to the canvas.

Since there are only some basic controls on how the labels are generated, you'll probably want to style them after D3 generates them for you. First, you might want the x-axis on the bottom. This requires a transform to move it to the bottom of the graph:

  svg.append("g")
      .attr("transform", "translate(0," + height + ")")
      .call(xAxis);


Next, you can target the SVG content generated by D3 and style it to your liking. Classes are added to the content generated by the axis function so you can define styles to target those elements. This is a snippet of the markup generated by the xAxis I defined above:

<g transform="translate(0,450)">

<g class="tick" style="opacity: 1;" transform="translate(38.68613138686132,0)">
<line y2="6" x2="0"/>
<text y="9" x="0" dy=".71em" style="text-anchor: middle;">May 27</text>
</g>

<g class="tick" style="opacity: 1;" transform="translate(92.84671532846716,0)">
<line y2="6" x2="0"/>
<text y="9" x="0" dy=".71em" style="text-anchor: middle;">Jun 10</text>
</g>

<path class="domain" d="M0,6V0H530V6"/>
</g>


You can see there are elements for each tick mark, the tick label, and the axis line. All of these can be styled by tag or class:

.axis path,
.axis line {
  fill: none;
  stroke: #000;
  shape-rendering: crispEdges;
}


The .axis was defined in the original example and I left it intact since it makes sense to further identify the specific groups that hold the axis content and which axis they represent:

  svg.append("g")
      .attr("class", "x axis")
      .attr("transform", "translate(0," + height + ")")
      .call(xAxis);

  svg.append("g")
      .attr("class", "y axis")
      .call(yAxis);


With those changes, our graph now looks like this:



One additional tweak I made was to define how to calculate which ticks to render. To make the example fit inside the bounds of my blog post, I had to reduce the width which resulted in the x-axis labels getting jumbled together. I originally was just going to rotate them, but figured I learn how that aspect of the axis function worked:


var xAxis = d3.svg.axis()
    .scale(x)
    .ticks(d3.time.weeks, 2)
    .orient("bottom");



Since this is a time scale, we need to use one of the handy time helper functions which will handle shaping the input data into something that allows you to articulate intervals and steps through the data. Technically, the ticks function on the axis just passes through to the equivalent scale.ticks so I could have specified it there:


var x = d3.time.scale()
    .domain(d3.extent(data, function(d) { return d.date; }))
    .range([0, width])
    .ticks(d3.time.weeks, 2);



And that would be sufficient. However, it feels more natural defined here since this is the function generating the visual representation of that data.

The only missing element on the graph now are both axis titles and a graph title. There's no library function to do this work. You can use D3 to select, append, and position SVG text objects. I left those out since they really just fall under normal DOM manipulation akin to what you might do in jQuery. The only real difference is how you position the SVG elements. From the example, this is how the temperature title was added:

  svg.append("text")
      .attr("transform", "rotate(-90)")
      .attr("y", 6)
      .attr("dy", ".71em")
      .style("text-anchor", "end")
      .text("Temperature (ºF)");


Nothing D3 specific here other than the svg variable is from the d3.select("body").append("svg") line earlier in the code.

So that covers my first dive into D3 and how I broke apart each piece to attempt to learn the basics of each component. The example I worked with utilizes what seems to be the basic building blocks of any static graph. Granted, there are many types of graphs and many types of scales but when you construct one with D3, you'll start with your scales for each axis, one or more drawing functions, labeling helpers, and then shaping your data into a form required by those different tools. The latter part is only possible if you understand what all the former pieces are expecting for inputs and, subsequently, the form of the generated output. After that, its a matter of mixing and matching the various options together to compose a graph.