Friday, May 24, 2013

Transforming Results from the Yahoo Contacts API to a Backbone Collection

Continuing along with my last post about the nice features of Underscore, I decided to document my transformation of the Yahoo Social API contacts result set. As flexible as it may be for enabling syncing of copies of a user's address book, its not exceptionally usable in a Backbone collection. Here's an example contact from the response body:



{
"uri" : "...",
"id" : 132,
"isConnection" : false,
"created" : "2013-05-21T15:05:16Z",
"updated" : "2013-05-21T15:05:56Z",
"categories" : [ ],
"fields" : [
{
"categories" : [ ],
"created" : "2013-05-21T15:05:16Z",
"editedBy" : "OWNER",
"flags" : [ ],
"id" : 228,
"type" : "nickname",
"updated" : "2013-05-21T15:05:16Z",
"uri" : "...",
"value" : "Frank"
},
{
"categories" : [ ],
"created" : "2013-05-21T15:05:16Z",
"editedBy" : "OWNER",
"flags" : [ ],
"id" : 229,
"type" : "email",
"updated" : "2013-05-21T15:05:16Z",
"uri" : "...",
"value" : "frankyb@yahoo.com"
},
{
"categories" : [ ],
"created" : "2013-05-21T15:05:16Z",
"editedBy" : "OWNER",
"flags" : [ ],
"id" : 230,
"type" : "email",
"updated" : "2013-05-21T15:05:16Z",
"uri" : "...",
"value" : "peasandcarrots@gmail.com"
},
{
"categories" : [ ],
"created" : "2013-05-21T15:05:16Z",
"editedBy" : "OWNER",
"flags" : [ "MOBILE" ],
"id" : 231,
"type" : "phone",
"updated" : "2013-05-21T15:05:16Z",
"uri" : "...",
"value" : "333-555-1212"
},
{
"categories" : [ ],
"created" : "2013-05-21T15:05:16Z",
"editedBy" : "OWNER",
"flags" : [ "HOME" ],
"id" : 232,
"type" : "phone",
"updated" : "2013-05-21T15:05:16Z",
"uri" : "...",
"value" : "333-444-1212"
},
{
"categories" : [ ],
"created" : "2013-05-21T15:05:16Z",
"editedBy" : "OWNER",
"flags" : [ ],
"id" : 227,
"type" : "name",
"updated" : "2013-05-21T15:05:56Z",
"uri" : "...",
"value" : {
"familyName" : "Haberhorn",
"familyNameSound" : "",
"givenName" : "Franklin",
"givenNameSound" : "",
"middleName" : "",
"prefix" : "",
"suffix" : ""
}
},
{
"categories" : [ ],
"created" : "2013-05-21T15:05:56Z",
"editedBy" : "OWNER",
"flags" : [ ],
"id" : 233,
"type" : "address",
"updated" : "2013-05-21T15:05:56Z",
"uri" : "...",
"value" : {
"city" : "Lake Mary",
"country" : "United States",
"countryCode" : "US",
"postalCode" : "30555",
"stateOrProvince" : "MI",
"street" : "PO Box 4657848"
}
},
{
"categories" : [ ],
"created" : "2013-05-21T15:05:56Z",
"editedBy" : "OWNER",
"flags" : [ "WORK" ],
"id" : 234,
"type" : "address",
"updated" : "2013-05-21T15:05:56Z",
"uri" : "...",
"value" : {
"city" : "Orlando",
"country" : "United States",
"countryCode" : "US",
"postalCode" : "30555",
"stateOrProvince" : "FL",
"street" : "555 Rose Ct"
}
}
]
}




I purposely created a contact with multiple addresses, phone numbers, and emails to help visualize and test the mapping. My goal is to take that object and turn it into the following:





{
"uri" : "..."
"id" : 132,
"created" : "2013-05-21T15:05:16Z",
"updated" : "2013-05-21T15:05:56Z",
"isConnection" : false,
"categories" : [ "Work" ],
"address" : [
{
"flags" : [ ],
"value" : {
"city" : "Lake Mary",
"country" : "United States",
"countryCode" : "US",
"postalCode" : "30555",
"stateOrProvince" : "MI",
"street" : "PO Box 4657848"
}
},
{
"flags" : [ "WORK" ],
"value" : {
"city" : "Orlando",
"country" : "United States",
"countryCode" : "US",
"postalCode" : "30555",
"stateOrProvince" : "FL",
"street" : "555 Rose Ct"
}
}
],
"email" : [
{
"flags" : [ ],
"value" : "frankyb@yahoo.com"
},
{
"flags" : [ ],
"value" : "peasandcarrots@gmail.com"
}
],
"name" : {
"familyName" : "Haberhorn",
"familyNameSound" : "",
"givenName" : "Franklin",
"givenNameSound" : "",
"middleName" : "",
"prefix" : "",
"suffix" : ""
},
"nickname" : "Frank",
"phone" : [
{
"flags" : [ "MOBILE" ],
"value" : "333-555-1212"
},
{
"flags" : [ "HOME" ],
"value" : "333-444-1212"
}
]
}





I want my fields to be keys in my object which will reference the assigned value. In the case where one field repeats with different values, I want a single key to reference an array of values. Downstream, my model will know how to map this object into several child collections that will maintain the multiple items and set a primary version to use in lists and searches. Most likely, I'll add another layer to flatten/rename things to match with my model better. However, for now, I'm just focusing on a rough normalization of the data.

The following function will transform the JSON parsed response into the desired format:



function transform( data ) {

return (
_.map(data.contacts.contact, function (rec) {

return (

/*
* For each contact, build a new object hash from the
* data return by the API
*/
_.extend(

/* Start with the top-level keys. Leave the fields and categories for the next two steps */
_.omit(rec, "fields", "categories"),

/*
* Handle each object in the fields array.
* Since the "type" field can repeat for different values,
* they can't just be plucked out of the collection. Instead,
* they need to be grouped and mapped into either an array of
* values or just the single value.
*/
_.chain(rec.fields)

/*
* results in an object where the key is the unique set of types (field names)
* in the fields array and the value of the key is an array
* of the original field objects
* output: [ { type: [ { field_object } ], type: [ { field_object }, { field_object }, ... ], ... ]
*/
.groupBy(function (f) { return f.type; })

/*
* build key/value array pairs
* detect multiple values as output by the groupBy
* if more than one, the value of the key will be an array
* of hashes with the flag (sometimes distinguishes multiple values)
* and the actual value. If only one, the key will hold the value string
* (flat, no array)
* output: [ [ type, value ] | [ type, [ { flags: [ ... ], value: ... } ], ... ]
*/
.map(function (v, k) {
return [ k, v.length > 1 ? _.map(v, function ( o ) { return _.pick(o, "flags", "value"); }) : v[0].value ]})

/* Take the array of array key/value pairs and create the object */
.object()
.value(),

/* Add key with an array of categories */
{ categories: _.pluck(rec.categories, "name") })
);
})
);
}




Now you can call this from jQuery's deferred.done() after making the request. For testing, I saved a call to the API into a local file and used $.get() to retrieve it. The transform sets a global variable so I could easily inspect it in Firebug:



$(function() {

$.get('yahoo-contacts.json')
.done(function (data) {
YAHOO_CONTACTS = transform(JSON.parse(data));
});
});



To use the data in a Backbone collection, the transform call probably is best placed in the Collection.parse method. Simply returning the result from transform will then cause Backbone to pass each normalized object to Model.parse as it instantiates each model. The model instance can then define its own parse method to perform further processing:



var Contact = Backbone.Model.extend({

parse: function ( data ) {
// More work ...

return data;
}

});

var YahooContacts = Backbone.Collection.extend({

model: Contact,
parse: transform,
url: 'yahoo-contacts.json'

});

/*
* Now load all the data ...
*/
$(function() {
list = new YahooContacts();
list.fetch();
});



In my case, I'm trying to normalize several sources of contact data into one standard model representation. Different collections are defined for each source end-point and then utilize the same model for storing the actual dataset. All the presentation logic is unaffected and can display contacts from any source. Since I'm only providing read-only views of the data, maintaining the original format is not a concern since I won't be sending anything back.

Monday, May 20, 2013

Writing More Functional Javascript with UnderscoreJS

I scrolled through some of my code recently and realized that not once in the 500 lines that I wrote did I use a for loop. In fact, I rarely ever do any longer. Between jQuery's each/map and the plethora of functions available in Underscore, there's just no need to use loops to iterate over objects and arrays. In general, code becomes easier to write and read when using discrete components to apply transforms to manipulate data contained in arrays and object hashes. However, it is also possible to get a little carried away and attempt to do everything through these functions where maybe one iterator will probably be sufficient.

Generally, the balance is based on simplicity. Converting an object hash to a string suitable for a URL query string seems like a reasonable transform to leverage Underscore's library of functions. Compared to its more imperative counter-part which would require several intermediate variables to track state, this version neatly tucks those details away inside the map() function:


queryString: function (params) {
return _.map(params, function (v,k) { return k+'='+encodeURIComponent(v); }).join('&');
}


Granted, internally, Underscore may revert to using a for loop to iterate over the object, but that's an implementation detail you don't need to worry about.

To keep it simple, the above function doesn't take into account converting an array into a repeating set of key/value pairs. In fact, the reverse process that does account for this scenario is a little more difficult to write as a chained set of Underscore calls. Here's a version that uses a for loop and no other Underscore function to parse a query string into an object hash:


function parseQuery ( query ) {

var pairs = (query || "").split('&'),
obj={};

for( i=0;i<pairs.length;i++ ) {

var p = pairs[i].split('=');

p[1] = decodeURIComponent(p[1] || '');

if ( obj[p[0]] ) {
obj[p[0]] = _.isArray(obj[p[0]]) ? obj[p[0]].concat([p[1]]) : [obj[p[0]], p[1]];
} else {
obj[p[0]] = p[1];
}
}

return obj;

}


There seems little room for leveraging much of Underscore to accomplish the task. The most obvious choice is to replace the for loop with an each() function:


function parseQuery2( query ) {

var obj = {};

_.each((query || "").split('&'),

function (pairs) {

var p = pairs.split('=');

p[1] = decodeURIComponent(p[1] || '');

if ( obj[p[0]] ) {
obj[p[0]] = _.isArray(obj[p[0]]) ? obj[p[0]].concat([p[1]]) : [obj[p[0]], p[1]];
} else {
obj[p[0]] = p[1];
}

});

return obj;
}


And if you're really bothered by the if/else needed to detect the repeating key, you could just use an array for everything and remove it later through a call to map:


function parseQuery2( query ) {

var obj = {};

_.each((query || "").split('&'),
function (pairs) {

var p = pairs.split('=');

p[1] = decodeURIComponent(p[1] || '');

obj[p[0]] = (obj[p[0]] || []).concat([p[1]]);

});

// Pull single values out of the array ( ie { a: ["1"] } )
return (
_.chain(obj)
.map(function (v, k) { return [k, v.length == 1 ? _.first(v) : v]; })
.object()
.value()
);
}



Whether that's better than just sticking with the if/else might be a matter of opinion. It seems like introducing two more loops, even if they do run in linear time, just adds more work that could otherwise be accomplished in one iterator.

I've found that transforms that follow through distinct steps that can be chained together are really good candidates for use with Underscore's library. Consider a situation where you need to modify the names of the keys in an object. Generally, modifying the values is quite straight forward but the other way around is a bit harder. Using the following simple object:



{
pre_a: "1",
pre_b: "2",
xpre_c: "3"
}




We don't want the pre/xpre part and need to find a way to rebuild the object with new keys. An easy way to achieve this with Underscore is to convert the object to an array of arrays, perform the replacement, and then convert back to an object:



function modifyKeys( obj) {

return (
_.chain(obj || {})
.pairs()
.map(function (p) { return [ p[0].replace(/^[x]?pre_/, ''), p[1] ]})
.object()
.value()
);
}



The intermediate output from each step using the example object would look like this:



pairs() => [ ["pre_a", "1"], ["pre_b", "2"], ["xpre_c", "3"] ]
map() => [ ["a", "1"], ["b", "2"], ["c", "3"] ]
object() => { a: "1", b: "2", c: "3" }



Thinking functionally may not always be easy, however, the benefits of becoming proficient with the concepts and the library of functions provided by Underscore can significantly reduce the amount of effort necessary to manage object and array transformations. Need and experience generally leads to better comprehension of the problems that can best leverage a given solution. If you haven't tried it yet, take a look at your code and attempt to convert a loop iterating over an object/array and see if an Underscore function or two can reduce the amount of code required to accomplish the manipulation.

Sunday, May 12, 2013

RSpec: Simple Tests to Learn its Behavior

Few people would argue that making it a point to ensure some level of testing is in place to verify that the functionality built works as expected is a bad practice. Size and complexity will typically dictate the testing philosophy used on a project. I tend to balance the effort required to build and maintain the automated tests versus the effort required to manually retest a module when a change is introduced. Fortunately, RSpec has a lot of power available for enabling reuse, readability, and, most importantly, brevity. All good things for developers trying to meet tight deadlines. Generally, I attempt to keep it as simple as possible by carefully orchestrating the basic let()/it() combined with good use of context to build most of my tests. Granted, when I wrote my first test, I was definitely fighting the expected philosophy of RSpec. Once I started to understand the behavior of each of its components, the pieces started falling into place and devising examples became a lot easier. Here's a few of trials I used to gain more clarity around how the different components work together.




Using context to control scope



Context creates scope for the examples such that let() definition in other contexts can't interact with the current context:


describe "let() scope" do

# Visible to both context blocks
let(:var1) { 3 }

context "block 1" do
# Only visible to this block
let(:var2) { 5 }

# Error
specify { var3.should eq(7) }
# Ok
specify { var2.should eq(5) }
specify { var1.should eq(3) }
end

context "block 2" do
# Only visible to this block
let(:var3) { 7 }

# Error
specify { var2.should eq(5) }
# Ok
specify { var3.should eq(7) }
specify { var1.should eq(3) }
end

end



Running this test results in these failures:



1) let() scope block 1
Failure/Error: specify { var3.should eq(7) }
NameError:
undefined local variable or method `var3' for #<RSpec::Core::ExampleGroup::Nested_1::Nested_1:0xb6b78ef8>
# ./spec/test_spec.rb:14

2) let() scope block 2
Failure/Error: specify { var2.should eq(5) }
NameError:
undefined local variable or method `var2' for #<RSpec::Core::ExampleGroup::Nested_1::Nested_2:0xb6b763d8>
# ./spec/test_spec.rb:25




Redefine let() value



Define the value once and redefine it as necessary. Context blocks can work with the default or override it to something different for the given context without affecting another context:


describe "let() redefined" do

let(:var1) { 3 }

context "block 1" do
let(:var1) { 5 }

specify { var1.should eq(5) }
end

context "block 2" do
let(:var1) { 7 }

specify { var1.should eq(7) }
end

# Unaffected by block 1 and 2
context "block 3" do
specify { var1.should eq(3) }
end
end



Cascading usage of let() values



Use previous let() values to define other values in a given context. This enables reuse and readability throughout the different contexts and examples:


describe "let() reuse" do

let(:var1) { 3 }

context "block 1" do
let(:var2) { var1 + 5 }

specify { var2.should eq(8) }
end

context "block 2" do
let(:var2) { var1 + 7 }

specify { var2.should eq(10) }
end

end



Be aware of recursion



Values created by let() are really just methods which will be called when referenced. Trying to define the same value again by referencing itself will cause a recursion error:


describe "let() recusive" do

let(:var1) { 3 }

context "block 1" do
# Fail
let(:var1) { var1 + 5 }

specify { var1.should eq(8) }
end

end


You'll see something like this. In more complex tests, these might be hard to find. Until it completely clicked what let() was really doing, I didn't entirely realize what I had done:


3) let() recusive block 1
Failure/Error: let(:var1) { var1 + 5 }
SystemStackError:
stack level too deep




Obviously, these are quite simple. However, I keep a little scratch pad of simple examples that I can tinker with and try out ideas to see the behavior before trying it out in a larger, more complex test script. Once you start adding in a database environment, application state, etc, it becomes difficult to stay true to the simple principles available in the test suite. Sometimes, going back to basics allows you to see things more clearly and apply them in other contexts.

Tuesday, May 7, 2013

Adventures in OAuth for Securing REST API Services

So you want to build a REST API that the rest of the world can use.  Maybe you have an internal application that can benefit from integrating with your external customers' and vendors' systems.  Even if the services you plan to build have no outside users, you may have internal groups that may need to consume these resources.  Clearly, securing these resources are a top priority and using an open standard has its many benefits over creating something proprietary.  However, once you take a precursory glance at OAuth, you realize there's a lot more to it than some token exchanges and signatures.  In fact, there are more than one version floating around with various different flows depending on the type of entity trying to consume the service.  Feeling overwhelmed, you wonder if maybe just hacking together something that works for your specific solution will be adequate.  However, deep down inside you know it will come back to bite you if you go down that path.  And after all, your a software engineer, this should be in you wheelhouse, right?  So you hunker down and start reading...

Taking a Step Back


After reading a lot of articles, code, RFCs, and specific implementations (Yahoo, Google, Twitter, to name a few), I decided to try to decompose the problem into a few pieces:

  • Access Authorization - This is concerned with controlling access to the resources provided by the API. There are generally two levels of authorization:

    • Client - This level identifies the party authorized to access your services. The only context is the client and there is no user-based data available with this level of access. The purpose of this control is to be able to attach policies to the client with the intention of call limits, usage metering, and restricting the available services.

    • User/Owner - This level enables creating a context to provide access to user-based data. The purpose of this control is to ensure clients can only access user data that the owner of the data has explicitly granted access.  The process of granting/revoking access is not concerned with verifying the user's identity.



  • Users Authentication - This is the process of verifying the user is actually owner of the data they want to access. Various strategies can be employed to verify the data owner's identity.  Because it is typically involved in the authorization process, there is some level of coupling required to properly transition between the services and change state.

  • Request Verification - This area is concerned with ensuring that requests are authentic. The goal is to thwart attacks that might allow access to resources to another party than who should have the access.  This process can be difficult to implement since both the client and server libraries must agree on the strategy and it puts more burden on the server to maintain some state on prior requests.


A Closer Look


The next step is to look at existing standards to see how each problem is addressed.  Since OAuth is the predominant standard in use, its a good starting point.  You can argue over specifics of how version 1 or 2 do things better or worse but the primary concern I have is to look at how each area is addressed to try to establish a common denominator among solutions.

Authorization


Here's a very typical flow for authorizing access to protect owner data.  The primary goal is not to reveal anything about the owner except what's available through the scope defined in the final access key.  The combination of consumer/owner defines the relationship and level of access.  All the versions of OAuth define a authorization flow to gain access to protected user's data.  While there are clear differences between flows, the main take-away is that this all boils down to three main steps: Initialize-> Authorize -> Finalize.

OAuth 1.0 Authentication Flow

Authentication


There's nothing new about validating a user's identity.  Just about every application does it and various strategies exist to perform this task.  What you end up with depends on the sensitivity of the data your protecting.  Keeping the details of how you protect accounts, manage passwords, and access rights should be neatly tucked away in this service.  It does need to be aware of an authorization request and properly affirming the authentication was successful so that process can continue.

Verification


It is important to verify the authenticity of each request made to your API.  Assuming that all inbound requests are original and not manipulated is naive.  OAuth 1.0 dedicates a large portion of the specification to constructing verifiable requests.  The burden is on the service provider to implement the strategies detailed in the specification.  Even if you chose to use another authorization solution (like OAuth2), adding a signature and nonce/timestamp to the request makes a lot of sense.  Granted, there is definitely some complexity and performance concerns with adding this layer of validation but all the effort in securing the authorization and validating the user's credentials seems pointless if you don't check that the requests being processed are even valid.

Putting It All Together


All three of these areas contribute to an overall solution to secure REST API services.  Depending on the version of OAuth you intend to use, the standard will address some, but not all of these areas.  The challenge is to architect a solution that properly distinguishes each mechanism and does overly couple each component to the other.  Additionally, since user interaction is implied in several of these areas, a presentation layer is necessary to enable user input.  In a pure services solution, these components will need to be properly separated to preserve the distinction between the control logic and the presentation.

After segmenting the solution into these different buckets, I'm still faced with realizing both server-side and client-side components to implement the different strategies.  As I've reviewed different solutions, most have the presentation is mixed with the services logic.  This has made it difficult to find solutions that can be readily plugged into my environment without compromising the architecture I'm trying to achieve.  My goal is to have this solution sit as an initial layer in front of both the client and server logic so its mostly transparent to the rest of the application but provides a very simple interface that can be used to verify the system is in the correct state for the given context (public vs private access, access scope, etc).  Ensuring that services are distinct from presentation is a top priority.  As a starting point, I've identified the following major components that need to be built or integrable off-the-shelf:

Client-side


Browser-based using a BackboneJS framework to consume REST API services and render presentation and manage user interaction.

  • OAuth Authorization Flow Controller - ensures all necessary access tokens are maintained, publishes some events, intercepts requests and adds necessary verification elements.

  • OAuth Adapters - implementation details for specific versions of OAuth and variations found in specific provider's solutions.

  • URI and Verification Utilities - helper libraries that implement the low-level processing required in an OAuth-based solution.

  • Authentication Controller - implements a login/authorization page for use with both external and internal consumers


Server-side


Use a Ruby-based stack.  Two distinct servers - one to deliver assets to the browser-base client and the other to implement the REST API services.

  • OAuth Provider - expose services to implement the authorization flow

  • Identity/Authentication - expose services to provide user identify verification and information

  • OAuth Verification - rack middleware to verify request authenticity and setup the user context for all downstream handlers

  • Authorization Library - common interface for all service to interact with to manage state and provide information


All of these points are discussions within themselves and different solutions exist that can address each of them.  At this juncture, this is just a rough sketch of the direction that seems to make sense now.  The ultimate solution depends on the available tools and libraries that currently exist and finding an appropriate balance between how coupled the different parts of the solution need to be to retain maintainability and flexibility.

Until then, the adventure continues...