Sunday, June 8, 2014

Searching for Patterns when Developing Browser Extensions

Browser plugin development has become as easy as writing a plain vanilla web application. The only difference is it's installed on the user's computer and has a lot more access to local resources. As such, many libraries and design patterns utilized in browser-based web applications can be reused within extension development. There are some quirks here and there to learn related to scope and security. However, most importantly, you'll have to rewrite portions of your extension to deploy it in different browsers (if they even support a HTML/JS based plugin). Both Chrome and Firefox offer APIs supporting a HTML/JS based plugins which is a good enough reason for me to explore the possibilities of what use cases I can find for them. I started with Chrome because Firefox has more installation requirements and, in my opinion, more development hurdles when you wanted to test your creation. Chrome simply requires you to make a directory and tell it to load the extension (and reload it after every change). That's the kind of simplicity I expect in my development environment.

Maybe the biggest learning curve when trying to build an extension for the first time is deciphering the terminology for the different execution contexts available in the application. Additionally, identifying what functionality exists in each execution scope that affects how to organize the extension's logic and any limitations that limit adopting techniques from traditional web application development. Using the diagram on the left as a guide, you can see there's really only two scopes to manage in an extension. One is the extension context and the other is the page context. Each browser platform names these differently, but the purpose of each is essentially the same. The challenge is decomposing several key components and identifying their role within each context. I've named a few pieces that I think make sense and, personally, would find useful when building an extension.

As I worked on my extension, it became apparent that a small layer of abstraction to facilitate the interaction of the different contexts and normalize the creation of UI components to create a consistent mechanism for managing communication between scopes would be useful. Since I was starting in Chrome with aspirations of also deploying it in Firefox, I'd like to make as much of my extension reusable across different browser extension platforms. Chrome and Firefox have many similarities related to how to structure an extension, context isolation, security, and API capability. However, there are nuances and, the more those can be abstracted, the easier it will be to port between browsers.

The goal in this discussion is to investigate how to break up the extension into logical chunks and stub out a few key code blocks to help create some concrete examples of what parts of a final solution might look like. In subsequent posts, I'll build off of those pieces to create a simple library to encapsulate these core pieces and demonstrate how to use these components to build a simple extension.

Extension Context

Code that runs in the extension context has access to the full browser API, however, has no access to the page DOM. In Chrome, they use the terminology background pages, which have been confusingly split into "persistent" background pages and event pages. The responsibilities placed on this area of the program include providing access to the browser API, activating the extension, managing extension level UI pages, and storing/retrieving global settings/data. Visually, you can add an icon to the toolbar which enables user interaction from the browser level regardless of the site or content displayed in the browser.

Extension Controller

Early on, it was apparent that the primary role of the "background page" was to act as a main messaging hub and maintain application state. As such, all the other components will ask for services or information through the browser's messaging services (in Chrome its part of the runtime API) using a request/response pattern:

chrome.runtime.onMessage.addListener(

   function( request, sender, sendResponse ) {
      // request has two keys - topic and data
      // topic is a string, data is a hash relevant to the topic
      // which the handler will understand
      switch ( request.topic ) {

         case 'subject.scope':

            sendResponse({ foo: bar });
            break;

         ...
      }
   }

);


This chunk of code implements the line labeled "A" in the above diagram. The important thing to note in the code is the structure of the message. The request is an object containing a "topic" and related "data" keys. The topic is a string structured with both a subject and a scope to help avoid collisions. You might use topics like "user.data" to fetch data about the current user, "user.login" to start the authentication process, or "site.monitor" to register a callback when certain conditions are met using the Chrome events API. In any case, its an opinionated structure to the messaging that is not inherently enforced in the browser API so we want something that can help ensure that there is consistency to the requests.

Popup

A popup in the extension context is a UI component that can run any web page content inside a browser provided window outside the context of the page. These pages are useful for collecting settings or information required by the extension to customize the user experience. Technically, a popup is not even required. You can load these pages into a new tab and offer a complete web application inside the browser without connecting to a server to download the content. You're extension could run as a locally installed web application activated by clicking a button on the browser tool bar. Granted, distributing updates to your application requires more effort but its an interesting concept for building client-side web apps.

Activating your popup or page can happen in one of two ways. Either it can be automatically triggered based on the manifest.json configuration:

  "browser_action": {
      "default_title": "My Bookmarks",
      "default_icon": "icon.png",
      "default_popup": "popup.html"
  },


Or, you can manually open a page in a tab or popup from the background page:

chrome.browserAction.onClicked.addListener(function(tab) {
  var manager_url = chrome.extension.getURL("manager.html");
  focusOrCreateTab(manager_url);
});


The former is obviously easier and works well if you only have one page that needs to be displayed. The latter example offers significantly more control and comes in handy if different pages should be displayed under different conditions. I'd guess that most extensions can implement one popup page to enable extension level user interaction and that's why there is a section to configure it in the manifest. In the extension I'll be building, there is only one popup page to collect some user specific data. Everything else operates inside the page context.

Page Context

Code running inside the page scope can access the DOM of the currently loaded page. Anything you would do in a normal web page to query and manipulate the DOM, you can do in this scope. The biggest difference is that all the code loaded in this context is isolated from any code running on the page. If jQuery is loaded on the current site, you're extension code can't use it. You must load your own copy of jQuery. The only thing shared between the page loaded in the browser window and the extension is the DOM.

Since you can manipulate the DOM, you can inject content into the current site. While this may seem great, in practice it may not be the best approach. Injecting anything into the DOM is subject to the current styling of the page which may cause undesirable results on the injected markup. Visa versa, any style sheets injected into the page may adversely affect the page and disrupt the functionality of the current site. As such, only minor changes should be made and maybe only when you know exactly how the site reacts to those changes. More elaborate UI elements need to be isolated from the current page.

Based on these considerations, I've broken the page context into three pieces. First, a controller manages all the components in the page context and any communication with the extension context. Second, a monitoring agent helps identify interesting events to the current site which the extension may want to use to trigger an action. Finally, an iframe container builds the foundation for UI components that will be instantiated to allow user interaction within the page.

Page Controller

This part of the extension is used to manage any logic required at the page context. Its primary purpose is to monitor events from page content detectors, manage the UI frames life-cycle, and bridge requests to the extension context. From a framework perspective, the UI frame component is broken into two pieces. On the controller side, there is a portion to send and receive messages from the iframe window and handle life-cycle events. These parts will be reflected on the iframe side of the library to handle the same activities but from the frame's perspective. Using messages, we can communicate between windows:

   
   // Receive messages from parent controller
   window.addEventListener( 'message', ... )

   // Send messages to parent controller
   // This is inside an abstraction layer to normalize
   // identifying each iframe to ensure proper message
   // routing
   function notify( topic, data ) {
      var win = this.$iframe[0].contentWindow,
          message = JSON.stringify({ target: this.cid, topic: topic, data: data });

      win.postMessage( message, this.$iframe[0].src );
   };

   // Send message to extension context for processing
   function request( topic, data ){
      chrome.runtime.sendMessage( { topic: topic, data: data }, function( response ) {
         ...
      });
   };


This API can only pass strings between the windows so all the data needs to be serialized and deserialized when working with the method and events. I chose a similar strategy to structuring the message that I used in the extension context to keep things consistent. The library can wrap this logic in a way to allow passing basic Javascript object hashes between the main window and its iframe children. The messaging will broadcast to all the children frames so its necessary to identify which one should respond to the event. Since the main window side of the iframe view will be an object instance representing the iframe, it can identify itself and creatively set the URL on the iframe so it can know its identity and respond accordingly. The cid above accomplishes this along with the src in the postMessage call. Between the two mechanisms, you can ensure secure communication is maintained between the two windows.

UI Frame

The iframe component enables isolating styling from the site's content. Since its own window as well, you can run anything inside it to create the view displayed to the user. If you want to build something fancy, you can use Backbone, Angular, or any other framework you'd like. Minimally, you will want to add a little logic to help wrap the messaging between the main window and the iframe window to ensure consistent communication:

   
   // Receive messages from parent controller
   window.addEventListener( 'message', ... )

   // Send messages to parent controller
   function notify( topic, data ) {
      window.parent.postMessage( JSON.stringify({ source: window.location.href, topic: topic, data: data }), '*' );
   }


The only other issue to be aware of when running code in the iframe is working around cross-domain security policies. When I tried to render a Backbone view in an iframe I created, I had issues since the Underscore micro-template uses eval to inject data into the compiled template and render the view. To enable this feature, you have to add this line to your manifest file:

   "content_security_policy": "script-src 'self' 'unsafe-eval'; object-src 'self'"


Monitor/Detect

The final piece of the puzzle that may not always be important to every plugin is watching the target page for interesting changes. Since most sites dynamically load and generate content, its not good enough to wait for the page to load and check it for certain content. For instance, if you're creating an extension to perform actions with images found on a page, those images may only load as the user scrolls the page. If its a single page app, the page loads once and everything else renders inside that page dynamically. Instead of binding to the loaded event, you have to bind to a mutation event. However, these events fire often and are prone to crash if not used wisely. It was fortuitous that Addy Osmani wrote an article about DOM mutation observers because before I considered that approach, I was manually binding/unbinding and throttling the event:

function monitorChange() {
   $( document.body ).bind( 'DOMSubtreeModified', detectContent);
}

function unmonitorChange() {
   $( document.body ).unbind( 'DOMSubtreeModified', detectContent);
}

var detectContent= _.throttle(
      function() {

          var realChanges = 0;

          unmonitorChange();

          console.log( 'tree changed' );
          ...
          /* Find changes and do something, which may modify the DOM */
          ...

          monitorChange();
      },
      500,
      { trailing: false }
   );

detectContent();



Using the observer avoids both those solutions and even provides detail about the changes made. I'd still like to abstract this part slightly to provide selectors that should trigger different actions if that content is among the changes:
Detect.monitor({
   'insert img': function() {
      ...
   },

   'remove img': function() {
      ...
   }
});
Integrating that back into the page context's controller enables it to act on changes of interest and perform an appropriate action. Since the mutation observer API doesn't provide a robust query selector on exactly what to observe, this layer can provide that capability and dispatch an appropraite subset of targeted events.

Next Steps

So far I've only made a broad outline of the types of components I'd like to have when building a browser extension. Now its time to flush those pieces out so something useful can be built with them. I'm still a little early in my research and will definitely refine these concepts a bit. But after cobbling together a simple plugin, these were the main themes I saw emerging in my work.