While It’s Compiling: Skills Matter interviews Rob Witoff

For an insight into this week’s Big Data eXchange at Skills Matter, we caught up with keynote speaker Rob Witoff. Until recently the first ever Data Scientist at NASA, Witoff recently took on a new position as Head of Data Science at Coinbase in San Francisco, the biggest Bitcoin company in the world. Passionate about finding new ways of looking at the world through the data generated today, Rob Witoff told us about working with technology world-leaders at NASA, the story-telling abilities that are vital for his job, and his excitement about being part of today’s fast-moving Bitcoin space.

Rob Witoff

Some might feel overwhelmed by the amounts of data that you have to analyse and understand for your job. What is it you love about it?

What I love about this job is that there’s a surplus of information that has been collected all around the world, and I think in that information is a lot of truth. There’s a lot that you can empirically understand and take a look at, sometimes for the first time, by being the first or helping other people to be the first to explore all this information that we’ve collected. So I love the ability to find new insights and new ways to look at the world around us.

Can you list the obvious reasons why mining insight out of the data we produce is so important?

It’s important for a few reasons. Initially, because we want to be able to learn from the mistakes and the exploration we’ve done in the past…And the data that we’ve collected comes in many forms. A lot of information that was previously meaningless for us, we can now take a look at again with more modern tools and we can learn something from what we’ve already uncovered. It’s really important in my job today at a startup because we are building a tool for consumers, and having millions of them, its hard for any one of us to really know what’s happening out there. So by taking a look at the data and empowering all the experts here to quantify their assertions, lets us run more effectively as a company.

When did the importance of bridging the gap between data and insight become clear? When did people start understanding its value?

I think that value is found individually. There’s nothing new on the surface about analysing the data we’ve been collecting. I don’t think there’s a professional business or person who hasn’t been actively looking at some part of their data for the last several decades. I think what’s new today is when people can see information they’ve already looked at be presented in a new way. That usually has some visualisation element to it, but it often is merging multiple data sets that are personally related to either you or your career or your job or your interests. And when you see those new connections start to form and say “Now that I’ve learned that from my data, I can really change the way I’m doing something or I can work more efficiently”, that’s when it starts to hit. I think that journey typically will start with some kind of data visualisation that you can immediately connect with.

What are the skills a good data scientists needs?

You need to be creative, you need to have a wide breadth of technical skills, and you need to be good at telling a story. It’s not just the data that we’re looking at – we’ve been looking at data for a long time – but its your ability to look at that in a new way, combine it with new insights and help convey it to someone in a way that they’ve never seen before.

As the first IT data scientist ever to be hired at NASA, why did they hire you and what did you do there?

The IT data science programme started when our chief technology officer at the time, Tom Soderstrom, was taking a look at what’s new in technology, what’s coming next. He noticed this trend of a lot of information that we had been collecting – and I think that NASA has one of the most interesting treasure troves of data anywhere in the world – and he then saw this new influx of technology that’s coming from startups, from an open source community and from other companies that are being built solely around data today. And he realised that there’s this opportunity to take the data that we already have, take these tools that are becoming available and bring them together to provide a new service to all of the experts around NASA. Walking around (at NASA), you come to appreciate that almost everyone you meet is a world leader, an expert, or far skilled in something that is hard for anyone outside of their domain to really appreciate. And so when we were able to provide more data, more interesting data, different kinds of data to these experts, along with the tools to really drive insights from it, there was a tremendous opportunity and I really think we provided a lot of value. It wasn’t about one individual problem or one individual technology, it was about bridging that gap between all the data we had and all the expertise that we had inside the organisation.

Why did you make the switch to Coinbase after less than 2 years?

I’ve always been quick to point out that working at NASA was absolutely a dream job for me. The ability to work around such expertise on such large problems but still to be able to work with a team that could really have an impact there, was absolutely phenomenal and I’m really proud of everything we did there. But looking at my background before I joined NASA, I was in startups. I had started a company in Silicon Valley and was really interested in what we could create from scratch. We brought a lot of that startup feel into NASA, really valuing the ability to work fast and work with agility. And in this space, I started to see a new trend of entire industries being built upon the data that was now being shared with the world. And I think Bitcoin was one example of one entire ecosystem that’s really built upon an open data system, and so with my background in startups, being able to move towards an entire industry that’s being built around open data and working with a fast paced Silicon Valley team (at Coinbase), let me work in a more fast paced environment, in a completely brand new field. So its still to be determined where the Bitcoin space directs itself, but I’m excited to be a part of that direction.

While It’s Compiling is a continuing series of interviews with experts across a range of bleeding-edge technologies and practices, exclusive to Skills Matter. Be sure to subscribe to this blog for future interviews, or follow us on Twitter.

Find out who we’ll be interviewing next, and get a chance to put your questions forward with the hashtag #whileitscompiling.


Five top Javascript Skillscasts from the Skills Matter archives


Gary Short: End to End Javascript (login required)

Gary Short, End to end Javascript

From the 2012 Progressive .NET Tutorials in London Gary Short, Technical Evangelist for Developer Express, looks at how Javascript as a language was beginning to infiltrate all three layers of application development – from JQuery in the front end, through NodeJS in the middle tier, to Map/Reduce functions on back end databases like CouchDB.

Gary looks at some of the gotcha’s within the language before going on to build an exemplar, 3-tier application in Javascript. By the end of the session he demonstrates a much better appreciation for the language and the power it has on all application tiers.

In the Brain of Carlos Ble: Behaviour Driven Rias with Javascript

Carlos Ble came along to Skills Matter a year ago for an In The Brain talk looking at a whole range of topics: why JavaScript is getting so popular, Executable Specifications with Cucumber for the RIA, the Passive View Pattern, event oriented programming and how to test-drive it, test driving promises, technical debt on purpose and the role of the “server side”.

Tamas Piros: An Introduction to AngularJS

In this talk organised by the Mean Stack user group, Tamas Piros came along to Skills Matter HQ with a presentation suitable for everyone (experienced and brand-new), with an introduction to AngularJS describing the basics such as data-binding, expressions and directives.

Jonathan Fielding: Building better experiences with Responsive Javascript

From the archives of another Mean Stack meetup, Jonathan Fielding goes beyond media queries showing what CSS can offer and moving into the realms of responsive Javascript.

Damjan Vujnovic: Backbone JS Unit tests

While most of us will agree that unit tests are an important aspect of development, actually implementing them seems to be a different story. Damjan takes you through your first steps with unit testing in Javascript in this mini workshop, hosted by the Backbone.js London user group.

FullStack – the Node and JavaScript Conference


This year sees us host our first ever FullStack Conference, bringing together the best in the worlds of NodeJS, Javascript and hackable electronics.

Join world leaders such as Douglas Crockford (creator of JSON), Cian O’Maidin (founder and CEO at nearForm) and Karolina Szczur (Smashing Magazine) for two days jam-packed with talks, demos, and coding.

Book your ticket now!


Build a Search Engine for Node.js Modules using Microservices (Part 2)

This is the second part of a three part guest post by Richard Rodger, CTO at nearForm & a leading Node.js specialist. Richard will be speaking at FullStack, the Node and JavaScript Conference on 23 October.

Following on from last week’s post in which Richard took us through his experience of building a search engine using microservices, here you will go through the steps to test your microservices and deploy them, in this example using Docker.

The final part will be available next week, after Richard’s In The Brain talk on the 22 September, building a Node.js search engine using the microservice architecture.

Testing the Microservice

Now that you have some code, you need to test it. Actually, that’s where you should have started. No Test-Driven Development brownie points for you!

Testing microservices is nice & simple. You just confirm that message patterns generate the results you want. There’s no complicated call flows or object hierarchy tear-ups or dependency injections or any of that nonsense.

The test code uses the mocha test framework. To execute the tests, run

$ npm test

Do this in the project folder. You may need to run npm install to get the module dependencies if you have not done so already.

First you need to create a Seneca instance to execute the actions you want to test:

  var si = seneca({log:'silent'})

Create a new Seneca instance and store it in the variable si. This will be used for multiple tests. This code keeps logging to a minimum ({log:'silent'}), so that you get clean test result output. If you want to see all the logs when you run a test, try this:

$ SENECA_LOG=all npm test

For more details on Seneca logging, including filtering, read the logging documentation here.

The next line tells Seneca to load the seneca-jsonfile-store plugin. This is a Seneca plugin that provides a data store that saves data entities to disk as JSON files. The plugin provides implementations of the data message patterns, such as role:entity,cmd:save, and so forth. The folder option specifies the folder path where the JSON documents should be stored on disk.

Finally, load your own plugin, npm.js, that defines the messages of the nodezoo-npm microservice. It may sound fancy having “plugins”, but all they really are, are just collections of message patterns and a way to make logging and option handling a bit more organised. If you want to know all the details there’s some Seneca plugin documentation.

Let’s test each message pattern. Here’s the role:npm,cmd:extract pattern:

  it('extract', function( fin ) {



The seneca.act method submits a message into the Seneca system for processing. If a microservice inside the current Node.js process matches the message pattern, then it gets the message and performs its action. Otherwise the message is sent out onto the network – maybe some other microservice knows how to deal with it.

In this case everything is local, as you would wish for a unit test. The variable test_data contains a test JSON document. The test case verifies that the expected properties are extracted correctly.

The seneca.act method lets you specify the message contents in more than one way as a convenience. You can submit an object directly or you can use a string of abbreviated JSON or both. The following lines are all equivalent:

si.act( { role: 'npm', cmd:'extract', data:test_data}, ... )
si.act( { 'role:npm,cmd:extract, data:{...}', ... )
si.act( { 'role:npm,cmd:extract', {data:test_data}, ... )

Here’s the test case for the role:npm,cmd:query action. It’s pretty much the same:

  it('query', function( fin ) {


Finally you need to test the role:npm,cmd:get action. In this case, you want to delete any old data first to ensure a test that triggers the role:npm,cmd:query action. This is the important use-case to cover when a visitor is looking for information on a module you haven’t retrieved from npmjs.org yet.

  it('get', function( fin ) {


      if( out ) {
      else do_get()

      function do_get() {

The do_get function is just a convenience to handle the callbacks.

Running Microservices

In production you’ll run microservices using a deployment system such as Docker or tools that build on Docker, or one microservice per instance or some other variant. In any case you’ll definitely automate your deployment. We’ll talk more about deployment in a later part of this series.

On your local development machine it’s a different story. Sometimes you just want to run a single micro-instance in isolation. This lets you test message behaviour and debug without complications.

For the nodezoo-npm service there’s a script in the srvs folder of the repository that does this for you: npm-dev.js. Here’s the code:

var seneca = require('seneca')()

When you run microservices it’s good to separate the business logic from the infrastructure set up. This tiny script runs the microservice in a simplistic mode. It’s very similar to the test set up code except that it includes a call to the seneca.listen() method.

The listen method exposes the microservice on the network. You can send any of the role:npm messages to this web server and get a response. Try it now! Start the microservice in one terminal:

$ node srvs/npm-dev.js

And issue a HTTP request using curl in another terminal:

$ curl -d '{"role":"npm","cmd":"get","name":"underscore"}' http://localhost:9001/act

How does the micro-server know to listen on port 9001? And how does the seneca-jsonfile-store know where to save its data? Seneca looks for a file called seneca.options.js in the current folder. This file can define options for plugins. Here’s what it looks like:

module.exports = {
  'jsonfile-store': {
    folder: __dirname+'/data'
  transport: {
    web: {
      port: 9001

If you want to see everything the microservice is doing, you can run it with all logging enabled. Be prepared, it’s a lot.

$ node srvs/npm-dev.js --seneca.log.all

Although the default message transport is HTTP, there are many others such as direct TCP, redis publish/subscribe, message queues, and others. To find out more about message transportation see the seneca-transport plugin.

There’s also an npm-prod.js script for running in production. We’ll talk about that in a later part of this series. The srvs folder for this microservice only contains two service scripts. You can write more to cover different deployment scenarios. You might decide to use a proper database instead. You might decide to use a message queue for message transport. Write a script for each case as you need it. And there’s nothing special about the srvs folder either. Microservices can run anywhere.

Next Time

In part three, you’ll build microservices to query Github, to aggregate the module information, and to display it via a website. If you’re feeling brave go ahead and play with the code for these services, it’s already up on Github!

Don’t forget, Richard will be speaking at Skills Matter HQ in London next week, covering the topics in these posts. Register for your free place here! The concluding part of this post will be published here next week, be sure to come back then!

FullStack: the Node and Javascript Conference


Join Richard Rodger, along with many other world-leading experts and hundreds of Javascript and Node enthusiasts at our first ever FullStack Conference. Come along to experience two days jam-packed with talks, demos, and coding!

This Week at Skills Matter: 15 – 19 September

In the Brain of Allan Kelly


We’re delighted to welcome Allan Kelly tonight, as he joins us for his In The Brain talk exploring whether Agile practices can work outside of software development. Allan is a founding consultant with Software Strategy who work with companies to implement and deepen Agile methods. Tonight he will look at examples of Agile practices beyond software to see what lessons we can garner for knowledge workers in general.


Swift London return for their September talks covering Core Data, UX, UI CollectionView, APIs and Swift pedagogy. Featuring five speakers this week, these talks will cover a wide variety of topics for anyone interested in the Swift language – from all levels, beginner to expert. Register your spot now to join this friendly and very active group!

Join software developer and architect Goswin Rothenthal at this weeks F#unctional Londoners meetup, as he explores F# scripting for 3D geometry in the construction industry. Goswin’s custom software models are capable of integrating the entire design process from the initial geometry setout all the way down to the 2D information or mashinecode required for manufacturing. He recently used this process at Waagner Biro for the cladding of Louvre Abu Dhabi and before on the façades of COD-Macao and CityLife-Milano Towers at Zaha Hadid.


Join an adventure in F1 data, hear about the state of the Pentaho market and create Pentaho Sparks Apps with the Pentaho London user group on Thursday, as they’re joined by three exciting and experienced speakers – Nelson Sousa, Dan Keeley and Diethard Steiner.

Also on Thursday, the London Software Craftsmanship Community look at how to name things, with a solution to the hardest problem in programming by Peter Hilton. Adam Kosiński will then present an Swiss army knife for readable tests, in a talk about ideas and patterns to produce clean, readable and developer-friendly tests.

Build a Search Engine for Node.js Modules using Microservices (Part 1)

Richard Rodger

This is a guest post from Richard Rodger, CTO at nearForm & a leading Node.js specialist. He is a technology entrepreneur and wrote the seminal book on Node: Beginning Mobile Application Development in the Cloud. This is the first of a three-part blog, with the second to follow next week and the last post after his In The Brain talk at Skills Matter on the 22 September, building a Node.js search engine using the microservice architecture.

Richard will also be speaking at FullStack, the Node and JavaScript Conference on 23 October.

In this posts Richard will take us through his experience of building a search engine using microservices and how using a microservice architecture can be used to isolate features and create separate components that can be independently developed, deployed and maintained. Part one will introduce you to microservices and start you building your components and modules. Next week, we’ll cover testing the microservice!

Finding the right Node.js module is hard. With over 90,000 modules (and counting!) it can be tricky to pick the right one. You’ll often come across multiple modules that solve your problem. They are often incomplete, abandoned or just badly documented. It’s hard to make the right choice and you have to rely on rules of thumb. When was the last commit? How many stars? How credible is the author?

It would be great if there was a search engine for Node.js modules that ranked them using these heuristics. I built one in late 2012 that tried to do just that. It was a miserable failure! The quality of the results was pretty poor – see for yourself at nodezoo.com. Search engines are harder than they look, let me tell you. Recently I decided to try again. This time, I wanted to use a microservice architecture. This lets me iterate many aspects of the engine, so that it can get better over time.

This blog post is the first in a series covering nodezoo version 2.0. I’ll also be talking about this at my upcoming In The Brain talk on September 22nd. In this blog post you’ll see how a microservice architecture can be used to isolate the features of the search engine into separate components that can be independently developed, deployed, and maintained – developer heaven!

Let’s focus on the information page for each module. After you get your search results, you should be able to click-through to a page that shows you all the useful meta data about a module, such as the number of Github stars, so that you can compare it with other modules. So today we’ll build out one of the services needed to collect that information and display it.

Why Microservices?

The search engine will use a microservice architecture. Let’s take a closer look at what that means. Microservices are small, independently running processes. Instead of a single large, monolithic application running all your functionality, you break it apart into many separate pieces that communicate with each other over the network.

There are quite a few informal “definitions” of microservices out there:

…developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API
James Lewis & Martin Fowler

…a system that is built up of small, lightweight services, where each performs a single function. The services are arranged in independently deployable groups and communicate with each other via a well defined interface and work together to provide business value.
David Morgantini

…a simple application that sits around the 10-100 lines-of-code mark
James Hughes

And even a “ManifestNO”
Russell Miles

Of course, I have my own definition! It’s really the effect on humans that matters most. Microservices are just more fun. I prefer:

An independent component that can be rewritten in one week.

This gets to the heart of microservices. They are disposable code. You can throw away what you have and start again with very little impact. That makes it safe to experiment, safe to build up technical debt and safe to refactor.

What about all that network traffic, won’t it slow things down to a crawl? In practice, no. We’ve not seen this in the production systems that we have built. It comes down to two factors. First, inter-service communication grows logarithmically, not exponentially. Most services don’t need to talk to most other services.

Second, responding to any message has a relatively shallow dependency-based traversal of the service network. Huh? Microservices can work in parallel. Mostly they don’t need to wait for each other. Even when they do, there’s almost never a big traffic jam of services waiting in a chain.

In fact a microservice architecture has a very nice characteristic. Out-of-the-box you get capacity to handle lots and lots of users. You can scale easily by duplicating services which by design have no shared state. Yes, it does make each individual request a little slower. How bad! In contrast, traditional monolithic approaches give you the opposite: better performance for one user, terrible performance for many. Throughput is what counts if you want to have lots of users and microservices make a very cheap trade-off in terms of pure performance. Here’s the obligatory chart showing how this works. Hint: you want to be in the green box. The red one gets you fired.


One more thing. It’s easy to accuse microservices of falling afoul of the Fallacies of Distributed Computing. Essentially pretending that the network is just like making a local function call. That’s precisely wrong. Microservices embrace the network. You are naturally led to deal with the effects of the network when you use microservices because you can’t avoid them. You have to assume that every time you talk to another component in the system you will incur network traffic. It enforces a productive discipline and clarity of mind. Architecture astronauts are quickly and severely punished.

The Search Engine

You won’t build a search engine in one blog post. Let’s look at the module information page. This collects data from multiple sources and displays it all together. Let’s take a bias towards the Availability aspect of the CAP Theorem. Whatever else happens, we want to display a web page to the user.

We’ll keep things simple. Each Node.js module has description data stored by the NPM registry. Let’s get a subset of that information and display it. Most Node.js modules also have a Github repository. It’s useful to know how many stars and forks a module has as that gives some measure of how good it is. Down the road, we might pull information from other sources, like the module author’s blog or a Google search.

So we have a set of information sources, mostly external, certainly unreliable, and definitely slow! Whatever else happens we’ve decided that we’ll always display the information page even if we have no information or only some of it. In microservice terms, you publish a message looking for information and you display whatever you get back within a certain time-frame (or some other cut-off condition). This scatter-gather pattern is explained in detail by Fred George in this video.

This acceptance of faulty behavior, this anticipation that hard choices will have to be made and results will be imperfect, is a key part of the microservice philosophy. We aim to reduce uncertainty and the frequency of failure but we never seek to eliminate it. Solutions which cannot tolerate component failure are inherently fragile. We aim to be anti-fragile.

The Architecture

There needs to be a microservice for the information page and then one for each data source. For the moment, that means one for NPM and one for Github. And we’ll also need a web server! Let’s use KrakenJS because it’s pretty cool and has easy server-side templates.

Let’s call the information page microservice nodezoo-info, the data source ones nodezoo-npm & nodezoo-github, and the KrakenJS process can be nodezoo-web. That gives you four Node.js processes to run at a minimum, or not – you’ll see in a later part of this series how to reduce this to one process when developing and debugging locally.

Now, the nodezoo-web service delivers the information page. Let’s use the URL path /info/MODULE. The controller that builds that page should ask the nodezoo-info service for the details of the module. The nodezoo-info service should publish a message to any interested services that care to provide information about modules. The nodezoo-npm and nodezoo-github services should listen for that message and respond if they can.

Sounds nice and neat, but there’s a complication. Isn’t there always! You need the Github repository URL to query Github. That information is in NPM. And yet, the Github service is not allowed to know about the NPM service. For the moment let’s assume we can have Schrödinger services that both do and do not know about other services. That allows the Github service to ask the NPM service for the repository URL. To be continued…

This is so simple that you don’t even need an architecture diagram to understand it. What? It would be “unprofessional” not to provide one you say? But microservice deployments are necessarily dynamic and fluid, changing minute-by-minute to meet the demands of user load and new business requirements. The whole point is that there is no Architecture (capital A). The best mental model is to suspend disbelief and assume all services can see all messages, ignoring those they don’t care about. Message transport is just an optimization problem.

Oh, all right then.


The lovely diagram shows you something else. Each microservice owns its own data. There is no shared database. What about consistency you say? Well think about our use case. Node.js modules are not updated that frequently. We can publish messages when they are, and any services that need to update, can do so at their own pace. But module data will be out-of-date and inconsistent in some cases! Well, shucks. That’s by design. We are not going for 100% here. We’re going for fault tolerance.

Think about the error level you find acceptable, and put remedial actions in place to achieve that. Perhaps you publish the update message multiple times over the following 24 hours. It’s wasteful of resources, but covers a lot of cases where services might be down temporarily. Perhaps you have a cross-check service that randomly samples the services to see how consistent they are. Perhaps you have a set of data services and you outsource the problem. This is the style of thinking you need to adopt. Assume things will break.

Now there are many use-cases where this cavalier approach to data is not going to fly. But there are many others where it is perfectly fine. Microservices make it easy to avoid over-engineering when you don’t have to. They can all talk to the same ACID-compliant Enterprise Database Solution if you really care that much. But then you have to install and maintain the damn thing. Expensive.

The Code

Let’s look at the NPM service first as that’s pretty simple. The code in this article uses the Seneca microservice framework. That means it’s concise, you can focus on the business logic, and you don’t need to worry about message transportation. Some may balk at the idea of using a framework for microservices. After three years of building systems this way, it makes sense for us to put our collected experience into a reusable package. But you don’t need permission to go bare bones if that’s what you want to do. The basic principles are exactly the same.

The NPM service needs to do three things:

  • return data about a module, if it has that data;
  • query the NPM registry for data about a module;
  • extract relevant data from the large NPM JSON document returned for each module.

Seneca defines microservices as collections of one or more message patterns. If a message matches the pattern, then the microservice executes it. Messages are considered to be plain old JSON documents. The patterns are simple name and value pairs. If the JSON document has matching top-level properties, then it’s a match. More specific matches win. That is, a pattern with more name and value pairs beats a pattern with fewer. And that’s it.

The pattern matching is deliberately simple so that you can simulate it inside your own head. There is no fancy stuff like regular expressions or dot-notation for sub-properties. The Principle of Least Power is very – well – powerful. If you need to get fancy, match as much as you can and do the rest in code.

As noted above, all the code for this article is open-sourced in Github. It’s available under the MIT license, so you are free to re-use for your own projects, open-source or commercial.

Here’s the message action that extracts data from the NPM JSON document.

// excerpt from: nodezoo-npm/npm.js

     data: { required$:true, object$:true },

 function cmd_extract( args, done ) {
   var seneca  = this

   var data       = args.data
   var dist_tags  = data['dist-tags'] || {}
   var latest     = ((data.versions||{})[dist_tags.latest]) || {}
   var repository = latest.repository || {}

   var out = {
     name:    data._id,
     version: dist_tags.latest,
     giturl:  repository.url


The message action pattern is role:npm, cmd:extract. This is completely arbitrary. The role property namespaces the pattern, and the cmd property names what it does. Not all patterns are commands, some are just announcements for general interest, as you’ll see in later parts.

The implementation of the pattern, the action that it triggers, is placed in the cmd_extract function. This is just good code organization. It also means you can list all the patterns at the top of the source code file, which gives you a little bit of built-in documentation.

The seneca.add method adds a new message pattern. You supply the pattern and its implementing function. Optionally you can specify the arguments you need. These are additional properties that should be present in the message JSON document. In this case, a data property must be present, and it’s value must be an object. That is, the data property should contain the the NPM JSON result. For more details of the argument validation format, see the parambulator Node.js module.

The action implementation follows the standard Node.js callback pattern:

function some_action( args, done ) {
  // context-specific seneca instance – gives you nice logs!
  var seneca = this


  // there is no error, so the first argument is null

The args object is the message document. This what you act on. The done function is the callback to call when you are finished. If there was an error, you should pass an Error object as the first argument to done. Otherwise, pass a null, and then your result as the second. Returning a result is not required. Some messages are fire-and-forget, and there is no result, as such. Always call done though, even if it’s just done(); as you need to signal that you’re finished processing.

The extraction logic is nothing special, it just picks values out of an object data structure. To see the full structure, try this for example.

Data Storage

Microservices do need to store data occasionally. If you want to be puritanical, then data storage should also be a microservice! Data operations are just more messages. A load message maybe, a save message, a remove message, and so forth. Purity is overrated. Seneca does do things this way, but it provides a nice and easy ActiveRecord-style interface for data entities, so you don’t have to worry about it.

This gives you all sorts of lovely side-benefits. Choose this database! No, that one! Change database in mid-project. Who cares! Need to cache data in redis or memcached? Intercept the messages! No need for the business logic code to know anything about it. Need special case validation or data munging? Intercept and do what you want!

I’ll stop now.

Here’s what the API looks like:

  • Create a new data entity: var color = seneca.make$(“color”)
  • Save some data: color.name = “red”; color.save$( callback )
  • Load some data: color.load$( {name:”red”}, callback )

For more details, see the Seneca Data Entities documentation.

For the moment, you’ll store all data as plain text JSON files on disk. Yes, plain old JSON. On disk. No database needed. Later on, in production, we can worry about LevelDB or Mongo or Postgres (no code changes required! I promise). For now, let’s make local development and debugging super easy.

OK, let’s respond to a request for information about a Node.js module:

      name:   { required$:true, string$:true },

  function cmd_get( args, done ) {
    var seneca  = this
    var npm_ent = seneca.make$('npm')

    var npm_name = args.name

    npm_ent.load$( npm_name, function(err,npm){
      if( err ) return done(err);

      // npm is not null! there was data! return it!
      if( npm ) {
        return done(null,npm);

      // query the npmjs registry as we
      // don't know about this module yet
      else {

This implements the role:npm,cmd:get pattern. You need to provide a name property in your message. Speaking of which, here’s an example message that will work:

  "role": "npm",
  "cmd":  "get",
  "name": "express"

The code attempts to load a module description with the given name. The npm entity stores this data. If it is found, well then just return it! Job done. The caller gets back a JSON document with data fields as per the role:npm,cmd:extract action.

If the module was not found in the data store, then you need to query the NPM registry. Submit a new message with pattern role:npm,cmd:query. That’s that action that does that job. Return whatever it returns.

Finally, let’s look at the query action. This needs to make a HTTP call out the the registry.npmjs.org site to pull down the module JSON description. The best module to do this is request.

      name: { required$:true, string$:true },

  function cmd_query( args, done ) {
    var seneca  = this
    var npm_ent = seneca.make$('npm')

    var npm_name = args.name

    var url = options.registry+npm_name
    request.get( url, function(err,res,body){
      if(err) return done(err);

      var data = JSON.parse(body)

        if(err) return done(err)

        npm_ent.load$(npm_name, function(err,npm){
          if( err ) return done(err);

          // already have this module
          // update it with .data$ and then save with .save$
          if( npm ) {
            return npm.data$(data).save$(done);

          // a new module! manually set the id, and then
          // make a new unsaved entity with .make$,
          // set the data and save
          else {
            data.id$ = npm_name


This action constructs the query URL and performs a HTTP GET against it. The resulting document is then parsed by the role:npm,cmd:extract action to pull out the properties of interest and merged into any existing data on that module, or else saved as a new data entity. The data.id$ = npm_name line is where you manually set the identifier you want to use for a new data entity.

And that’s all this microservice does. 112 lines of code. Not bad.

Check back next week for part two, when we’ll cover testing the microservice. Can’t wait until then? Let us know your thoughts, ideas and experiences with microservices below, or give us a shout on Twitter!