Wednesday, October 15, 2014

MDX: Converting Second of Day to Standard Time Notation

Had a bit of fun with Pentaho Analyzer recently. In the release of Pentaho 5.2, we have introduced the ability to define filters across a range of time, which is really handy when your dataset is millions of records per second.

My use case included keying our time dimension on seconds per day, which results in 86,400 (60 seconds * 60 minutes * 24 hours) unique records; one to represent each unique second in a day. While this is great for simplifying query predicates, it does not help the usability or intuitiveness of the analysis report you are presenting to the user. For instance, who would intuitively understand that 56725 represents 15:45:25 in time?

So I came up with this user-defined calculation that will convert seconds in a day to standard time notation. Would love to hear from anyone who can optimize this:)  This is a valid MDX calculation that Mondrian will process. Since I needed to know the minimum and maximum second per hour in the display, I used the second of day number as a measure.

Format(Int([event scnd of day min]/3600), "00:") || 
  Format(Int(([event scnd of day min] - 
        (Int([event scnd of day min]/3600))*3600)/60), "00:") ||
    Format([event scnd of day min] - 
          ((Int([event scnd of day min]/3600)*3600) + 
          (Int(([event scnd of day min] - 
          (Int([event scnd of day min]/3600)*3600))/60)*60)), "00")

Here's what it looks like in Analyzer.  The columns Minimum & Maximum Second of Hour have the calculation applied to them. Note the time filter range in the filter panel. Super sweet.

Sunday, September 28, 2014

Hello Docker.

Docker. Hmmmm. I really want to love it. Everybody else loves it, so I should right? I think maybe some of the "shiny" isn't so bright using DOCKER ON MY MAC.  Although, Chris J. over at Viget wrote this blogpost that singularly walked me through each Mac-Docker gotcha with zero pain.   Total stand-up guy, IMHO.

If you are not familiar, Docker plays favorites to Linux based operating systems, and requires a virtual machine wrapper called boot2docker in order to run on a Mac or Windows OS. Not a huge hurdle, but definitely feels heavier and a bit more maintenance intensive ... two of the core pain points in traditional virtual environment deployments that Docker proposes to alleviate.

Beyond that silliness, there is a whole lot more *nix based scripting than I expected. Somehow I thought the Dockerfile language would be richer, accommodating more decision-based caching. You know, something like cache this command but not this one.  As I looked around and read a few comments from the Docker enthusiasts and Docker folks-proper, it seems there is a great desire to keep the Dockerfile and it's DSL ... well ... simple. Limited? Is that a matter of perspective? I can appreciate simple I guess, but I still want to do hard stuff ... and thus I am pushed to the *nix script environment. This may just be a matter of stuffing myself into these new Docker jeans and waiting for them to stretch for comfort:)

One blessed moment of triumph I would like to share: I was able to write a Dockerfile that would accommodate pulling source from a private Github repository using SSH. This is NOT a difficult Docker exercise. This is a persnickety SSH exercise:) The Docker container needs to register the private SSH key that will pair with the public key that you have registered at Github. At least that is the approach I took. Please do let me know if there are easier / better / more secure alternatives.

So, the solution. The first few steps, I'm going to assume you know how to do, or can find guidance. They are not related to the container setup.

I'm going to tell you right up front that my solution does have a weakness (requirement?) that may not be altogether comfortable, and Github downright poo-poos it. In order to get the container to load without human intervention, you need to leave off the passphrase when you generate your SSH keys (Gretchen ducks.).  I planned to revisit this thorn, but just simply ran out of time. Would love to hear alternatives to this small snafu. Anyway, if you're still in the game,  then read on...

Here are the steps you should follow to get this container up and running.

  1. Generate a pair of SSH keys for Github, and register your public key at
  2. Create a folder for your Docker project.
  3. Place your private SSH key file (id_rsa) in your Docker project folder.
  4. Create your Dockerfile, following the example below.
  5. Build your image, and run your container.
  6. Profit:)

The Dockerfile

FROM gmoran/my-env
MAINTAINER Gretchen Moran

RUN mkdir -p /root/.ssh

# Add this file ... this should be your private GitHub key ...
ADD id_rsa /root/.ssh/id_rsa

RUN touch /root/.ssh/known_hosts
RUN sudo ssh-keyscan -t rsa -p 22 >> /root/.ssh/known_hosts

Running as root User

I am referencing the root user for this example, since that is the default user that Docker will use when you run the container. If you would like a bit more protection, you can create a user, and run the container with that user with the following command ...

USER pentaho

I created the 'pentaho' user as part of a Dockerfile used in the base image gmoran/my-env. IMPORTANT: Note that gmoran/my-env also downloads the OpenSSH daemon and starts is as part of the CMD Dockerfile command.

Adding the id_rsa File

The id_rsa file is the private SSH key generated as part of the first step in this process. You can find it in the directory you specified on creation, or in your ~/.ssh directory.

There are a number of ways to add this key to the container. I chose the simplest ... copy it to the container user's ~/.ssh directory. OpenSSH will look for this key first when attempting to authenticate our Github request.

Adding to the known_hosts File

We add the SSH key to the known_hosts file to avoid the nasty warning and prompt for this addition at runtime.

In my thrashing on this, I did find several posts in the ether that recommended  disabling StrictHostChecking, which hypothetically produces the same end result as manufacturing/mod'ing the known_hosts file. This could however leave this poor container vulnerable, so I chose the known_hosts route.

At the End of the Day ...

So at the end of the day, when I thought I would be honing my Docker skills, I actually came away a with a stronger set of Unix scripting skills. Good for me all in all.  I am excited about what Docker will become, and I do find the cache to be enough sugar to keep me drinking the Docker kool-aid.

I should say I appreciate not actually having to struggle with Docker. It is a nice, easy, straight-forward tool with very few surprises (we won't talk about CMD versus ENTRYPOINT). Any time-consuming tasks in this adventure were directly related to my very intentional avoidance of shell scripting, which I now probably have a tiny bit more appreciation for as well.

In the words of the guy I like the most today, Chris Jones ... Good Guy Docker :) 

Tuesday, April 15, 2014

Pentaho Analytics with MongoDB

I love technology partnerships. They make our lives as technologists easier by introducing the cross sections of functionality that lie just under the surface of the products, easily missed by the casual observer. When companies partner to bring whole solutions to the market, ideally consumers get more power, less maintenance, better support and lower TCO.

Pentaho recognizes these benefits, and works hard to partner with technology companies that understand the value proposition of business analytics and big data. The folks over at MongoDB are rock stars with great vision in these spaces, so it was natural for Pentaho and MongoDB to partner up.

My colleague Bo Borland has written Pentaho Analytics with MongoDB,  a book that fast tracks the reader to all the goodness at your fingertips when partnering Pentaho Analytics and MongoDB for your analytics solutions.  He gets right to the point,  so be ready to roll up your sleeves and dig into the products right from page 1 (or nearly so).  This book is designed for technology ninjas that may have a bit of MongoDB and/or Pentaho background. In a nutshell, reading the book is a straight shot to trying out all of the integration points between the MongoDB database and the Pentaho suite of products.

You can get a copy of Pentaho Analytics with MongoDB here.  Also continue to visit the Pentaho wiki, as these products move fast.

Friday, March 07, 2014

Pentaho's Women in Tech: In Good Company

I was honored this week to be included in a a blog series that showcases just a few of the great women I work with, in celebration of International Women's Day on March 8.

Check out the series, I think you'll find the common theme in the interviews interesting and inspiring. Pass on the links if you have girls in your life that could be interested in pursuing technology as a career.