Saturday, March 31, 2012

Artificial intelligence has failed my inbox

Today we have artificial intelligence infused into many aspects of our technological life. Siri on the iPhone can understand our verbal commands. IVR - or Interactive Voice Response - is ubiquitous nowadays. Machine learning algorithms are applied to our financial history to tell banks how to maximize our debt. Marketeers use advanced algorithms to maximize the one-way flow of cash from our wallets to their companies. Evolutionary Computation algorithms are currently being applied to the mechanical design of airplane jet engines - maximizing the flow of compressed air through it multiple stages. And the applications grow daily.

The promise of A.I. is finally coming true. The cyber-augmented future, the intelligent machine, the avatar of many a stories and movies is a reality today. Vernor Vinge's Singularity is approaching, some would say (although Paul Allen thinks differently!).

Yet, our most pervasive interface into this world, this Metaverse (to steal from Neal Stephenson), this collective intelligence, is still a throw-back to the late 1960s - and it is far from being intelligent. The venerable email and its inseparable companion, the inbox, are still very passive and dumb pieces of software.

Today I watched a talk by Neal Stephenson titled "Getting big stuff done" in which he laments our current inability to tackle the big challenges that are present in our lives today. He describes, as only he can, the leaps in technological advance we took from the early 1900s until about 1968 and the significant underachievement in the latter part of the century. He goes on to challenge us to think big. It was a very interesting, if somewhat unnerving observation (we are afraid of big projects, moonshot type of projects, that is!).

But today, I wasn't thinking about big problems (almost never do!). Today, I was perplexed by a far more mundane problem.

My inbox (at work) has over 40,000 messages (this doesn't include my "sent" messages, which are part of the problem I am lamenting about). Sure, I have some folders to which messages matching simple rules are route to. But that is it. My email inbox knows nothing about my reading habits. It doesn't really know what is important or not. It doesn't know which messages can be safely ignored, which should result in an instant warning sent to me (think a text message delivered to your phone). It doesn't know which messages are related to a project. It doesn't know which messages are simply noise, office chatter; messages which are best destined for the bit bucket that is the trash bin. It doesn't know which messages contain information that should be stored for later recall (think about "how-to" type of messages from your colleagues or replies from a Microsoft tech about an obscure bug your organization encountered and which is likely to pop up again). My inbox, your inbox, most of our inboxes, are dumb and very passive recipients of information.

The A.I. revolution has, by and large, left the email inbox behind. It has failed the inbox.

A quick Google search yields many research papers on this subject. Yet, that research has not materialized into a viable email agent that learns from my email reading, organizational, and replying habits. We have applied A.I. to far more complex problems. Today we have companies dealing with Big Data, looking for relational patterns to combat terrorism. Text mining and document classification is no longer just a research topic. Many companies (!) deal with this problem as their main line of business. But our inboxes remain clogged with useless message, messages that flood us every minute and which are not worth the time spent to determine how to triage them.

It is time for our inboxes to grow some brain. I suggest that we have enough algorithm and computing power in our hands (literally, think dual-core iPhone and Andriod devices) to support a far more intelligent interaction with emails.

I would like my email program to...

  • Learn from my manual categorization of messages (supervised learning) and apply that knowledge as its level of accuracy increases.
  • Learn about the association of messages and recipients or senders. Learn to advice me when a certain recipient should be added to an email thread. Think of your email system reacting to a message sent to you with a question: "In the past, so and so has answered this question for you. Would you like to forward this message to so and so?"
  • Learn to triage my messages based on urgency, time specifications. Think along the lines of: "You have received 10 messages on this topic during the last two days and this one appears to specify a deadline"
  • Review my draft replies and fix my mistakes. And no, I don't mean spelling or grammatical errors. I mean, emotional "errors". Think along the lines "Last time you replied to this person with similar words, a "nasty" exchanged ensued"!!! Imagine how much time and aggravation this would save!
  • Answer simple questions (or create a draft). Imagine a question from a colleague, a question that you might have already answered on a previous exchange with someone else. Why would you have to manually reply? Why wouldn't your email agent/avatar draft a reply with the answer?

I know these are hard problems to solve. But I believe the current state of A.I. can provide reasonable answers to them. Learning algorithms, specially when supervised, can build excellent stimuli-response systems. If we can have cars drive themselves, airplanes land automatically, and our participation in the free market maximized by algorithms, then we must be capable of creating a smart email system.

Perhaps we still have "small stuff" to be done!

UPDATE: Well, maybe not so small when Y Combinator considers this is an investment opportunity!

Thursday, March 22, 2012

Importing Files Into MongoDB GridFS With Python

Importing files into MongoDB GridFS is a trivial task with Python. In this post I will illustrate the necessary steps to accomplish this. In subsequent entries I will describe efficient ways to query those files and associate them with other collections in a MongoDB database.

I assume you have familiarity with MongoDB and GridFS. In the absence of that I recommend you start with the GridFS documentation.


The MongoDB Python driver must be present. I installed it on a Mac OS X (Lion) box with easy_install. Please note that there are other installation options.

$ easy_install pymongo

With the driver installed, there are only two concepts to illustrate: reading a file from the file system -and- using the Python MongoDB driver to store it in GridFS.


To open a file for reading you can use the open(file, filemode) function which returns a file object.

file = open("my_file_name", 'r')  

Using the GridFS store in MongoDB is also fairly simple. The general steps are: (1) open a connection to the server, (2) get the target database, (3) initialize a GridFS object with the database reference, and (4) invoke the GridFS.put() function to store the file.

Step 1 - Connect to the server using the Python Mongo driver. This illustrates connecting to your local development instance on the default port used by MongoDB.

connection = pymongo.Connection( "localhost", 27017)

Step 2 - Obtain a reference to the database on which the file(s) will be stored using the GridFS API. Note that a MongoDB instance holds one or more databases, each with one or more collections.

db = connection.yourdatabase

Step 3 - Create a GridFS object using a reference to the database on which to store the file(s).

gridFs = gridfs.GridFS(db)

All that's left now is to invoke the "put" function to store the file. This function takes one or more keyword arguments which are used by the GridIn class to assign attributes to the stored file or to specify other storage characteristics. For more details see the PyMongo documentation.

file_id = gridFs.put(, filename="my_file_name")

In this case we have passed the "filename" keyword to let GridFS know that we want the file to be stored with this file name ("my_file_name").

The put functions returns the "_id" of the newly created file. This can be used to associate GridFS files with other collections objects.

Closing up

I used this approach to import a large number of TIFF files. With Python's simple and succinct syntax, along with the elegant PyMongo driver implementation, this task was accomplished with 30 lines of code, including error handling.

In subsequent posts I will describe how to associate files with existing documents in a different collection in an efficient manner.

Tuesday, March 20, 2012

Ruby meetup

Attended a meetup of the Miami Ruby Brigade last night. It has been a while since I've been to a developer's gathering. Two interesting observations.

First, this was a really interesting and smart bunch of people. Now, I say that's interesting because coming from the insular Microsoft Universe one has no idea of how much creativity and energy exists in other communities.

Second, this meetup was about Patterns, as in the Gang of Four (G.o.F) Patterns. And that is also interesting and surprising. The mid and late nineties were a Pattern-rich time for me and many developers in the Microsoft Universe. The ATL or Active Template Library used by many of us had interesting patterns (e.g., the Upside Down Inheritance). Similarly, MFC had its good share of them as well. But, about 10 years ago, Patterns and its vocabulary seem to have disappeared from the typical Microsoft developer's vernacular. Whether an artifact of the .Net/ASP.NET framework and the prescribed implementation "anti-patterns" advocated by pundits or caused by the effect of cosmic rays, I can't really explain this disappearing act. The fact is that aside from the venerable Singleton, I have not heard about Patterns for quite a while. Hence, it was fascinating to see a community of developers who not only cares about them, but embraces them.

I am looking forward to other meetup.

Tuesday, March 13, 2012

Debugging Ruby on Rails on Mac OS X

If you have installed the latest version of Ruby (1.9.3 p125 as of 3/13/2012) on Mac OS X and need to debug a Rails application you will find little out-of-the-box support. In this entry I enumerate the steps required to enable debugging and illustrate how to invoke the debugger.

Note: The default version of Ruby on Mac OS X Lion (10.7.3) is 1.8.7. For many this is a significant drawback as many of the language enhancements in 1.9.x cannot be leveraged. This article assumes you have already installed the newer version. If that's not the case, you will find details instructions here.

Alright, to the important stuff now. The first step is to apply the debug patch to your current installation of Ruby.

$ rvm reinstall 1.9.3-p125 --patch debug --force-autoconf  

Now edit the Gemfile in  your Rails application and add the following lines.

gem 'ruby-debug19', :require => false
gem 'ruby-debug-base19', :git => '', :require => false

Now run bundle config:

$ bundle config build.ruby-debug-base19 --with-ruby-include=$rvm_path/src/ruby-1.9.3-p125/

We are not done yet. Next we need to install linecache19 as follows:

$ gem install linecache19

Finally, we do:

$ bundle install

Now we should be ready to debug our Rails application.  The first step is to bootstrap the debugger in the application. This is achieved by including the following directive in the source code:

def some_function
  respond_to do [format]

Then, launch your application with the debugger option

 $ rails s --debugger

At this point the execution of the thread on which you placed in the [debugger] directive should break, and you should be able to invoke the debugger commands. A description of the debugger commands can be found here.

Happy debugging.