I assume you have familiarity with MongoDB and GridFS. In the absence of that I recommend you start with the GridFS documentation.
Prerequisites
The MongoDB Python driver must be present. I installed it on a Mac OS X (Lion) box with easy_install. Please note that there are other installation options.
$ easy_install pymongo
With the driver installed, there are only two concepts to illustrate: reading a file from the file system -and- using the Python MongoDB driver to store it in GridFS.
Details
To open a file for reading you can use the
open(file, filemode)
function which returns a file object.file = open("my_file_name", 'r')
Using the GridFS store in MongoDB is also fairly simple. The general steps are: (1) open a connection to the server, (2) get the target database, (3) initialize a GridFS object with the database reference, and (4) invoke the GridFS.put() function to store the file.
Step 1 - Connect to the server using the Python Mongo driver. This illustrates connecting to your local development instance on the default port used by MongoDB.
connection = pymongo.Connection( "localhost", 27017)
Step 2 - Obtain a reference to the database on which the file(s) will be stored using the GridFS API. Note that a MongoDB instance holds one or more databases, each with one or more collections.
db = connection.yourdatabase
Step 3 - Create a GridFS object using a reference to the database on which to store the file(s).
gridFs = gridfs.GridFS(db)
All that's left now is to invoke the "put" function to store the file. This function takes one or more keyword arguments which are used by the GridIn class to assign attributes to the stored file or to specify other storage characteristics. For more details see the PyMongo documentation.
file_id = gridFs.put( file.read(), filename="my_file_name")
In this case we have passed the "filename" keyword to let GridFS know that we want the file to be stored with this file name ("my_file_name").
The put functions returns the "_id" of the newly created file. This can be used to associate GridFS files with other collections objects.
Closing up
I used this approach to import a large number of TIFF files. With Python's simple and succinct syntax, along with the elegant PyMongo driver implementation, this task was accomplished with 30 lines of code, including error handling.
In subsequent posts I will describe how to associate files with existing documents in a different collection in an efficient manner.
how to actually view those saved files?
ReplyDelete