Chris J Powell

Big Data – On the Cheap

I have worked for a lot of companies in my lifetime and I find it humorous how if I knew today back then just how much of an impact I could have had in assisting the organizations that I worked for in both the collection and the analysis of the data that they just have sitting around.  Data has always been collected.  It wasn’t always in raw bits and bytes…it was once in paper form but it was just as important.  I was reading through an Aberdeen Group Research Paper titled “Big Data for Small Budgets” and I had an epiphany.

One of the points that really stood out for me in the paper was that the demand for real time or near real time Business Intelligence is almost the same for a Small Business as it is for a Large Enterprise but in having worked with and for numerous small businesses in the past, there is just not the staff, expertise or time to actually build a solution to provide that information to the Business Decision Makers.  This is not a good thing and the reality is…the solutions are there for the taking…it just takes a “leap of faith” to be able to make it happen. The data is already being collected in the form of orders, sales and raw customer data…but why is it not being leveraged? In looking at the report and doing some side research of my own…it is the perception that the Data Integration Tools are too expensive so lets take a look at how a Small Business can reap the benefits of Big Data without breaking the bank. Return on Investment is key to the success of any venture…I get that so lets set a modest $5000 budget to the project and see what we can get from looking at the key take aways from the report:

  • Data Integration Tools
  • In-memory Computing
  • Unstructured Data Management Tools
  • Data Visualization Tools

Data Integration Tools: The world of Open Source is a great place to look for to level the playing field and get extra bang for your buck.  I took a quick gander at two tools that may just fit the bill as effective Data Integration Tools: Pentaho Kettle (Data Integration (or Kettle) delivers powerful Extraction, Transformation, and Loading (ETL) capabilities, using a groundbreaking, metadata-driven approach) and Talend Open Studio for Data Integration helps you get your data to the right place, in the right form, at the right time. Both of these tool sets are completely FREE and having used Pentaho products in the past…are relatively easy to use so cost to the budget:


In-memory Computing: It is possible to move the needs of the Big Data Analysis to the Cloud and live life with an Operational Budget but we in using the Aberdeen Group cut off point of 5 TB of Data as the minimum standard for being Classed as Big Data…Amazon Web Services is going to charge $150 per month so after the first year (just for storage) you are looking at $1800 just to house the data and not run any analysis on it so lets build an effective In-Memory system that can do the processing for us…I headed over to and configured a PowerEdge 320 with 32 GB of RAM, 2 x 4 TB Drives and a single Intel® Xeon® E5-2430 processor that has 6 cores…all this computing power comes in at:


Unstructured Data Management Tools: Leveraging a system like Apache Solr again an Open Source Project can enable us to work with Unstructured Data quickly and operate at maximum configuration when dealing with Text Searches.  This combined with Hortonworks version of a virtualized Hadoop configuration and we are talking a kick ass option to start looking at all the data in a company and coming up with some interesting insights.


Data Visualization Tools: When it comes to taking all that Data and starting to making it look like something that business an actually use…well that is where life in the world of Big Data starts to get interesting.  Looking at Open Data Visualization and applying the work that our new Powerhouse Server will be able to apply to this setup…we can look to the world of Open Data Tools but that is more about applying external Data Sources to your method…if you are looking at building out your own analysis…well that is where jumping into the world of Python and building some interesting interpretations comes into play.


There is no simple answer to the question of Big Data on the Cheap but as you can see…at the base line…with some will power and going Open Source (including the OS for the Server) you can become a Big Data Powerhouse for under $5,000…but it will take some blood, sweat and tears to get there…but once there…the real time insights that you were looking for…well the possibilities are really endless!

(Note: there will be additional expenses in training, development and programming but to acquire the necessary software and hardware…this is a best effort price).



Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.