VLAB – Big Data & NoSQL
March 15, 2012 - The MIT/Stanford Venture Lab (http://www.vlab.org/) held their latest event on the topic of Big Data and NoSQL which is the growing trend for data storage in public and private clouds. The crowd of nearly 600 packed the campus auditorium to hear several different approaches to the data collection and query issue from CouchBase, Oracle, 10Gen on MongoDB, Apache on Hadoop, and Shasta Ventures. The event was moderated by Robert Scoble of Rackspace Hosting.
The discussion was launched with a short talk by CouchBase on the challenges of big data and why it is reaquiring a change in the database structure. The opening statistics were from IDC which indicated thter was more than 1.8TrillionGB of data created in 2011 and this will be doubling each year. On top of the data, the audience of that data and the access is getting bigger and more distributed. The big data issue is for new data being collected that is from distributed sources, run on distributed machines and not under a structured schema, This is different from traditional business data which has a relational database for storage that runs on a single system and has a schema that is architected for an extensive set of apriori queries on the content of the data based, and whose performance is related to the schema used. This sort of structure does not scale.
The NoSQL methodology is based on a distributed db without a pre-defined schema that allows for linear scale out based on adding machines. The CloudBase NoSQL solution is an open source solution that has a back up business model that licenses the binary with instal support and driver patches that precede the integration of these functions into the open source code. The system is designed to be a transactional database that sits inside an application rather than reords data on the outside. The keys are to be able to Focus/Simplify/Adjust for the application and the types of data involved.
Apache and 10Gen continued the discussions on the changes in the db world and why these new methods were gaining in popularity, especially for content creators and providers, using the open sourece model. Oracle however opined that traditional business models can prevail in these applicaitons as Oracle sells relational data bases, NoSQL databases and open source databases at different price point, applications and customers.
The discussion ended with a shift to analysis that is possible with the data structures. The business intelligence (BI) community and data scientist community has long relied on the relational style database. The web and media ontent however, do not have structured queries and the BI market for that sector requires a lot of flexibility as the queries and the depth of the queries are very diverse. The content industry is looking into a tiered program with structured metadata and unstructured full data to help represent and be a searchable repository for information.




