Monday, March 7, 2016

Google BigQuery is a Big Deal : A Brief History of Database Evolution

As a person so close to, active in, and passionate about the technology industry, I love sharing my knowledge with non-technical crowds from time to time.  It is rewarding and refreshing to simplify some difficult concepts, take a look back at history and gain new perspective.  I recently found myself educating the less-technical side of our team at Bimotics about the incredible power of analytical databases like Google BigQuery. What I realized is that folks can better appreciate BigQuery’s  capabilities by  first understanding how databases have evolved over time.  The following is a short evolutionary history of databases and a summary of why BigQuery really is a big deal!
Most of the databases since the 1970s have been developed, designed and architected with the main purpose of writing and capturing data. Most of these "writing" databases are based on storing data in rows, as this makes it very simple to gather, append and collect data as quickly as possible. SQL became a very common language to inquire or query the data from these writing-optimized relational database management systems. As the volume of data has grown exponentially over the last 40 years, these writing-optimized databases have started to show their age in terms of analyzing large numbers of records, making analytics on top of these architectures time consuming and slow. Big data has also been focused on collecting everything, further increasing the size of and strain on these write databases.
Although the challenge of creating analytical databases or databases optimized for “reading” also started in the 1970s, it was not until the 1990s that reading-optimized database adoption  began to grow and mature. Multiple approaches like Cubes and non-SQL languages like MDX that focused on being able to read row based (“write”) databases more quickly, usually required processing or ingestion that took time and introduced latency. These approaches also made it hard for analysts to leverage “read” databases like HP’s Vertica, Cognos, Hyperion or Microsoft. Eventually, “reading” databases like SybaseIQ (1995) were developed and based on storing data in columns rather than  rows. This design optimized the capabilities to query and read information quickly. The next step was to move these analytical databases to the cloud.
Why Google BigQuery is a big deal
It was not until the mid-2000s that cloud databases based on columnar storage with SQL language compatibility became a real solution for analysts. Google BigQuery made it to market by 2010, allowing for unparalleled speed regardless of size. Based on a Dremel whitepaper from 2006, the reading optimized database has unlimited growth and unparallel performance. BigQuery is a big big deal in terms of analytical databases because it allows analysts to query their data very quickly using SQL regardless of the size. You can ingest data very fast, while never stopping to query thus removing the latency introduced by early analytical databases. BigQuery is available as a cloud service that does not require configuration or hardware setup.  It is auto optimized with no indexes to manage. Finally, it is fully secure and capable of complex nested structures to support web data. BigQuery is indeed a big deal and the product of 40 years of database evolution!

No comments:

Post a Comment