Writing
Most of the databases since the 1970s have been developed, designed and architected with the main purpose of writing and capturing data. Most of these "writing" databases are based on storing data in rows, as this makes it very simple to gather, append and collect data as quickly as possible. SQL became a very common language to inquire or query the data from these writing-optimized relational database management systems. As the volume of data has grown exponentially over the last 40 years, these writing-optimized databases have started to show their age in terms of analyzing large numbers of records, making analytics on top of these architectures time consuming and slow. Big data has also been focused on collecting everything, further increasing the size of and strain on these write databases.
Reading
Although the challenge of creating analytical databases or databases optimized for “reading” also started in the 1970s, it was not until the 1990s that reading-optimized database adoption began to grow and mature. Multiple approaches like Cubes and non-SQL languages like MDX that focused on being able to read row based (“write”) databases more quickly, usually required processing or ingestion that took time and introduced latency. These approaches also made it hard for analysts to leverage “read” databases like HP’s Vertica, Cognos, Hyperion or Microsoft. Eventually, “reading” databases like SybaseIQ (1995) were developed and based on storing data in columns rather than rows. This design optimized the capabilities to query and read information quickly. The next step was to move these analytical databases to the cloud.
Why Google BigQuery is a big deal
It was not until the mid-2000s that cloud databases based on columnar storage with SQL language compatibility became a real solution for analysts. Google BigQuery made it to market by 2010, allowing for unparalleled speed regardless of size. Based on a Dremel whitepaper from 2006, the reading optimized database has unlimited growth and unparallel performance. BigQuery is a big big deal in terms of analytical databases because it allows analysts to query their data very quickly using SQL regardless of the size. You can ingest data very fast, while never stopping to query thus removing the latency introduced by early analytical databases. BigQuery is available as a cloud service that does not require configuration or hardware setup. It is auto optimized with no indexes to manage. Finally, it is fully secure and capable of complex nested structures to support web data. BigQuery is indeed a big deal and the product of 40 years of database evolution!
No comments:
Post a Comment