Data Lake vs. Data Warehouse: Which Is the Best Data Architecture?

For a business in digital transition, the data architecture is a big decision. Choosing the right model is one of the first and most important choices of such an initiative. But given the wide range of options and the confusing terminology, it is not easy to choose a solution that meets the needs of the business without compromising its budget.

Two of the most popular options are often called "data warehouses" and "data lakes." Think of a data warehouse as a shopping center . It contains discrete "stores" that store structured data – bits that are preformed in formats with which the database software can interact.

In contrast, a data lake resembles a disorganized flea market. It has "stalls" but where one stops and the next one starts is not so clear. Unlike data warehouses, Data Lake can contain both structured and unstructured data. Unstructured data, as its name implies, refers to "disordered" digital information, such as audio data, images, and videos.

The data market complicates things even more. Unlike the first two concepts, it is not an architecture, but an interface with a data lake allowing those who are not part of the IT team, such as business analysts,. With a search function, it allows users to fish for what they need out of the lake. Think of the data markets as personal tour guides for flea markets, showing buyers where to find the best deals.

Inside the data warehouse and data lake

For a business looking to analyze large but structured datasets, a data warehouse is a good option. In fact, if the company is only interested in the descriptive analysis – the process of simply summarizing the data – a data warehouse can be all it needs.

Let's say, for example, that business leaders want to look at sales figures over a period of time, the number of inquiries about a product or view relies on various marketing videos. A data warehouse would be perfect for these applications because all associated figures are stored as structured data.

But for most companies that embark on Big Data initiatives, structured data is only part of the story. Every year, companies generate a staggering amount of unstructured data. In fact, 451 Research in collaboration with Western Digital found that 63% companies and service providers retain at least 25 petabytes of unstructured data. For these companies, data lakes are attractive options because of their ability to store large amounts of such data.

In addition, the lacustrine data allow analysts to go beyond descriptive analysis and in the exciting – and highly rewarding – field of predictive or prescriptive analysis . Predictive analytics involves using existing data to predict future trends relevant to the business, such as next year's earnings.

Prescriptive analytics goes even further by using artificial intelligence technologies to make recommendations in response to predictions. For predictive and prescriptive analysis, a Data Lake is essential. Often, executives manage data lakes with the help of software like Apache Hadoop, a popular ecosystem of analytical tools.

Before creating a data lake or data warehouse, think about who will do the data analysis and what kind of data they will need. Data warehouses are often accessible only by IT teams, while data can be configured for access by analysts and business sales staff.

A health care company that my company recently worked with, for example, asked for a data warehouse solution. Soon, however, it became clear that the company would rather need a data lake. He was not only interested in predictive modeling, but he also sought to capture all sorts of unstructured data, such as handwritten notes.

Analysts of a health care company can extract data processing from a lake of data to predict the results of a patient. They could add a prescriptive layer and then recommend the best treatment for each patient's needs – one that minimizes costs and risks while providing the highest quality of care.

Making the Most of the Data Lake

Given their ability to store both types of data and their relevance to future analysis needs, it is tempting to think that data lakes are the obvious answer. But because of their loose structure, they are sometimes considered a "swamp" of data rather than a lake.

In fact, Adam Wray, CEO and chairman of the NoSQL database Basho, describes them as "evil because they are unruly" and "incredibly costly." According to the experience of Basho, "the extraction of value [from data lakes] is infinitesimal compared to the promised value. "

But we must not count the data lakes yet. Data markets can save the promise of data lakes by organizing them for the end user. Just as the Internet was much harder to navigate before Google, the data markets are unlocking Data Lake's powerful architecture. In the world of analysis, there is no single system. Data warehouses can give smaller businesses a taste of data analysis, while data lakes (combined with data markets) can help companies dive headlong into big data. These systems are not mutually exclusive either. If its analysis needs change, a company that chooses a warehouse can later add a lake and a market.

The most important thing is to start the journey to a more data-driven company. Many executives will remember that ten years ago the data was not even discussed outside the IT teams. Now, with the range of needs and analytical tools available, it is the turn of executives to lead the conversation.

Asha Saxena


Asha Saxena is the CEO and President of Future Technologies Inc., a data management company. Asha is also the Chief Executive Officer and Chief Innovation Officer of ACULYST, which provides healthcare analysis services.

Leave a Reply

Your email address will not be published.