As data grows and diversifies, many organizations are finding that traditional methods of managing information are becoming outdated. This report aims to understand the performance implications of a data lake, and the common characteristics of those that leverage the technologies effectively.
The data lake analogy was conceived to help bring a common and visual understanding to the benefits of distributed computing systems able to handle multiple types of data, in their native formats, with a high degree of flexibility and scalability. While the analogy might not be perfect, the goal of a data lake is certainly well-aligned with the challenges so many companies struggle with today. According to recent Aberdeen research, the average company is seeing the volume of their data grow at a rate that exceeds 50% per year. Additionally, these companies are managing an average of 33 unique data sources used for analysis. This type of rapid volume growth and complexity can wreak havoc on the internal efficiency of companies that depend heavily on data, hence many of these companies are responding by implementing data lake technologies.