Data Modernization Archives - 91��

Data Modernization – What is the best route for your transformation journey? (Part 2)

91�� — Tue, 30 Aug 2022 05:36:46 +0000

So, you have taken the decision to go in for a data modernization exercise, which befits any forward-thinking organization. That’s the good news!

The question now is what is the way forward? What is the most appropriate model for your organization?

The truth is that there is no one-size-fits-all solution. Over the last decade, Data Lakes grew to be the de facto model for modernization. These days, they are being supplanted by, or in many cases have been subsumed into, Data Meshes. Both models have their votaries, and both come with their own set of challenges.

Let us examine these two models in a little more detail so that you can wrap your mind around them more easily and be better positioned to choose between them.

The Data Lake

A Data Lake is a large reservoir into which raw data can be poured and stored until needed. Thanks to its flat architecture, it stores data in its native format, as binary large objects (blobs) or files. It takes in unstructured data, such as emails, documents etc.; binary data like images, audio, and video; semi-structured data, such as CSV, logs, and XML; and structured data from relational databases. The extract-transform-load process happens within the Data Lake itself.��

The Data Lake can, therefore, efficiently manage the high Volume, high Variety, and high Velocity of Big Data. It also significantly enhances the value of Big Data by making it available as reports, dashboards, and applications, to facilitate better visualization, advanced analytics, and machine learning. All, of course, to ultimately empower organizations with the ability to take evidence-supported business decisions with more far-reaching impact than ever before.��

Being a single, integrated, and complete system, the Data Lake facilitates faster and simpler development of applications as well, which are based on one code.

The Data Lake can reside on the cloud, on a platform such as Microsoft Azure, or as a distributed file system such as MS SQL Server with the Hadoop Distributed File System.

However, Data Lake also has its drawbacks.��

As the volume of data increases and grows more complex, the central IT function becomes overloaded with requests and cannot keep pace. Individual project teams then try to bypass it and deploy quick fixes that are poorly integrated and create problems in the future.

What is worse, organizations keep pouring data into the Lake and eventually lose track of what it contains. Much valuable information can go unnoticed because data analysts have no knowledge vis-à-vis the data’s source domain and engage in fishing expeditions.

Many organizations have seen their Data Lakes turn into data swamps because, after a point, it entails considerable technical and organizational effort to make productive use of them.

The Data Mesh

The Data Mesh evolved in response to the many challenges that the Data Lakes posed.

Unlike the Data Lake, the Data Mesh is a composite ecosystem, not a monolith. It breaks giant, monolithic enterprise data architectures into decentralized subsystems, each owned and managed by a dedicated team.��

The Data Mesh facilitates the management, connection, and smooth flow of data from producers through to consumers, whether outside or within a Data Lake. In that sense, a Data Mesh may include Data Lakes.

Data Meshes can be said to have four pillars:

Decentralized Data Ownership

Data is owned by the entity that produces it, typically functions such as HR, Finance, Marketing, etc. Therefore, more value can be derived from it. Typically, tools such as Azure Databricks are used to process the large workloads of data.

Data as Product

Users, such as data analysts, can easily source data directly from the domain owners, who will ensure that the data is of high quality. Conflicts are eliminated by using approaches like event sourcing and CQRS.

Self-serve data infrastructure as a platform��

Domain teams can create, transform, and consume data products autonomously.��

Federated governance

Mandated universal standards to enable smooth interoperability and flow of data.

The Data Mesh brings many benefits to the table

Flexibility and Choice – Since its architecture is domain driven and distributed, you have the flexibility to choose vendors and technologies that work best for you, without getting locked onto one platform.��

Greater agility, seamless collaboration, shorter project times – Since domain teams own their data, they can operate independently, making them more agile and responsive. At the same time, since the teams are cross-functional, collaboration becomes simpler and more efficient. Development accelerates and projects go live faster!��

Superior quality – Since ownership is vested with domain experts, the quality of the data is always high. Further, by mandating universal protocols and principles, the Data Mesh promotes the delivery of data in standardized formats for easier access.��

Quick service: Data producers and data users interact based on pre-determined SLAs, which enables much faster delivery of data. All data management needs such as storage, logging, identity management, and such, which slow the process down, are handled by the Data Mesh’s inbuilt capabilities.��

Scalability: Being distributed in structure the Data Mesh is also eminently scalable with minimal disruption.

So, should your company upgrade to a Data Mesh?

A Data Mesh certainly sounds like a panacea for all data ills but, like all technology solutions, it must be opted for after due thought and diligence. Keeping the following factors in mind will help you make a better-informed decision about whether your organization needs to upgrade to a data mesh.

Duplication of data: Repurposing data to serve another domain’s needs may lead to data duplication. This can lead to higher storage requirements as well as increased data management��costs.

Quality Avoidance: The availability of multiple data products and pipelines may lead to non-compliance with governance standards. Therefore, these principles will need to be clearly articulated and compliance enforced through appropriate measures at the domain level.��

Change management efforts: Deploying data mesh architecture and decentralized data operations will entail organization-wide change management efforts. You will need to plan to allow for business disruptions and to ensure that critical operations continue.

Choosing future-proof technologies: Teams will have to think long term when selecting technologies that will be standardized across the company, to ensure easier future upgradation with minimal disruption.

Cross domain analytics: Reporting becomes decentralized as well, and a separate organization wide model may need to be defined to consolidate diverse data products into one report.

Talk to us at 91��. We’ll undertake an assessment of your existing digital landscape, identify modernization areas, build a strategic roadmap, and define the enterprise architecture you need.��

Click here for Part 1 of blog: Modernize the Data Ecosystem to Lay the Foundation of an Insights-driven Digital Next Enterprise (Part 1)

Reference:

Zhamak Dehghani
Data Mesh Founder

Author:

Bhagaban Khatai
Data Transformation Leader

The post Data Modernization – What is the best route for your transformation journey? (Part 2) appeared first on 91��.

Modernize the Data Ecosystem to Lay the Foundation of an Insights-driven Digital Next Enterprise (Part 1)

91�� — Tue, 28 Jun 2022 10:43:56 +0000

Data modernization has become an urgent competitive necessity for businesses to stay ahead of the curve – anticipate market changes earlier, understand customer needs more closely, and take and implement winning decisions faster than the competition.

That said, technology leaders need to assess the pros and cons of a modernization exercise. Businesses must study the various avenues for modernization and choose the one that gives them the best cost-benefit balance. As with any change management initiative, it is disruptive and entails focused deployment of resources.

In this article, I will discuss three frameworks/platforms that, we at 91�� have helped our clients use to effectively leverage data for business success.

The Data Warehouse

The Data Warehouse was probably the first enterprise-level platform to use data for business decision support. It came into its own in the Nineties and at the turn of the new Millennium. As its name implies it organized data in structured and labelled fields that could be easily accessed, and it worked excellently.��

Data-driven business intelligence, as a concept, gained massive leverage thanks to the Data Warehouse. However, like its counterpart in the real world, the Data Warehouse’s key drawback is poor scalability. It works on pre-built schema and can take in only structured data. As a result, the data is siloed and not all data is captured.��

As the three Vs of data – volume, variety, and velocity – grow, as in today’s age of Big Data, the Data Warehouse becomes unwieldy and inefficient. And data’s fourth V, veracity, suffers in consequence.

This is not to say that the Data Warehouse has outlived its utility. It still works efficiently for businesses that deal with a smaller volume and variety of data and provides excellent decision support intelligence at a relatively lower investment.��

The Data Lake

The Data Warehouse’s inherent problems gave rise to the Data Lake, a platform with no hierarchical structure that is more attuned to the needs of Big Data.��

A data lake is like a reservoir into which raw data can be poured and stored until needed. It has a flat architecture and takes in data in their native formats – emails, documents, images, audio, video, semi-structured data, such as CSV, logs, and XML, as well as structured data from relational databases.��

The extract-transform-load process happens within the Lake itself and data is presented as reports, dashboards, and such, to facilitate better visualization and more accurate analytics, as well as to enable machine learning.��

The Data Lake is thus capable of managing the high Volume, high Variety, and high Velocity of Big Data.��

However, the Data Lake also has its drawbacks.��

Once data is put into the Lake, it becomes monolithic. This limits the knowledge that data analysts can gain from it and increases the risk of valuable information going unnoticed.

Its centralized control structure stretches the IT team thin. Projects get delayed, forcing teams to resort to poorly integrated ‘quick-fix’ solutions that eventually compound problems.

Consequently, it often ends up as a huge unmanageable data dump yard. Drawing any useful sense out of the Data Lake becomes a complex, expensive, and resource-intensive task.��

It is in response to these problems that the concept of a Data Mesh came into being.

The Data Mesh

Unlike the Data Lake, the Data Mesh is a composite, integrated ecosystem, and not a monolith. It is composed of decentralized subsystems or domains, each managed by a dedicated team. In a sense, you can say that the Data Mesh as a whole is greater than the sum of its parts.

It thus offers several advantages over the Data Lake.

It makes domain experts owners of their data. Thus, there is no danger of valuable nuggets of information being lost or ignored.��

It treats data as a product and enables a smooth and secure flow of data from producers to users, whether outside or within a Data Lake. In that sense, a Data Mesh may include Data Lakes.��

It encourages cross-functional teams and empowers them to operate independently, with little or no support from a central IT function. Collaboration is more efficient, the pace of development accelerates, and projects go live much sooner.

Its decentralized approach gives you the flexibility to choose vendors and technologies that work best for you, without getting locked onto one platform.��

A Data Mesh can be deployed for a broad range of needs and for diverse use cases:

Migrating applications to the cloud
Modernizing data lakes to make data more easily accessible
Integrating apps, IoT, and analytics in real-time
Streaming data pipelines within or from data lakes
Data-in-motion analytics

So which modernization solution is best for your organization?

We are part of the Data & Analytics transformation journey over last 15 years

Click here for Part of 2 blog: Data Modernization – What is the best route for your transformation journey? (Part 2)

Author:

Bhagaban Khatai
Data Transformation Leader

Reference:

Zhamak Dehghani
Data Mesh Founder

The post Modernize the Data Ecosystem to Lay the Foundation of an Insights-driven Digital Next Enterprise (Part 1) appeared first on 91��.

Data Modernization Archives - 91����

Data Modernization – What is the best route for your transformation journey? (Part 2)

The Data Lake

The Data Mesh

Modernize the Data Ecosystem to Lay the Foundation of an Insights-driven Digital Next Enterprise (Part 1)

The Data Warehouse

The Data Lake

However, the Data Lake also has its drawbacks.��

The Data Mesh

So which modernization solution is best for your organization?

Data Modernization Archives - 91��