Think of your data architecture as a home renovation project: the Bronze layer strips everything back to the basics, the Silver layer rebuilds with improved materials, and the Gold layer adds those final touches that make it liveable. But wait! Some contractors suggest a ’Platinum’ layer like adding a gold-plated roof to your newly renovated home – which sounds impressive but might be more about impressing the neighbours than improving your living experience, potentially not adding value to your otherwise perfect renovation.

The medallion architecture is a well-established framework for organising data in a Lakehouse. It consists of three layers: Bronze, Silver, and Gold. Each layer serves a specific data processing and analytics purpose, progressively improving data quality and structure. Recently, proposals have been made to add a ’Platinum’ layer to this architecture, which can be used by extending the Data pipeline to OneLake and Microsoft Fabric. 

While the idea of adding a Platinum layer for enhanced Power BI integration and performance may seem attractive, it inadvertently reintroduces complexity, fragmenting the data fabric and moving away from the unified data platform vision. Instead of optimising the architecture, the notion of using Microsoft Fabric for this approach brings back data silos, defeating the core purpose of a unified platform. This blog post will delve into these challenges and explain why adding more layers can undermine the effectiveness of modern data architectures.

The Medallion Architecture

Let’s recap on the Medallion Architecture below demonstrating a unified single approach.  Additional layers can be added, but within the platform, not outside the platform increasing complexity, risk and cost.

Medallion Architecture

The ’Platinum’ Layer Proposal

The proposal to add a ’Platinum’ layer outside the Unified Data platform aims to introduce an additional data refinement and aggregation stage, leveraging OneLake and Microsoft Fabric. However, this approach has several significant drawbacks.

Challenges and Drawbacks

It duplicates data

Adding a Platinum layer can lead to significant data duplication. Each layer in the medallion architecture already involves transformations and data storage at different stages. Introducing another an external layer means additional copies of data, which can:

  • Increase Storage Costs: More layers require more storage, increasing costs. OneLake charges for storage based on the amount of data stored and the number of transactions.
  • Complex Data Management: Managing multiple copies of data across gold and platinum layers can become cumbersome and error-prone, increasing the risk of inconsistencies and data integrity issues – particularly when they are not part of the same platform.

It creates data silos

The Platinum layer will create data silos, which the Unified Platform architecture aims to avoid. The architecture aims to provide a single source of truth and seamless data access across the enterprise. Adding another layer can:

  • Fragment Data Access: Different teams might end up working with different architectures (i.e., gold and platinum), leading to fragmented data access and analysis.
  • Reduce Collaboration: Data silos can hinder collaboration between teams, as they might not have access to the same datasets or insights due to inconsistent experience and disjoined access control model.  This is what we avoided when moving to a Unified Platform.
  • Latency in Data Delivery: Additional transformations and validations required for the Platinum layer can slow down data processing pipelines, affecting the timeliness of insights.

It is more expensive

Implementing a Platinum layer in OneLake and Fabric can be more expensive:

  • Higher API Costs: OneLake APIs and other associated services can be costly, especially when dealing with large volumes of data and frequent transactions. With non-Fabric compute, you pay up to 3x for reading and writing.
  • Additional Compute Resources: More layers outside the platform require more compute resources for data processing (i.e., Fabric Data Factory, Fabric Lakehouse, Fabric Data Warehouse), leading to higher operational costs and network costs.
  • Fabric Capacity Model: While providing predictable costs, the 24/7 pricing model for Fabric capacities may not be cost-effective for organizations with variable workloads or those that don’t require constant processing. This model can lead to overprovisioning and unnecessary expenses during periods of low activity, potentially making the platinum layer approach less economically viable compared to more flexible, consumption-based pricing models.

It exacerbates Day 2 operational challenges

Introducing an additional layer with a different technology exacerbates the day two operational challenges:

  • Multiplied Maintenance Touchpoints: More layers mean more configurations, monitoring, and maintenance tasks, which can strain IT resources (the team needs both Databricks and Fabric skills).
  • Fabric Capacity Sizing Conundrum: Determining the appropriate capacity size for

Fabric workloads are challenging due to the variability in compute requirements across different operations and concurrency levels. Incorrect sizing led to capacity throttling, which impacts performance. This unpredictability makes it difficult for organizations to budget and plan for their data analytics needs accurately.

It impacts performance

    • Increased Processing Time: Each additional layer introduces more data transformations, potentially increasing overall processing time and latency.
    • Real-time Data Challenges: The extra layer may impact real-time or near-real-time data processing scenarios, where speed is crucial.
    • Resource Management Complexities: Fabric’s capacity model and resource allocation can lead to unexpected performance issues:
      • Throttling occurs when exceeding capacity limits, potentially degrading user experience
      •  Bursting and smoothing features may mask underlying capacity issues, making it challenging to identify and address the root causes of performance problems

For further information: Power BI DirectLake benchmark comparison for Databricks

Security and compliance risks are the critical enterprise readiness gap

Adding a Platinum layer in Microsoft Fabric while maintaining the Bronze, Silver, and Gold layers in Databricks introduces several security and compliance challenges:

  • Fragmented Governance: The Platinum layer in Fabric lacks integration with Databricks’ Unity Catalog, preventing a seamless data governance experience across the entire data lifecycle.
  • Inconsistent Access Control: Implementing role-based access control across two platforms may lead to security gaps and inconsistent data visibility, complicating user management and increasing the risk of unauthorized access.
  • Network Security Complexities: Fabric’s limitations in private link access for non-Fabric products could compromise the seamless security integration achieved with Databricks, especially for organizations requiring strict network isolation.
  • Data Residency Concerns: Unlike Databricks, Fabric’s restrictions on deployment to customer-managed virtual networks may pose challenges for organizations with stringent data residency requirements.
  • Limited Infrastructure Control: Fabric does not provide the level of control over the data environment that highly regulated industries require, contrasting with Databricks’ more flexible deployment options (e.g., VNET injection, private link, etc.).
  • Compliance Gaps: While Databricks offers a comprehensive set of compliance certifications, Fabric’s evolving compliance landscape may create temporary gaps in meeting industry-specific regulatory requirements.

For details: Hollow cheese security model for enterprise

Could a Platinum model be beneficial without moving data unnecessarily?

While Fabric’s Direct Lake mode aims to enhance performance, it presents significant constraints:

  • Mixing Direct Lake tables with other table types in the same model is not allowed.
  • Composite models are unsupported.
  • Certain data types and complex Delta table column types are not compatible.
  • The system may revert to DirectQuery mode, negating performance benefits.

For further information: Fabric and Direct Lake limitations

Instead of moving data and introducing potential risks, you can leverage a Platinum model within the existing framework to maximise efficiency and performance while minimising complexity.

Conclusion

In part 1, we’ve navigated the complexities and costs of adding a Platinum layer to OneLake and Fabric. From increased storage expenses to intricate data management and from potential data silos to higher operational costs, these hurdles lead us to ask: Is there a more efficient path forward?

Stay tuned as we explore innovative strategies and cutting-edge solutions that promise to streamline your data architecture, reduce costs, and enhance efficiency. Part 2 will reveal how the Databricks Data Intelligence approach can help you overcome these challenges.

Ready to Optimise Your Data Platform?

 

Let DNX Solutions guide you through a Well-Architected Review to ensure your platform is scalable, cost-effective, secure, and aligned with best practices. Our expert team will work with you to assess your current environment, provide actionable insights, and create a roadmap for future growth and innovation. Maximize the value of your data and AI initiatives today