In part 1, we explored the challenges and drawbacks of adding a Platinum layer connecting Databricks and OneLake with Microsoft Fabric. Now, let’s dive into how Databricks’ Data Intelligence approach offers compelling alternatives that streamline your data architecture, enhance performance, and maintain robust governance without needing an additional layer.

The Databricks Data Intelligence approach

Databricks Data Intelligence

When discussing a “platinum layer,” using the most widely accepted definition is essential to avoid confusion within the business intelligence (BI) community. Generally, the platinum layer is understood as a semantic layer where business metrics for BI reports are defined. It is typically done within the BI tool but can also be managed in a separate semantic layer to centralize, reuse, and govern these metrics.

Databricks Integration with Power BI

  1. Publish to Power BI Workspace:
    • When using Databricks in conjunction with Power BI Service within Microsoft Fabric, measures created in Power BI are computed by Databricks. It ensures that the compute power of Databricks is utilized effectively without unnecessary data duplication.
  2. Open in Power BI Desktop:
    • If users create measures and perform some “last mile data modeling” using Power Query Editor in Power BI Desktop, the compute still leverages Databricks. However, extensive modeling in Power Query Editor can blur the lines between responsibility and efficiency.
  3. Copying Data to OneLake:
    • If gold data is copied to OneLake and the “last mile data modeling” is done using Power Query Editor in Fabric (Dataflow Gen 2), the compute shifts to Fabric, leading to potential limitations and increased costs associated with OneLake and Direct Lake. This approach also risks losing the benefits of Databricks’ compute power and security features.
  4. Shortcut Silver Data to OneLake:
    • The most problematic scenario is when silver data from Databricks is shortcut to OneLake, and all modeling is done within Fabric and Dataflow Gen 2. It breaks security protocols and results in non-scalable, fragile pipelines.

Compelling Alternatives to OneLake and Direct Lake

To avoid locking up gold data inside OneLake and using Direct Lake, consider the following alternatives:

  1. Modeling in Databricks and Publishing to Power BI Workspace:
  • Perform all modeling into fact/dimension tables within Databricks. Use the

“Publish to Power BI Workspace” feature to create a semantic model on top of Unity Catalog-governed data. This approach leverages Databricks SQL for compute without copying data and allows measure creation in Power BI Service within Fabric. This method ensures compute efficiency and governance without data duplication.

  1. Using a Different Semantic Layer Tool:
  • Tools like AtScale or Cube can create a semantic layer that centralizes metrics as governed UC tables. These tools provide a robust metric layer that integrates seamlessly with Databricks, ensuring consistent and governed data access across BI tools.
  1. Upcoming UC Business Metrics:
    • Stay tuned for Databricks’ upcoming UC business metrics feature, which will further enhance the ability to define and govern business metrics directly within Databricks. This will simplify the integration with BI tools and maintain robust governance and performance.

Advantages of Databricks Lakehouse Architecture

The Databricks Lakehouse architecture offers advanced features that make an additional “platinum” layer in OneLake redundant and potentially counterproductive. Here’s why:

  • Liquid Clustering: Automatically optimizes data layout for faster queries, adapting to changing access patterns without manual intervention. This eliminates the need for a separate layer dedicated to query performance optimization.
  • Serverless SQL: Provides instant, elastic compute resources that scale automatically, reducing costs by charging only for actual compute usage. This contrasts with Fabric’s capacity model, which can lead to overprovisioning and unnecessary expenses.
  • Direct Publishing to Power BI Workspace: Seamlessly integrates with Power BI, ensuring up-to-date data access for analytics teams without an intermediary layer.
  • Unity Catalog: Offers a centralized governance layer with fine-grained access control and data lineage, enhancing security and compliance. This eliminates the fragmented governance and inconsistent access control issues that arise when introducing a platinum layer in a separate system like OneLake.
  • Open Lakehouse: Built on open source technologies, offering flexibility, interoperability, and cost-effectiveness without additional costs for non-Databricks compute resources. This contrasts with OneLake, which charges premium rates for accessing data with non-Fabric compute engines.

By leveraging these features, organizations can achieve the desired performance, governance, and integration benefits directly within their existing Bronze, Silver, and Gold layer framework. This approach simplifies data architecture, reduces costs, and avoids the pitfalls of introducing an additional layer in OneLake and Fabric.

Conclusion

Adding a ’platinum’ layer to the medallion architecture, while initially appealing to enhance Power BI integration and performance, introduces significant challenges that outweigh its potential benefits when connecting outside via OneLake and Microsoft Fabric. The existing Bronze, Silver, and Gold layers already provide a robust data processing and analytics framework. Introducing an additional layer can lead to data duplication, fragmented access, increased costs, and architectural complexity. Instead, organizations should enhance Power BI integration by leveraging the advanced features of the Databricks Lakehouse architecture, such as Liquid Clustering, Serverless SQL, and Unity Catalog. These capabilities facilitate seamless data sharing and collaboration, allowing teams to easily access and model data. This approach streamlines pre-model sharing and ensures that insights are derived efficiently and effectively across the organisation. These features offer superior performance, governance, and cost-effectiveness within the existing framework. By maximizing the capabilities of Databricks’ platform, companies can achieve their data intelligence goals while maintaining simplicity, performance, and scalability without needing a separate platinum layer.

Ready to Optimize Your Data Platform?

Let DNX Solutions guide you through a Well-Architected Review to ensure your platform is scalable, cost-effective, secure, and aligned with best practices. Our expert team will work with you to assess your current environment, provide actionable insights, and create a roadmap for future growth and innovation. Maximize the value of your data and AI initiatives today