Data Dependency

The Importance of Data Dependency

Why not investing in data platforms is setting your company up for disaster.

Companies with legacy systems or workloads face one of three problems more often than not. Maybe your company has already experienced issues with time to market, bugs in production or limited task coverage due to a lack of confidence in releasing new features. These are the issues usually picked up upon by the CTO or a technical leader who recognises the need to invest in architecture to increase the quality and speed of progress. There is, however, another important underlying problem that no one seems to be talking about.

How and where are you storing your data?

Looking at a typical legacy system these days, it is likely to be a Java, .Net application using a relational database storing a huge amount of data; we have seen companies with tables containing up to 13 years of data!

When we ask customers why they keep all their data on the same database, they rarely have an answer. Often, old or irrelevant data has been retained without reasons, but simply because it has been forgotten about and ended up getting lost among the masses.

With consistently increasing amounts of data comes consistently increased response times for querying information from the database. Whilst it may not be noticable day to day, it could lead to serious consequences, such as losing valuable time and revenue whilst waiting for a backup to restore after a database outage.

It is puzzling to think that whilst we have our best minds considering so much down to the minute details, we largely ignore the way in which we store data. It seems we have a collective ‘out of sight, out of mind’ attitude.

It is extremely common to come across companies that are generating reports from a single database. But here’s the interesting part: each software is unique in terms of security and operation, meaning storage is different for each and every one. Let’s consider an ecommerce store. In this case you would want to organise your tables and data in a way that allows users to easily add items to their shopping cart, place an order, and pay. To make this possible, you would have a shopping cart table, orders table, and products table, which is what we call normalising the database – a relational database. So far, so good. Now let’s look at what happens when you want to run a report. To fully understand your ecommerce business you will want to see your data in various ways, for example, number of sales in NSW in the last seven days; average shopping cart price; average checkout amount; average shipping time frame. Each of these scenarios require data from multiple sources, but by keeping all your data on the same database you are risking the whole operation.

Just as you may lose deals if customers have to wait five seconds to add an item to their shopping cart, you also lose valuable resources while waiting an hour to generate a report – something that is not uncommon to see on legacy applications. Not to mention the direct and indirect consequences of having to wait hours to restore a backup after a database outage (that is, if you even have a backup!).

By choosing not to modernise your data, your business is perched squarely on a ticking time bomb. With a typical ratio of 15 – 20 Developers to 1 Database Administrator (DBA), the DBA is without a doubt the underdog. If the DBA’s suggestions are ignored, developers may begin to modernise their source code and adopt microservices whilst the company’s data in its entirety.

So what happens next?

They might use a database and probably they will use a database. They could use something else. They have to manage states.

Instead of having separate tables for your shopping cart, orders, and products; you now have a product microservice with its own product table, far from the customer microservice and its customer table, which is located in a different database. In addition, the shopping cart may now be in a no sequence database.

Now comes the time to run your reports, but you can no longer do a SELECT in a database and join all the tables because the tables are unreachable. Now you find yourself with a whole host of different problems and a new level of complexity.

Consider the data dimension to fully modernise your application

Now that you understand the importance of data modernisation, you need to know a few key points. To take full advantage of the cloud when modernising your architecture and workloads, you have to find out which tools the cloud has to offer. First, you need to understand that Microservices have to manage states and will likely use a database to do so, due to transactional responsibility. For example, when you create a new product for your ecommerce store, you want it to exist until you actively decide to discontinue it, so you don’t want the database to forget about it – we refer to this as durability.

Consistency is equally important; for example, when you market the product as unavailable, you do not want it to be included in new orders. This is a transactional orientation.

Now we need to understand the analytical view. In order to see how many products you are selling to students in year 8 to year 12 you need to run a correlation between the products, customers and orders. This requires you to have a way of viewing things differently. Most companies choose to build a data warehouse where they can store data in a way that enables them to slice and change the dimension they are looking at. Whilst this is not optimal for transactional operations, it is optimal for analytical operations.

That segregation is crucial. If you build that, you can keep your Microservices with multiple different databases in one state or multiple states in an architecture that is completely decoupled from an analytical data warehouse facility that enables and empowers the business to understand what is happening in the business.

This is hugely important! Operating without these analytical capabilities is like piloting a plane with no radio or navigating systems: you can keep flying but you have no idea where you are going, nor what is coming your way! This analytical capability is crucial to the business but you have to segregate that responsibility. Keeping your new modernised architecture independent from your data warehouse and analytical capability is key.

So, where do we go from here? Utilising Data Lakes

DNX assisted companies enjoying high levels of success through the adoption of data lakes. A data lake can contain structured and unstructured data as well as all the information you need from Microservices, transactional databases and other sources. If you want to include external data from the market today, such as fluctuations in oil prices – go ahead! You can input them into the data lake too! You should take care to extract and clean your data if you can before putting it into the data lake, as this will make its future journey smoother.
Once all your data is in the data lake, you can then mine relevant information and input it in your data warehouse where it can be easily consumed.

Data modernisation can save your company from impending disaster, but it is no small feat!
Most people assume it is as simple as breaking down a monolithic into microservices, but the reality is far more complex.

When planning your data modernisation you must consider reporting, architectural, technical and cultural changes, as well as transactional versus analytical responsibilities of storing stages, and their segregation. All of this becomes a part of your technological road map and shows you the way to a more secure future for your business.

If you would like to know how we have achieved this for multiple clients, and can do the same for you

At DNX Solutions, we work to bring a better cloud and application experience for digital-native companies in Australia.

Our current focus areas are AWS, Well-Architected Solutions, Containers, ECS, Kubernetes, Continuous Integration/Continuous Delivery and Service Mesh.

We are always hiring cloud engineers for our Sydney office, focusing on cloud-native concepts.

Check our open-source projects at https://github.com/DNXLabs and follow us on Twitter, Linkedin or Facebook.