3 reasons for missed ROI in data projects

blog preview

Collecting and analyzing data was promised to be one of the most important and most profitable areas of business in the last decade. However, it’s a fact that most businesses - especially small and medium companies - fail to either properly access their data or cannot provide the right data and analytics to the right people in time.

While this situation is frustrating, it’s a risk for the entire business. Properly using business data and basing decision on these data can be a tremendous advantage in the marketplace for any company who successfully incorporates data in their daily business.

So, why do most companies struggle with profiting from data?

1. They don't know how to use their data and they've failed to identify the value of their data

As a business, you continually strive for increasing revenue and customer satisfaction, decrease churn and find ways to overall increase your businesses health. In a best-case scenario, you would already know in advance what actions you can take to generate the desired outcome. Or at least to know quickly, whether your actions are driving you in the right direction. Using your business-data is the best known way of exactly having this - a rather objective way of measuring your success or lack thereof.

However, because of either some unpleasant experiences with failed investments in data or by not having access to the right data, a lot of companies conclude, that data are not important for them. What a tremendous pity, as using data as an integral part of your business strategy has shown time and time again to be highly successful, if done right.

So, how do you derive value from your data?

Let’s get the elephant in the room out of our ways: Data in itself don’t provide any value - only using them to enhance your business does. First, you want to decide on a limited and clearly described set of business factors or metrics you want to improve. The more specific you can describe your metric, the better. Examples can be, you want to "increase number of sales started via your homepage", "decrease churn of your subscription contracts" or "decrease the number of problems on my product". These are all rather specific metrics or questions which would all result in better health for your business. Try to avoid rather generic and very open-ended questions like "increase revenue" - this is just too wide of a question to be answered. You can however transform these generic questions into good ones by adding more details to them - like "increase revenue by increasing number of weekly subscribers to your product" or "find ways which were historically successful in driving revenue".

Then you want to establish a baseline - the current situation - for these questions. To measure increased sales started via your homepage, you first need to know the current sales started via your homepage.

This step is often overlooked - resulting in having no objective way of knowing whether your actions are successful.

For creating this highly valuable baseline, look for data sources within your company which might provide the foundation for it. Work closely with your departments or industry experts to know where data are available and how you can integrate them. Don’t get too complicated here: Most of the time the most basic data sources are all you need - website tracking data, email sign-ups or basic IoT data like alarms and operations data.

As a nice side-effect of creating the baseline: The data sources needed to create the baseline are often required to solve your key question - two birds with one stone.

Assuming you integrated the data (more on that in the next chapter) and you created the baseline, use your data to look for a limited set of factors which influence (positively or negatively) your question or metric. This requires some Data-Science background and manually working with data. Don’t expect an automatic solution to do its wonders here - this step involves some amount of labour. However, also here, don’t overcomplicate things. More often than not, the answer is rather easy to find. A set of 3 to 5 factors/parameters is enough to sufficiently describe your problem. This step most of the time also involves some experimentation. Create a hypothesis, implement it and test how this influences your factors and key metric.  Finally, after finding what influences your key metrics or questions, work closely with your departments or industry experts to derive actions which influence these influencing parameters. These actions are obviously based on the experiments already conducted in step 3.

As an example, I was working with a client lately who already knew rather well, what he wanted: Generating more inbound sales via his homepage and blog rather than only via his sales force. He already instrumented his homepage, meaning he had all the tracking data we would need. To now increase inbound sales, we followed the above formula:

  1. Define the question to answer (already done)

  2. Create a baseline of how many sales are already done via the homepage: For creating the baseline, we first established a specific definition of "inbound sale" - when does a sale count as inbound sale? We defined it as a sale counts as inbound sale, if a potential client presses the "Contact Sales" button on the homepage and this client finally get’s signs a contract. Clients which signed up for a Newsletter and then were converted did not count - we wanted to specifically go for the shortest potential sale.

As the client already perfectly tracked his homepage, all we had to do was to check the historic data for how many clients clicked on the "Contact Sales" button and converted. Easy enough.

  1. Derive influencing factors to improve the baseline: For completing this step, we again used the website tracking data and did some old-school data exploration. Manually and semi-automatically looking for patterns in the datasets related to users who clicked or did not click the "Contact Sales" button. We also analyzed, which of these users finally converted. Interestingly, we found, that users who contacted sales are already very engaged on the website. They read through a very high number of product information pages before clicking "Contact Sales" - however the overall conversion rate still was very low. After talking with the sales department, we could quickly plan a hypothesis: When people were contacting sales, they most of the time only wanted to have more information, rather than already buying the product. Combined with our data-based knowledge, that people are reading a lot of information on our website, we concluded, that the homepage simply does not provide good enough informational material. We designed an experiment, where we served some users more specific information about the product on fewer pages. To our delight, already this first experiment proved successful - not only were we able to convert more people, we also could increase the number of users contacting the sales team as compared to the control group. My guess is, that the clients need to navigate fewer pages to get more information. This leads to fewer people leaving the homepage without contacting sales at all - but this hypothesis is its own experiment ;-)

  2. Create specific actions to improve the influencing factors: After knowing, that one of the major problems of the client’s webpage was too little information, scattered across to many sub-pages, the action was rather easy to define. First, we created some smaller experiments, tracking which specific information we need to convey - and then reworked the homepage accordingly.

Many companies struggle to find value in data because they don’t use a straight-forward and easy enough route to data analytics. As the above process describes, using data for increasing your metrics does not require complicated processes and systems. Just a well-executed, educated plan and some curiosity for data you already have.


  1. Define the question to answer
  2. Create a baseline
  3. Collect influencing factors to improve the baseline
  4. Create specific actions to improve the influencing factors

2. They've invested enormous amounts of money in data infrastructure, IoT systems and modern databases just to end up with tons of unused data scattered across data silos

It’s important to not only collect data but also use them. One of the biggest hurdles in using data is the overabundance at which data is available. Any modern system provides data, any product provides interfaces to collect data. IoT systems promise us Golden Eras by using IoT data. Modern databases allow to implement any data use case in no time - according to their marketing. Literature and media is overwhelming us with positive examples of how data changed some companies life.

Taking all this praise into account it’s no wonder, that many companies were excited and invested heavily in systems providing data, in data infrastructure, IoT, etc. etc. However, the excitement often ended rather quickly. When one realized, that all these data systems need someone to work with. These systems have their unique data formats and APIs, most of the time you need multiple systems to answer your questions and finally you also have to think about which data you want to store for how long.

To not suffer from these problems, you want to build upon a solid data strategy. It’s fine if you already used some of your data sources, it’s easy enough to integrate your ongoing efforts with a bigger strategic effort.

A data strategy incorporates foremost a system for documenting your data sources. This not only is relevant for GDPR compliance but also to allow your data scientists to easily find and work with any data.

It’s advisable to integrate your most valuable data into a centralized system which is the main entry point for your data analytics endeavors - think data warehouse or data lake. Please don’t be repulsed by the "big" words data warehouse or data lake. Using state-of-the-art tooling and good architectures, it’s easy enough to set up infrastructure which requires very little man-power to run and operate.

Using modern workflow orchestration tools like prefect.io you can and want to fully automate integrating your data sources and creating models based on your data analytics needs. Using cloud-based data-warehouse tools like Google BigQuery or Snowflake provide very cheap data storage for huge amounts of data with at the same time unbelievable scalability if you need to analyze lots of data.

You also want to have possibilities to test your data - tools like "good expectations" allow to define how you expect your data. Having this system in place will increase your data quality tremendously.

One of my recent clients had invested in a modern IoT system for analyzing his products. This went well for the initial months, as he could find some potential for improvements on his products. However, after some time the business leaders got accustomed to using data and their line of questioning got more elaborate. Soon enough, the IoT data itself were not enough anymore. Questions arose like "how may we increase the upsell volume for a specific segment of clients". One can’t answer this question by using an IoT system alone, as client segments were done in a CRM system. You also want to have additional customer-related data only available in the CRM systems. And finally, we wanted to have specific product information which in this clients case were part of their ERP system.

The straightforward way now might seem to simply integrate the 3 data sources (IoT, ERP, CRM system) with your analytics solution. This is exactly what the client first intended, however he come across several hurdles:

  • Integrating the APIs in itself is rather complex
  • The integration itself is not scalable - you need to integrate starting from zero for your next analytics
  • Performance of the analytics was slow
  • Let's not talk about data security…

Based on the very good analytics question from Management, we went for a small data warehouse. We could see that this not only helps to answer this question, but many future ones as well. So we set up the excellent workflow orchestration tool prefect.io, we used airbyte.io for integrating the data sources and Snowflake on Microsoft Azure to store the data. For modelling our data we used another excellent tool: dbt.

As the bespoke systems provide excellent documentation and awesome developer experiences, the whole warehouse was running within less than a month - being a scalable and modern foundation for any present and future data analytics. By using the infrastructure we now had, it was easy to combine the data from these various data sources. From a data science perspective, we only had to deal with one system - our warehouse database. On top of this system, we early on invested in data quality, by using the tool "Good expectations" to define how our data should look like. This investment proofed highly valuable as we caught several problems with our data like inconsistent sampling rates of the IoT system or missing data entries in the CRM system.

Having this centralized repository of data allows you to work with data in a totally new way. It provides ONE interface which your data scientists can. It ensures, the quality of your data is high. It allows to combine multiple data sources without any additional integration hassle - analyzing data from a single system is oftentimes not enough to answer your complex business needs.

Having this centralized repository of data allows you to work with data in a totally new way

3. They don't base their decision on data, due to unusable or not available data end-products

Basing decisions on data rather than gut-feeling is from utmost importance - however many decision makers cannot do so. One of the main reasons - if not the most common one - is, that the data they receive is simply not useful for decision making.

Data Scientists are amazingly talented in creating detailed answers from data - but they MUST NOT stop there. The final and most important step is to make data usable by the end customer. This requires product-thinking and identification of stakeholders. The result of a data analytics project needs to be easy to understand, easy to use and must follow common UX practices. Do not overload your target group. Know exactly how accustomed your customer is with data and only present the data which are really relevant.

Also, the look and feel of the data analytics result needs to be from high quality. Using nice color schemes, modern data visualizing and reporting is not optional. Make your end users love your analytics, not only your numbers!

I had a client already some time ago who was well ahead of the curve with data usage. They had a great data team, nice, centralized infrastructure and innovative top-level management who quickly knew, how to ask questions to be answered by data.

However, they continually struggled to spread the fire of excitement to middle management and individual departments. Quickly analyzing the situation we came with the obvious reason: The data products delivered to these departments were simply boring. Nothing more and nothing less. The necessary (and by the way amazing) information was there. But they delivered too much information with too little entertainment. Often as pdf files on Monday morning (you like getting a 10 page pdf on Monday morning via E-Mail, right?). The visualizations were plain non-styled 3d.js charts - visually not appealing.

We changed the delivery of data as follows:

  • We did a product-like survey to identify the information which was REALLY needed
  • We introduced a modern dashboarding tool (which then was loved not only by the end customers, but also data scientists)
  • We had dashboards with as little as 3 metrics on them. Including a description about what the metrics were about
  • We did 2-weekly check-ins with the departments to identify pain points with data and changed the representation accordingly. For some departments, we established "data champions" which were in constant conversation with the data team.
  • We did not actively deliver metrics anymore. With having this clear and visually appealing dashboards, people started to proactively use dashboards. (Nice side-effect: Less infrastructure for automating reports!)

Only 8 weeks after introducing this new data representation we could see a 450% (!!) increase in dashboard usage, compared to the open rate of the e-mail based reports. [Note that measuring open-rates is fuzzy. Despite that, as the number of increased engagement is so huge, we can neglect the inaccuracy in mail open rate measurements].

Treat your data - and especially the outcome of your data analytics - similar to your products. You have internal (and maybe external) clients with their needs and knowledge. Targeting these data products to the needs of your data-clients will tremendously increase acceptance - and therefore efficiency - of the data throughout your entire organization.

Only 8 weeks after introducing this new data representation we could see a 450% (!!) increase in dashboard usage

You can and will profit from data

Data are one of your most valuable assets - this should be the norm, but is not - for many. While you can still be successful without using data to your advantage, you will have a harder and harder time on the markets, as your competition will use objective, data-based decisions to appeal to your clients and market share.

It’s time to change how you work with data, how you deliver data and how you make decisions. As an expert data concierge, I can help. My data science engineer product line is tailored to small and medium-sized businesses, looking for finally profiting from data. Early on I will help you improve your business health, revenue, etc. by focusing on a data-based, objective approach of data engineering and analytics.

Data are no longer a liability - but the foundation for your profit generation.

Leave me a message how you are using data and what obstacles you encounter in your daily data business. How do you integrate your various data sources? How do you deliver your data products to your end customers - like top management? How do you handle data quality issues? When was the last time your data or data products annoyed you?


Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

  • Cost control
  • Data privacy
  • Excellent performance - adjusted specifically for your intended use

Need assistance?

Do you have any questions about the topic presented here? Or do you need someone to assist in implementing these areas? Do not hesitate to contact me.