Storytelling with Data

class: title-slide

# ER015 - Storytelling with Data

## PVA3

### Datenprodukte - Einführung.

<br>
<br>
<br>
<br>
<br>
<br>
<br>
### HS 2024 
<br>
### Prof. Dr. Jörg Schoder
.mycontacts[
<svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg> @FFHS-EconomicResearch
<svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z"/></svg> @jfschoder
]

---
layout: true

<div style="position: absolute;left:400px;bottom:10px;font-size:9px">©Prof. Dr. Jörg Schoder</div>

---
name: agenda
class: left

.blockquote[Agenda]

## Datenprodukte - Theorie und Praxis

* Motivation und Einordnung

* Datenprodukte

* Exkurs: Data Product Management

---
class: left

.blockquote[Motivation und Einordnung]

## Daten: vom Kostenfaktor zum Asset

.panelset[
.panel[.panel-name[Perspektiven]
<img src="data:image/png;base64,#../../img/PVA3/data_in_business_(tableau)_1.PNG" width="100%" style="display: block; margin: auto;" />
]
.panel[.panel-name[Bewertung]
<img src="data:image/png;base64,#../../img/PVA3/data_in_business_(tableau)_2.PNG" width="80%" style="display: block; margin: auto;" />
]
]
.quelle[Quelle: [cloudflight.io](https://www.cloudflight.io/de/download/uncategorized-download/turn-data-into-products-vom-data-scientist-zum-data-business-owner/)]

???

* Data is Oil
  * Quatsch, weil Daten mehr werden, Öl aber verbraucht wird.
  * Aber: Wie beim Öl entsteht der Wert auch bei Daten erst durch die Weiterverarbeitung ("Raffinerie")

---
class: left

.blockquote[Motivation und Einordnung]

## Daten als strategisches Asset im Innovationsprozess

.right-column[
.blockquote[
"[..] the ability to run fast, frugal, and scalable experiments based on high-value business hypotheses is becoming a new core competence for innovation success. As companies gather more data about their customers, channels, usage, complaints, social media, etc., we won’t just see people analyzing data with optimization in mind; we’ll be seeing machines generating “innovation hypotheses” recommending new configurations, bundles, features, pricing schemes, and business models to test."

.tr[
<a name=cite-schrage_let_2014></a>[Schrage (2014)](#bib-schrage_let_2014)
]
]

]
.left-column[
<br>
<img src="data:image/png;base64,#../../img/PVA3/schrage_(amazon).jpg" width="100%" />

]
.quelle[Bildquelle: [amazon.de](www.amazon.de).]

???

* Insights from Data

* descriptive
  * predictive
  * prescriptive/actionable
  
* Results vs. Methods

* Data-Driven vs. Data-Informed

* Data Mining vs. Data Products

---
class: left

.blockquote[Motivation und Einordnung]

## Datenprodukte und Datenmanagement

.quelle[Quelle: [starbust.io](https://www.starburst.io/blog/data-driven-innovation/)]

???

* known data, known question
  * Beispiel: 
    * Question: Wer sind meine profitablsten Kunden?
    * Data: bereits modelliert und in einer Datenbank gespeichert.
  * letztlich ein klassisches Data-Warehouse
  * Wir haben eine Frage und speichern die zur Beantwortung benötigten Daten in einem Data Warehouse
  * legacy data stack
  
* known data, unknown question
  * legacy stack typically fine-tuned for well-defined data and analytics, it can also serve the case of new analytics on top of well-defined data
  * expolorative Analyse - Inspiration für neue Fragen
  * Ziel: Verstehen und erforschen vorhandener Daten, um daraus Erkenntnisse zu gewinnen. 
  * Ebenfalls recht verbreitet
  * Beispiel:
    * Data: bereits im warehouse (oder data lake) vorhanden. Explorative Analyse (bspw. verschiedene KPIs, Kunden etc.) führt zur
    * Question: Eigenschaften der profitablsten Kunden?

* unknown data, known question
  * es gibt konkrete Fragen und dafür geeignete Analystools sind bekannt. Die Tools erfordern jedoch Daten, die noch nicht vorhanden sind (oder die Daten sind neu im Unternehmen und noch nicht "strukturiert".
  * Beispiel:
    * Question: Wie verhalten sich meine profitablsten Kunden?
    * Data: Daten noch nicht modelliert, zentralisiert.
  * Wird in den meisten Unternehmen nicht oder nicht gut gemacht.
  
* unknown data, unknown question
  * kann Ausgangspunkt für echte Innovationen sein. Spielerisch an neue/unbekannte Daten rangehen und sich zu neuen Fragen inspirieren lassen.
  * hier kann ggf. ein echter Wettbewerbsvorteil geschaffen werden.
  * Beispiel: 
    * Data: verschiedene interne und externe Datenbanken, die nicht zentralisiert und modelliert sind.
    * Question: was müssen wir mindestens investieren, um einen profitablen Kunden zu halten?
  * Ideen könnten bspw. der Marketing-Abteilung präsentiert werden. Diese könnte ein Experiment lancieren, um herauszufinden, ob es eine Wertschöpfung ermöglicht.
  
* Das Experiment kann dann die Grundlage für strategische Entscheidungen bilden. Kosten-Nutzen bekannt, wie setzen wir es dauerhaft um? Sind die Kosten der Modellierung/Zentralisierung der Daten gerechtfertigt?

* Take-aways [video](https://www.starburst.io/blog/data-driven-innovation/?wvideo=coj3g4cc4h): 
  * neue, veränderte Technologien
  * Bedürfnisse bzgl. Datenverwendung und -eigentum haben sich verändert
  * Zugangsformen zu Daten haben sich veränder (data products, APIs, Webscraping) etc.
  * Unternehmen müssen ständig innovieren (Kosten-Nutzen, Experimente als effizienter Ansatz!?)
  
<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M464 256A208 208 0 1 1 48 256a208 208 0 1 1 416 0zM0 256a256 256 0 1 0 512 0A256 256 0 1 0 0 256zM294.6 135.1c-4.2-4.5-10.1-7.1-16.3-7.1C266 128 256 138 256 150.3V208H160c-17.7 0-32 14.3-32 32v32c0 17.7 14.3 32 32 32h96v57.7c0 12.3 10 22.3 22.3 22.3c6.2 0 12.1-2.6 16.3-7.1l99.9-107.1c3.5-3.8 5.5-8.7 5.5-13.8s-2-10.1-5.5-13.8L294.6 135.1z"/></svg> Damit muss sich auch die Art und Weise verändern, wie Daten genutzt (expoited) werden

* Datenprodukte und Daten-Produktmanagement sind dabei wichtige Aspekte, die wir heute genauer anschauen werden.

---
name: Teil1
class: inverse, middle, center

## Datenprodukte

.blockquote[Was ist ein Datenprodukt?]

.blockquote[Eigenschaften von Datenprodukten]

.blockquote[Typen von Datenprodukten]

???

* Data products
  * Definitionen
  * Eigenschaften (Interaktivität...)
* Data divide
  * Data Lake, Data
  * Data products vs. data-as-a-product
* Data Product Management
  * Data mesh (Knoten als Datenprodukte)

---
name: Zack-Meyer-1
class: left

.blockquote[Datenprodukte]

## Information Products vs. Data Products

.blockquote[
" We define information products broadly to include [products based on data, information and knowledge] in either electronic or printed form and sold to external markets as well as that provided by information systems departments within firms to internal 'customers'."
.tr[
<a name=cite-meyer_design_1996></a>[Meyer and Zack (1996)](#bib-meyer_design_1996)
]
]

.blockquote[
"[..] a data product is a product that facilitates an end goal through the use of data. [..] Before investing in a big effort, you need to answer one simple question: Does anyone want or need your product?"
.tr[
<a name=cite-patil_data_2012></a>[Patil (2012)](#bib-patil_data_2012), S. 1.
]
]

.blockquote["[..] any **digital** product or feature can be considered a "data product" if it **uses data** to facilitate a goal.
.tr[
[Gumara Rigol (2021).](https://towardsdatascience.com/data-as-a-product-vs-data-products-what-are-the-differences-b43ddbb0f123#:~:text=This%20means%20that%20any%20digital,on%20my%20previous%20navigation%20data.)
]
]

???

* ...beschäftigen sich angesichts der großen technologischen Fortschritte seit Mitte der 1980er Jahre erstmals mit der Frage des Designs und der Entwicklung von "Informationsprodukten".

* ...definieren "Informationsprodukte" breit, als **elektronische oder gedruckte Produkte, die auf Daten, Information und Wissen" basieren.

* ...stellen grundlegende Fragen:
  * Was kann aus der Forschung zu physischen Produkten gelernt werden? 
  * Wie werden Informationsprodukte entworfen und hergestellt?
  * Wie kann Informationstechnologie dabei unterstützen?
  * Welche "Architektur" haben Informationsprodukte und welche strategischen, organisatorischen und technischen Implikationen hat diese Architektur für die entsprechenden Unternehmen?

* vgl. auch Thomas H. Davenport and Stephan Kudyba

* Data Products letztlich ähnlich, aber in den meisten Definitionen mit zwei Abweichungen:
  1. digital
  2. Kundenorientierung. Dazu werden Daten genutzt, um quasi zu "customizen"
  
<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M464 256A208 208 0 1 1 48 256a208 208 0 1 1 416 0zM0 256a256 256 0 1 0 512 0A256 256 0 1 0 0 256zM294.6 135.1c-4.2-4.5-10.1-7.1-16.3-7.1C266 128 256 138 256 150.3V208H160c-17.7 0-32 14.3-32 32v32c0 17.7 14.3 32 32 32h96v57.7c0 12.3 10 22.3 22.3 22.3c6.2 0 12.1-2.6 16.3-7.1l99.9-107.1c3.5-3.8 5.5-8.7 5.5-13.8s-2-10.1-5.5-13.8L294.6 135.1z"/></svg> This means that **any digital product or feature** can be considered a "data product" **if it uses data to facilitate a goal**.

* Beispiel zu Gumara Rigol: **For example**, the home page of a digital newspaper can be a data product **if the news items featured in the home page I see are dynamically selected** based on my previous navigation data.

---
class: left

.blockquote[Datenprodukte]

## Was ist ein Datenprodukt?

.blockquote[
"[..] insights generated from data can be considered a data product if there are "users" willing to give back value for these insights. The user may be an external customer (e.g., a "consumer") or a user in an organization, for example, inside the company. The value given back may be in the form of a financial payment, but not necessarily [..]"
.tr[
<a name=cite-braschler_data_2019></a>[Meierhofer, Stadelmann, and Cieliebak (2019)](https://doi.org/10.1007/978-3-030-11821-1_4), S. 49.
]
]

.blockquote[
"A data product is an application or tool that uses data to help businesses improve their decisions and processes."
.tr[
[tableau.com](https://www.tableau.com/learn/whitepapers/turn-data-products-data-scientist-data-business-owner)
]
]

.blockquote[
"[..] data products are all the tools that facilitate data processing,
especially statistical analysis of data."
.tr[
<a name=cite-majchrzak_data_2023></a>[Majchrzak, Balnojan, Siwiak, Sieraczkiewicz, and Perrin (2023)](#bib-majchrzak_data_2023), S. 131.
]
]

???

* Braschler et al: *not necessarily financial*: there are other dimensions of value like
  * emotional or
  * social value (Jagdish et al. 1991), 
  * or the **collected data**, e.g., health data from wearables, search patterns, etc.).

* Majchrzak et al: 
  * Mit Bezug auf die allgemeine Bedeutung eines Produkts, als "Auswirkung einer bewussten Handlung", ist es relativ einfach zu bestimmen, was **kein Datenprodukt** ist. 
  * Daten, die als *Nebeneffekt* einer anderen Tätigkeit entstanden sind, sind *keine Datenprodukte*. 
  * Beispiel: Wenn ein Angestellter eine Excel Datei erstellt, die frei verfügbar ist, handelt es sich aus den folgenden Gründen nicht um ein Datenprodukt:   
    * Sie wurde nicht mit der Absicht erstellt, ein Datenprodukt zu erstellen, d. h. es wurden keine bewussten Anstrengungen unternommen, um sie zu einem Datenprodukt zu machen.
    *  Es entspricht nicht den klar definierten Bedürfnissen der Nutzer.

---
clas: left

.blockquote[Datenprodukte]

## Beispiel: Google Analytics

???

* Finanztools, bspw. Bloomberg Terminal
* Häufig entwickeln Unternehmen eigene **interne Datenprodukte** (for privacy, data integrity, and adaptability).
  * Businesses look to find data applications that are **built to fulfill a specific need**. 
  * **The more flexible** and customizable, **the more valuable** within an organization.

* Weitere Beispiel (Gomara Rigol):
  * A company dashboard to visualise the main KPIs of your business. This data product is of the type decision support system and the interface to access it is a visualisation.
  * A data warehouse. This data product is a mix of raw, derived data and decision support system. The interface to access it are probably SQL queries.
  * A list of recommended restaurants nearby. Since the list is curated specifically for you, this data product is an automated decision-making one. The interface to access it is an app or website.
  * A “faster route now available” notification on Google Maps is a decision support data product (as you are the one making the decision) and its interface is a web/app.
  * A self-driving car is a data product too. Since it drives automatically, it is of the type automated decision-making. Its interface is, well, the car itself.

---
class: left

.blockquote[Datenprodukte]

## Typen

.quelle[Quelle: [O'Regan (2018)](https://towardsdatascience.com/designing-data-products-b6b93edf3d23).]

???

[O'Regan](https://towardsdatascience.com/designing-data-products-b6b93edf3d23)
* **Raw data**. Starting with raw data, we are collecting and making available data as it is (perhaps we’re doing some small processing or cleansing steps). The user can then choose to use the data as appropriate, but most of the work is done on the user’s side.
* **Derived data.** In providing users with derived data, we are doing some of the processing on our side. We could, in the case of customer data, add additional attributes like assigning a customer segment to each customer, or we could add their likelihood of clicking on an ad or of buying a product from a certain category.
* **Algorithms.** Next we have algorithms, or algorithms-as-a-service. We are given some data, we run it through the algorithm — be that machine learning or otherwise — and we return information or insights. A good example would be Google Image: the user uploads a picture, and receives a set of images that are the same or similar to the one uploaded. Behind the scenes, the product extracts features, classifies the image and matches it to stored images, returning the ones that are most similar.
* **Decision support.** Here we are looking to provide information to the user to help them with decision-making but we are not taking the decision ourselves. Analytics dashboards such as Google Analytics, Flurry, or WGSN would fall into this category. We are doing most of the heavy lifting on our side; our intention is to give the user relevant information in an easy-to-digest format to allow them to take better decisions. In the case of Google Analytics, that could mean changing the editorial strategy, addressing leaks in the conversion funnel, or doubling down on a given product strategy. The important thing to remember here is as follows: while we have taken design-decisions in data collection, derivation of new data, in choosing what data to display and how to display it, the user is still tasked with interpreting the data themselves. They are in control of the decision to act (or not act) on that data.
* **Automated decision-making.** Here we outsource all of the intelligence within a given domain. Netflix product recommendations or Spotify’s Discover Weekly would be common examples. Self-driving cars or automated drones are more physical manifestations of this closed decision-loop.

We allow the algorithm to do the work and present the user with the final output (sometimes with an explanation as to why the AI chose that option, other times completely opaque).

* So far we’ve discussed functional data product types.

**Each of these data products can be presented to our users in a variety of ways** — with clear implications for their design. What are these interfaces or interactions?

* **Dashboards & visualisations.** For dashboards, and visualisations we’re **assuming some statistical literacy** or competence in dealing with numbers. In its most extreme we can do a lot of the heavy-lifting for our users and work hard to ensure that we only present the most pertinent information in an easy-to-understand format. **By choosing what information to display, we are influencing decision-making**, but it still leaves interpretation and decision-making in the hands (or minds) of the user.

* **Web elements.** For the past 5 years or so the **least technical interface** for data products that have been commonly seen by users has been web elements. More **recently, these interfaces have been broadly extended** to include voice, robotics and augmented reality, amongst others. While the design details for each of these newer interfaces are clearly distinctive, there is considerable overlap, in that they revolve around presenting the results of a decision to the user, and perhaps also communicating why or how the AI reached that decision.

* **APIs.** In the case of APIs, we assume a technical user. We should still follow good Product practices and **ensure that the API is intuitive to use, well documented**, can do what its user’s need and is desirable to work with.

---
class: left

.blockquote[Datenprodukte]

## Datenprodukte nach Art der Analyse

.quelle[Bildquelle: [Jung (2023).](https://towardsdatascience.com/from-data-warehouses-and-lakes-to-data-mesh-a-guide-to-enterprise-data-architecture-e2d93b2466b1)]

---
class: inverse, center, middle

## Exkurs: Data Product Management

.blockquote[Daten als Projekt vs. Produkt]

.blockquote[Datenmanagement]

???

---
name: Zack-Meyer-2
class: left

.blockquote[Daten als Projekt vs. Produkt]

## Daten-Produktmanagement

* Zentrale Frage von [Meyer and Zack (1996)](#bib-meyer_design_1996): Was kann für die 
Entwicklung von *Informationsprodukten* aus dem klassischen Produktmanagement gelernt werden.

* Entwicklung eines **fünfstufigen Prozesses** zur Entwicklung von *Informationsprodukten*:

1. acquisition, 
  2. refinement, 
  3. storage/retrieval, 
  4. distribution, and 
  5. presentation

???

Quelle: Thomas H. Davenport and Stephan Kudyba

* Davenport/Kudyba ergänzen:

---
class: left

.blockquote[Daten als Projekt vs. Produkt]

## Datenprodukte sind Teamarbeit

.quelle[Quelle: <a name=cite-schmitt_moving_2017></a>[Schmitt (2017)](#bib-schmitt_moving_2017).]

???

* Damit stellt sich die Frage nach der Organisation

---
name: data-pm
class: left

.blockquote[Daten als Projekt vs. Produkt]

## Datenproduktmanagement

.blockquote[
"We cannot continue to treat data as a project. Instead, we must shift our perspective and treat data as a product that is accessible, visible and usable for everyone, no matter their discipline or desire."
.tr[
<a name=cite-aziza_data_2022></a>[Aziza (2022)](#bib-aziza_data_2022)
]
]
 
.blockquote[
"The product-based approach organizes teams around specific products to create end-to-end owners of everything related to that product."
.tr[
<a name=cite-wong_moving_2022></a>[Wong (2022)](#bib-wong_moving_2022)
]
]

???

* Wong (2022):
  * Under the old model, teams are organized around projects and responsible for specific sections of the company’s initiatives. Teams are regularly rearranged to meet changing needs.
  * The differences between a project- and product-based approach are night and day. Instead of splitting product knowledge across numerous teams and piecemealing tasks together, a product-based team is responsible for every aspect of the product, from its overall design to the budget, market research, and product development. A product team might consist of designers, engineers, and product managers, among other roles. They’re able to answer questions, solve problems, and improve the product because they know the ins and outs.

* Aziza (2020): 
  * We **cannot continue to treat data as a project**. Instead, we **must shift our perspective** and **treat data as a product** that is *accessible, visible and usable for everyone*, no matter their discipline or desire.
  * To build a successful product, be specific about the following details 1) who will use it 2) how much and how it will be used 3) what jobs it will help accomplish 4) what pain points it will address, and 5) how it will generate revenue.

[Jung (2023):](https://towardsdatascience.com/from-data-warehouses-and-lakes-to-data-mesh-a-guide-to-enterprise-data-architecture-e2d93b2466b1) Enterprises are moving towards increasingly powerful business intelligence (BI) tools and codeless machine learning platforms in an effort to democratise data and analytics capabilities.

The idea is many companies today possess only pockets or silos of advanced analytics skills. The game-changer lies in empowering 10,000 non-technical colleagues across the entire organisation with the right skillset and tools, providing a boost in overall productivity that surpasses the marginal benefits polishing up a specialist 20-member data science team. (Sorry to my data science pals!)

---
class: left

.blockquote[Daten als Produkt und Datenmanagement]

## Data Pipelines und Data Divide

.quelle[Quelle: <a name=cite-dehghani_data_2022></a>[Dehghani (2022)](#bib-dehghani_data_2022), S. 20.]

---
class: left

.blockquote[Daten als Projekt vs. Produkt]

## Datenmanagement und Agilität

.blockquote[
"Everyone involved in the analytics industry is just trying to find their footing. We’re all stumbling forward together. Fail fast and fail forward."
.tr[
[Jung (2023).](https://towardsdatascience.com/from-data-warehouses-and-lakes-to-data-mesh-a-guide-to-enterprise-data-architecture-e2d93b2466b1)
]
]

.quelle[Quelle: [Dehghani (2022)](#bib-dehghani_data_2022), S. 30.]

???

---
class: left

.blockquote[Daten als Produkt]

## Eigenschaften

.quelle[Quelle: [Dehghani (2022)](#bib-dehghani_data_2022), S. 70.]

???

* Discoverable
  * in order for data as a product to be discoverable, a **search engine** is needed and users must be able to register datasets in this engine and request access to them (this will increase security, another capability explained below).
  * The **first iteration for this capability could be just a list of datasets** in your de facto internal intranet and you can iterate and build incrementally from that. Remember that processes and culture are more important than deploying the ultimate data catalogue tool too early (which can be too complex for employees to use).

* Addressable
  having addressable datasets makes your teams **more productive**.
    * On one side, Data Analysts and Data Scientists are **autonomous in finding and using the data** they need. 
    * On the other side, Data Engineers have far **less interruptions** from people asking where they can find data about X.

* Self-describing and interoperable

* As we commented in the blog post where we explained Adevinta’s data mesh journey, datasets **need to contain metadata** that make them understandable and follow the same naming conventions (which will make the datasets interoperable). We found these pieces of metadata to be super useful to our Data Analysts:
    * Data location (as seen above)
    * Data provenance and data mapping
    * Sample data
    * Execution time and freshness
    * Input preconditions
    * Example notebook or SQL queries using the data set

* Trustworthy and secure
  * Checking data quality **regularly and automatically** is a must to fulfil the trustworthy characteristic of data as a product. And owners of the datasets need to react accordingly to the results of these checks.
  * Quality checks must be **done at pipeline input and output** and it doesn’t hurt to provide contextual data quality information to consumers of the data; like for example in Tableau dashboards.

---
class: left

.blockquote[Daten als Produkt]

## Datenqualität und Vertrauenswürdigkeit

.quelle[Bildquelle: [Kumar (2018).](https://medium.com/nyc-design/gigo-garbage-in-garbage-out-concept-for-ux-research-7e3f50695b82)]

---
class: left

.blockquote[Datenmanagment und -architektur]

## Datentypen und Datenarchitekturen

* Strukturierte vs. unstrukturierte Daten

* Evolution der Datenarchitekturen ([Dehghani (2022)](#bib-dehghani_data_2022), S. 40ff.):

* Erste Generation:  Data Warehouse (Schema-on-write)

* Zweite Generation: Data Lake (Schema-on-read)

???

Quelle: [Jung (2023).](https://towardsdatascience.com/from-data-warehouses-and-lakes-to-data-mesh-a-guide-to-enterprise-data-architecture-e2d93b2466b1)

* strukturierte Daten: Tabellen, im Idealfalls rectangular
  * werden typischerweise in einem Warehouse gespeichert.
  * bzw. werden Daten in eine bestimmte Struktur gebracht und dann gespeichert (schema on write)
  
* unstrukturierte Daten: 
  * werden typischerweise in einem Data Lake gespeichert (Sammelbecken)
  * Data Lakes arbeiten typischerweise mit einem schema-on-write, d.h. die Daten werden erst in eine Struktur gebracht, wenn sie verwendet werden.
  * Beispiele unstrukturierter Daten und Dateitypen:
    * Texte (.txt, .docx)
    * PDFs
    * Bilder (.png, .svg)
    * Audio (.mp3)
    * Videos (.mp4)
    * social media posts
    * emails, web-Seiten
    * Sensordaten etc.
* semistrukturierte Daten: Json, XML
* Distributed compute & storage: Data lakes use distributed compute and distributed storage to process and store huge volumes of potentially unstructured data. This means the data is held and processed across potentially thousands of machines, known as a big data cluster. This technology took off in the 2010’s, enabled by Apache Hadoop, a collection of open-source big data software that empowered organisations to distribute huge amounts of data across many machines (HDFS distributed storage) and run SQL-like queries on tables stored across them (Hive & Spark distributed compute). Companies like Cloudera and Hortonworks later commercialised Apache software into packages that enabled easier onboarding and maintenance by organisations around the world.

First generation: proprietary enterprise data warehouse and business intelligence platforms; solutions with large price tags that have left companies with equally large amounts of technical debt [in the form of] thousands of unmaintainable ETL jobs, and tables and reports that only a small group of specialised people understand, resulting in an under-realized positive impact on the business.

Second generation: big data ecosystem with a data lake as a silver bullet; complex big data ecosystem and long running batch jobs operated by a central team of hyper-specialised data engineers have created data lake monsters that at best has enabled pockets of R&D analytics; over-promised and under-realised.

Third (and current generation) data platforms: more or less similar to the previous generation, with a modern twist towards streaming for real-time data availability with architectures, unifying the batch and stream processing for data transformation, as well as fully embracing cloud-based managed services for storage, data pipeline execution engines and machine learning platforms.

---
class: left

.blockquote[Datenmanagment und -architektur]

## "Next Generation": Data Mesh

.panelset[
.panel[.panel-name[Prinzipien]
<img src="data:image/png;base64,#../../img/PVA3/datamesh_(datamesh_archittecture).webp" width="80%" style="display: block; margin: auto;" />

.quellePanURL[Bildquelle: [https://www.datamesh-architecture.com/](https://www.datamesh-architecture.com/).]

]
.panel[.panel-name[Perspektivveränderung]
<img src="data:image/png;base64,#../../img/PVA3/Datamesh_(Dehghani_2021)_S2.PNG" width="70%" style="display: block; margin: auto;" />

.quellePanURL[Quelle: <a name=cite-dehghani_data_2021></a>[Dehghani (2021)](#bib-dehghani_data_2021), S. 2.]

]
]

???

* Gomara Rigol (2021):
  * One of the principles of the data mesh paradigm is to consider data as a product. Sometimes this principle has been abbreviated to “data products”, hence the confusion.
  * “Data product” is a generic concept (as explained above) and “data as a product” is a subset of all possible data products. More specifically, if we use Simon’s categories, “data as a product” belongs to the raw or derived data type of “data product”.

* Data Mesh principles of Data as a Product 
  * introduces an accountability for the domains to serve their analytical data as a product and 
  * delight the experience of data consumers; streamlining their experience discovering, understanding, trusting, and ultimately using quality data. 
  * Data as a product principle is designed to address the data quality and the age-old siloed data problem, and their unhappy data consumers.

* Implementing this approach introduces a **new architectural unit**, called **data product** quantum that will embed all the structural components needed to maintain and serve data as a product.

* The structural components include 
  * the code that maintains the data, 
  * additional information, metadata, to make data discoverable and usable, and 
  * a contract to access the data in a variety of access modes native to the data consumers.

---
class: left

.blockquote[Datenmanagement]

## Datenprodukte im Kontext des Data Mesh

.blockquote[
"A data product is an autonomous, read-optimized, standardized data unit containing at least one dataset (domain dataset), created for satisfying user needs."
.tr[
[Majchrzak, Balnojan, Siwiak et al. (2023)](#bib-majchrzak_data_2023), S. 131.
]
]

.blockquote[
"[..] distinction between products that use data to facilitate an end goal and products whose primary objective is to use data to facilitate an end goal."
.tr[

]
]

???

todo
[Dehghani (2022)](#bib-dehghani_data_2022)

Why be so pedantic — well, my argument is that Data Products, whether they be an entire customer-facing product or a partial back-end product, have different characteristics than other technology products.

While many of the standard Product Development Rules apply — solve a customer need, learn from feedback, prioritise relentlessly, etc. — there are subtleties that can make thinking about data products somewhat different.

The definition above is used to discern whether we should be thinking about a product as we typically would or whether we need to consider aspects of product development that are more tailored to world of data.

---
class: inverse, center, middle

## Key Take-Aways

---
class: left

.blockquote[Key Take-Aways]

## Zentrale Aspekte der Diskussion

* Daten sind nur Assets, wenn sie entsprechend aufbereitet und zum Produkt entwickelt werden.

* Datenprodukte vs. Daten als Produkt

* Datenprodukte im Kontext von Data Mesh

* Bedeutung der Datenarchitektur für Agilität

* Generelle Bedeutung der Datenqualität (GiGo)

---
name: EndHanks
class: center

background-size: 75%
background-image: url(data:image/png;base64,#https://media.giphy.com/media/KJ1f5iTl4Oo7u/giphy.gif)

---
class: left

## Quellenverzeichnis

.ref-slide[
<a name=bib-aziza_data_2022></a>[Aziza, B.](#cite-aziza_data_2022)
(2022). _The Data Product ``Value Chain''_. (Visited on Jan. 13, 2024).

<a name=bib-dehghani_data_2021></a>[Dehghani,
Z.](#cite-dehghani_data_2021) (2021). _Data Mesh Principles and Logical
Architecture_.
https://martinfowler.com/articles/data-mesh-principles.html. (Visited
on Jan. 07, 2024).

<a name=bib-dehghani_data_2022></a>[Dehghani,
Z.](#cite-dehghani_data_2022) (2022). _Data Mesh: Delivering
Data-Driven Value at Scale_. First edition. Beijing ; Boston: O'Reilly.
ISBN: 978-1-4920-9239-1.

<a name=bib-majchrzak_data_2023></a>[Majchrzak, J., S. Balnojan, M.
Siwiak, et al.](#cite-majchrzak_data_2023) (2023). _Data Mesh in
Action_. Shelter Island, NY: Manning. ISBN: 978-1-63343-997-9.

<a name=bib-braschler_data_2019></a>[Meierhofer, J., T. Stadelmann, and
M. Cieliebak](#cite-braschler_data_2019) (2019). "Data Products". In:
_Applied Data Science_. Ed. by M. Braschler, T. Stadelmann and K.
Stockinger. Cham: Springer International Publishing, pp. 47-61. ISBN:
978-3-030-11820-4 978-3-030-11821-1. DOI:
[10.1007/978-3-030-11821-1_4](https://doi.org/10.1007%2F978-3-030-11821-1_4).
(Visited on Jan. 06, 2024).

<a name=bib-meyer_design_1996></a>[Meyer, M. H. and M. H.
Zack](#cite-meyer_design_1996) (1996). "The Design and Development of
Information Products". In: _Sloan Management Review_ 37, pp. 43-59.

<a name=bib-mott_data-driven_2023></a>[Mott,
A.](#cite-mott_data-driven_2023) (2023). _Data-Driven Innovation:
Decision-making, Modern Data Stack | Definition_. (Visited on Jan. 10,
2024).

<a name=bib-oregan_designing_2018></a>[O'Regan,
S.](#cite-oregan_designing_2018) (2018). _Designing Data Products_.
https://towardsdatascience.com/designing-data-products-b6b93edf3d23.
(Visited on Jan. 13, 2024).

<a name=bib-patil_data_2012></a>[Patil, \. D.](#cite-patil_data_2012)
(2012). _Data Jujitsu: The Art of Turning Data Into Product_.
Sebastopol, CA: O'Reilly Media, Inc.. ISBN: 978-1-4493-4115-2.

<a name=bib-schmitt_moving_2017></a>[Schmitt,
J.](#cite-schmitt_moving_2017) (2017). _Moving from Project to Product:
Keys to Developing Successful Data Science Products_. (Visited on Jan.
13, 2024).

<a name=bib-schrage_let_2014></a>[Schrage, M.](#cite-schrage_let_2014)
(2014). "Let Data Ask Questions, Not Just Answer Them". In: _Harvard
Business Review_. ISSN: 0017-8012. (Visited on Jan. 10, 2024).

<a name=bib-wong_moving_2022></a>[Wong, M.](#cite-wong_moving_2022)
(2022). _Moving from Project to Product: A Five-Stage Journey_.
https://blog.planview.com/moving-from-project-to-product-a-five-stage-journey/.
(Visited on Jan. 13, 2024).
]