Dan’s Data Notes — Data Sharing

Daniel Fernandez
3 min readJun 19, 2020

--

Why share data?

  • Data is seen as an asset: While some businesses are protective of their data as a strategic advantage, multiple industries have realized that further enhancing said data can be more valuable. One of the important things to do is cataloging and measuring data quality for future use.
  • Combining data increases its value: Like any other raw material, creating derivative data product increase in value the further down they go into the value chain. For example, a list of contacts for professionals in an industry can be readily available online through public information sources but a curated list of executives contact details for a particular industry can be sold for thousands of dollars at a time.

How can data be shared?

  • Direct Share: The complexity and sophistication vary significantly. For highly valuable data assets how it’s distributed is not so relevant. Direct share can refer to the more “traditional” types of share such as a file server via File Transfer Protocol or similar.
  • API based Access: An API is a derivative data product. It facilitates the use of the data by standardizing how it’s accessed making the development of applications using said data significantly easier. In a previous post, I mentioned the benefits of APIs and how these on their own can become products.
  • Real-time data sharing: Depending on the type of consuming application, real-time data can add a layer of complexity to the retrieval but proves powerful when used to provide more responsive applications and user experiences.

Why is it possible now?

  • Cloud computing: The overall distribution of data can now be simplified by the availability of computing resources in most geographical areas. This allows for establishing lower latency distributions points for data regardless of the format.
  • Cheap storage: The increasing usage of cloud infrastructure and the availability of storage appliances has greatly reduced costs. This has in turn allowed organizations to store more data at a lower cost. For this reason, sharing data has become much cheaper.
  • Easy of sharing tool: Platforms that allow for the storage and retrieval of data in many formats are more readily available. Whether is processing applications to create derivative data products, indexing technology to quickly search data assets or core libraries to facilitate the actual process of sharing, technologies have made it easier for almost any organization to distribute data at scale.

What are some security implications?

  • Access control: When architecting a system that stores and redistributes data one of the more critical functionality that has to be developed is proper access controls. Restricting access to the data is not only a legal requirement but can also possible revenue implications for the data distributor.
  • De-identifying Data: Similar to the above, derived data products may require an additional level of anonymization of the data such that the original data cannot be derived back. Some geographical areas are more strict than others in these respects but generally earning the trust of clients and data providers is key.
  • Data traceability: Assessing data quality is a fundamental aspect of data sharing services and products. In order to constantly verify the quality of data it’s important to track and monitor how the data is transformed at all stages and tracing it back to its original state.

What does it mean for business?

  • Improve customer satisfaction: Whether it’s for reporting or building additional products themselves, having easy and predictable access to data is valued by customers across the board.
  • Simplify Internal Application Development: Analytics application development is highly underserved when it comes to the availability of data, specifically good quality data for the training of more advanced analytics models. Having data readily available can accelerate the development of such applications.
  • Monetize Data: Data as a service is becoming the primary revenue driver of some organizations with others only building their business models around data products. Whether it’s the sharing of clean data or exposing aggregated or otherwise derived data a strategic priority must be placed on data products going forward.

Data Privacy

Adequate handling of data and assurances to maintain the privacy of personal data stored, analyzed or otherwise processed is key for organizations that rely on data products. Only by maintaining high privacy and security standards will consumers and businesses trust their data to aggregators and processors.

--

--

Daniel Fernandez
Daniel Fernandez

Written by Daniel Fernandez

Product Manager in Infosec. Cybersecurity Graduate Student. https://linktr.ee/dnlfdz

No responses yet