This is Part 1 of a 2 Part series where we’ll be exploring Data Sharing. We’ll look at why we want to share data and what the common problems are. We’ll also look at some of the other options out there to see how they compare.
Last week at Azure Ignite, Microsoft quietly announced that Azure Data Share went GA (Generally Available). This was possibly missed by all the other exciting news (Synapse anyone?) but this is huge.
There’s a recurring request I hear from customers in the data space and that is to be able to share their data easily, securely, quickly and cheaply with their customers and/or partners. It’s obvious why. Data mixed with other data can exponentially increase the value and usefulness of the insights it yields. Imagine a simple example of mixing Facebook social data with Linked In career related data. The owner of that information could easily monetize (and exploit!) the ability of understand where an individual is in their personal AND career journey.
There’s a number of reasons why this has traditionally been very hard with Big Data and it’s to do with the limitations of space and time. If I have a ton of data and want to share it with you:
- Do I take a snapshot at a point in time and give it to you?
- How long would the snapshot take to copy to your Azure tenancy?
- Who pays for the egress on my side and the ingress on your side?
- How would I govern (or even know) what you did with my data?
- Do I even still own that data now that you have it?
- Am I lending it to you or giving it to you?
- What are terms and conditions around what you can do with that data
- How do you get a refresh in a months time when I’ve added a whole lot more data?
And so on…
We all know that Data is the new oil (or the new renewable energy source is perhaps a more up to date term) and therefore it needs to be guarded closely, just as an organisation would guard it’s code, artifacts or IP.
I like to use the example of cloning the Facebook – if you cloned the app and functionality and gave everyone a chance to use your cloned version of the app and UI, it wouldn’t be successful because the value is in the data: the connections, your friends lists, past posts etc. The UI and application are just the way of wrapping up the data in a nice way and displaying it to the consumer. The data is the real product on Facebook (yes, you are the product – how much are you charging them to use your brand?). You get the point.
Right now there are not a lot of options out there to meet these requirements – I believe that in 2 years time this will be so in-built to the Big Data cloud services it will be hard to imagine there was a time when we could not do it without major cost and effort.
Let’s summarise the requirements we need in a data sharing service:
- Easily share data across organizational boundaries with 3rd parties
- Easy way to govern sharing with centralized management
- Visibility into what data is shared and with who
- Share both structured and unstructured data (text and multi-media)
- In-place sharing to access to data at source:
- PaaS – As little infrastructure/code as possible
- Support once-off snapshot and scheduled-based refreshes
- The ability to monetise your data in a market-place
This is Part 1 of a 2 Part series where we’ll be exploring Data Sharing. In Part 2 we deep dive in to the Azure Data Sharing service and look at the benefits and limitations of it and summarize how it compares to other sharing solutions on the market.
You can subscribe to be notified when the next post is available to read….
- 10 reasons to use Azure SQL in your next analytics project - November 3, 2020
- A Developer’s Guide to Building AI Application - September 4, 2020
- Things You Wish You Had Known Earlier About Databricks Performance - August 31, 2020