Sharepoint Integration: How to Share and Ingest Data Automatically into a Data Platform

campaign creators e6n7uoEnYbA unsplash • Data and AI Analytics

Are you still sharing files with your peers on teams and maintaining them in your local system?

If yes, then perhaps you could use SharePoint to remove the hassle of organising files and managing documents efficiently.

SharePoint is a Microsoft Office web platform that allows users to share files and documents in a collaborative fashion. It is backed by Active Directory to securely allow sharing of information across multiple teams and departments.

For example, all the documents of a project such as design documents, project status reports, requirement documents and more can stay at one place in SharePoint which can be shared with different stakeholders as required.

While SharePoint is primarily used for document management, it is also capable of hosting internal websites, blogs, running polls and surveys and tracking project progress by integrating with another MS office application called Planner.

SharePoint being a Microsoft product integrates well with the Microsoft cloud platform – Azure. With SharePoint acting as a document repository, complex custom workflows can be made to suit business requirements.

Today we are going to cover the use of SharePoint to share and ingest data automatically into a Data Platform.

Architecture

There are multiple ways to integrate Sharepoint into a workflow. One of the most convenient ways is to use Logic Apps on Azure. Let’s consider a scenario where a file uploaded/modified on Sharepoint library should act as the trigger for a workflow in Azure.

What is Logic App?

Logic App is an Azure cloud-based serverless platform to design and orchestrate workflows. It abstracts some API operations in the form of connectors which when triggered perform a series of defined actions. The connectors can act as both actions and triggers. The Logic App Designer visualises the actions in a hierarchy to represent sequence and branching.

Logic App here can have a Sharepoint trigger pointed at a library. It can periodically check the library for new files created or existing files modified. The logic app can then do further validations using the metadata of the file such as:

  1. On file size so that empty files can be ignored
  2. On file name so that only a specific file name(s) can trigger the workflow and the rest can be ignored. A business could use the library folder for multiple purposes. Let’s say there’s a folder for each city in the library with different files such as weather, population, and air quality index and our workflow is only concerned with the weather. So using the filename from metadata, we can add steps to the logic app that would only proceed if filename is weather.

Afterwards, Logic App can call a resource like Azure Data Factory or Synapse for further orchestration which would load data into data lake and delta lake using Databricks. A Power BI dashboard can be created to generate reports on the data.

In the aforementioned architecture, Logic App would require a user account to connect to the SharePoint library. This account needs to be provided the right permissions in the SharePoint library. It is recommended to create a generic service account for this purpose and not tie authentication to a particular user’s account. Otherwise, should the user account be disabled for any reason the access to the SharePoint would be lost and workflow(s) would no longer function.

sharepoint architecture

The benefits of using Logic Apps on Azure:

  1. Low code solution. Logic App offers an easy to use designing interface deploying a drag-and-drop user interface. The code is generated in the background which is abstracted from developer.
  2. Very Cost-effective. Being serverless, Logic App has minimal cost as it is not running all the time. It only runs when the workflow is triggered.
  3. In-built Connectors. Azure provides connectors for hundreds of applications such as OneDrive, Dropbox, Outlook, Twitter and more.
  4. Ability to make a custom connector. Should a connector to a service not be present, Logic app allows user to make custom connector.
  5. In-built versioning. A history of different saved versions is automatically maintained by Logic App for user’s future reference. They are read-only and can be made active version in use by promoting them.

image3 1 • Data and AI Analytics

SharePoint Access and User Permissions

SharePoint can be accessed through a user account of that organisation (using your AD for security purposes) or a Service Principal (Client ID and Client Secret) that has access to the Sharepoint library.

The service principal can be used to generate an OAuth token to authenticate REST API operations. The service principal can be given to other users outside your organisation and can have a specific access restrictions and expiry date that can be managed and set by your organisation.

Implementation Use-Case

It is very common for companies to use excel files to analyse and share the data, and it is as common as having multiple files with the same of similar information across departments or having issues with versioning control of given files.

Departments receive and consolidate multiple excel files from different people, departments, or SaaS (Software as a Service) product exports in order to answer business questions and drive business decisions.

The consolidation of said files consists of an enormous number of human hours and if lucky cleverly designed macros to produce an output document that can then be reviewed and analysed, which would mean the process is at least not a human copying and pasting from one document to another.

This process is labour intensive, prone to human error, and very inflexible as one minor change in the input files can break the entire process, plus the inability to expand it for future use-cases or reporting metrics the business user might need.

Data-Driven has a Data Analytics Foundation Accelerator with SharePoint ingestion plug-in to address this problem.

Example architecture for this use case:

image2 • Data and AI Analytics

Data-Driven offers a 4-week engagement to automate your Excel or CSV files consolidation using Microsoft Azure cloud technologies, the files can be shared with you internally or come from external sources such as business partners or suppliers.

We focus on automating all the repetitive tasks while consolidating such files. The tasks can start from a simple copy-paste to a more business complex rule/logic.

As part of the PoC, Data-Driven will design, implement, and deliver an end-to-end workflow from the ingestion of the files, the rules engine with business logic and manual process to be automated to the output file either in a database, exportable CSV file, or a Power BI report to visualise the data.

The outcome is a flexible rules engine, where you can add or modify rules if new business cases arise or if the input data changes and a consolidated source of truth ready for analytics through a Power BI report. Providing the flexibility of easily extending the reporting capabilities with new dashboards.

Yash Tamakuwala

What do you think?

Subscribed! We'll let you know when we have new blogs and events...