Wednesday, November 23, 2016

The Power BI Integration Pipeline



A Power BI application is rather a solution of integration. It is a pipeline that starts from data of one or more OLTP sources, to a central storage for reporting and analysis, then to the PBI service, and finally to users with corresponding visualizations.

OLTP and OLAP

We should not use OLTP data directly for PBI. To minimize performance impact on OLTP, We need an OLAP-like (may not be cubes) database, or staging database, where data are centralized, cleansed, also transformed, ready for PBI.

SSIS

Between OLTP data sources and the staging database we need a converter + synchronizer, an ETL component. And to automate this ETL, we can use SQL Agent or other custom job that run on a defined schedule.

DirectQuery or Embeded

There are two options on how PBI uses the data. For a large dataset or frequently changing dataset, DirectQuery is preferred. In such a case, the staging database is ready to service PBI Service. However, through DirectQuery PBI can only access database in Azure Cloud. If the database is on premise, we need to set up an Enterprise Gateway.

Another option is that we push data into PBI service as embedded data. PBI has REST interface for this.

Distribution

The visualization artifacts generated by PBI are to be distributed to users. The direct way is to have the users have accounts with PBI, then PBI dashboards can be “shared” to them. A more natural way is to share through a native organization application, for example, the organization’s website. PBI allow “publish to the web”, and it provides a IFRAME html to be embedded in any web page. But this has no security. A more practical way is to publish with an access key, so only authorized users can access.

Examples




No comments: