Big Data project on Azure
- Data Dictionary
- Data Ingestion and Curation
Challenges
Build the enterprise data Lake in Azure Cloud.
Limited data sets are being captured from 4 source systems in a SQL Datawarehouse onsite. Need to expand data capture to 26 additional sources. Daily load of 14 GB.
Inconsistent Metadata across systems, no enterprise data model
Getting business metadata against each data point is a big challenge. No association of technical metadata and
business metadata
Solutions
A Comprehensive Data Ingestion Framework to ingest data from multiple sources
Metadata collection for all these sources. Metadata tools directly access different sources to collect metadata and stored into repository.
Control-M used for job scheduling and Jenkin used for CI/CD
Implemented Data Governance and Security Control
Automation and Reuse for data ingestion/ metadata collection