qubole.com qubole.com

Cloud Data Lakes – Best Practices

This is an abridged version of the article that appears on NewStack BI tools have been the go-to for data analysts who help business track top line, bottom line and customer experience metrics. BI tools analyze small sets of relational data (a few terabytes) in a data warehouse, which require small data scans (a few gigabytes) to execute. But, businesses are now looking beyond BI to interactive, streaming and clickstream analytics,...

github

Blog post on ETL pipelines with Airflow

mdh266/AirflowETL Blog post on ETL pipelines with Airflow Users starred: 15Users forked: 4Users watching: 15Updated at: 2020-02-10 21:35:32 An Example ETL Pipeline With Airflow In this blog post I want to go over the operations of data engineering called Extract, Transform, Load (ETL) and show how they can be automated and scheduled using Apache Airflow. You can see the source code for this project here. Extracting data...