Azure, Data Pipeline and SQL - Data Science Current

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. They transform data into a consistent format for users to consume.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Most Frequently Asked Azure Data Factory Interview Questions

Analytics Vidhya

FEBRUARY 20, 2023

Introduction Azure data factory (ADF) is a cloud-based data ingestion and ETL (Extract, Transform, Load) tool. The data-driven workflow in ADF orchestrates and automates data movement and data transformation.

Azure

Azure ETL Analytics Analytics

Airbyte: The ultimate workhorse for all your ELT pipelines

Data Science Dojo

JANUARY 27, 2023

Data Science Dojo is offering Airbyte for FREE on Azure Marketplace packaged with a pre-configured web environment enabling you to quickly start the ELT process rather than spending time setting up the environment. If you can’t import all your data, you may only have a partial picture of your business.

Azure

Azure Data Science Data Pipeline Data Engineering

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

With this full-fledged solution, you don’t have to spend all your time and effort combining different services or duplicating data. OneLake, being built on Azure Data Lake Storage (ADLS), supports various data formats, including Delta, Parquet, CSV, and JSON.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Understanding ETL Tools as a Data-Centric Organization

Smart Data Collective

SEPTEMBER 8, 2021

A lot of Open-Source ETL tools house a graphical interface for executing and designing Data Pipelines. It can be used to manipulate, store, and analyze data of any structure. It generates Java code for the Data Pipelines instead of running Pipeline configurations through an ETL Engine. Conclusion.

ETL

ETL Hadoop Data Warehouse Data Pipeline

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Summary: This blog provides a comprehensive roadmap for aspiring Azure Data Scientists, outlining the essential skills, certifications, and steps to build a successful career in Data Science using Microsoft Azure. What is Azure?

Azure

Azure Data Scientist Data Science Machine Learning

How to Load Google Analytics 4 Dataset into Snowflake with BigQuery & Azure Data Factory

phData

SEPTEMBER 5, 2023

In this step-by-step guide, we will walk you through setting up a data ingestion pipeline using Azure Data Factory (ADF), Google BigQuery, and the Snowflake Data Cloud. By the end of this tutorial, you’ll have a seamless pipeline that fetches and syncs your GA4 raw events data to Snowflake efficiently.

Azure

Azure Analytics Analytics Data Pipeline

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

ODSC - Open Data Science

FEBRUARY 17, 2023

Cloud Computing, APIs, and Data Engineering NLP experts don’t go straight into conducting sentiment analysis on their personal laptops. Data Engineering Platforms Spark is still the leader for data pipelines but other platforms are gaining ground. Knowing some SQL is also essential.

Deep Learning

Deep Learning Deep Learning Data Science Natural Language Processing

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

One big issue that contributes to this resistance is that although Snowflake is a great cloud data warehousing platform, Microsoft has a data warehousing tool of its own called Synapse. In a perfect world, Microsoft would have clients push even more storage and compute to its Azure Synapse platform.

Power BI

Power BI Analytics Analytics Azure

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python. Databases and SQL : Managing and querying relational databases using SQL, as well as working with NoSQL databases like MongoDB.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Top 5 Data Warehouses to Supercharge Your Big Data Strategy

Women in Big Data

NOVEMBER 27, 2024

Its PostgreSQL foundation ensures compatibility with most SQL clients. While it shares similarities with PostgreSQL, there are key differences that must be considered during application development. Strengths : High performance with SQL support, easy integration with other AWS services, and strong security features.

Data Warehouse

Data Warehouse Big Data Big Data Azure

2021 Data/AI Salary Survey

O'Reilly Media

SEPTEMBER 15, 2021

Cloud certifications, specifically in AWS and Microsoft Azure, were most strongly associated with salary increases. As we’ll see later, cloud certifications (specifically in AWS and Microsoft Azure) were the most popular and appeared to have the largest effect on salaries. Many respondents acquired certifications. What about Kafka?

AI

AI AI Azure AWS

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Computer Science and Computer Engineering Similar to knowing statistics and math, a data scientist should know the fundamentals of computer science as well. While knowing Python, R, and SQL are expected, you’ll need to go beyond that. Big Data As datasets become larger and more complex, knowing how to work with them will be key.

Data Science

Data Science Data Scientist Computer Science Computer Science

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

SQL Server – The SQL Server connector, another widely-used database-type connector, provides similar functionality but is tailored for Microsoft’s SQL Server. The phData team achieved a major milestone by successfully setting up a secure end-to-end data pipeline for a substantial healthcare enterprise.

SQL

SQL Data Warehouse Azure Cloud Data

Big Data vs. Data Science: Demystifying the Buzzwords

Pickl AI

APRIL 21, 2025

Key Skills for Data Science: A data scientist typically needs a blend of skills: Mathematics and Statistics: To understand the theoretical underpinnings of models. Programming: Often in languages like Python or R, using libraries for data manipulation, analysis, and machine learning.

Big Data

Big Data Big Data Data Science Machine Learning

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. They are crucial in ensuring data is readily available for analysis and reporting.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage. Data Visualization: Matplotlib, Seaborn, Tableau, etc.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

The software you might use OAuth with includes: Tableau Power BI Sigma Computing If so, you will need an OAuth provider like Okta, Microsoft Azure AD, Ping Identity PingFederate, or a Custom OAuth 2.0 When to use SCIM vs phData's Provision Tool SCIM manages users and groups with Azure Active Directory or Okta. authorization server.

Clustering

Clustering Database SQL Data Pipeline

ODSC West 2023 Recap in Pictures

ODSC - Open Data Science

DECEMBER 5, 2023

We had bigger sessions on getting started with machine learning or SQL, up to advanced topics in NLP, and of course, plenty related to large language models and generative AI. You can see our photos from the event here , and be sure to follow our YouTube for virtual highlights from the conference as well.

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

How to Setup a Project in Snowpark Using a Python IDE

phData

JULY 2, 2024

Snowpark, offered by the Snowflake AI Data Cloud , consists of libraries and runtimes that enable secure deployment and processing of non-SQL code, such as Python, Java, and Scala. Developers can seamlessly build data pipelines, ML models, and data applications with User-Defined Functions and Stored Procedures.

Python

Python SQL Data Pipeline ML

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Microsoft Azure ML Platform The Azure Machine Learning platform provides a collaborative workspace that supports various programming languages and frameworks. It could help you detect and prevent data pipeline failures, data drift, and anomalies. Flyte Flyte is a platform for orchestrating ML pipelines at scale.

Machine Learning

Machine Learning Machine Learning ML ML

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

IBM Infosphere DataStage IBM Infosphere DataStage is an enterprise-level ETL tool that enables users to design, develop, and run data pipelines. Key Features: Graphical Framework: Allows users to design data pipelines with ease using a graphical user interface. Read More: Advanced SQL Tips and Tricks for Data Analysts.

ETL

ETL Data Quality Data Pipeline Data Warehouse

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

It does not support the ‘dvc repro’ command to reproduce its data pipeline. DVC Released in 2017, Data Version Control ( DVC for short) is an open-source tool created by iterative. Dolt Created in 2019, Dolt is an open-source tool for managing SQL databases that uses version control similar to Git.

Machine Learning

Machine Learning Machine Learning Data Lakes Database

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently.

ETL

ETL Data Warehouse SQL Database

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

This individual is responsible for building and maintaining the infrastructure that stores and processes data; the kinds of data can be diverse, but most commonly it will be structured and unstructured data. They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

If using a network policy with Snowflake, be sure to add Fivetran’s IP address list , which will ensure Azure Data Factory (ADF) Azure Data Factory is a fully managed, serverless data integration service built by Microsoft. Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.).

Data Warehouse

Data Warehouse Azure AWS Database

When To Use Internal vs. External Stages in Snowflake

phData

AUGUST 4, 2023

The shared-nothing architecture ensures that users don’t have to worry about distributing data across multiple cluster nodes. Snowflake hides user data objects and makes them accessible only through SQL queries through the compute layer. This includes tasks such as data cleansing, enrichment, and aggregation.

Database

Database Azure SQL AWS

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

Integration : Can it connect with existing systems like AWS, Azure, or Google Cloud? It supports complex data transformations and offers advanced features like data quality management and metadata management. PowerCenter is particularly favored by large organizations with extensive data integration needs.

ETL

ETL Data Warehouse AWS Business Intelligence

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

Generative AI can be used to automate the data modeling process by generating entity-relationship diagrams or other types of data models and assist in UI design process by generating wireframes or high-fidelity mockups. There is a VSCode Extension that enables its integration into traditional development pipelines.

AI

AI AI Data Analysis Data Analysis

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

Dolt LakeFS Delta Lake Pachyderm Git-like versioning Database tool Data lake Data pipelines Experiment tracking Integration with cloud platforms Integrations with ML tools Examples of data version control tools in ML DVC Data Version Control DVC is a version control system for data and machine learning teams.

ML

ML ML Data Lakes Machine Learning

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

The platform enables quick, flexible, and convenient options for storing, processing, and analyzing data. The solution was built on top of Amazon Web Services and is now available on Google Cloud and Microsoft Azure. Data warehousing is a vital constituent of any business intelligence operation. What does Snowflake do?

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

How to Build an End-to-End Energy Price Forecasting Solution with Snowflake

phData

JANUARY 31, 2024

With Snowflake’s acquisition of Streamlit in 2022, Streamlit applications are now able to be hosted within your Snowflake environment, eliminating the need for extensive knowledge of Docker, Kubernetes, cloud platforms like AWS, GCP, or Azure, authentication and authorization patterns, etc.,

Machine Learning

Machine Learning Machine Learning Python Data Scientist

Apache Kafka use cases: Driving innovation across diverse industries

IBM Journey to AI blog

SEPTEMBER 4, 2024

Kafka helps simplify the communication between customers and businesses, using its data pipeline to accurately record events and keep records of orders and cancellations—alerting all relevant parties in real-time. Developing with Kafka has many advantages over other platforms, here are a few of its most popular benefits.

Apache Kafka

Apache Kafka Internet of Things Data Pipeline Clustering

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Automation Automation plays a pivotal role in streamlining ETL processes, reducing the need for manual intervention, and ensuring consistent data availability. By automating key tasks, organisations can enhance efficiency and accuracy, ultimately improving the quality of their data pipelines.

ETL

ETL Data Warehouse Data Quality Data Governance

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Here’s the structured equivalent of this same data in tabular form: With structured data, you can use query languages like SQL to extract and interpret information. In contrast, such traditional query languages struggle to interpret unstructured data. This text has a lot of information, but it is not structured.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

Microsoft Azure ML Provided by Microsoft , Azure Machine Learning (ML) is a cloud-based machine learning platform that enables data scientists and developers to build, train, and deploy machine learning models at scale. compute instances, storage) used to run Airflow and store workflow data.

Machine Learning

Machine Learning Machine Learning ML ML

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

ODSC - Open Data Science

JANUARY 7, 2025

Computer Science and Computer Engineering Similar to knowing statistics and math, a data scientist should know the fundamentals of computer science as well. While knowing Python, R, and SQL is expected, youll need to go beyond that. Employers arent just looking for people who can program.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

This two-part series will explore how data discovery, fragmented data governance , ongoing data drift, and the need for ML explainability can all be overcome with a data catalog for accurate data and metadata record keeping. The Cloud Data Migration Challenge. Data pipeline orchestration.

Data Governance

Data Governance ML ML Cloud Data

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Support for Numerous Data Sources: Fivetran supports over 200 data sources, including popular databases, applications, and cloud platforms like Salesforce, Google Analytics, SQL Server, Snowflake, and many more. Additionally, unsupported data sources can be integrated using Fivetran’s cloud function connectors.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

I have worked with customers where R and SQL were the first-class languages of their data science community. You don’t need a bigger boat : The repository curated by Jacopo Tagliabue shows how several (mostly open-source) tools can be effectively combined together to run data pipelines at scale with very small teams.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

However, if the tool supposes an option where we can write our custom programming code to implement features that cannot be achieved using the drag-and-drop components, it broadens the horizon of what we can do with our data pipelines. JV_STAGING_TBL} Here is what the outline of the pipeline looks like.

Python

Python ETL AWS Database

What Orchestration Tools Help Data Engineers in Snowflake

phData

AUGUST 17, 2023

Data pipeline orchestration tools are designed to automate and manage the execution of data pipelines. These tools help streamline and schedule data movement and processing tasks, ensuring efficient and reliable data flow. What are Orchestration Tools?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Most Frequently Asked Azure Data Factory Interview Questions

Webinars

Trending Sources

Airbyte: The ultimate workhorse for all your ELT pipelines

Webinars

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Understanding ETL Tools as a Data-Centric Organization

How to Build Effective Data Pipelines in Snowpark

Your Complete Roadmap to Become an Azure Data Scientist

How to Load Google Analytics 4 Dataset into Snowflake with BigQuery & Azure Data Factory

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

How to Optimize Power BI and Snowflake for Advanced Analytics

A Guide to Choose the Best Data Science Bootcamp

Top 5 Data Warehouses to Supercharge Your Big Data Strategy

2021 Data/AI Salary Survey

40 Must-Know Data Science Skills and Frameworks for 2023

Top 5 Fivetran Connectors for Healthcare

Big Data vs. Data Science: Demystifying the Buzzwords

Discover the Most Important Fundamentals of Data Engineering

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Getting Started With Snowflake: Best Practices For Launching

ODSC West 2023 Recap in Pictures

How to Setup a Project in Snowpark Using a Python IDE

MLOps Landscape in 2023: Top Tools and Platforms

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Best 8 Data Version Control Tools for Machine Learning 2024

Best Practices When Developing Matillion Jobs

How to Shift from Data Science to Data Engineering

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

When To Use Internal vs. External Stages in Snowflake

List of ETL Tools: Explore the Top ETL Tools for 2025

Generative AI in Software Development

How to Version Control Data in ML for Various Data Sources

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

How to Build an End-to-End Energy Price Forecasting Solution with Snowflake

Apache Kafka use cases: Driving innovation across diverse industries

Maximising Efficiency with ETL Data: Future Trends and Best Practices

How to Manage Unstructured Data in AI and Machine Learning Projects

How to Choose MLOps Tools: In-Depth Guide for 2024

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

The Cloud Connection: How Governance Supports Security

The Ultimate Modern Data Stack Migration Guide

Definite Guide to Building a Machine Learning Platform

Top 10 Python Scripts for use in Matillion for Snowflake

What Orchestration Tools Help Data Engineers in Snowflake

Stay Connected