Azure, Data Engineering and Data Pipeline

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. They transform data into a consistent format for users to consume.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Most Frequently Asked Azure Data Factory Interview Questions

Analytics Vidhya

FEBRUARY 20, 2023

Introduction Azure data factory (ADF) is a cloud-based data ingestion and ETL (Extract, Transform, Load) tool. The data-driven workflow in ADF orchestrates and automates data movement and data transformation.

Azure

Azure ETL Analytics Analytics

Airbyte: The ultimate workhorse for all your ELT pipelines

Data Science Dojo

JANUARY 27, 2023

Data Science Dojo is offering Airbyte for FREE on Azure Marketplace packaged with a pre-configured web environment enabling you to quickly start the ELT process rather than spending time setting up the environment. If you can’t import all your data, you may only have a partial picture of your business.

Azure

Azure Data Science Data Pipeline Data Engineering

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

With this full-fledged solution, you don’t have to spend all your time and effort combining different services or duplicating data. OneLake, being built on Azure Data Lake Storage (ADLS), supports various data formats, including Delta, Parquet, CSV, and JSON. On the home page, select Synapse Data Engineering.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Memphis: A game changer in the world of traditional messaging systems

Data Science Dojo

MARCH 9, 2023

Data Science Dojo is offering Memphis broker for FREE on Azure Marketplace preconfigured with Memphis, a platform that provides a P2P architecture, scalability, storage tiering, fault-tolerance, and security to provide real-time processing for modern applications suitable for large volumes of data. Try Memphis Now !

Apache Kafka

Apache Kafka Azure Data Science Data Pipeline

Revolutionize data management with Meltano CLI – The ultimate open source solution for flexible and scalable ELT

Data Science Dojo

MARCH 15, 2023

Data Science Dojo is offering Meltano CLI for FREE on Azure Marketplace preconfigured with Meltano, a platform that provides flexibility and scalability. It is designed to assist data engineers in transforming, converting, and validating data in a simplified manner while ensuring accuracy and reliability.

Azure

Azure Data Science Data Engineering Data Engineering

Boost your MLOps efficiency with these 6 must-have tools and platforms

Data Science Dojo

FEBRUARY 20, 2023

Google Cloud Platform is a great option for businesses that need high-performance computing, such as data science, AI, machine learning, and financial services. Microsoft Azure Machine Learning Microsoft Azure Machine Learning is a set of tools for creating, managing, and analyzing models.

Machine Learning

Machine Learning Machine Learning AWS Azure

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineering

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

This article explores the importance of ETL pipelines in machine learning, a hands-on example of building ETL pipelines with a popular tool, and suggests the best ways for data engineers to enhance and sustain their pipelines. What is an ETL data pipeline in ML?

ETL

ETL Data Pipeline ML ML

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

Data engineering is a rapidly growing field, and there is a high demand for skilled data engineers. If you are a data scientist, you may be wondering if you can transition into data engineering. In this blog post, we will discuss how you can become a data engineer if you are a data scientist.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Summary: This blog provides a comprehensive roadmap for aspiring Azure Data Scientists, outlining the essential skills, certifications, and steps to build a successful career in Data Science using Microsoft Azure. What is Azure?

Azure

Azure Data Scientist Data Science Machine Learning

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Unfolding the difference between data engineer, data scientist, and data analyst. Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. Read more to know.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

Data engineering has become an integral part of the modern tech landscape, driving advancements and efficiencies across industries. So let’s explore the world of open-source tools for data engineers, shedding light on how these resources are shaping the future of data handling, processing, and visualization.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

ODSC - Open Data Science

FEBRUARY 17, 2023

Cloud Computing, APIs, and Data Engineering NLP experts don’t go straight into conducting sentiment analysis on their personal laptops. Data Engineering Platforms Spark is still the leader for data pipelines but other platforms are gaining ground.

Deep Learning

Deep Learning Deep Learning Data Science Natural Language Processing

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Scale is worth knowing if you’re looking to branch into data engineering and working with big data more as it’s helpful for scaling applications. Cloud Services The only two to make multiple lists were Amazon Web Services (AWS) and Microsoft Azure.

Data Science

Data Science Data Scientist Computer Science Computer Science

How Cloud Data Platforms improve Shopfloor Management

Data Science Blog

FEBRUARY 4, 2023

If the data sources are additionally expanded to include the machines of production and logistics, much more in-depth analyses for error detection and prevention as well as for optimizing the factory in its dynamic environment become possible.

Cloud Data

Cloud Data Data Science Business Intelligence Business Intelligence

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

One big issue that contributes to this resistance is that although Snowflake is a great cloud data warehousing platform, Microsoft has a data warehousing tool of its own called Synapse. In a perfect world, Microsoft would have clients push even more storage and compute to its Azure Synapse platform.

Power BI

Power BI Analytics Analytics Azure

2021 Data/AI Salary Survey

O'Reilly Media

SEPTEMBER 15, 2021

Cloud certifications, specifically in AWS and Microsoft Azure, were most strongly associated with salary increases. As we’ll see later, cloud certifications (specifically in AWS and Microsoft Azure) were the most popular and appeared to have the largest effect on salaries. Many respondents acquired certifications. What about Kafka?

AI

AI AI Azure AWS

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data Engineering : Building and maintaining data pipelines, ETL (Extract, Transform, Load) processes, and data warehousing. Cloud Computing : Utilizing cloud services for data storage and processing, often covering platforms such as AWS, Azure, and Google Cloud.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. For example, neptune.ai

Machine Learning

Machine Learning Machine Learning ML ML

On-Prem vs. The Cloud: Key Considerations

phData

FEBRUARY 21, 2025

The Cloud represents an iteration beyond the on-prem data warehouse, where computing resources are delivered over the Internet and are managed by a third-party provider. Examples include: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

Data Warehouse

Data Warehouse Cloud Data ETL Cloud Computing

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Consequently, AIOps is designed to harness data and insight generation capabilities to help organizations manage increasingly complex IT stacks. MLOps involves a series of steps that help ensure the seamless deployability, reproducibility, scalability and observability of ML models.

Big Data

Big Data Big Data ML ML

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Integration: Airflow integrates seamlessly with other data engineering and Data Science tools like Apache Spark and Pandas. IBM Infosphere DataStage IBM Infosphere DataStage is an enterprise-level ETL tool that enables users to design, develop, and run data pipelines. Read Further: Azure Data Engineer Jobs.

ETL

ETL Data Quality Data Pipeline Data Warehouse

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

If using a network policy with Snowflake, be sure to add Fivetran’s IP address list , which will ensure Azure Data Factory (ADF) Azure Data Factory is a fully managed, serverless data integration service built by Microsoft. Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.).

Data Warehouse

Data Warehouse Azure AWS Database

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

The software you might use OAuth with includes: Tableau Power BI Sigma Computing If so, you will need an OAuth provider like Okta, Microsoft Azure AD, Ping Identity PingFederate, or a Custom OAuth 2.0 When to use SCIM vs phData's Provision Tool SCIM manages users and groups with Azure Active Directory or Okta. authorization server.

Clustering

Clustering Database SQL Data Pipeline

Getting Started With Matillion Data Productivity Cloud

phData

NOVEMBER 28, 2023

In July 2023, Matillion launched their fully SaaS platform called Data Productivity Cloud, aiming to create a future-ready, everyone-ready, and AI-ready environment that companies can easily adopt and start automating their data pipelines coding, low-coding, or even no-coding at all.

Data Warehouse

Data Warehouse Data Pipeline ETL Azure

How to Setup a Project in Snowpark Using a Python IDE

phData

JULY 2, 2024

Developers can seamlessly build data pipelines, ML models, and data applications with User-Defined Functions and Stored Procedures. Move inside sfguide-data-engineering-with-snowpark-python ( cd sfguide-data-engineering-with-snowpark-python ). What Are Snowpark’s Differentiators?

Python

Python SQL Data Pipeline ML

Nurturing a Strong Data Science Foundation for Beginners

Mlearning.ai

JULY 11, 2023

This includes important stages such as feature engineering, model development, data pipeline construction, and data deployment. For example, when it comes to deploying projects on cloud platforms, different companies may utilize different providers like AWS, GCP, or Azure.

Data Science

Data Science Exploratory Data Analysis Azure Power BI

Training Models on Streaming Data [Practical Guide]

The MLOps Blog

FEBRUARY 5, 2023

This pipeline facilitates the smooth, automated flow of information, preventing many problems that enterprises face, such as data corruption, conflict, and duplication of data entries. A streaming data pipeline is an enhanced version which is able to handle millions of events in real-time at scale. Happy Learning!

Machine Learning

Machine Learning Machine Learning Data Pipeline Apache Kafka

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

However, there are some key differences that we need to consider: Size and complexity of the data In machine learning, we are often working with much larger data. Basically, every machine learning project needs data. Given the range of tools and data types, a separate data versioning logic will be necessary.

ML

ML ML Data Lakes Machine Learning

When To Use Internal vs. External Stages in Snowflake

phData

AUGUST 4, 2023

The external stage area includes Microsoft Azure Blob storage, Amazon AWS S3, and Google Cloud Storage. Amazon S3 for AWS, Azure Blob Storage for Azure, or Google Cloud Storage for GCP) to store the actual data files in micro-partitions. The data can then be processed using Snowflake’s SQL capabilities.

Database

Database Azure SQL AWS

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Automation Automation plays a pivotal role in streamlining ETL processes, reducing the need for manual intervention, and ensuring consistent data availability. By automating key tasks, organisations can enhance efficiency and accuracy, ultimately improving the quality of their data pipelines.

ETL

ETL Data Warehouse Data Quality Data Governance

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently.

ETL

ETL Data Warehouse SQL Database

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

How to Build an End-to-End Energy Price Forecasting Solution with Snowflake

phData

JANUARY 31, 2024

With Snowflake’s acquisition of Streamlit in 2022, Streamlit applications are now able to be hosted within your Snowflake environment, eliminating the need for extensive knowledge of Docker, Kubernetes, cloud platforms like AWS, GCP, or Azure, authentication and authorization patterns, etc.,

Machine Learning

Machine Learning Machine Learning Python Data Scientist

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

The platform enables quick, flexible, and convenient options for storing, processing, and analyzing data. The solution was built on top of Amazon Web Services and is now available on Google Cloud and Microsoft Azure. Simplify and Win Experienced data engineers value simplicity. What does Snowflake do?

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

ODSC - Open Data Science

JANUARY 7, 2025

Scala is worth knowing if youre looking to branch into data engineering and working with big data more as its helpful for scaling applications. Knowing all three frameworks covers the most ground for aspiring data science professionals, so you cover plenty of ground knowing thisgroup.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

Gen AI 101: Technology Choices (Part 1)

phData

JULY 5, 2024

To provide an example, traditional structured data such as a user’s demographic information can be provided to an AI application to create a more personable experience. Our data engineering blog in this series explores the concept of data engineering and data stores for Gen AI applications in more detail.

AI

AI AI Database AWS

Managing Dataset Versions in Long-Term ML Projects

The MLOps Blog

MARCH 20, 2023

However, in scenarios where dataset versioning solutions are leveraged, there can still be various challenges experienced by ML/AI/Data teams. Data aggregation: Data sources could increase as more data points are required to train ML models. Existing data pipelines will have to be modified to accommodate new data sources.

ML

ML ML Machine Learning Machine Learning

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Both persistent staging and data lakes involve storing large amounts of raw data. But persistent staging is typically more structured and integrated into your overall customer data pipeline. It’s not just a dumping ground for data, but a crucial step in your customer data processing workflow.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

What Orchestration Tools Help Data Engineers in Snowflake

phData

AUGUST 17, 2023

In the rapidly evolving landscape of data engineering, Snowflake Data Cloud has emerged as a leading cloud-based data warehousing solution, providing powerful capabilities for storing, processing, and analyzing vast amounts of data. What are Orchestration Tools?

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

Modern low-code/no-code ETL tools allow data engineers and analysts to build pipelines seamlessly using a drag-and-drop and configure approach with minimal coding. Matillion ETL for Snowflake is an ELT/ETL tool that allows for the ingestion, transformation, and building of analytics for data in the Snowflake AI Data Cloud.

Python

Python ETL AWS Database

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Most Frequently Asked Azure Data Factory Interview Questions

Webinars

Trending Sources

Airbyte: The ultimate workhorse for all your ELT pipelines

Webinars

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Memphis: A game changer in the world of traditional messaging systems

Revolutionize data management with Meltano CLI – The ultimate open source solution for flexible and scalable ELT

Boost your MLOps efficiency with these 6 must-have tools and platforms

Discover the Most Important Fundamentals of Data Engineering

How to Build Effective Data Pipelines in Snowpark

How to Build ETL Data Pipeline in ML

How to Shift from Data Science to Data Engineering

Your Complete Roadmap to Become an Azure Data Scientist

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

11 Open-Source Data Engineering Tools Every Pro Should Use

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

40 Must-Know Data Science Skills and Frameworks for 2023

How Cloud Data Platforms improve Shopfloor Management

How to Optimize Power BI and Snowflake for Advanced Analytics

2021 Data/AI Salary Survey

A Guide to Choose the Best Data Science Bootcamp

MLOps Landscape in 2023: Top Tools and Platforms

On-Prem vs. The Cloud: Key Considerations

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Top ETL Tools: Unveiling the Best Solutions for Data Integration

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Getting Started With Snowflake: Best Practices For Launching

Getting Started With Matillion Data Productivity Cloud

How to Setup a Project in Snowpark Using a Python IDE

Nurturing a Strong Data Science Foundation for Beginners

Training Models on Streaming Data [Practical Guide]

How to Version Control Data in ML for Various Data Sources

When To Use Internal vs. External Stages in Snowflake

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Best Practices When Developing Matillion Jobs

How to Manage Unstructured Data in AI and Machine Learning Projects

How to Build an End-to-End Energy Price Forecasting Solution with Snowflake

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

Gen AI 101: Technology Choices (Part 1)

Managing Dataset Versions in Long-Term ML Projects

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

What Orchestration Tools Help Data Engineers in Snowflake

Top 10 Python Scripts for use in Matillion for Snowflake

Stay Connected