Data Engineering, Data Pipeline and Events

Airbyte: The ultimate workhorse for all your ELT pipelines

Data Science Dojo

JANUARY 27, 2023

Data Science Dojo is offering Airbyte for FREE on Azure Marketplace packaged with a pre-configured web environment enabling you to quickly start the ELT process rather than spending time setting up the environment. Free to use. Conclusion  There are a ton of small services that aren’t supported on traditional data pipeline platforms.

Azure

Azure Data Science Data Pipeline Data Engineering

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

These experiences facilitate professionals from ingesting data from different sources into a unified environment and pipelining the ingestion, transformation, and processing of data to developing predictive models and analyzing the data by visualization in interactive BI reports.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

What Does a Data Engineering Job Involve in 2024?

ODSC - Open Data Science

JANUARY 30, 2024

Data engineering is a hot topic in the AI industry right now. And as data’s complexity and volume grow, its importance across industries will only become more noticeable. But what exactly do data engineers do? So let’s do a quick overview of the job of data engineer, and maybe you might find a new interest.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Announcing the 2024 Data Engineering & Ai X Innovation Summits

ODSC - Open Data Science

JANUARY 2, 2024

We couldn’t be more excited to announce two events that will be co-located with ODSC East in Boston this April: The Data Engineering Summit and the Ai X Innovation Summit. These two co-located events represent an opportunity to dive even deeper into the topics and trends shaping these disciplines. Register for free today!

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

Data engineering has become an integral part of the modern tech landscape, driving advancements and efficiencies across industries. So let’s explore the world of open-source tools for data engineers, shedding light on how these resources are shaping the future of data handling, processing, and visualization.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

Data engineering is a rapidly growing field, and there is a high demand for skilled data engineers. If you are a data scientist, you may be wondering if you can transition into data engineering. In this blog post, we will discuss how you can become a data engineer if you are a data scientist.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Automate and streamline our ML inference pipeline with SageMaker and Airflow Building an inference data pipeline on large datasets is a challenge many companies face. Check Tweets Batch Inference Job Status: Create an SQS listener that reads a message from the queue when the event rule publishes it.

Data Pipeline

Data Pipeline ML ML AWS

Memphis: A game changer in the world of traditional messaging systems

Data Science Dojo

MARCH 9, 2023

It’s great for creating modern queue-based apps with large amounts of streamed data and modern protocols, and it reduces costs and dev time for data engineers. Memphis has a simple UI, CLI, and SDKs, and offers features like automatic message retransmitting, storage tiering, and data-level observability.

Apache Kafka

Apache Kafka Azure Data Science Data Pipeline

Announcing the First Speakers for the 2024 Data Engineering Summit

ODSC - Open Data Science

FEBRUARY 15, 2024

We couldn’t be more excited to announce the first sessions for our second annual Data Engineering Summit , co-located with ODSC East this April. Join us for 2 days of talks and panels from leading experts and data engineering pioneers. Is Gen AI A Data Engineering or Software Engineering Problem?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

IBM Data Science in Practice

MARCH 8, 2023

Additionally, imagine being a practitioner, such as a data scientist, data engineer, or machine learning engineer, who will have the daunting task of learning how to use a multitude of different tools. A feature platform should automatically process the data pipelines to calculate that feature.

Machine Learning

Machine Learning Machine Learning ML ML

6 Remote AI Jobs to Look for in 2024

ODSC - Open Data Science

DECEMBER 19, 2023

Data Engineer Data engineers are responsible for the end-to-end process of collecting, storing, and processing data. They use their knowledge of data warehousing, data lakes, and big data technologies to build and maintain data pipelines. Interested in attending an ODSC event?

Data Scientist

Data Scientist Machine Learning Machine Learning AI

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

If the question was Whats the schedule for AWS events in December?, AWS usually announces the dates for their upcoming # re:Invent event around 6-9 months in advance. Rajesh Nedunuri is a Senior Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data Engineering : Building and maintaining data pipelines, ETL (Extract, Transform, Load) processes, and data warehousing. Career Support Some bootcamps include job placement services like resume assistance, mock interviews, networking events, and partnerships with employers to aid in job placement.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Apache Kafka and Apache Flink: An open-source match made in heaven

IBM Journey to AI blog

NOVEMBER 3, 2023

Apache Kafka and Apache Flink working together Anyone who is familiar with the stream processing ecosystem is familiar with Apache Kafka: the de-facto enterprise standard for open-source event streaming. Apache Kafka streams get data to where it needs to go, but these capabilities are not maximized when Apache Kafka is deployed in isolation.

Apache Kafka

Apache Kafka Data Warehouse Data Pipeline Big Data

ODSC East 2025: A Sneak Peek at the Schedule

ODSC - Open Data Science

FEBRUARY 5, 2025

This May, were heading to Boston for ODSC East 2025, where data scientists, AI engineers, and industry leaders will gather to explore the latest advancements in AI, machine learning, and data engineering. Lets dive into the schedule and key events that will shape this years conference.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Consequently, AIOps is designed to harness data and insight generation capabilities to help organizations manage increasingly complex IT stacks. Data characteristics and preprocessing AIOps tools handle a range of data sources and types, including system logs, performance metrics, network data and application events.

Big Data

Big Data Big Data ML ML

Advancing AI Cloud with Release 7.2

DataRobot

SEPTEMBER 14, 2021

Data scientists and data engineers want full control over every aspect of their machine learning solutions and want coding interfaces so that they can use their favorite libraries and languages. At the same time, business and data analysts want to access intuitive, point-and-click tools that use automated best practices.

AI

AI AI Data Scientist Machine Learning

Training Models on Streaming Data [Practical Guide]

The MLOps Blog

FEBRUARY 5, 2023

In the later part of this article, we will discuss its importance and how we can use machine learning for streaming data analysis with the help of a hands-on example. What is streaming data? This will also help us observe the importance of stream data. It can be used to collect, store, and process streaming data in real-time.

Machine Learning

Machine Learning Machine Learning Data Pipeline Apache Kafka

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. For example, neptune.ai

Machine Learning

Machine Learning Machine Learning ML ML

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Elementl / Dagster Labs Elementl and Dagster Labs are both companies that provide platforms for building and managing data pipelines. Elementl’s platform is designed for data engineers, while Dagster Labs’ platform is designed for data scientists. Interested in attending an ODSC event?

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Collaboration across teams – Shared features allow disparate teams like fraud, marketing, and sales to collaborate on building ML models using the same reliable data instead of creating siloed features. Audit trail for compliance – Administrators can monitor feature usage by all accounts centrally using CloudTrail event logs.

AWS

AWS ML ML Machine Learning

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

By analyzing datasets, data scientists can better understand their potential use in an algorithm or machine learning model. The data science lifecycle Data science is iterative, meaning data scientists form hypotheses and experiment to see if a desired outcome can be achieved using available data.

Data Science

Data Science Analytics Analytics Data Scientist

Unlocking Tabular Data’s Hidden Potential

ODSC - Open Data Science

MAY 10, 2023

Many mistakenly equate tabular data with business intelligence rather than AI, leading to a dismissive attitude toward its sophistication. Standard data science practices could also be contributing to this issue. Making data engineering more systematic through principles and tools will be key to making AI algorithms work.

Data Scientist

Data Scientist Data Science Deep Learning Deep Learning

Data Analytics in the Age of AI, When to Use RAG, Examples of Data Visualization with D3 and Vega…

ODSC - Open Data Science

APRIL 4, 2024

Find out how to weave data reliability and quality checks into the execution of your data pipelines and more. More Speakers and Sessions Announced for the 2024 Data Engineering Summit Ranging from experimentation platforms to enhanced ETL models and more, here are some more sessions coming to the 2024 Data Engineering Summit.

Data Visualization

Data Visualization Analytics Analytics Big Data Analytics

Upcoming Snowflake Features

phData

JULY 1, 2024

In this blog, we will highlight some of the most important upcoming features and updates for those who could not attend the events, specifically around AI and developer tools. It allows data engineers familiar with Python and Pandas to run their Pandas code in a scalable and distributed manner. schemas["my_schema"].tables.create(my_table)

Python

Python Database Data Pipeline SQL

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

Alation

MAY 31, 2023

If you’re not familiar with DGIQ, it’s the world’s most comprehensive event dedicated to, you guessed it, data governance and information quality. This year’s DGIQ West will host tutorials, workshops, seminars, general conference sessions, and case studies for global data leaders.

Data Governance

Data Governance DataOps Data Pipeline Business Intelligence

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

In this post, we discuss how to bring data stored in Amazon DocumentDB into SageMaker Canvas and use that data to build ML models for predictive analytics. Without creating and maintaining data pipelines, you will be able to power ML models with your unstructured data stored in Amazon DocumentDB.

Machine Learning

Machine Learning Machine Learning AWS ML

What Industries are Hiring for Different Jobs in AI

ODSC - Open Data Science

APRIL 26, 2023

Tools such as the mentioned are critical for anyone interested in becoming a machine learning engineer. Data Engineer Data engineers are the authors of the infrastructure that stores, processes, and manages the large volumes of data an organization has. Well then, you’re in luck.

Data Analyst

Data Analyst Machine Learning Machine Learning Power BI

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

Curated foundation models, such as those created by IBM or Microsoft, help enterprises scale and accelerate the use and impact of the most advanced AI capabilities using trusted data. In addition to natural language, models are trained on various modalities, such as code, time-series, tabular, geospatial and IT events data.

AI

AI AI Data Warehouse ML

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Methods that allow our customer data models to be as dynamic and flexible as the customers they represent. In this guide, we will explore concepts like transitional modeling for customer profiles, the power of event logs for customer behavior, persistent staging for raw customer data, real-time customer data capture, and much more.

Data Models

Data Models Data Modeling Apache Kafka Data Lakes

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

phData

OCTOBER 25, 2024

It’s common to have terabytes of data in most data warehouses, data quality monitoring is often challenging and cost-intensive due to dependencies on multiple tools and eventually ignored. This results in poor credibility and data consistency after some time, leading businesses to mistrust the data pipelines and processes.

Data Quality

Data Quality Data Pipeline Data Governance Database

Top 5 Use Cases of phData’s Advisor Tool

phData

MARCH 29, 2024

Founded in 2014 by three leading cloud engineers, phData focuses on solving real-world data engineering, operations, and advanced analytics problems with the best cloud platforms and products. Over the years, one of our primary focuses became Snowflake and migrating customers to this leading cloud data platform.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Top 5 Fivetran Connectors For Financial Services

phData

JANUARY 24, 2024

Understanding Fivetran Fivetran is a user-friendly, code-free platform enabling customers to easily synchronize their data by automating extraction, transformation, and loading from many sources. Fivetran automates the time-consuming steps of the ELT process so your data engineers can focus on more impactful projects.

Data Warehouse

Data Warehouse Data Pipeline Data Governance Cloud Data

AI Mastery 2025: Skills to Stay Ahead in the Next Wave

ODSC - Open Data Science

JANUARY 28, 2025

While traditional roles like data scientists and machine learning engineers remain essential, new positions like large language model (LLM) engineers and prompt engineers have gained traction. LLM Engineers: With job postings far exceeding the current talent pool, this role has become one of the hottest inAI.

AI

AI AI Machine Learning Machine Learning

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

So, in those projects, you have more than 70% of the engineering development resources that are tied to data engineering activities. That is a mix of data engineering, feature engineering work, a mix of data transformation work writ large. It is at the level of data quality and joining tasks.

Data Quality

Data Quality ML ML AI

What Is Data Observability and Why You Need It?

Precisely

DECEMBER 12, 2023

Systems and data sources are more interconnected than ever before. A broken data pipeline might bring operational systems to a halt, or it could cause executive dashboards to fail, reporting inaccurate KPIs to top management. The application of this concept to data is relatively new. Complexity leads to risk.

Data Observability

Data Observability Data Quality Data Pipeline Machine Learning

Data Quality Framework: What It Is, Components, and Implementation

DagsHub

AUGUST 23, 2024

Data Quality Dimensions Data quality dimensions are the criteria that are used to evaluate and measure the quality of data. These include the following: Accuracy indicates how correctly data reflects the real-world entities or events it represents. Datafold is a tool focused on data observability and quality.

Data Quality

Data Quality Data Governance Machine Learning Machine Learning

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

In August 2019, Data Works was acquired and Dave worked to ensure a successful transition. David: My technical background is in ETL, data extraction, data engineering and data analytics. David, what can you tell us about your background?

ETL

ETL Data Scientist Data Science Machine Learning

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

So, in those projects, you have more than 70% of the engineering development resources that are tied to data engineering activities. That is a mix of data engineering, feature engineering work, a mix of data transformation work writ large. It is at the level of data quality and joining tasks.

Data Quality

Data Quality ML ML AI

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

So, in those projects, you have more than 70% of the engineering development resources that are tied to data engineering activities. That is a mix of data engineering, feature engineering work, a mix of data transformation work writ large. It is at the level of data quality and joining tasks.

Data Quality

Data Quality ML ML AI

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

ODSC - Open Data Science

JANUARY 7, 2025

Scala is worth knowing if youre looking to branch into data engineering and working with big data more as its helpful for scaling applications. Knowing all three frameworks covers the most ground for aspiring data science professionals, so you cover plenty of ground knowing thisgroup.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

Deployment of Data and ML Pipelines for the Most Chaotic Industry: The Stirred Rivers of Crypto

The MLOps Blog

DECEMBER 7, 2022

Knowing what needs to be done and in what order (the whole process and management side of data) is often overlooked , and we know sometimes keeping everyone up to date can be a bit tedious in its own way, but if you can orchestrate pipelines with dozens of steps in your sleep, you surely can take a moment to write what you’re up to, right?

ML

ML ML AWS ETL

Airbyte: The ultimate workhorse for all your ELT pipelines

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Webinars

Trending Sources

Discover the Most Important Fundamentals of Data Engineering

Webinars

What Does a Data Engineering Job Involve in 2024?

Announcing the 2024 Data Engineering & Ai X Innovation Summits

11 Open-Source Data Engineering Tools Every Pro Should Use

How to Shift from Data Science to Data Engineering

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Memphis: A game changer in the world of traditional messaging systems

Announcing the First Speakers for the 2024 Data Engineering Summit

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

6 Remote AI Jobs to Look for in 2024

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

A Guide to Choose the Best Data Science Bootcamp

Apache Kafka and Apache Flink: An open-source match made in heaven

ODSC East 2025: A Sneak Peek at the Schedule

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Advancing AI Cloud with Release 7.2

Training Models on Streaming Data [Practical Guide]

MLOps Landscape in 2023: Top Tools and Platforms

Find Your AI Solutions at the ODSC West AI Expo

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Data science vs data analytics: Unpacking the differences

Unlocking Tabular Data’s Hidden Potential

Data Analytics in the Age of AI, When to Use RAG, Examples of Data Visualization with D3 and Vega…

Upcoming Snowflake Features

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

What Industries are Hiring for Different Jobs in AI

How to use foundation models and trusted governance to manage AI workflow risk

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

Top 5 Use Cases of phData’s Advisor Tool

How to Manage Unstructured Data in AI and Machine Learning Projects

Top 5 Fivetran Connectors For Financial Services

AI Mastery 2025: Skills to Stay Ahead in the Next Wave

McKinsey QuantumBlack on automating data quality remediation with AI

What Is Data Observability and Why You Need It?

Data Quality Framework: What It Is, Components, and Implementation

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

McKinsey QuantumBlack on automating data quality remediation with AI

McKinsey QuantumBlack on automating data quality remediation with AI

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

Deployment of Data and ML Pipelines for the Most Chaotic Industry: The Stirred Rivers of Crypto

Stay Connected