Data Pipeline, Data Preparation and Data Scientist

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

Let’s explore each of these components and its application in the sales domain: Synapse Data Engineering: Synapse Data Engineering provides a powerful Spark platform designed for large-scale data transformations through Lakehouse. Here, we changed the data types of columns and dealt with missing values.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Summary: This blog provides a comprehensive roadmap for aspiring Azure Data Scientists, outlining the essential skills, certifications, and steps to build a successful career in Data Science using Microsoft Azure. This roadmap aims to guide aspiring Azure Data Scientists through the essential steps to build a successful career.

Azure

Azure Data Scientist Data Science Machine Learning

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Step-by-step guide: Generative AI for your business

IBM Journey to AI blog

JULY 30, 2024

Data Scientists and AI experts: Historically we have seen Data Scientists build and choose traditional ML models for their use cases. Data Scientists will typically help with training, validating, and maintaining foundation models that are optimized for data tasks. IBM watsonx.ai

AI

AI AI Data Scientist Data Preparation

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In an increasingly digital and rapidly changing world, BMW Group’s business and product development strategies rely heavily on data-driven decision-making. With that, the need for data scientists and machine learning (ML) engineers has grown significantly. A data scientist team orders a new JuMa workspace in BMW’s Catalog.

ML

ML ML AWS AI

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

See also Thoughtworks’s guide to Evaluating MLOps Platforms End-to-end MLOps platforms End-to-end MLOps platforms provide a unified ecosystem that streamlines the entire ML workflow, from data preparation and model development to deployment and monitoring. Check out the Kubeflow documentation.

Machine Learning

Machine Learning Machine Learning ML ML

2024 Mexican Grand Prix: Formula 1 Prediction Challenge Results

Ocean Protocol

NOVEMBER 28, 2024

Introduction The Formula 1 Prediction Challenge: 2024 Mexican Grand Prix brought together data scientists to tackle one of the most dynamic aspects of racing — pit stop strategies. Yunus focused on building a robust data pipeline, merging historical and current-season data to create a comprehensive dataset.

Cross Validation

Cross Validation Decision Trees Data Scientist Data Science

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 21, 2024

In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks.

AWS

AWS Data Preparation ML ML

Unlocking Tabular Data’s Hidden Potential

ODSC - Open Data Science

MAY 10, 2023

Many mistakenly equate tabular data with business intelligence rather than AI, leading to a dismissive attitude toward its sophistication. Standard data science practices could also be contributing to this issue. Making data engineering more systematic through principles and tools will be key to making AI algorithms work.

Data Scientist

Data Scientist Data Science Deep Learning Deep Learning

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

It helps companies streamline and automate the end-to-end ML lifecycle, which includes data collection, model creation (built on data sources from the software development lifecycle), model deployment, model orchestration, health monitoring and data governance processes.

Big Data

Big Data Big Data ML ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Continuous ML model retraining is one method to overcome this challenge by relearning from the most recent data. This requires not only well-designed features and ML architecture, but also data preparation and ML pipelines that can automate the retraining process.

AWS

AWS ML ML ETL

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. They are crucial in ensuring data is readily available for analysis and reporting.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

The primary goal of Data Engineering is to transform raw data into a structured and usable format that can be easily accessed, analyzed, and interpreted by Data Scientists, analysts, and other stakeholders. Future of Data Engineering The Data Engineering market will expand from $18.2

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, ML, and application development. Above all, this solution offers you a native Spark way to implement an end-to-end data pipeline from Amazon Redshift to SageMaker.

ML

ML ML AWS Data Warehouse

MLOps and the evolution of data science

IBM Journey to AI blog

AUGUST 11, 2023

Because the machine learning lifecycle has many complex components that reach across multiple teams, it requires close-knit collaboration to ensure that hand-offs occur efficiently, from data preparation and model training to model deployment and monitoring. How to use ML to automate the refining process into a cyclical ML process.

Data Science

Data Science Machine Learning Machine Learning ML

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

ODSC - Open Data Science

OCTOBER 7, 2024

Understanding the MLOps Lifecycle The MLOps lifecycle consists of several critical stages, each with its unique challenges: Data Ingestion: Collecting data from various sources and ensuring it’s available for analysis. Data Preparation: Cleaning and transforming raw data to make it usable for machine learning.

Machine Learning

Machine Learning Machine Learning AI AI

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Data scientists have to address challenges like data partitioning, load balancing, fault tolerance, and scalability. Amazon SageMaker Pipelines allows orchestrating the end-to-end ML lifecycle from data preparation and training to model deployment as automated workflows.

Machine Learning

Machine Learning Machine Learning ML ML

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. This requires building a data pipeline (using tools such as Amazon SageMaker Data Wrangler ) to move data into Amazon S3.

ML

ML ML AWS Python

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

With sports (and everything else) cancelled, this data scientist decided to take on COVID-19 | A Winner’s Interview with David Mezzetti When his hobbies went on hiatus, Kaggler David Mezzetti made fighting COVID-19 his mission. David: My technical background is in ETL, data extraction, data engineering and data analytics.

ETL

ETL Data Scientist Machine Learning Machine Learning

3 Takeaways from Gartner’s 2018 Data and Analytics Summit

DataRobot Blog

APRIL 1, 2018

These modern tools will auto-profile the data, detect joins and overlaps, and offer recommendations. With AI infused throughout, the industry is moving towards a place where data analytics is far less biased, and where citizen data scientists will have greater power and agility to accomplish more in less time. Free Trial.

Analytics

Analytics Analytics Data Preparation Augmented Analytics

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities.

AI

AI AI Machine Learning Machine Learning

Using ChatGPT for Data Science

Pickl AI

FEBRUARY 8, 2023

Data Scientists and Data Analysts have been using ChatGPT for Data Science to generate codes and answers rapidly. Data Manipulation The process through which you can change the data according to your project requirement for further data analysis is known as Data Manipulation.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

Iguazio

JULY 31, 2023

Implementing MLOps solves the following challenges: Siloed Teams - Before MLOps, data scientists, data engineers and DevOps used to work in silos and with different tools and frameworks. By taking this step, organizations ensure they have high quality data that is available for model training, feature engineering, and analysis.

ML

ML ML Machine Learning Machine Learning

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

A traditional machine learning (ML) pipeline is a collection of various stages that include data collection, data preparation, model training and evaluation, hyperparameter tuning (if needed), model deployment and scaling, monitoring, security and compliance, and CI/CD.

Machine Learning

Machine Learning Machine Learning ML ML

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

What’s really important in the before part is having production-grade machine learning data pipelines that can feed your model training and inference processes. And that’s really key for taking data science experiments into production. Let’s go and talk about machine learning pipelining.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

What’s really important in the before part is having production-grade machine learning data pipelines that can feed your model training and inference processes. And that’s really key for taking data science experiments into production. Let’s go and talk about machine learning pipelining.

SQL

SQL ML ML Python

LLMOps vs. MLOps: Understanding the Differences

Iguazio

FEBRUARY 8, 2024

Data engineers, data scientists and other data professional leaders have been racing to implement gen AI into their engineering efforts. Continuous monitoring of resources, data, and metrics. Data Pipeline - Manages and processes various data sources. LLMOps is MLOps for LLMs. What is MLOps?

ML

ML ML Data Scientist AI

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

Snowpark Use Cases Data Science Streamlining data preparation and pre-processing: Snowpark’s Python, Java, and Scala libraries allow data scientists to use familiar tools for wrangling and cleaning data directly within Snowflake, eliminating the need for separate ETL pipelines and reducing context switching.

Python

Python ML ML SQL

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

The platform typically includes components for the ML ecosystem like data management, feature stores, experiment trackers, a model registry, a testing environment, model serving, and model management. It checks the data for quality issues and detects outliers and anomalies. Pipelines can be scheduled to carry out CI, CD, or CT.

ML

ML ML Machine Learning Machine Learning

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

With all this packaged into a well-governed platform, Snowflake continues to set the standard for data warehousing and beyond. Snowflake supports data sharing and collaboration across organizations without the need for complex data pipelines. One of the standout features of Dataiku is its focus on collaboration.

Machine Learning

Machine Learning Machine Learning Data Science ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly. Saurabh Gupta is a Principal Engineer at Zeta Global.

AWS

AWS Machine Learning Machine Learning ML

Introducing the DataRobot AI Cloud: A Closer Look

DataRobot

SEPTEMBER 14, 2021

We’re building a platform for all users: data scientists, analytics experts, business users, and IT. DataRobot now delivers both visual and code-centric data preparation and data pipelines, along with automated machine learning that is composable, and can be driven by hosted notebooks or a graphical user experience.

AI

AI AI Data Pipeline Data Preparation

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

AWS Machine Learning Blog

DECEMBER 18, 2024

This strategic decision was driven by several factors: Efficient data preparation Building a high-quality pre-training dataset is a complex task, involving assembling and preprocessing text data from various sources, including web sources and partner companies. The team opted for fine-tuning on AWS.

Clustering

Clustering AWS AI AI

Data science

Dataconomy

MARCH 19, 2025

Key disciplines involved in data science Understanding the core disciplines within data science provides a comprehensive perspective on the field’s multifaceted nature. Overview of core disciplines Data science encompasses several key disciplines including data engineering, data preparation, and predictive analytics.

Data Science

Data Science Citizen Data Scientist Data Scientist Machine Learning

Gen AI Trends and Scaling Strategies for 2025

Iguazio

MARCH 20, 2025

AI engineering - AI is being democratized for developers and engineers, expanding beyond the limited pool of data scientists. Quality, Scalability and Continuous Delivery Implementing modularity with LLM, data, and API abstractions to ensure flexibility Implementing tests for models, prompts, application logic, etc.

AI

AI AI Data Pipeline Data Scientist

Data Science Current

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Trending Sources

Your Complete Roadmap to Become an Azure Data Scientist

Webinars

Step-by-step guide: Generative AI for your business

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

MLOps Landscape in 2023: Top Tools and Platforms

2024 Mexican Grand Prix: Formula 1 Prediction Challenge Results

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

Unlocking Tabular Data’s Hidden Potential

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Discover the Most Important Fundamentals of Data Engineering

10 Best Data Engineering Books [Beginners to Advanced]

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

MLOps and the evolution of data science

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Use Snowflake as a data source to train ML models with Amazon SageMaker

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

3 Takeaways from Gartner’s 2018 Data and Analytics Summit

Exploring the AI and data capabilities of watsonx

Using ChatGPT for Data Science

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

How to Choose MLOps Tools: In-Depth Guide for 2024

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

LLMOps vs. MLOps: Understanding the Differences

How Does Snowpark Work?

How to Build an End-To-End ML Pipeline

How Dataiku and Snowflake Strengthen the Modern Data Stack

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Introducing the DataRobot AI Cloud: A Closer Look

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

Data science

Gen AI Trends and Scaling Strategies for 2025

Stay Connected