Data Preparation, ETL and Machine Learning

Data Preparation

ETL

Machine Learning

Machine Learning Data Prep Tips for Time Series Models

DataRobot Blog

JANUARY 27, 2019

In my previous articles Predictive Model Data Prep: An Art and Science and Data Prep Essentials for Automated Machine Learning, I shared foundational data preparation tips to help you successfully. by Jen Underwood. Read More.

Machine Learning

Machine Learning Machine Learning Data Preparation Predictive Analytics

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

How to Learn Machine Learning

APRIL 26, 2025

This includes duplicate removal, missing value treatment, variable transformation, and normalization of data. Tools like Python (with pandas and NumPy), R, and ETL platforms like Apache NiFi or Talend are used for data preparation before analysis. Their insights must be in line with real-world goals.

Data Science

Data Science Data Analyst Data Scientist Machine Learning

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Summary: This article explores the significance of ETL Data in Data Management. It highlights key components of the ETL process, best practices for efficiency, and future trends like AI integration and real-time processing, ensuring organisations can leverage their data effectively for strategic decision-making.

ETL

ETL Data Warehouse Data Quality Data Governance

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Zeta’s AI innovation is powered by a proprietary machine learning operations (MLOps) system, developed in-house. Context In early 2023, Zeta’s machine learning (ML) teams shifted from traditional vertical teams to a more dynamic horizontal structure, introducing the concept of pods comprising diverse skill sets.

AWS

AWS Machine Learning Machine Learning ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Statistical methods and machine learning (ML) methods are actively developed and adopted to maximize the LTV. In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker.

AWS

AWS ML ML ETL

Recapping the Cloud Amplifier and Snowflake Demo

Towards AI

JANUARY 28, 2024

To start, get to know some key terms from the demo: Snowflake: The centralized source of truth for our initial data Magic ETL: Domo’s tool for combining and preparing data tables ERP: A supplemental data source from Salesforce Geographic: A supplemental data source (i.e.,

ETL

ETL Python Database Data Preparation

Image Retrieval with IBM watsonx.data

IBM Data Science in Practice

APRIL 9, 2024

Instead, we use pre-trained deep learning models like VGG or ResNet to extract feature vectors from the images. Image retrieval search architecture The architecture follows a typical machine learning workflow for image retrieval. Data Preparation Here we use a subset of the ImageNet dataset (100 classes).

Deep Learning

Deep Learning Deep Learning Database Data Preparation

Turn the face of your business from chaos to clarity

Dataconomy

JULY 28, 2023

In the digital age, the abundance of textual information available on the internet, particularly on platforms like Twitter, blogs, and e-commerce websites, has led to an exponential growth in unstructured data. Text data is often unstructured, making it challenging to directly apply machine learning algorithms for sentiment analysis.

Power BI

Power BI Data Preparation Exploratory Data Analysis Machine Learning

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

The rules in this engine were predefined and written in SQL, which aside from posing a challenge to manage, also struggled to cope with the proliferation of data from TR’s various integrated data source. TR customer data is changing at a faster rate than the business rules can evolve to reflect changing customer needs.

AWS

AWS Data Warehouse ML ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. We are happy to announce that SageMaker Data Wrangler now supports using Lake Formation with Amazon EMR to provide this fine-grained data access restriction.

AWS

AWS Data Lakes Clustering Data Preparation

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Db2 Warehouse fully supports open formats such as Parquet, Avro, ORC and Iceberg table format to share data and extract new insights across teams without duplication or additional extract, transform, load (ETL). This allows you to scale all analytics and AI workloads across the enterprise with trusted data. 

AWS

AWS Database ETL AI

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. LLMs excel at writing code and reasoning over text, but tend to not perform as well when interacting directly with time-series data.

AWS

AWS AI AI Python

What is Alteryx certification: A comprehensive guide

Pickl AI

FEBRUARY 4, 2024

Alteryx’s Capabilities Data Blending: Effortlessly combine data from multiple sources. Predictive Analytics: Leverage machine learning algorithms for accurate predictions. This makes Alteryx an indispensable tool for businesses aiming to glean insights and steer their decisions based on robust data.

Data Preparation

Data Preparation Tableau Data Visualization Analytics

Leveraging KNIME and Tableau: Connecting to Tableau with KNIME

phData

JUNE 26, 2023

While both these tools are powerful on their own, their combined strength offers a comprehensive solution for data analytics. In this blog post, we will show you how to leverage KNIME’s Tableau Integration Extension and discuss the benefits of using KNIME for data preparation before visualization in Tableau.

Tableau

Tableau Data Preparation Machine Learning Machine Learning

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

AWS Machine Learning Blog

AUGUST 4, 2023

Benefits of the SageMaker and Data Cloud Einstein Studio integration Here’s how using SageMaker with Einstein Studio in Salesforce Data Cloud can help businesses: It provides the ability to connect custom and generative AI models to Einstein Studio for various use cases, such as lead conversion, case classification, and sentiment analysis.

AWS

AWS ML ML AI

How and When to Use Dataflows in Power BI

phData

SEPTEMBER 28, 2023

Dataflows represent a cloud-based technology designed for data preparation and transformation purposes. Dataflows have different connectors to retrieve data, including databases, Excel files, APIs, and other similar sources, along with data manipulations that are performed using Online Power Query Editor.

Power BI

Power BI Data Preparation Machine Learning Machine Learning

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

However, building advanced data-driven applications poses several challenges. First, it can be time consuming for users to learn multiple services development experiences. Third, configuring and governing access to appropriate users for data, code, development artifacts, and compute resources across services is a manual process.

SQL

SQL AWS Data Lakes AI

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

ML operationalization summary As defined in the post MLOps foundation roadmap for enterprises with Amazon SageMaker , ML and operations (MLOps) is the combination of people, processes, and technology to productionize machine learning (ML) solutions efficiently. For them, the end-to-end MLOps lifecycle and infrastructure is necessary.

AI AI ML ML

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. Automated development: Automates data preparation, model development, feature engineering and hyperparameter optimization using AutoAI.

AI AI Machine Learning Machine Learning

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Summary: Data transformation tools streamline data processing by automating the conversion of raw data into usable formats. These tools enhance efficiency, improve data quality, and support Advanced Analytics like Machine Learning. Aggregation : Combining multiple data points into a single summary (e.g.,

Data Quality

Data Quality AWS Machine Learning Machine Learning

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. Amazon SageMaker notebook jobs allow data scientists to run their notebooks on demand or on a schedule with a few clicks in SageMaker Studio.

ML ML Data Scientist Python

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 3: Processing and Data Wrangler jobs

AWS Machine Learning Blog

MAY 30, 2023

Since its introduction, we’ve helped hundreds of customers optimize their workloads, set guardrails, and improve the visibility of their machine learning (ML) workloads’ cost and usage. In this series of posts, we share lessons learned about optimizing costs in Amazon SageMaker.

ML ML AWS Machine Learning

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them.

SQL

SQL AWS Database Data Scientist

Unlock Productivity: How to Use AI in Excel for Smart Solutions

Pickl AI

SEPTEMBER 10, 2024

Ideas (Analyse Data) Excel’s Ideas feature, or Analyse Data, brings powerful AI-driven insights directly into your spreadsheets. By selecting a data range and clicking on Ideas, Excel scans your data and automatically generates summaries, trends, and visualisations.

Power BI

Power BI Data Analysis Data Analysis AI

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

On the client side, Snowpark consists of libraries, including the DataFrame API and native Snowpark machine learning (ML) APIs for model development (public preview) and deployment (private preview). Machine Learning Training machine learning (ML) models can sometimes be resource-intensive.

Python

Python ML ML SQL

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Data Engineering emphasises the infrastructure and tools necessary for data collection, storage, and processing, while Data Engineers concentrate on the architecture, pipelines, and workflows that facilitate data access. Key components of data warehousing include: ETL Processes: ETL stands for Extract, Transform, Load.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Leveraging KNIME and Power BI: Integrating Power BI in KNIME

phData

OCTOBER 11, 2023

In this blog, we will focus on integrating Power BI within KNIME for enhanced data analytics. KNIME and Power BI: The Power of Integration The data analytics process invariably involves a crucial phase: data preparation. This phase demands meticulous customization to optimize data for analysis.

Power BI

Power BI Data Preparation Analytics Analytics

Top Data Analytics Trends Shaping 2025

Pickl AI

DECEMBER 10, 2024

A unified data fabric also enhances data security by enabling centralised governance and compliance management across all platforms. Automated Data Integration and ETL Tools The rise of no-code and low-code tools is transforming data integration and Extract, Transform, and Load (ETL) processes.

Analytics

Analytics Analytics Augmented Analytics Machine Learning

Building ML Platform in Retail and eCommerce

The MLOps Blog

MAY 31, 2023

Getting machine learning to solve some of the hardest problems in an organization is great. In this article, I will share my learnings of how successful ML platforms work in an eCommerce and what are the best practices a Team needs to follow during the course of building it. How to set up an ML Platform in eCommerce?

ML ML Algorithm Machine Learning

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Placing functions for plotting, data loading, data preparation, and implementations of evaluation metrics in plain Python modules keeps a Jupyter notebook focused on the exploratory analysis | Source: Author Using SQL directly in Jupyter cells There are some cases in which data is not in memory (e.g.,

SQL

SQL Database Data Scientist Python

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

Traditionally, answering this question would involve multiple data exports, complex extract, transform, and load (ETL) processes, and careful data synchronization across systems. Users can write data to managed RMS tables using Iceberg APIs, Amazon Redshift, or Zero-ETL ingestion from supported data sources.

SQL

SQL Data Analyst Data Warehouse AWS

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

With sports (and everything else) cancelled, this data scientist decided to take on COVID-19 | A Winner’s Interview with David Mezzetti When his hobbies went on hiatus, Kaggler David Mezzetti made fighting COVID-19 his mission. Photo by Clay Banks on Unsplash Let’s learn about David! David, what can you tell us about your background?

ETL

ETL Data Scientist Data Science Machine Learning

IBM watsonx Platform: Compliance obligations to controls mapping

IBM Journey to AI blog

OCTOBER 30, 2024

Through the integrated suite of tools offered by watsonx.governance™, users can expedite the implementation of responsible, transparent and explainable AI workflows tailored to both generative AI and machine learning models. Additionally, watsonx.governance extends its governance provisions to encompass generative AI assets.

Machine Learning

Machine Learning Machine Learning ETL AI

AWS SageMaker vs. Custom ML: Choosing the Right Approach in 2025

How to Learn Machine Learning

APRIL 10, 2025

Introduction Machine learning (ML) in 2025 will be continuously evolving because businesses from all industries will utilize artificial intelligence to achieve market superiority. Seamless AWS Integration Works effortlessly with AWS S3 (data storage), AWS Lambda (serverless computing), and AWS Glue (ETL).

AWS

AWS ML ML Machine Learning

How Formula 1® uses generative AI to accelerate race-day issue resolution

AWS Machine Learning Blog

FEBRUARY 18, 2025

To handle the log data efficiently, raw logs were centralized into an Amazon Simple Storage Service (Amazon S3) bucket. An Amazon EventBridge schedule checked this bucket hourly for new files and triggered log transformation extract, transform, and load (ETL) pipelines built using AWS Glue and Apache Spark.

AWS

AWS Database ETL AI

Best AI apps that actually deliver: No hype, just impact (2025)

Dataconomy

MARCH 7, 2025

IBM Watson A pioneer in AI-driven analytics, IBM Watson transforms enterprise operations with natural language processing, machine learning, and predictive modeling. From customer service chatbots to data-driven decision-making , Watson enables businesses to extract insights from large-scale datasets with precision.

AI AI Machine Learning Machine Learning

Machine Learning Data Prep Tips for Time Series Models

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

Webinars

Trending Sources

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Webinars

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Recapping the Cloud Amplifier and Snowflake Demo

Image Retrieval with IBM watsonx.data

Turn the face of your business from chaos to clarity

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Tackling AI’s data challenges with IBM databases on AWS

Improving air quality with generative AI

What is Alteryx certification: A comprehensive guide

Leveraging KNIME and Tableau: Connecting to Tableau with KNIME

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

How and When to Use Dataflows in Power BI

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Exploring the AI and data capabilities of watsonx

Popular Data Transformation Tools: Importance and Best Practices

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 3: Processing and Data Wrangler jobs

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Unlock Productivity: How to Use AI in Excel for Smart Solutions

How Does Snowpark Work?

Discover the Most Important Fundamentals of Data Engineering

Leveraging KNIME and Power BI: Integrating Power BI in KNIME

Top Data Analytics Trends Shaping 2025

Building ML Platform in Retail and eCommerce

How to Use Exploratory Notebooks [Best Practices]

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

IBM watsonx Platform: Compliance obligations to controls mapping

AWS SageMaker vs. Custom ML: Choosing the Right Approach in 2025

How Formula 1® uses generative AI to accelerate race-day issue resolution

Best AI apps that actually deliver: No hype, just impact (2025)

Stay Connected