Blog, Data Preparation and ETL - Data Science Current

Blog

Data Preparation

ETL

Machine Learning Data Prep Tips for Time Series Models

DataRobot Blog

JANUARY 27, 2019

In my previous articles Predictive Model Data Prep: An Art and Science and Data Prep Essentials for Automated Machine Learning, I shared foundational data preparation tips to help you successfully. by Jen Underwood. Read More.

Machine Learning

Machine Learning Machine Learning Data Preparation Predictive Analytics

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Db2 Warehouse fully supports open formats such as Parquet, Avro, ORC and Iceberg table format to share data and extract new insights across teams without duplication or additional extract, transform, load (ETL). This allows you to scale all analytics and AI workloads across the enterprise with trusted data. 

AWS

AWS Database ETL AI

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Continuous ML model retraining is one method to overcome this challenge by relearning from the most recent data. This requires not only well-designed features and ML architecture, but also data preparation and ML pipelines that can automate the retraining process. But there is still an engineering challenge.

AWS

AWS ML ML ETL

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Navigating Data: Alation + Trifacta

Alation

FEBRUARY 20, 2020

With blogs, anyone can now write and distribute an article and with message boards anyone can post an advertisement. Business Intelligence used to require months of effort from BI and ETL teams. You used to be able to get those standards from your colleague in the BI/ETL team. Subscribe to Alation's Blog.

ETL

ETL Hadoop Tableau Data Scientist

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

With SageMaker Unified Studio notebooks, you can use Python or Spark to interactively explore and visualize data, prepare data for analytics and ML, and train ML models. With the SQL editor, you can query data lakes, databases, data warehouses, and federated data sources. Big Data Architect.

SQL

SQL AWS Data Lakes AI

Turn the face of your business from chaos to clarity

Dataconomy

JULY 28, 2023

In the digital age, the abundance of textual information available on the internet, particularly on platforms like Twitter, blogs, and e-commerce websites, has led to an exponential growth in unstructured data. These tools offer a wide range of functionalities to handle complex data preparation tasks efficiently.

Power BI

Power BI Data Preparation Exploratory Data Analysis Machine Learning

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

The solution addressed in this blog solves Afri-SET’s challenge and was ranked as the top 3 winning solutions. This post presents a solution that uses a generative artificial intelligence (AI) to standardize air quality data from low-cost sensors in Africa, specifically addressing the air quality data integration problem of low-cost sensors.

AWS

AWS AI Python AI

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

AWS Machine Learning Blog

JANUARY 6, 2023

TR used AWS Glue DataBrew and AWS Batch jobs to perform the extract, transform, and load (ETL) jobs in the ML pipelines, and SageMaker along with Amazon Personalize to tailor the recommendations. TR wanted to take advantage of AWS managed services where possible to simplify operations and reduce undifferentiated heavy lifting.

AWS

AWS Data Warehouse ML ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. We are happy to announce that SageMaker Data Wrangler now supports using Lake Formation with Amazon EMR to provide this fine-grained data access restriction.

AWS

AWS Data Lakes Clustering Data Preparation

Leveraging KNIME and Tableau: Connecting to Tableau with KNIME

phData

JUNE 26, 2023

While both these tools are powerful on their own, their combined strength offers a comprehensive solution for data analytics. In this blog post, we will show you how to leverage KNIME’s Tableau Integration Extension and discuss the benefits of using KNIME for data preparation before visualization in Tableau.

Tableau

Tableau Data Preparation Machine Learning Machine Learning

Migrating From AWS Redshift to Snowflake: 2 Methods to Explore

phData

FEBRUARY 9, 2023

Welcome to our AWS Redshift to the Snowflake Data Cloud migration blog! In this blog, we’ll walk you through the process of migrating your data from AWS Redshift to the Snowflake Data Cloud. One popular route is leveraging third-party ETL tools like Fivetran to ensure a smooth and successful migration.

AWS

AWS ETL Data Preparation SQL

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment. This blog post delves into the details of this MLOps platform, exploring how the integration of these tools facilitates a more efficient and scalable approach to managing ML projects.

AWS

AWS Machine Learning Machine Learning ML

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

AWS Machine Learning Blog

AUGUST 4, 2023

Benefits of the SageMaker and Data Cloud Einstein Studio integration Here’s how using SageMaker with Einstein Studio in Salesforce Data Cloud can help businesses: It provides the ability to connect custom and generative AI models to Einstein Studio for various use cases, such as lead conversion, case classification, and sentiment analysis.

AWS

AWS ML ML AI

How and When to Use Dataflows in Power BI

phData

SEPTEMBER 28, 2023

Dataflows allow users to establish source connections and retrieve data, and subsequent data transformations can be conducted using the online Power Query Editor. In this blog, we will provide insights into the process of creating Dataflows and offer guidance on when to choose them to address real-world use cases effectively.

Power BI

Power BI Data Preparation Machine Learning Machine Learning

Deep Thoughts on Data Flow with Alation & Trifacta

Alation

FEBRUARY 20, 2020

Data lakes, while useful in helping you to capture all of your data, are only the first step in extracting the value of that data. We recently announced an integration with Trifacta to seamlessly integrate the Alation Data Catalog with self-service data prep applications to help you solve this issue.

Data Lakes

Data Lakes ETL Data Analyst Data Preparation

How to Use Fivetran to Ingest Salesforce Data into Snowflake

phData

SEPTEMBER 25, 2024

With the importance of data in various applications, there’s a need for effective solutions to organize, manage, and transfer data between systems with minimal complexity. While numerous ETL tools are available on the market, selecting the right one can be challenging.

ETL

ETL Database Data Warehouse Analytics

The 2016 Crystal Ball – What’s Next in Data?

Alation

FEBRUARY 20, 2020

However, most are only deployed over one data store (Hadoop or other various backends). In 2016, these will increasingly be deployed to query multiple data sources. The implication will be doing away with some (if not all) of the ETL work required to gather all of the data in one data warehouse.

Data Warehouse

Data Warehouse Hadoop Data Science ETL

Leveraging KNIME and Power BI: Integrating Power BI in KNIME

phData

OCTOBER 11, 2023

Both tools serve distinct phases within the data analytics process, making their integration a highly advantageous proposition. In this blog, we will focus on integrating Power BI within KNIME for enhanced data analytics. This phase demands meticulous customization to optimize data for analysis.

Power BI

Power BI Data Preparation Analytics Analytics

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

In this blog, I will cover: What is watsonx.ai? sales conversation summaries, insurance coverage, meeting transcripts, contract information) Generate: Generate text content for a specific purpose, such as marketing campaigns, job descriptions, blogs or articles, and email drafting support. What capabilities are included in watsonx.ai?

AI AI Machine Learning Machine Learning

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

AI AI ML ML

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

For instance, a notebook that monitors for model data drift should have a pre-step that allows extract, transform, and load (ETL) and processing of new data and a post-step of model refresh and training in case a significant drift is noticed.

ML ML Data Scientist Python

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 3: Processing and Data Wrangler jobs

AWS Machine Learning Blog

MAY 30, 2023

However, preparing raw data for ML training and evaluation is often a tedious and demanding task in terms of compute resources, time, and human effort. Data preparation commonly needs to be integrated from different sources and deal with missing or noisy values, outliers, and so on.

ML ML AWS Machine Learning

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

The Snowflake Data Cloud is a leading cloud data platform that provides various features and services for data storage, processing, and analysis. A new feature that Snowflake offers is called Snowpark, which provides an intuitive library for querying and processing data at scale in Snowflake. What is Snowpark?

Python

Python ML ML SQL

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

These connections are used by AWS Glue crawlers, jobs, and development endpoints to access various types of data stores. You can use these connections for both source and target data, and even reuse the same connection across multiple crawlers or extract, transform, and load (ETL) jobs.

SQL

SQL AWS Database Data Scientist

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Placing functions for plotting, data loading, data preparation, and implementations of evaluation metrics in plain Python modules keeps a Jupyter notebook focused on the exploratory analysis | Source: Author Using SQL directly in Jupyter cells There are some cases in which data is not in memory (e.g.,

SQL

SQL Database Data Scientist Python

Building ML Platform in Retail and eCommerce

The MLOps Blog

MAY 31, 2023

The objective of an ML Platform is to automate repetitive tasks and streamline the processes starting from data preparation to model deployment and monitoring. In this section, I will talk about best practices around building the Data Processing platform. How to set up an ML Platform in eCommerce?

ML ML Algorithm Machine Learning

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

Traditionally, answering this question would involve multiple data exports, complex extract, transform, and load (ETL) processes, and careful data synchronization across systems. Users can write data to managed RMS tables using Iceberg APIs, Amazon Redshift, or Zero-ETL ingestion from supported data sources.

SQL

SQL Data Analyst Data Warehouse AWS

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

In August 2019, Data Works was acquired and Dave worked to ensure a successful transition. David: My technical background is in ETL, data extraction, data engineering and data analytics. An ETL process was built to take the CSV, find the corresponding text articles and load the data into a SQLite database.

ETL

ETL Data Scientist Data Science Machine Learning

IBM watsonx Platform: Compliance obligations to controls mapping

IBM Journey to AI blog

OCTOBER 30, 2024

IBM watsonx.data facilitates scalable analytics and AI endeavors by accommodating data from diverse sources, eliminating the need for migration or cataloging through open formats. This approach enables centralized access and sharing while minimizing extract, transform and load (ETL) processes and data duplication.

Machine Learning

Machine Learning Machine Learning ETL AI

How Formula 1® uses generative AI to accelerate race-day issue resolution

AWS Machine Learning Blog

FEBRUARY 18, 2025

To handle the log data efficiently, raw logs were centralized into an Amazon Simple Storage Service (Amazon S3) bucket. An Amazon EventBridge schedule checked this bucket hourly for new files and triggered log transformation extract, transform, and load (ETL) pipelines built using AWS Glue and Apache Spark.

AWS

AWS Database ETL AI

Best AI apps that actually deliver: No hype, just impact (2025)

Dataconomy

MARCH 7, 2025

Jasper AI Content marketing teams swear by Jasper AI for generating blog posts, social media captions, and even ad copy in seconds. These AI-powered platforms enhance decision-making, automate reporting, and simplify complex data operations.

AI AI Machine Learning Machine Learning

Machine Learning Data Prep Tips for Time Series Models

Tackling AI’s data challenges with IBM databases on AWS

Webinars

Trending Sources

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Webinars

Navigating Data: Alation + Trifacta

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Turn the face of your business from chaos to clarity

Improving air quality with generative AI

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Leveraging KNIME and Tableau: Connecting to Tableau with KNIME

Migrating From AWS Redshift to Snowflake: 2 Methods to Explore

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

How and When to Use Dataflows in Power BI

Deep Thoughts on Data Flow with Alation & Trifacta

How to Use Fivetran to Ingest Salesforce Data into Snowflake

The 2016 Crystal Ball – What’s Next in Data?

Leveraging KNIME and Power BI: Integrating Power BI in KNIME

Exploring the AI and data capabilities of watsonx

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 3: Processing and Data Wrangler jobs

How Does Snowpark Work?

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

How to Use Exploratory Notebooks [Best Practices]

Building ML Platform in Retail and eCommerce

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

IBM watsonx Platform: Compliance obligations to controls mapping

How Formula 1® uses generative AI to accelerate race-day issue resolution

Best AI apps that actually deliver: No hype, just impact (2025)

Stay Connected