Data Lakes, ETL and ML - Data Science Current

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Flipboard

JUNE 26, 2023

Companies are faced with the daunting task of ingesting all this data, cleansing it, and using it to provide outstanding customer experience. Typically, companies ingest data from multiple sources into their data lake to derive valuable insights from the data. Run the AWS Glue ML transform job.

AWS

AWS ML ML ETL

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier.

ETL

ETL Data Pipeline ML ML

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Flipboard

MARCH 7, 2023

This post presents a solution that uses a workflow and AWS AI and machine learning (ML) services to provide actionable insights based on those transcripts. We use multiple AWS AI/ML services, such as Contact Lens for Amazon Connect and Amazon SageMaker , and utilize a combined architecture. Validation set 11 1500 0.82

ML

ML ML AWS AI

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data. Choose Continue.

SQL

SQL AWS Data Lakes AI

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks. Apache Hive was used to provide a tabular interface to data stored in HDFS, and to integrate with Apache Spark SQL. This also led to a backlog of data that needed to be ingested.

Data Science

Data Science AWS Hadoop Data Scientist

ETL Pipelines With Python Azure Functions

Mlearning.ai

JULY 8, 2023

In this article we’re going to check what is an Azure function and how we can employ it to create a basic extract, transform and load (ETL) pipeline with minimal code. Extract, transform and Load Before we begin, let’s shed some light on what an ETL pipeline essentially is. ELT stands for extract, load and transform.

ETL

ETL Azure Python Internet of Things

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

ML operationalization summary As defined in the post MLOps foundation roadmap for enterprises with Amazon SageMaker , ML and operations (MLOps) is the combination of people, processes, and technology to productionize machine learning (ML) solutions efficiently.

AI

AI AI ML ML

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Previously, he was a Data & Machine Learning Engineer at AWS, where he worked closely with customers to develop enterprise-scale data infrastructure, including data lakes, analytics dashboards, and ETL pipelines.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. Data is frequently kept in data lakes that can be managed by AWS Lake Formation , giving you the ability to implement fine-grained access control using a straightforward grant or revoke procedure.

AWS

AWS Data Lakes Clustering Data Preparation

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

These tools may have their own versioning system, which can be difficult to integrate with a broader data version control system. For instance, our data lake could contain a variety of relational and non-relational databases, files in different formats, and data stored using different cloud providers. DVC Git LFS neptune.ai

ML

ML ML Data Lakes Machine Learning

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

Despite the challenges, Afri-SET, with limited resources, envisions a comprehensive data management solution for stakeholders seeking sensor hosting on their platform, aiming to deliver accurate data from low-cost sensors. Qiong (Jo) Zhang , PhD, is a Senior Partner Solutions Architect at AWS, specializing in AI/ML.

AWS

AWS AI AI Python

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Db2 Warehouse fully supports open formats such as Parquet, Avro, ORC and Iceberg table format to share data and extract new insights across teams without duplication or additional extract, transform, load (ETL). This allows you to scale all analytics and AI workloads across the enterprise with trusted data. 

AWS

AWS Database ETL AI

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

AWS Machine Learning Blog

JUNE 25, 2024

Although these traditional machine learning (ML) approaches might perform decently in terms of accuracy, there are several significant advantages to adopting generative AI approaches. The following table compares the generative approach (generative AI) with the discriminative approach (traditional ML) across multiple aspects.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Data platform trinity: Competitive or complementary?

IBM Journey to AI blog

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data lakehouse was created to solve these problems.

Data Lakes

Data Lakes Data Warehouse Azure Apache Hadoop

How to Effectively Handle Unstructured Data Using AI

DagsHub

NOVEMBER 11, 2024

We use data-specific preprocessing and ML algorithms suited to each modality to filter out noise and inconsistencies in unstructured data. NLP cleans and refines content for text data, while audio data benefits from signal processing to remove background noise. Tools like Unstructured.io

AI

AI AI Data Lakes Database

Modern Data Architectures Provide a Foundation for Innovation

Precisely

JUNE 6, 2023

The group kicked off the session by exchanging ideas about what it means to have a modern data architecture. Atif Salam noted that as recently as a year ago, the primary focus in many organizations was on ingesting data and building data lakes.

Data Observability

Data Observability Data Lakes Data Quality ETL

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Managing unstructured data is essential for the success of machine learning (ML) projects. Without structure, data is difficult to analyze and extracting meaningful insights and patterns is challenging. This article will discuss managing unstructured data for AI and ML projects. What is Unstructured Data?

Machine Learning

Machine Learning Machine Learning AI Data Lakes

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. Watsonx.data allows customers to augment data warehouses such as Db2 Warehouse and Netezza and optimize workloads for performance and cost. IBM watsonx.ai

AI

AI AI Machine Learning Machine Learning

Top Data Analytics Skills and Platforms for 2023

ODSC - Open Data Science

APRIL 3, 2023

Skills like effective verbal and written communication will help back up the numbers, while data visualization (specific frameworks in the next section) can help you tell a complete story. Data Wrangling: Data Quality, ETL, Databases, Big Data The modern data analyst is expected to be able to source and retrieve their own data for analysis.

Analytics

Analytics Analytics Data Analyst Data Science

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

Data pipeline stages But before delving deeper into the technical aspects of these tools, let’s quickly understand the core components of a data pipeline succinctly captured in the image below: Data pipeline stages | Source: Author What does a good data pipeline look like?

Data Pipeline

Data Pipeline ETL SQL Data Quality

Azure Data Engineer Jobs

Pickl AI

APRIL 6, 2023

Having a solid understanding of ML principles and practical knowledge of statistics, algorithms, and mathematics. An example of the Azure Data Engineer Jobs in India can be evaluated as follows: 6-8 years of experience in the IT sector. Data Warehousing concepts and knowledge should be strong.

Azure

Azure Data Engineering Data Engineering Data Engineering

AI that’s ready for business starts with data that’s ready for AI

IBM Journey to AI blog

JULY 3, 2024

To power AI and analytics workloads across your transactional and purpose-built databases, you must ensure they can seamlessly integrate with an open data lakehouse architecture without duplication or additional extract, transform, load (ETL) processes.

AI

AI AI Data Quality Database

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

ODSC - Open Data Science

OCTOBER 9, 2024

What should you be looking for?

Apache Kafka

Apache Kafka AI AI Machine Learning

How and When to Use Dataflows in Power BI

phData

SEPTEMBER 28, 2023

Power BI Dataflows provide vital functionalities that effectively empower users to cleanse and reshape data from various sources. These Dataflows are crucial in fostering consistency and reducing the duplication of repetitive ETL (Extract, Transform, Load) steps, achieved by reusing transformations.

Power BI

Power BI Data Preparation Machine Learning Machine Learning

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Tools such as Python’s Pandas library, Apache Spark, or specialised data cleaning software streamline these processes, ensuring data integrity before further transformation. Step 3: Data Transformation Data transformation focuses on converting cleaned data into a format suitable for analysis and storage.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Nevertheless, many data scientists will agree that they can be really valuable – if used well. And that’s what we’re going to focus on in this article, which is the second in my series on Software Patterns for Data Science & ML Engineering. Aside neptune.ai

SQL

SQL Database Data Scientist Python

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

If the event log is your customer’s diary, think of persistent staging as their scrapbook – a place where raw customer data is collected, organized, and kept for future reference. In traditional ETL (Extract, Transform, Load) processes in CDPs, staging areas were often temporary holding pens for data.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

Flipboard

NOVEMBER 22, 2024

Organizations run millions of Apache Spark applications each month to prepare, move, and process their data for analytics and machine learning (ML). During development, data engineers often spend hours sifting through log files, analyzing execution plans, and making configuration changes to resolve issues. Choose your job.

AWS

AWS AI AI Data Engineer

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Flipboard

DECEMBER 18, 2023

Customers use Amazon Redshift as a key component of their data architecture to drive use cases from typical dashboarding to self-service analytics, real-time analytics, machine learning (ML), data sharing and monetization, and more. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

AWS

AWS Data Warehouse ETL SQL

Exploring the Power of Data Warehouse Functionality

Pickl AI

JUNE 11, 2024

Let’s delve into the key components that form the backbone of a data warehouse: Source Systems These are the operational databases, CRM systems, and other applications that generate the raw data feeding the data warehouse. Data Extraction, Transformation, and Loading (ETL) This is the workhorse of architecture.

Data Warehouse

Data Warehouse ETL Data Mining Data Mining

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

AWS Machine Learning Blog

AUGUST 2, 2024

Generative AI empowers organizations to combine their data with the power of machine learning (ML) algorithms to generate human-like content, streamline processes, and unlock innovation. Since 2023, he has also been working on scaling the use of generative AI in all departments.

AWS

AWS Machine Learning Machine Learning Database

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

By leveraging data services and APIs, a data fabric can also pull together data from legacy systems, data lakes, data warehouses and SQL databases, providing a holistic view into business performance. It uses knowledge graphs, semantics and AI/ML technology to discover patterns in various types of metadata.

Data Lakes

Data Lakes AI AI Data Governance

Generative AI for agriculture: How Agmatix is improving agriculture with Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 12, 2024

There are various technologies that help operationalize and optimize the process of field trials, including data management and analytics, IoT, remote sensing, robotics, machine learning (ML), and now generative AI. Multi-source data is initially received and stored in an Amazon Simple Storage Service (Amazon S3) data lake.

AWS

AWS AI AI Data Lakes

Search enterprise data assets using LLMs backed by knowledge graphs

Flipboard

NOVEMBER 27, 2024

His mission is to enable customers achieve their business goals and create value with data and AI. He helps architect solutions across AI/ML applications, enterprise data platforms, data governance, and unified search in enterprises.

AWS

AWS Database ML ML

Data Science Current

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

How to Build ETL Data Pipeline in ML

Webinars

Trending Sources

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Webinars

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

How Rocket Companies modernized their data science solution on AWS

ETL Pipelines With Python Azure Functions

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

How to Version Control Data in ML for Various Data Sources

Improving air quality with generative AI

Tackling AI’s data challenges with IBM databases on AWS

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

Data platform trinity: Competitive or complementary?

How to Effectively Handle Unstructured Data Using AI

Modern Data Architectures Provide a Foundation for Innovation

How to Manage Unstructured Data in AI and Machine Learning Projects

Exploring the AI and data capabilities of watsonx

Top Data Analytics Skills and Platforms for 2023

Comparing Tools For Data Processing Pipelines

Azure Data Engineer Jobs

AI that’s ready for business starts with data that’s ready for AI

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

How and When to Use Dataflows in Power BI

Build Data Pipelines: Comprehensive Step-by-Step Guide

How to Use Exploratory Notebooks [Best Practices]

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Exploring the Power of Data Warehouse Functionality

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

Data democratization: How data architecture can drive business decisions and AI initiatives

Generative AI for agriculture: How Agmatix is improving agriculture with Amazon Bedrock

Search enterprise data assets using LLMs backed by knowledge graphs

Stay Connected