Data Engineering, Data Warehouse and ML

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others. A provisioned or serverless Amazon Redshift data warehouse.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Flipboard

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

AWS

AWS Data Warehouse ETL SQL

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

OMRONs data strategyrepresented on ODAPalso allowed the organization to unlock generative AI use cases focused on tangible business outcomes and enhanced productivity. When needed, the system can access an ODAP data warehouse to retrieve additional information.

AWS

AWS Data Governance Data Silos SQL

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

dbt: Codify and Automate Transformation of Data in Your Data Warehouse

Mlearning.ai

OCTOBER 22, 2023

dbt helps transform data in your warehouse all through SQL following software engineering best practices. Continue reading on MLearning.ai »

Data Warehouse

Data Warehouse SQL Data Engineer Data Engineering

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Organizations are building data-driven applications to guide business decisions, improve agility, and drive innovation. Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services.

SQL

SQL AWS Data Lakes AI

Azure Data Engineer Jobs

Pickl AI

APRIL 6, 2023

Accordingly, one of the most demanding roles is that of Azure Data Engineer Jobs that you might be interested in. The following blog will help you know about the Azure Data Engineering Job Description, salary, and certification course. How to Become an Azure Data Engineer?

Azure

Azure Data Engineer Data Engineering Data Engineering

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

Mlearning.ai

MAY 16, 2023

Data engineering is a rapidly growing field that designs and develops systems that process and manage large amounts of data. There are various architectural design patterns in data engineering that are used to solve different data-related problems.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier.

ETL

ETL Data Pipeline ML ML

Most Common Use Cases of Data Engineering in Healthcare

phData

AUGUST 11, 2023

Data engineering in healthcare is taking a giant leap forward with rapid industrial development. Artificial Intelligence (AI) and Machine Learning (ML) are buzzwords these days with developments of Chat-GPT, Bard, and Bing AI, among others. Data engineering can serve as the foundation for every data need within an organization.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Why optimize your warehouse with a data lakehouse strategy

IBM Journey to AI blog

APRIL 25, 2023

We also made the case that query and reporting, provided by big data engines such as Presto, need to work with the Spark infrastructure framework to support advanced analytics and complex enterprise data decision-making. To do so, Presto and Spark need to readily work with existing and modern data warehouse infrastructures.

Data Warehouse

Data Warehouse Data Engineering Data Engineering Data Engineering

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

Amazon Lookout for Metrics is a fully managed service that uses machine learning (ML) to detect anomalies in virtually any time-series business or operational metrics—such as revenue performance, purchase transactions, and customer acquisition and retention rates—with no ML experience required. To learn more, see the documentation.

AWS

AWS ML ML Data Quality

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment.

AWS

AWS Machine Learning Machine Learning ML

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

AWS Machine Learning Blog

NOVEMBER 1, 2023

Organizations can search for PII using methods such as keyword searches, pattern matching, data loss prevention tools, machine learning (ML), metadata analysis, data classification software, optical character recognition (OCR), document fingerprinting, and encryption.

AWS

AWS Machine Learning Machine Learning ML

Connecting Amazon Redshift and RStudio on Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 29, 2022

You can quickly launch the familiar RStudio IDE and dial up and down the underlying compute resources without interrupting your work, making it easy to build machine learning (ML) and analytics solutions in R at scale. Data analysis and modeling can be challenging when working with large datasets in the cloud. Conclusion.

AWS

AWS Machine Learning Machine Learning Database

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

AWS Machine Learning Blog

JUNE 5, 2023

Data is the foundation for machine learning (ML) algorithms. One of the most common formats for storing large amounts of data is Apache Parquet due to its compact and highly efficient format. Import the Parquet data to Canvas using Athena. Use the imported Parquet data to build ML models with Canvas.

Machine Learning

Machine Learning Machine Learning AWS Data Lakes

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. An AI governance framework ensures the ethical, responsible and transparent use of AI and machine learning (ML). The development and use of these models explain the enormous amount of recent AI breakthroughs.

AI

AI AI Data Warehouse ML

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Introduction ETL plays a crucial role in Data Management. This process enables organisations to gather data from various sources, transform it into a usable format, and load it into data warehouses or databases for analysis. Loading The transformed data is loaded into the target destination, such as a data warehouse.

ETL

ETL Data Warehouse Data Quality Data Governance

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

The ultimate need for vast storage spaces manifests in data warehouses: specialized systems that aggregate data coming from numerous sources for centralized management and consistency. In this article, you’ll discover what a Snowflake data warehouse is, its pros and cons, and how to employ it efficiently.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

Getting Started with Snowflake and the Rise of ELT Workflows in the Cloud

Mlearning.ai

FEBRUARY 4, 2023

Modern Data Warehouses like Snowflake are changing how we load and transform data in our warehouse with no extra tooling or external… Continue reading on MLearning.ai »

Data Warehouse

Data Warehouse Data Engineer Data Engineering Data Engineering

How to Normalize MongoDB Data in Snowflake for Data Science Workflows

Mlearning.ai

FEBRUARY 4, 2023

Leverage the Power of MongoDB and Snowflake to Create a Data Warehouse built for Data Science and Analytics Workflows. Continue reading on MLearning.ai »

Data Science

Data Science Data Warehouse Analytics Analytics

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

However, there are some key differences that we need to consider: Size and complexity of the data In machine learning, we are often working with much larger data. Basically, every machine learning project needs data. Given the range of tools and data types, a separate data versioning logic will be necessary.

ML

ML ML Data Lakes Machine Learning

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. It is supported by querying, governance, and open data formats to access and share data across the hybrid cloud. IBM watsonx.ai

AI

AI AI Machine Learning Machine Learning

How to Prepare Data for Use in Machine Learning Models

phData

JUNE 18, 2024

Machine learning (ML) is only possible because of all the data we collect. However, with data coming from so many different sources, it doesn’t always come in a format that’s easy for ML models to understand. Before you can take advantage of everything ML offers, much prep work is involved.

Machine Learning

Machine Learning Machine Learning ML ML

8 Data Lake Vendors to Make Your Data Life Easier in 2023

ODSC - Open Data Science

JUNE 7, 2023

Data has to be stored somewhere. Data warehouses are repositories for your cleaned, processed data, but what about all that unstructured data your organization is starting to notice? What is a data lake? Snowflake Snowflake is a cross-cloud platform that looks to break down data silos.

Data Lakes

Data Lakes Azure Data Warehouse Hadoop

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Why Migrate to a Modern Data Stack? With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. Data teams can focus on delivering higher-value data tasks with better organizational visibility.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Booths and Demos Coming to the ODSC West 2024 Expo Hall

ODSC - Open Data Science

OCTOBER 7, 2024

There you’ll hear from Ivan Nardini, Developer Relations Engineer at Google Cloud and discover the latest advancements in AI and learn how to leverage Google Cloud’s powerful tools and infrastructure to drive innovation in your organization. We are just weeks away from the AI Expo and Demo Hall.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

Scale knowledge management use cases with generative AI

IBM Journey to AI blog

JULY 27, 2023

Powering a knowledge management system with a data lakehouse Organizations need a data lakehouse to target data challenges that come with deploying an AI-powered knowledge management system. It provides the combination of data lake flexibility and data warehouse performance to help to scale AI.

AI

AI AI Data Scientist Data Quality

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Try Db2 Warehouse SaaS on AWS for free   Netezza SaaS on AWS IBM® Netezza® Performance Server is a cloud-native data warehouse designed to operationalize deep analytics, data mining and BI by unifying, accessing and scaling all types of data across the hybrid cloud. Netezza

AWS

AWS Database ETL AI

Data science vs. machine learning: What’s the difference?

IBM Journey to AI blog

JULY 6, 2023

Data science solves a business problem by understanding the problem, knowing the data that’s required, and analyzing the data to help solve the real-world problem. Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on learning from what the data science comes up with.

Machine Learning

Machine Learning Machine Learning Data Science Big Data

How to Maximize Fan Engagement with Fan 360

phData

FEBRUARY 15, 2024

With the advent of cloud data warehouses and the ability to (seemingly) infinitely scale analytics on an organization’s data, centralizing and using that data to discover what drives customer engagement has become a top priority for executives across all industries and verticals.

Data Warehouse

Data Warehouse Cloud Data Data Silos Analytics

ETL Pipelines With Python Azure Functions

Mlearning.ai

JULY 8, 2023

EL stands for extract and load, and its primary goal is to just move the data from one place to another where the destination is usually a Data Warehouse or a Data Lake. The most fundamental difference between ELT and ETL is that the former first loads the data into the target storage and, then, processes them.

ETL

ETL Azure Python Internet of Things

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

Alation

MAY 24, 2022

Alation has been leading the evolution of the data catalog to a platform for data intelligence. Higher data intelligence drives higher confidence in everything related to analytics and AI/ML. For example, a data steward can filter all data by ‘“endorsed data’” in a Snowflake data warehouse, tagged with ‘bank account’.

Data Quality

Data Quality Data Governance ETL Data Observability

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

This includes the tools and techniques we used to streamline the ML model development and deployment processes, as well as the measures taken to monitor and maintain models in a production environment. Costs: Oftentimes, cost is the most important aspect of any ML model deployment. This includes data quality, privacy, and compliance.

AWS

AWS ETL ML ML

Five benefits of a data catalog

IBM Journey to AI blog

DECEMBER 16, 2022

It uses metadata and data management tools to organize all data assets within your organization. It synthesizes the information across your data ecosystem—from data lakes, data warehouses, and other data repositories—to empower authorized users to search for and access business-ready data for their projects and initiatives.

Data Quality

Data Quality Data Governance Data Scientist Data Wrangling

How to Maximize Time to Value with Fivetran and dbt

phData

OCTOBER 17, 2023

In our previous blog , we discussed how Fivetran and dbt scale for any data volume and workload, both small and large. Now, you might be wondering what these tools can do for your data team and the efficiency of your organization as a whole. Can these tools help reduce the time our data engineers spend fixing things?

ETL

ETL Data Pipeline Data Engineering Data Engineering

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Alation

SEPTEMBER 7, 2021

With Snowflake, data stewards have a choice to leverage Snowflake’s governance policies. First, stewards are dependent on data warehouse admins to provide information and to create and edit enforcement policies in Snowflake. Alation’s data lineage helps organizations to secure their data in the Snowflake Data Cloud.

Data Governance

Data Governance Data Scientist Data Quality Data Profiling

Data Quality Framework: What It Is, Components, and Implementation

DagsHub

AUGUST 23, 2024

As companies increasingly rely on data for decision-making, poor-quality data can lead to disastrous outcomes. Even the most sophisticated ML models, neural networks, or large language models require high-quality data to learn meaningful patterns. When bad data is inputted, it inevitably leads to poor outcomes.

Data Quality

Data Quality Data Governance Machine Learning Machine Learning

Beginner’s Guide To GCP BigQuery (Part 2)

Mlearning.ai

JULY 10, 2023

Without partitioning, daily data activities will cost your company a fortune and a moment will come where the cost advantage of GCP BigQuery becomes questionable. By keeping the data in cloud storage instead of native BigQuery tables, you can reduce your storage costs while maintaining the ability to query the data.

SQL

SQL Database Database Administration Data Lakes

Top Advanced Text Data Labeling Techniques: A Comprehensive Guide

DagsHub

JANUARY 27, 2025

Active Learning represents a strategic approach that addresses the fundamental challenge of data annotation: maximizing model performance while minimizing human labeling effort. It enables efficient active learning by iteratively selecting the most valuable data points for labeling, reducing manual effort while improving model performance. This

Machine Learning

Machine Learning Machine Learning Natural Language Processing Supervised Learning

Top Advanced Text Data Labeling: A Comprehensive Guide

DagsHub

JANUARY 27, 2025

Active Learning represents a strategic approach that addresses the fundamental challenge of data annotation: maximizing model performance while minimizing human labeling effort. It enables efficient active learning by iteratively selecting the most valuable data points for labeling, reducing manual effort while improving model performance. This

Machine Learning

Machine Learning Machine Learning Natural Language Processing Supervised Learning

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

Flipboard

NOVEMBER 22, 2024

Organizations run millions of Apache Spark applications each month to prepare, move, and process their data for analytics and machine learning (ML). During development, data engineers often spend hours sifting through log files, analyzing execution plans, and making configuration changes to resolve issues.

AWS

AWS AI AI Data Engineering

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Here’s how a composable CDP might incorporate the modeling approaches we’ve discussed: Data Storage and Processing : This is your foundation. You might choose a cloud data warehouse like the Snowflake AI Data Cloud or BigQuery. Building a composable CDP requires some serious data engineering chops.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Webinars

Trending Sources

Shaping the future: OMRON’s data-driven journey with AWS

Webinars

dbt: Codify and Automate Transformation of Data in Your Data Warehouse

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Azure Data Engineer Jobs

The Backbone of Data Engineering: 5 Key Architectural Patterns Explained

How to Build ETL Data Pipeline in ML

Most Common Use Cases of Data Engineering in Healthcare

Why optimize your warehouse with a data lakehouse strategy

Transitioning off Amazon Lookout for Metrics

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

Connecting Amazon Redshift and RStudio on Amazon SageMaker

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

How to use foundation models and trusted governance to manage AI workflow risk

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Getting Started with Snowflake and the Rise of ELT Workflows in the Cloud

How to Normalize MongoDB Data in Snowflake for Data Science Workflows

How to Version Control Data in ML for Various Data Sources

Exploring the AI and data capabilities of watsonx

How to Prepare Data for Use in Machine Learning Models

8 Data Lake Vendors to Make Your Data Life Easier in 2023

The Ultimate Modern Data Stack Migration Guide

Booths and Demos Coming to the ODSC West 2024 Expo Hall

Scale knowledge management use cases with generative AI

Tackling AI’s data challenges with IBM databases on AWS

Data science vs. machine learning: What’s the difference?

How to Maximize Fan Engagement with Fan 360

ETL Pipelines With Python Azure Functions

Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance

How to Build a CI/CD MLOps Pipeline [Case Study]

Five benefits of a data catalog

How to Maximize Time to Value with Fivetran and dbt

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Data Quality Framework: What It Is, Components, and Implementation

Beginner’s Guide To GCP BigQuery (Part 2)

Top Advanced Text Data Labeling Techniques: A Comprehensive Guide

Top Advanced Text Data Labeling: A Comprehensive Guide

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Stay Connected