AWS, Cloud Data and Data Pipeline - Data Science Current

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Prerequisites Before you begin, make sure you have the following prerequisites in place: An AWS account and role with the AWS Identity and Access Management (IAM) privileges to deploy the following resources: IAM roles. A provisioned or serverless Amazon Redshift data warehouse. Choose Create stack. Sohaib Katariwala is a Sr.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. The robust security features provided by Amazon S3, including encryption and durability, were used to provide data protection.

AWS

AWS Data Governance Data Silos SQL

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. Create dbt models in dbt Cloud.

ETL

ETL Data Warehouse Analytics Analytics

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow. This ensures flexibility and interoperability while using the unique capabilities of each cloud provider.

Data Pipeline

Data Pipeline ETL SQL Database

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. AWS CodeBuild is a fully managed continuous integration service in the cloud.

AWS

AWS ML ML Machine Learning

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

AWS Machine Learning Blog

FEBRUARY 23, 2023

For more information about distributed training with SageMaker, refer to the AWS re:Invent 2020 video Fast training and near-linear scaling with DataParallel in Amazon SageMaker and The science behind Amazon SageMaker’s distributed-training engines. In a later post, we will do a deep dive into the DNNs used by ADAS systems.

AWS

AWS ML ML Machine Learning

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS.

ML

ML ML AWS AI

Discovering the Role of Data Science in a Cloud World

Pickl AI

DECEMBER 26, 2024

Each platform offers unique capabilities tailored to varying needs, making the platform a critical decision for any Data Science project. Major Cloud Platforms for Data Science Amazon Web Services ( AWS ), Microsoft Azure, and Google Cloud Platform (GCP) dominate the cloud market with their comprehensive offerings.

Data Science

Data Science Cloud Computing Machine Learning Machine Learning

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

We also discuss different types of ETL pipelines for ML use cases and provide real-world examples of their use to help data engineers choose the right one. What is an ETL data pipeline in ML? Xoriant It is common to use ETL data pipeline and data pipeline interchangeably.

ETL

ETL Data Pipeline ML ML

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineering

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. It provides a single web-based visual interface where you can perform all ML development steps, including preparing data and building, training, and deploying models.

ML

ML ML AWS Data Warehouse

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

which play a crucial role in building end-to-end data pipelines, to be included in your CI/CD pipelines. End-To-End Data Pipeline Use Case & Flyway Configuration Let’s consider a scenario where you have the requirement to ingest and process inventory data on an hourly basis.

Data Pipeline

Data Pipeline Database SQL Data Engineering

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

In this post, we describe how AWS Partner Airis Solutions used Amazon Lookout for Equipment , AWS Internet of Things (IoT) services, and CloudRail sensor technologies to provide a state-of-the-art solution to address these challenges. It’s an easy way to run analytics on IoT data to gain accurate insights.

AWS

AWS ML ML Machine Learning

On-Prem vs. The Cloud: Key Considerations

phData

FEBRUARY 21, 2025

The Cloud represents an iteration beyond the on-prem data warehouse, where computing resources are delivered over the Internet and are managed by a third-party provider. Examples include: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Data integrations and pipelines can also impact latency.

Data Warehouse

Data Warehouse Cloud Data ETL Cloud Computing

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Big Data Technologies : Handling and processing large datasets using tools like Hadoop, Spark, and cloud platforms such as AWS and Google Cloud. Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Mainframe Technology Trends for 2023

Precisely

JANUARY 19, 2023

Cloud Computing Many appreciate cloud computing because of its scalability, elasticity, and ability to offer easy access to users across the globe. With the emergence of cloud hyperscalers like AWS, Google, and Microsoft, the shift to the cloud has accelerated significantly.

AWS

AWS Cloud Computing Data Pipeline Big Data

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Fivetran enables healthcare organizations to ingest data securely and effectively from a variety of sources into their target destinations, such as Snowflake or other cloud data platforms, for further analytics or curation for sharing data with external providers or customers.

SQL

SQL Data Warehouse Azure Cloud Data

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

For those unfamiliar with GIT or GIT practices, please refer Git for Business Users with Matillion DPC What is a Matillion Pipeline? A Matillion pipeline is a collection of jobs that extract, load, and transform (ETL/ELT) data from various sources into a target system, such as a cloud data warehouse like Snowflake.

AI

AI AI SQL ETL

Mainframe Technology Trends for 2024

Precisely

JANUARY 18, 2024

Modernize in Place (Instead of Rip and Replace) Many appreciate cloud computing because of its scalability, elasticity, and ability to offer easy access to users across the globe. With the emergence of cloud hyperscalers like AWS, Google, and Microsoft, the shift to the cloud has accelerated significantly.

AWS

AWS Artificial Intelligence Artificial Intelligence Cloud Computing

Visionary Data Quality Paves the Way to Data Integrity

Precisely

MARCH 14, 2023

First, private cloud infrastructure providers like Amazon (AWS), Microsoft (Azure), and Google (GCP) began by offering more cost-effective and elastic resources for fast access to infrastructure. Instead of moving customer data to the processing engine, we move the processing engine to the data.

Data Quality

Data Quality Cloud Data Data Pipeline Data Observability

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

Integration : Can it connect with existing systems like AWS, Azure, or Google Cloud? Integration capabilities are crucial for leveraging cloud services. Airflow is particularly useful for organizations that require flexibility and scalability in their data pipelines.

ETL

ETL Data Warehouse AWS Business Intelligence

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.). Streamsets Data Collector StreamSets Data Collector Engine is an easy-to-use data pipeline engine for streaming, CDC, and batch ingestion from any source to any destination.

Data Warehouse

Data Warehouse Azure AWS Database

The Data Integration Solution Checklist: Top 10 Considerations

Precisely

MAY 13, 2024

As enterprise technology landscapes grow more complex, the role of data integration is more critical than ever before. Whether it’s a cloud data warehouse or a mainframe, look for vendors who have a wide range of capabilities that can adapt to your changing needs. What data governance controls do your solutions have in place? #9.

Data Governance

Data Governance Data Pipeline Cloud Data Data Quality

Choosing the Right ETL Platform: Benefits for Data Integration

Pickl AI

OCTOBER 15, 2024

Talend Talend is a leading open-source ETL platform that offers comprehensive solutions for data integration, data quality , and cloud data management. It supports both batch and real-time data processing , making it highly versatile. It is well known for its data provenance and seamless data routing capabilities.

ETL

ETL Azure AWS Data Governance

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

This open-source streaming platform enables the handling of high-throughput data feeds, ensuring that data pipelines are efficient, reliable, and capable of handling massive volumes of data in real-time. Prefect’s design is particularly suited for modern cloud-based data environments.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

How to Connect Snowflake to Python

phData

JANUARY 5, 2023

Python has proven proficient in setting up pipelines, maintaining data flows, and transforming data with its simple syntax and proficiency in automation. Having been built completely for and in the cloud, the Snowflake Data Cloud has become an industry leader in cloud data platforms.

Python

Python Data Engineering Data Engineering Data Engineering

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently. What Are Matillion Jobs and Why Do They Matter?

ETL

ETL Data Warehouse SQL Database

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

.” Das Kamhout, VP and Senior Principal Engineer of the Cloud and Enterprise Solutions Group at Intel Watsonx.data supports our customers’ increasing needs around hybrid cloud deployments and is available on premises and across multiple cloud providers, including IBM Cloud and Amazon Web Services (AWS).

AI

AI AI Machine Learning Machine Learning

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

They created each capability as modules, which can either be used independently or together to build automated data pipelines. IDF works natively on cloud platforms like AWS. In essence, Alation is acting as a foundational data fabric that Gartner describes as being required for DataOps.

DataOps

DataOps Data Pipeline Data Engineering Data Engineer

What are the Top Applications of AI for Financial Services?

phData

OCTOBER 11, 2024

To help, phData designed and implemented AI-powered data pipelines built on the Snowflake AI Data Cloud , Fivetran, and Azure to automate invoice processing. Migrations from legacy on-prem systems to cloud data platforms like Snowflake and Redshift. This is where AI truly shines.

AI

AI AI Data Pipeline ML

What is Salesforce Data Cloud for Tableau?

Tableau

DECEMBER 7, 2022

Every company today is being asked to do more with less, and leaders need access to fresh, trusted KPIs and data-driven insights to manage their businesses, keep ahead of the competition, and provide unparalleled customer experiences. But good data—and actionable insights—are hard to get. What is Salesforce Data Cloud for Tableau?

Tableau

Tableau Data Warehouse Data Pipeline Data Visualization

What are the Advantages of Using Fivetran?

phData

JULY 19, 2023

In this blog post, we’ll dive into the amazing advantages of using Fivetran , a powerful data integration platform that will revolutionize the way you handle your data pipelines. They established an Information Architecture for Snowflake Data Cloud , enabling automated database and role creation.

Data Warehouse

Data Warehouse Database Data Pipeline Cloud Data

What are the Advantages of Using Fivetran?

phData

JULY 19, 2023

In this blog post, we’ll dive into the amazing advantages of using Fivetran , a powerful data integration platform that will revolutionize the way you handle your data pipelines. They established an Information Architecture for Snowflake Data Cloud , enabling automated database and role creation.

Data Warehouse

Data Warehouse Database Data Pipeline Cloud Data

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

However, if there’s one thing we’ve learned from years of successful cloud data implementations here at phData, it’s the importance of: Defining and implementing processes Building automation, and Performing configuration …even before you create the first user account. Use with caution, and test before committing to using them.

Clustering

Clustering Database SQL Data Pipeline

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

One big issue that contributes to this resistance is that although Snowflake is a great cloud data warehousing platform, Microsoft has a data warehousing tool of its own called Synapse. Gateways are being used as another layer of security between Snowflake or cloud data source and Power BI users.

Power BI

Power BI Analytics Analytics Azure

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. First up, let’s dive into the foundation of every Modern Data Stack, a cloud-based data warehouse.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

However, if the tool supposes an option where we can write our custom programming code to implement features that cannot be achieved using the drag-and-drop components, it broadens the horizon of what we can do with our data pipelines. In this example, the secret is an API key, which will be used later on in the pipeline.

Python

Python ETL AWS Database

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Shaping the future: OMRON’s data-driven journey with AWS

Webinars

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

The power of remote engine execution for ETL/ELT data pipelines

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Discovering the Role of Data Science in a Cloud World

How to Build ETL Data Pipeline in ML

How to Build Effective Data Pipelines in Snowpark

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

On-Prem vs. The Cloud: Key Considerations

A Guide to Choose the Best Data Science Bootcamp

Mainframe Technology Trends for 2023

Top 5 Fivetran Connectors for Healthcare

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Mainframe Technology Trends for 2024

Visionary Data Quality Paves the Way to Data Integrity

List of ETL Tools: Explore the Top ETL Tools for 2025

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

The Data Integration Solution Checklist: Top 10 Considerations

Choosing the Right ETL Platform: Benefits for Data Integration

11 Open-Source Data Engineering Tools Every Pro Should Use

How to Connect Snowflake to Python

Best Practices When Developing Matillion Jobs

Exploring the AI and data capabilities of watsonx

Turnkey Cloud DataOps: Solution from Alation and Accenture

What are the Top Applications of AI for Financial Services?

What is Salesforce Data Cloud for Tableau?

What are the Advantages of Using Fivetran?

What are the Advantages of Using Fivetran?

Getting Started With Snowflake: Best Practices For Launching

How to Optimize Power BI and Snowflake for Advanced Analytics

The Ultimate Modern Data Stack Migration Guide

Top 10 Python Scripts for use in Matillion for Snowflake

Stay Connected