AWS, Data Pipeline and SQL - Data Science Current

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. The robust security features provided by Amazon S3, including encryption and durability, were used to provide data protection.

AWS

AWS Data Governance Data Silos SQL

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. Create dbt models in dbt Cloud.

ETL

ETL Data Warehouse Analytics Analytics

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

AWS Machine Learning Blog

DECEMBER 4, 2024

SageMaker Unified Studio combines various AWS services, including Amazon Bedrock , Amazon SageMaker , Amazon Redshift , Amazon Glue , Amazon Athena , and Amazon Managed Workflows for Apache Airflow (MWAA) , into a comprehensive data and AI development platform.

AWS

AWS AI AI SQL

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Prerequisites Before you begin, make sure you have the following prerequisites in place: An AWS account and role with the AWS Identity and Access Management (IAM) privileges to deploy the following resources: IAM roles. A provisioned or serverless Amazon Redshift data warehouse. Basic knowledge of a SQL query editor.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

AWS Machine Learning Blog

DECEMBER 6, 2023

To overcome these limitations, we propose a solution that combines RAG with metadata and entity extraction, SQL querying, and LLM agents, as described in the following sections. Typically, these analytical operations are done on structured data, using tools such as pandas or SQL engines.

SQL

SQL AWS Analytics Analytics

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

Data engineering tools are software applications or frameworks specifically designed to facilitate the process of managing, processing, and transforming large volumes of data. Amazon Redshift: Amazon Redshift is a cloud-based data warehousing service provided by Amazon Web Services (AWS).

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow. With a multicloud data strategy, organizations need to optimize for data gravity and data locality.

Data Pipeline

Data Pipeline ETL SQL Database

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

In addition to its groundbreaking AI innovations, Zeta Global has harnessed Amazon Elastic Container Service (Amazon ECS) with AWS Fargate to deploy a multitude of smaller models efficiently. It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines.

AWS

AWS Machine Learning Machine Learning ML

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

Using structured data to answer questions requires a way to effectively extract data that’s relevant to a user’s query. We formulated a text-to-SQL approach where by a user’s natural language query is converted to a SQL statement using an LLM. The SQL is run by Amazon Athena to return the relevant data.

SQL

SQL AWS AI AI

University of British Columbia Cloud Innovation Centre: Prototyping generative AI solutions using AWS

Flipboard

MAY 21, 2025

This post highlights how the UBC CIC uses Amazon Web Services (AWS) to accelerate generative AI development, sharing lessons learned, tools used, and actionable insights you can apply to your projects. Security in generative AI prototyping UBC CIC observes the shared responsibility model through the Amazon Bedrock Data protection features.

AWS

AWS AI AI Database

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 12, 2024

Furthermore, the democratization of AI and ML through AWS and AWS Partner solutions is accelerating its adoption across all industries. For example, a health-tech company may be looking to improve patient care by predicting the probability that an elderly patient may become hospitalized by analyzing both clinical and non-clinical data.

ML

ML ML AWS AI

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

In order to train a model using data stored outside of the three supported storage services, the data first needs to be ingested into one of these services (typically Amazon S3). This requires building a data pipeline (using tools such as Amazon SageMaker Data Wrangler ) to move data into Amazon S3.

ML

ML ML AWS Python

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineering

Understanding ETL Tools as a Data-Centric Organization

Smart Data Collective

SEPTEMBER 8, 2021

A lot of Open-Source ETL tools house a graphical interface for executing and designing Data Pipelines. It can be used to manipulate, store, and analyze data of any structure. It generates Java code for the Data Pipelines instead of running Pipeline configurations through an ETL Engine.

ETL

ETL Hadoop Data Warehouse Data Pipeline

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. If you’re familiar with SageMaker and writing Spark code, option B could be your choice.

ML

ML ML AWS Data Warehouse

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Flipboard

NOVEMBER 24, 2023

This use case highlights how large language models (LLMs) are able to become a translator between human languages (English, Spanish, Arabic, and more) and machine interpretable languages (Python, Java, Scala, SQL, and so on) along with sophisticated internal reasoning.

Database

Database AWS ETL SQL

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

which play a crucial role in building end-to-end data pipelines, to be included in your CI/CD pipelines. Each migration SQL script is assigned a unique sequence number to facilitate the correct order of application. Additionally, we need to incorporate Flyway variables into the Flyway configuration file.

Data Pipeline

Data Pipeline Database SQL Data Engineering

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

SEPTEMBER 24, 2024

Working with the AWS Generative AI Innovation Center , DoorDash built a solution to provide Dashers with a low-latency self-service voice experience to answer frequently asked questions, reducing the need for live agent assistance, in just 2 months. “We You can deploy the solution in your own AWS account and try the example solution.

AWS

AWS AI AI Analytics

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Big Data Technologies : Handling and processing large datasets using tools like Hadoop, Spark, and cloud platforms such as AWS and Google Cloud. Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

ODSC - Open Data Science

FEBRUARY 17, 2023

Cloud Computing, APIs, and Data Engineering NLP experts don’t go straight into conducting sentiment analysis on their personal laptops. Data Engineering Platforms Spark is still the leader for data pipelines but other platforms are gaining ground. Knowing some SQL is also essential.

Data Science

Data Science Deep Learning Deep Learning Natural Language Processing

2021 Data/AI Salary Survey

O'Reilly Media

SEPTEMBER 15, 2021

Cloud certifications, specifically in AWS and Microsoft Azure, were most strongly associated with salary increases. As we’ll see later, cloud certifications (specifically in AWS and Microsoft Azure) were the most popular and appeared to have the largest effect on salaries. Salaries were lower regardless of education or job title.

AI

AI AI Azure AWS

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Computer Science and Computer Engineering Similar to knowing statistics and math, a data scientist should know the fundamentals of computer science as well. While knowing Python, R, and SQL are expected, you’ll need to go beyond that. Big Data As datasets become larger and more complex, knowing how to work with them will be key.

Data Science

Data Science Data Scientist Computer Science Computer Science

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Top 5 Data Warehouses to Supercharge Your Big Data Strategy

Women in Big Data

NOVEMBER 27, 2024

How to Choose a Data Warehouse for Your Big Data Choosing a data warehouse for big data storage necessitates a thorough assessment of your unique requirements. Begin by determining your data volume, variety, and the performance expectations for querying and reporting.

Data Warehouse

Data Warehouse Big Data Big Data Azure

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

SQL Server – The SQL Server connector, another widely-used database-type connector, provides similar functionality but is tailored for Microsoft’s SQL Server. The phData team achieved a major milestone by successfully setting up a secure end-to-end data pipeline for a substantial healthcare enterprise.

SQL

SQL Data Warehouse Azure Cloud Data

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage. Data Visualization: Matplotlib, Seaborn, Tableau, etc.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

Intuitive Workflow Design Workflows should be easy to follow and visually organized, much like clean, well-structured SQL or Python code. WHERE d.name = 'Sales'; Matillion is designed as a no/low-code ELT tool, so lets leave the SQL deep dive for another time and focus on making workflows as clean and intuitive as possible!

AI

AI AI SQL ETL

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. They are crucial in ensuring data is readily available for analysis and reporting.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Big Data vs. Data Science: Demystifying the Buzzwords

Pickl AI

APRIL 21, 2025

Key Skills for Data Science: A data scientist typically needs a blend of skills: Mathematics and Statistics: To understand the theoretical underpinnings of models. Programming: Often in languages like Python or R, using libraries for data manipulation, analysis, and machine learning.

Big Data

Big Data Big Data Data Science Machine Learning

How to Setup a Project in Snowpark Using a Python IDE

phData

JULY 2, 2024

Snowpark, offered by the Snowflake AI Data Cloud , consists of libraries and runtimes that enable secure deployment and processing of non-SQL code, such as Python, Java, and Scala. Developers can seamlessly build data pipelines, ML models, and data applications with User-Defined Functions and Stored Procedures.

Python

Python SQL Data Pipeline ML

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Summary: Choosing the right ETL tool is crucial for seamless data integration. Top contenders like Apache Airflow and AWS Glue offer unique features, empowering businesses with efficient workflows, high data quality, and informed decision-making capabilities. Scalability: Designed to handle large volumes of data efficiently.

ETL

ETL Data Quality Data Pipeline Data Warehouse

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

If you answer “yes” to any of these questions, you will need cloud storage, such as Amazon AWS’s S3, Azure Data Lake Storage or GCP’s Google Storage. Copy Into When loading data into Snowflake, the very first and most important rule to follow is: do not load data with SQL inserts!

Clustering

Clustering Database SQL Data Pipeline

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. SageMaker Studio offers built-in algorithms, automated model tuning, and seamless integration with AWS services, making it a powerful platform for developing and deploying machine learning solutions at scale.

Machine Learning

Machine Learning Machine Learning ML ML

List of ETL Tools: Explore the Top ETL Tools for 2025

Pickl AI

APRIL 9, 2025

Integration : Can it connect with existing systems like AWS, Azure, or Google Cloud? It supports complex data transformations and offers advanced features like data quality management and metadata management. PowerCenter is particularly favored by large organizations with extensive data integration needs.

ETL

ETL Data Warehouse AWS Business Intelligence

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

.” Das Kamhout, VP and Senior Principal Engineer of the Cloud and Enterprise Solutions Group at Intel Watsonx.data supports our customers’ increasing needs around hybrid cloud deployments and is available on premises and across multiple cloud providers, including IBM Cloud and Amazon Web Services (AWS).

AI

AI AI Machine Learning Machine Learning

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently. What Are Matillion Jobs and Why Do They Matter?

ETL

ETL Data Warehouse SQL Database

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.). Streamsets Data Collector StreamSets Data Collector Engine is an easy-to-use data pipeline engine for streaming, CDC, and batch ingestion from any source to any destination.

Data Warehouse

Data Warehouse Azure AWS Database

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

The June 2021 release of Power BI Desktop introduced Custom SQL queries to Snowflake in DirectQuery mode. However, Snowflake runs better on Azure than it does on AWS – so even though it’s not the ideal situation, Microsoft still sees Azure consumption when organizations host Snowflake on Azure.

Power BI

Power BI Analytics Analytics Azure

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

Dolt LakeFS Delta Lake Pachyderm Git-like versioning Database tool Data lake Data pipelines Experiment tracking Integration with cloud platforms Integrations with ML tools Examples of data version control tools in ML DVC Data Version Control DVC is a version control system for data and machine learning teams.

ML

ML ML Data Lakes Machine Learning

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

This individual is responsible for building and maintaining the infrastructure that stores and processes data; the kinds of data can be diverse, but most commonly it will be structured and unstructured data. They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Building an Effective OSS Management Layer for Your Data Lake

ODSC - Open Data Science

OCTOBER 13, 2024

Open table formats, such as Delta Lake or Apache Iceberg, add a crucial metadata layer over raw data files, allowing us to manage schema, enforce transactions, and even track changes to datasets over time.

Data Lakes

Data Lakes Database Data Pipeline SQL

When To Use Internal vs. External Stages in Snowflake

phData

AUGUST 4, 2023

The shared-nothing architecture ensures that users don’t have to worry about distributing data across multiple cluster nodes. Snowflake hides user data objects and makes them accessible only through SQL queries through the compute layer. This includes tasks such as data cleansing, enrichment, and aggregation.

Database

Database Azure SQL AWS

Shaping the future: OMRON’s data-driven journey with AWS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Webinars

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

Essential data engineering tools for 2023: Empowering for management and analysis

The power of remote engine execution for ETL/ELT data pipelines

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

University of British Columbia Cloud Innovation Centre: Prototyping generative AI solutions using AWS

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

Use Snowflake as a data source to train ML models with Amazon SageMaker

How to Build Effective Data Pipelines in Snowpark

Understanding ETL Tools as a Data-Centric Organization

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

A Guide to Choose the Best Data Science Bootcamp

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

2021 Data/AI Salary Survey

40 Must-Know Data Science Skills and Frameworks for 2023

Comparing Tools For Data Processing Pipelines

Top 5 Data Warehouses to Supercharge Your Big Data Strategy

Top 5 Fivetran Connectors for Healthcare

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Discover the Most Important Fundamentals of Data Engineering

Big Data vs. Data Science: Demystifying the Buzzwords

How to Setup a Project in Snowpark Using a Python IDE

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Getting Started With Snowflake: Best Practices For Launching

MLOps Landscape in 2023: Top Tools and Platforms

List of ETL Tools: Explore the Top ETL Tools for 2025

Exploring the AI and data capabilities of watsonx

Best Practices When Developing Matillion Jobs

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

How to Optimize Power BI and Snowflake for Advanced Analytics

How to Version Control Data in ML for Various Data Sources

How to Shift from Data Science to Data Engineering

Building an Effective OSS Management Layer for Your Data Lake

When To Use Internal vs. External Stages in Snowflake

Stay Connected