AWS, Cloud Data and Data Engineering

AWS Storage: Cost Optimization Principles

Analytics Vidhya

OCTOBER 29, 2022

Organizations are collecting data at an alarming pace to analyze and derive insights for business enhancements. The abundant requirement for data collection made cloud data storage an unavoidable option concerning the […]. The post AWS Storage: Cost Optimization Principles appeared first on Analytics Vidhya.

AWS

AWS Cloud Data Data Science Analytics

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

AWS Machine Learning Blog

NOVEMBER 15, 2024

Principal wanted to use existing internal FAQs, documentation, and unstructured data and build an intelligent chatbot that could provide quick access to the right information for different roles. Principal also used the AWS open source repository Lex Web UI to build a frontend chat interface with Principal branding.

AWS

AWS AI AI Machine Learning

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

By automating the provisioning and management of cloud resources through code, IaC brings a host of advantages to the development and maintenance of Data Warehouse Systems in the cloud. So why using IaC for Cloud Data Infrastructures? appeared first on Data Science Blog.

Data Warehouse

Data Warehouse Azure SQL Database

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. AWS CodeBuild is a fully managed continuous integration service in the cloud.

AWS

AWS ML ML Machine Learning

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS.

ML

ML ML AWS AI

Was ist ein Data Lakehouse?

Data Science Blog

MAY 15, 2023

Spark ist direkt auf mehreren Cloud-Plattformen verfügbar, darunter AWS, Azure und Google Cloud Platform.Apacke Spark ist jedoch mehr als nur ein Tool, es ist die Grundbasis für die meisten anderen Tools. Databricks: Databricks ist eine Cloud-basierte Datenverarbeitungs- und Analyseplattform, die auf Apache Spark aufbaut.

Data Warehouse

Data Warehouse Data Lakes Azure AWS

11 Open-Source Data Engineering Tools Every Pro Should Use

ODSC - Open Data Science

FEBRUARY 6, 2024

Data engineering has become an integral part of the modern tech landscape, driving advancements and efficiencies across industries. So let’s explore the world of open-source tools for data engineers, shedding light on how these resources are shaping the future of data handling, processing, and visualization.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Object-centric Process Mining on Data Mesh Architectures

Data Science Blog

NOVEMBER 15, 2023

The creation of this data model requires the data connection to the source system (e.g. SAP ERP), the extraction of the data and, above all, the data modeling for the event log. DATANOMIQ Data Mesh Cloud Architecture – This image is animated! Click to enlarge!

Data Models

Data Models Data Modeling Business Intelligence Business Intelligence

What It’s Like To Work as a Solutions Engineer at phData

phData

MARCH 20, 2023

Length of Interview: 30 – 45 minutes Interview 2: Leadership In this interview, you will meet with the Director of the Solutions Engineering team. The discussion points in this interview will include a review of your current experience as it relates to cloud data engineering and solution engineering.

Cloud Data

Cloud Data Azure Data Engineering Data Engineer

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

This is where the AWS suite of low-code and no-code ML services becomes an essential tool. As a strategic systems integrator with deep ML experience, Deloitte utilizes the no-code and low-code ML tools from AWS to efficiently build and deploy ML models for Deloitte’s clients and for internal assets.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Big Data Technologies : Handling and processing large datasets using tools like Hadoop, Spark, and cloud platforms such as AWS and Google Cloud. Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

On-Prem vs. The Cloud: Key Considerations

phData

FEBRUARY 21, 2025

The Cloud represents an iteration beyond the on-prem data warehouse, where computing resources are delivered over the Internet and are managed by a third-party provider. Examples include: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Data integrations and pipelines can also impact latency.

Data Warehouse

Data Warehouse Cloud Data ETL Cloud Computing

Why Open Table Format Architecture is Essential for Modern Data Systems

phData

NOVEMBER 8, 2024

Data Versioning and Time Travel Open Table Formats empower users with time travel capabilities, allowing them to access previous dataset versions. Versioning also ensures a safer experimentation environment, where data scientists can test new models or hypotheses on historical data snapshots without impacting live data.

Data Lakes

Data Lakes Data Warehouse Database Azure

How to Connect Snowflake to Python

phData

JANUARY 5, 2023

Python is the top programming language used by data engineers in almost every industry. Python has proven proficient in setting up pipelines, maintaining data flows, and transforming data with its simple syntax and proficiency in automation. Truly a must-have tool in your data engineering arsenal!

Python

Python Data Engineering Data Engineer Data Engineering

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

There are several styles of data integration. Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow.

Data Pipeline

Data Pipeline ETL SQL Database

Process Mining – Ist Celonis wirklich so gut? Ein Praxisbericht.

Data Science Blog

SEPTEMBER 3, 2024

Reduzierte Personalkosten , sind oft dann gegeben, wenn interne Data Engineers verfügbar sind, die die Datenmodelle intern entwickeln. Höhere Data Readiness , denn für eine zentrale Datenplattform lohn es sich eher, Daten aus weniger genutzten Quellen anzuschließen. Müssen Rohdatentabellen in die Analyse-Tools wie z.

Data Science

Data Science Power BI Azure Data Warehouse

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Matillion Matillion is a complete ETL tool that integrates with an extensive list of pre-built data source connectors, loads data into cloud data environments such as Snowflake, and then performs transformations to make data consumable by analytics tools such as Tableau and PowerBI.

Data Warehouse

Data Warehouse Azure AWS Database

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities. Savings may vary depending on configurations, workloads and vendor.

AI

AI AI Machine Learning Machine Learning

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

Organizations must ensure their data pipelines are well designed and implemented to achieve this, especially as their engagement with cloud data platforms such as the Snowflake Data Cloud grows. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

Alation 2022.1: Customize Your Data Catalog

Alation

MARCH 1, 2022

Through Impact Analysis, users can determine if a problem occurred with data upstream, and locate the impacted data downstream. With robust data lineage, data engineers can find and fix issues fast and prevent them from recurring. Similarly, analysts gain a clear view of how data is created.

Data Warehouse

Data Warehouse Data Lakes Cloud Data Database

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

In recent years, data engineering teams working with the Snowflake Data Cloud platform have embraced the continuous integration/continuous delivery (CI/CD) software development process to develop data products and manage ETL/ELT workloads more efficiently. What Are the Benefits of CI/CD Pipeline For Snowflake?

Data Pipeline

Data Pipeline Database SQL Data Engineering

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently. What Are Matillion Jobs and Why Do They Matter?

ETL

ETL Data Warehouse SQL Database

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

phData

APRIL 4, 2023

Fivetran today announced support for Amazon Simple Storage Service (Amazon S3) with Apache Iceberg data lake format. Amazon S3 is an object storage service from Amazon Web Services (AWS) that offers industry-leading scalability, data availability, security, and performance.

Data Lakes

Data Lakes Data Warehouse Cloud Data AWS

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

Accenture calls it the Intelligent Data Foundation (IDF), and it’s used by dozens of enterprises with very complex data landscapes and analytic requirements. Simply put, IDF standardizes data engineering processes. IDF works natively on cloud platforms like AWS. Take a look at figure 1 below.

DataOps

DataOps Data Pipeline Data Engineering Data Engineer

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

This article explores the importance of ETL pipelines in machine learning, a hands-on example of building ETL pipelines with a popular tool, and suggests the best ways for data engineers to enhance and sustain their pipelines. Before delving into the technical details, let’s review some fundamental concepts.

ETL

ETL Data Pipeline ML ML

How You Can Create and Share Your Own Matillion Shared Job

phData

SEPTEMBER 27, 2024

Modern business operations rely heavily on data engineering and transformation processes to turn raw data into valuable insights. Matillion, a robust ELT (Extract, Load, Transform) platform, simplifies data integration and transformation complexities with a no-code or high-code experience. What is a Matillion Job?

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

One big issue that contributes to this resistance is that although Snowflake is a great cloud data warehousing platform, Microsoft has a data warehousing tool of its own called Synapse. Gateways are being used as another layer of security between Snowflake or cloud data source and Power BI users.

Power BI

Power BI Analytics Analytics Azure

Data Mesh Architecture and the Data Catalog

Alation

FEBRUARY 8, 2022

While data fabric takes a product-and-tech-centric approach, data mesh takes a completely different perspective. Data mesh inverts the common model of having a centralized team (such as a data engineering team), who manage and transform data for wider consumption. But why is such an inversion needed?

Data Governance

Data Governance Data Engineering Data Engineer Data Engineering

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

However, if there’s one thing we’ve learned from years of successful cloud data implementations here at phData, it’s the importance of: Defining and implementing processes Building automation, and Performing configuration …even before you create the first user account.

Clustering

Clustering Database SQL Data Pipeline

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

Modern low-code/no-code ETL tools allow data engineers and analysts to build pipelines seamlessly using a drag-and-drop and configure approach with minimal coding. Matillion ETL for Snowflake is an ELT/ETL tool that allows for the ingestion, transformation, and building of analytics for data in the Snowflake AI Data Cloud.

Python

Python ETL AWS Database

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Flipboard

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Learn more about the AWS zero-ETL future with newly launched AWS databases integrations with Amazon Redshift.

AWS

AWS Data Warehouse ETL SQL

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. The robust security features provided by Amazon S3, including encryption and durability, were used to provide data protection.

AWS

AWS Data Governance Data Silos SQL

Data Mesh Architecture on Cloud for BI, Data Science and Process Mining

Data Science Blog

JULY 23, 2023

Azure’s strong focus on security, compliance, and global presence, along with hybrid cloud capabilities and cost management tools, make it an ideal choice for industrial firms seeking to modernize, innovate, and improve efficiency.

Data Science

Data Science Azure Power BI Business Intelligence

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

With the birth of cloud data warehouses, data applications, and generative AI , processing large volumes of data faster and cheaper is more approachable and desired than ever. First up, let’s dive into the foundation of every Modern Data Stack, a cloud-based data warehouse.

Data Warehouse

Data Warehouse Analytics Analytics SQL

Advance environmental sustainability in clinical trials using AWS

AWS Machine Learning Blog

NOVEMBER 1, 2024

AWS can play a key role in enabling fast implementation of these decentralized clinical trials. By exploring these AWS powered alternatives, we aim to demonstrate how organizations can drive progress towards more environmentally friendly clinical research practices.

AWS

AWS Data Lakes Machine Learning Machine Learning

Import data from Google Cloud Platform BigQuery for no-code machine learning with Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 28, 2024

The workflow includes the following steps: Within the SageMaker Canvas interface, the user composes a SQL query to run against the GCP BigQuery data warehouse. Athena uses the Athena Google BigQuery connector , which uses a pre-built AWS Lambda function to enable Athena federated query capabilities.

Machine Learning

Machine Learning Machine Learning ML ML

AWS Storage: Cost Optimization Principles

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

Webinars

Trending Sources

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Webinars

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Was ist ein Data Lakehouse?

11 Open-Source Data Engineering Tools Every Pro Should Use

Object-centric Process Mining on Data Mesh Architectures

What It’s Like To Work as a Solutions Engineer at phData

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

A Guide to Choose the Best Data Science Bootcamp

On-Prem vs. The Cloud: Key Considerations

Why Open Table Format Architecture is Essential for Modern Data Systems

How to Connect Snowflake to Python

The power of remote engine execution for ETL/ELT data pipelines

Process Mining – Ist Celonis wirklich so gut? Ein Praxisbericht.

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Exploring the AI and data capabilities of watsonx

How to Build Effective Data Pipelines in Snowpark

Alation 2022.1: Customize Your Data Catalog

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

Best Practices When Developing Matillion Jobs

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

Turnkey Cloud DataOps: Solution from Alation and Accenture

How to Build ETL Data Pipeline in ML

How You Can Create and Share Your Own Matillion Shared Job

How to Optimize Power BI and Snowflake for Advanced Analytics

Data Mesh Architecture and the Data Catalog

Getting Started With Snowflake: Best Practices For Launching

Top 10 Python Scripts for use in Matillion for Snowflake

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Shaping the future: OMRON’s data-driven journey with AWS

Data Mesh Architecture on Cloud for BI, Data Science and Process Mining

The Ultimate Modern Data Stack Migration Guide

Advance environmental sustainability in clinical trials using AWS

Import data from Google Cloud Platform BigQuery for no-code machine learning with Amazon SageMaker Canvas

Stay Connected