Cloud Data, Data Engineering and SQL

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Cloud Data Science 7

Data Science 101

FEBRUARY 15, 2020

Welcome to Cloud Data Science 7. Announcements around an exciting new open-source deep learning library, a new data challenge and more. Amazon Personalize can now use 10x more item attributes Personalize, which is a customizable recommendation engine, can now use 50 attributes instead of just 5. Training and Courses.

Cloud Data

Cloud Data Data Science Deep Learning Deep Learning

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

By automating the provisioning and management of cloud resources through code, IaC brings a host of advantages to the development and maintenance of Data Warehouse Systems in the cloud. So why using IaC for Cloud Data Infrastructures? appeared first on Data Science Blog.

Data Warehouse

Data Warehouse Azure SQL Database

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

These experiences facilitate professionals from ingesting data from different sources into a unified environment and pipelining the ingestion, transformation, and processing of data to developing predictive models and analyzing the data by visualization in interactive BI reports. In the menu bar on the left, select Workspaces.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Object-centric Process Mining on Data Mesh Architectures

Data Science Blog

NOVEMBER 15, 2023

Simple Data Model for a Process Mining Event Log As part of data engineering, the data traces that indicate process activities are brought into a log-like schema. A simple event log is therefore a simple table with the minimum requirement of a process number (case ID), a time stamp and an activity description.

Data Modeling

Data Modeling Data Models Business Intelligence Business Intelligence

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Big Data Technologies : Handling and processing large datasets using tools like Hadoop, Spark, and cloud platforms such as AWS and Google Cloud. Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

What Is Fivetran and How Much Does It Cost?

phData

MARCH 8, 2023

Fivetran is an automated data integration platform that offers a convenient solution for businesses to consolidate and sync data from disparate data sources. With over 160 data connectors available, Fivetran makes it easy to move data out of, into, and across any cloud data platform in the market.

Data Warehouse

Data Warehouse Data Engineering Data Engineering Data Engineering

Was ist ein Data Lakehouse?

Data Science Blog

MAY 15, 2023

Data Warehousing ist seit den 1980er Jahren die wichtigste Lösung für die Speicherung und Verarbeitung von Daten für Business Intelligence und Analysen. Mit der zunehmenden Datenmenge und -vielfalt wurde die Verwaltung von Data Warehouses jedoch immer schwieriger und teurer.

Data Warehouse

Data Warehouse Data Lakes Azure AWS

Process Mining – Ist Celonis wirklich so gut? Ein Praxisbericht.

Data Science Blog

SEPTEMBER 3, 2024

Celonis unterscheidet sich von den meisten anderen Tools noch dahingehend, dass es versucht, die ganze Kette des Process Minings in einer einzigen und ausschließlichen Cloud-Anwendung in einer Suite bereitzustellen. Vielleicht haben wir auch das ein Stück weit Celonis zu verdanken. Aber auch andere Prozesse für andere Geschäftsprozesse z.

Data Science

Data Science Power BI Azure Data Warehouse

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

There are several styles of data integration. Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow.

Data Pipeline

Data Pipeline ETL SQL Database

Why Open Table Format Architecture is Essential for Modern Data Systems

phData

NOVEMBER 8, 2024

Data Versioning and Time Travel Open Table Formats empower users with time travel capabilities, allowing them to access previous dataset versions. Versioning also ensures a safer experimentation environment, where data scientists can test new models or hypotheses on historical data snapshots without impacting live data.

Data Lakes

Data Lakes Data Warehouse Database Azure

Where Does Fivetran Fit into The Modern Data Stack?

phData

JULY 17, 2023

Over the past few decades, the corporate data landscape has changed significantly. The shift from on-premise databases and spreadsheets to the modern era of cloud data warehouses and AI/ LLMs has transformed what businesses can do with data. What is the Modern Data Stack? Data modeling, data cleanup, etc.

Data Warehouse

Data Warehouse Data Pipeline Cloud Data ETL

The Data Engineer’s Roadmap

Dataversity

SEPTEMBER 28, 2022

Data engineering is a fascinating and fulfilling career – you are at the helm of every business operation that requires data, and as long as users generate data, businesses will always need data engineers. The journey to becoming a successful data engineer […].

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

How to Create a dbt Custom Materialization

phData

AUGUST 1, 2024

A prime example of this is automating repetitive code performed in many models or implementing a new feature introduced in your cloud data warehouse. Scenarios Now, we need to build the SQL statements. In this case, we have to create it before loading the data. In our case, we need to set up the temporary table SQL first.

SQL

SQL Database Data Warehouse Cloud Data

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

In recent years, data engineering teams working with the Snowflake Data Cloud platform have embraced the continuous integration/continuous delivery (CI/CD) software development process to develop data products and manage ETL/ELT workloads more efficiently. What Are the Benefits of CI/CD Pipeline For Snowflake?

Data Pipeline

Data Pipeline Database SQL Data Engineering

Best Practices For Using Snowflake With KNIME

phData

MARCH 29, 2023

However, many analysts and other data professionals run into two common problems: They are not given direct access to their database They lack the skills in SQL to write the queries themselves The traditional solution to these problems is to rely on IT and data engineering teams. Only use the data you need.

Database

Database SQL Analytics Analytics

Where to Find Snowflake Training Resources

phData

MARCH 27, 2024

The SnowPro Advanced Administrator Certification targets Snowflake Administrators, Snowflake Data Cloud Administrators, Database Administrators, Cloud Infrastructure Administrators, and Cloud Data Administrators. I found the Data Engineering Simplified’s playlists particularly beneficial during my studies.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently. Below are the best practices.

ETL

ETL Data Warehouse SQL Database

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Alation

APRIL 4, 2023

Profiling delivers a birds-eye view of the statistics of the data, such as minimum, maximum, median, and null values. This empowers users to judge data’s quality and fitness for purpose quickly. This expanded connector to Databricks Unity Catalog does just that, delivering to joint customers a comprehensive view of all cloud data.

DataOps

DataOps Data Engineering Data Engineering Data Engineering

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

The Snowflake Data Cloud is a leading cloud data platform that provides various features and services for data storage, processing, and analysis. A new feature that Snowflake offers is called Snowpark, which provides an intuitive library for querying and processing data at scale in Snowflake.

Python

Python ML ML SQL

Marketing Questions phData Can Answer with Data

phData

JULY 24, 2024

Utilizing AI and machine learning (ML) models can sound like a daunting task, but it is achievable, especially with the ML engineering experts at phData by your side to guide you in your data journey. Many data engineering consulting companies can answer these questions, and you may have the in-house talent to do it yourself.

Machine Learning

Machine Learning Machine Learning Data Engineering Data Engineering

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

Organizations must ensure their data pipelines are well designed and implemented to achieve this, especially as their engagement with cloud data platforms such as the Snowflake Data Cloud grows. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineering

Retail & CPG Questions phData Can Answer with Data

phData

JUNE 26, 2024

Cleaning and preparing the data Raw data typically shouldn’t be used in machine learning models as it’ll throw off the prediction. Data engineers can prepare the data by removing duplicates, dealing with outliers, standardizing data types and precision between data sets, and joining data sets together.

Machine Learning

Machine Learning Machine Learning Data Engineering Data Engineering

Healthcare Questions phData Can Answer with Data

phData

JULY 23, 2024

This data can help healthcare providers retain their key talent and save hundreds of thousands of dollars in yearly recruiting costs. Many data engineering consulting companies could also answer these questions for you, or maybe you think your team has the talent to do it in-house. Why phData?

Machine Learning

Machine Learning Machine Learning Data Engineering Data Engineering

Why Upgrade to dbt Cloud over dbt Core?

phData

OCTOBER 12, 2022

It comes with a rather lightweight intellisense, and highlights for both SQL and Jinja use. The real power is the ability to run your models and view the outputs, or even have your SQL compiled to verify that your Jinja or SQL compiles into the correct model. Our team of data experts are happy to assist. Reach out today!

SQL

SQL Data Warehouse Data Visualization Cloud Data

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

However, if there’s one thing we’ve learned from years of successful cloud data implementations here at phData, it’s the importance of: Defining and implementing processes Building automation, and Performing configuration …even before you create the first user account. And once again, for loading data, do not use SQL Inserts.

Clustering

Clustering SQL Database Data Pipeline

Why Migrate From Netezza to Snowflake?

phData

JANUARY 4, 2023

Data Sharing Enterprises can easily create data sharing relationships with direct, governed, and secure sharing in near-real time. With Snowflake, organizations can be data consumers, data providers, or both. Ready to Get Started in the Migration to Snowflake?

Data Warehouse

Data Warehouse SQL Database ETL

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Data warehousing is a vital constituent of any business intelligence operation. Companies can build Snowflake databases expeditiously and use them for ad-hoc analysis by making SQL queries. Machine Learning Integration Opportunities Organizations harness machine learning (ML) algorithms to make forecasts on the data.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

One big issue that contributes to this resistance is that although Snowflake is a great cloud data warehousing platform, Microsoft has a data warehousing tool of its own called Synapse. The June 2021 release of Power BI Desktop introduced Custom SQL queries to Snowflake in DirectQuery mode.

Power BI

Power BI Analytics Analytics Azure

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

Proper data preparation leads to better model performance and more accurate predictions. SageMaker Canvas allows interactive data exploration, transformation, and preparation without writing any SQL or Python code. The following diagram shows the SageMaker Canvas data flow after adding visual transformations.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

Manufacturing Questions phData Can Answer with Data

phData

JULY 18, 2024

Many data engineering consulting companies could also answer these questions for you, or maybe you think you have the talent on your team to do it in-house. Expertise Here at phData, we strive to be experts in data engineering, analytics, and machine learning. Why phData? Why should you choose phData to help?

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Engineering

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Alation

SEPTEMBER 7, 2021

Organizations need to ensure that data use adheres to policies (both organizational and regulatory). In an ideal world, you’d get compliance guidance before and as you use the data. Imagine writing a SQL query or using a BI dashboard with flags & warnings on compliance best practice within your natural workflow.

Data Governance

Data Governance Data Scientist Data Quality Data Profiling

Top 5 Use Cases of phData’s Advisor Tool

phData

MARCH 29, 2024

Founded in 2014 by three leading cloud engineers, phData focuses on solving real-world data engineering, operations, and advanced analytics problems with the best cloud platforms and products. Over the years, one of our primary focuses became Snowflake and migrating customers to this leading cloud data platform.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Alation 2022.3: Alation Anywhere Connecting the Modern Data Stack

Alation

AUGUST 30, 2022

These range from data sources , including SaaS applications like Salesforce; ELT like Fivetran; cloud data warehouses like Snowflake; and data science and BI tools like Tableau. This expansive map of tools constitutes today’s modern data stack. But different users have different needs.

Data Governance

Data Governance Data Quality Tableau Data Analyst

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These tools are used to manage big data, which is defined as data that is too large or complex to be processed by traditional means. How Did the Modern Data Stack Get Started? The rise of cloud computing and cloud data warehousing has catalyzed the growth of the modern data stack.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities. Savings may vary depending on configurations, workloads and vendor.

AI

AI AI Machine Learning Machine Learning

What is ThoughtSpot? Everything You Need to Know

phData

SEPTEMBER 4, 2024

ThoughtSpot is a cloud-based AI-powered analytics platform that uses natural language processing (NLP) or natural language query (NLQ) to quickly query results and generate visualizations without the user needing to know any SQL or table relations. Why Use ThoughtSpot?

Analytics

Analytics Analytics SQL ETL

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Matillion Matillion is a complete ETL tool that integrates with an extensive list of pre-built data source connectors, loads data into cloud data environments such as Snowflake, and then performs transformations to make data consumable by analytics tools such as Tableau and PowerBI.

Data Warehouse

Data Warehouse Azure AWS Database

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. But what does this mean from a practitioner perspective?

Data Analyst

Data Analyst Data Scientist Analytics Analytics

What is Identity Resolution? A Comprehensive Guide

phData

MAY 6, 2024

Another benefit of deterministic matching is that the process to build these identities is relatively simple, and tools your teams might already use, like SQL and dbt , can efficiently manage this process within your cloud data warehouse.

Data Lakes

Data Lakes Data Warehouse SQL Cloud Data

Data Mesh Architecture and the Data Catalog

Alation

FEBRUARY 8, 2022

While data fabric takes a product-and-tech-centric approach, data mesh takes a completely different perspective. Data mesh inverts the common model of having a centralized team (such as a data engineering team), who manage and transform data for wider consumption. But why is such an inversion needed?

Data Governance

Data Governance Data Engineering Data Engineering Data Engineering

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Cloud Data Science 7

Webinars

Trending Sources

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Webinars

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Object-centric Process Mining on Data Mesh Architectures

A Guide to Choose the Best Data Science Bootcamp

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

What Is Fivetran and How Much Does It Cost?

Top 6 Snowflake Interview Questions

Was ist ein Data Lakehouse?

Process Mining – Ist Celonis wirklich so gut? Ein Praxisbericht.

The power of remote engine execution for ETL/ELT data pipelines

Why Open Table Format Architecture is Essential for Modern Data Systems

Where Does Fivetran Fit into The Modern Data Stack?

The Data Engineer’s Roadmap

How to Create a dbt Custom Materialization

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

Best Practices For Using Snowflake With KNIME

Where to Find Snowflake Training Resources

Best Practices When Developing Matillion Jobs

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

How Does Snowpark Work?

Marketing Questions phData Can Answer with Data

How to Build Effective Data Pipelines in Snowpark

Retail & CPG Questions phData Can Answer with Data

Healthcare Questions phData Can Answer with Data

Why Upgrade to dbt Cloud over dbt Core?

Getting Started With Snowflake: Best Practices For Launching

Why Migrate From Netezza to Snowflake?

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

How to Optimize Power BI and Snowflake for Advanced Analytics

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Manufacturing Questions phData Can Answer with Data

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Top 5 Use Cases of phData’s Advisor Tool

Alation 2022.3: Alation Anywhere Connecting the Modern Data Stack

The Modern Data Stack Explained: What The Future Holds

Exploring the AI and data capabilities of watsonx

What is ThoughtSpot? Everything You Need to Know

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

What is Identity Resolution? A Comprehensive Guide

Data Mesh Architecture and the Data Catalog

Stay Connected