Data Modeling, Data Pipeline and Database

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. They transform data into a consistent format for users to consume.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Testing and Monitoring Data Pipelines: Part Two

Dataversity

JUNE 19, 2023

In part one of this article, we discussed how data testing can specifically test a data object (e.g., table, column, metadata) at one particular point in the data pipeline.

Data Pipeline

Data Pipeline Database Data Modeling Data Models

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Demystifying Time Series Database: A Comprehensive Guide

Pickl AI

JULY 8, 2024

Summary: Time series databases (TSDBs) are built for efficiently storing and analyzing data that changes over time. This data, often from sensors or IoT devices, is typically collected at regular intervals. Buckle up as we navigate the intricacies of storing and analysing this dynamic data.

Database

Database Data Pipeline Machine Learning Machine Learning

Here’s Why Automation For Data Lakes Could Be Important

Smart Data Collective

APRIL 2, 2019

Based on Microsoft’s discussion of the topic, CDC makes it much easier for a data store to accept changes within a database as it only updates the changed records of the database instead of reloading the entire tables that were affected. The Second Problem – Quickly Querying Data.

Data Lakes

Data Lakes Big Data Big Data Data Scientist

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

Leveraging Looker’s semantic layer will provide Tableau customers with trusted, governed data at every stage of their analytics journey. With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics.

Tableau

Tableau Analytics Analytics Machine Learning

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

If you will ask data professionals about what is the most challenging part of their day to day work, you will likely discover their concerns around managing different aspects of data before they get to graduate to the data modeling stage. This ensures that the data is accurate, consistent, and reliable.

Data Pipeline

Data Pipeline ETL SQL Data Quality

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Model versioning, lineage, and packaging : Can you version and reproduce models and experiments? Can you see the complete model lineage with data/models/experiments used downstream? Dolt Dolt is an open-source relational database system built on Git. Is it fast and reliable enough for your workflow?

Machine Learning

Machine Learning Machine Learning ML ML

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage. Data Warehousing: Amazon Redshift, Google BigQuery, etc.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

Iguazio

MAY 29, 2024

The 4 Gen AI Architecture Pipelines The four pipelines are: 1. The Data Pipeline The data pipeline is the foundation of any AI system. It's responsible for collecting and ingesting the data from various external sources, processing it and managing the data.

Data Pipeline

Data Pipeline AI AI ML

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

Iguazio

JULY 22, 2024

MongoDB for end-to-end AI data management MongoDB Atlas , an integrated suite of data services centered around a multi-cloud NoSQL database, enables developers to unify operational, analytical, and AI data services to streamline building AI-enriched applications.

AI

AI AI ML ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

Where Does Fivetran Fit into The Modern Data Stack?

phData

JULY 17, 2023

Over the past few decades, the corporate data landscape has changed significantly. The shift from on-premise databases and spreadsheets to the modern era of cloud data warehouses and AI/ LLMs has transformed what businesses can do with data. This is where Fivetran and the Modern Data Stack come in.

Data Warehouse

Data Warehouse Data Pipeline Cloud Data ETL

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

DagsHub DagsHub is a centralized Github-based platform that allows Machine Learning and Data Science teams to build, manage and collaborate on their projects. In addition to versioning code, teams can also version data, models, experiments and more. It does not support the ‘dvc repro’ command to reproduce its data pipeline.

Machine Learning

Machine Learning Machine Learning Data Lakes Data Science

Who is a BI Developer: Role, Responsibilities & Skills

Pickl AI

JULY 3, 2023

It is the process of converting raw data into relevant and practical knowledge to help evaluate the performance of businesses, discover trends, and make well-informed choices. Data gathering, data integration, data modelling, analysis of information, and data visualization are all part of intelligence for businesses.

Business Intelligence

Business Intelligence Business Intelligence SQL Data Visualization

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

And you should have experience working with big data platforms such as Hadoop or Apache Spark. Additionally, data science requires experience in SQL database coding and an ability to work with unstructured data of various types, such as video, audio, pictures and text.

Data Science

Data Science Analytics Analytics Data Scientist

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

Leveraging Looker’s semantic layer will provide Tableau customers with trusted, governed data at every stage of their analytics journey. With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics.

Tableau

Tableau Analytics Analytics Machine Learning

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Elementl / Dagster Labs Elementl and Dagster Labs are both companies that provide platforms for building and managing data pipelines. Elementl’s platform is designed for data engineers, while Dagster Labs’ platform is designed for data scientists. However, there are some critical differences between the two companies.

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

What are Snowflake Dynamic Tables?

phData

NOVEMBER 2, 2023

Managing data pipelines efficiently is paramount for any organization. The Snowflake Data Cloud has introduced a groundbreaking feature that promises to simplify and supercharge this process: Snowflake Dynamic Tables. What are Snowflake Dynamic Tables?

Data Pipeline

Data Pipeline SQL Data Warehouse Data Engineering

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

Having gone public in 2020 with the largest tech IPO in history, Snowflake continues to grow rapidly as organizations move to the cloud for their data warehousing needs. Importing data allows you to ingest a copy of the source data into an in-memory database.

Power BI

Power BI Analytics Analytics Azure

Unlocking Tabular Data’s Hidden Potential

ODSC - Open Data Science

MAY 10, 2023

Many mistakenly equate tabular data with business intelligence rather than AI, leading to a dismissive attitude toward its sophistication. Standard data science practices could also be contributing to this issue. One might say that tabular data modeling is the original data-centric AI!

Data Scientist

Data Scientist Data Science Deep Learning Deep Learning

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

That said, dbt provides the ability to generate data vault models and also allows you to write your data transformations using SQL and code-reusable macros powered by Jinja2 to run your data pipelines in a clean and efficient way. The most important reason for using DBT in Data Vault 2.0

SQL

SQL Data Observability Data Quality Data Pipeline

Implementing GenAI in Practice

Iguazio

JANUARY 22, 2024

Production App - Build resilient and modular production pipelines with automation, scale, testing, observability, versioning, security, risk handling, etc. Monitoring - Monitor all resources, data, model and application metrics to ensure performance. This helps cleanse the data.

Data Pipeline

Data Pipeline ML ML Data Warehouse

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date. mp4,webm, etc.),

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

It involves retrieving data from various sources, such as databases, spreadsheets, or even cloud storage. The goal is to collect relevant data without affecting the source system’s performance. Compatibility with Existing Systems and Data Sources Compatibility is critical. How to drop a database in SQL server?

ETL

ETL Data Quality Data Pipeline Data Warehouse

Top 5 Data Warehouses to Supercharge Your Big Data Strategy

Women in Big Data

NOVEMBER 27, 2024

It consolidates data from various systems, such as transactional databases, CRM platforms, and external data sources, enabling organizations to perform complex queries and derive insights.

Data Warehouse

Data Warehouse Big Data Big Data Azure

What Are dbt Artifacts

phData

FEBRUARY 8, 2024

Data Modeling, dbt has gradually emerged as a powerful tool that largely simplifies the process of building and handling data pipelines. dbt is an open-source command-line tool that allows data engineers to transform, test, and document the data into one single hub which follows the best practices of software engineering.

Data Modeling

Data Modeling Data Models Data Warehouse Database

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

DataSeries

AUGUST 15, 2024

Data Engineering Career: Unleashing The True Potential of Data Problem-Solving Skills Data Engineers are required to possess strong analytical and problem-solving skills to navigate complex data challenges. Understanding these fundamentals is essential for effective problem-solving in data engineering.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. This type of next-generation data store combines a data lake’s flexibility with a data warehouse’s performance and lets you scale AI workloads no matter where they reside.

AI

AI AI Data Warehouse ML

Implementing Gen AI for Financial Services

Iguazio

FEBRUARY 20, 2024

This includes management vision and strategy, resource commitment, data and tech and operating model alignment, robust risk management and change management. The required architecture includes a data pipeline, ML pipeline, application pipeline and a multi-stage pipeline. Read more here.

AI

AI AI Data Pipeline Analytics

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Thus, the solution allows for scaling data workloads independently from one another and seamlessly handling data warehousing, data lakes , data sharing, and engineering. Snowflake Database Pros Extensive Storage Opportunities Snowflake provides affordability, scalability, and a user-friendly interface.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

How to Use Fivetran to Ingest Salesforce Data into Snowflake

phData

SEPTEMBER 25, 2024

Under this category, tools with pre-built connectors for popular data sources and visual tools for data transformation are better choices. Integration: How well does the tool integrate with your existing infrastructure, databases, cloud platforms, and analytics tools? What is Fivetran?

ETL

ETL Database Data Warehouse Analytics

Data architecture strategy for data quality

IBM Journey to AI blog

JANUARY 5, 2023

What does a modern data architecture do for your business? A modern data architecture like Data Mesh and Data Fabric aims to easily connect new data sources and accelerate development of use case specific data pipelines across on-premises, hybrid and multicloud environments.

Data Quality

Data Quality Data Lakes Data Warehouse Big Data

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

Generative AI can be used to automate the data modeling process by generating entity-relationship diagrams or other types of data models and assist in UI design process by generating wireframes or high-fidelity mockups. GPT-4 Data Pipelines: Transform JSON to SQL Schema Instantly Blockstream’s public Bitcoin API.

AI

AI AI Data Analysis Data Analysis

Star Schema vs. Snowflake Schema: Comparing Dimensional Modeling Techniques

Pickl AI

JULY 25, 2024

Must Read Blogs: Exploring the Power of Data Warehouse Functionality. Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world. Exploring Differences: Database vs Data Warehouse. Its clear structure and ease of use facilitate efficient data analysis and reporting.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

What Free Tools Pair Well With The Snowflake AI Data Cloud?

phData

OCTOBER 17, 2024

The Data Source Tool can automate scanning DDL and profiling tables between source and target, comparing them, and then reporting findings. Aside from migrations, Data Source is also great for data quality checks and can generate data pipelines. But you still want to start building out the data model.

AI

AI AI SQL Data Quality

What Industries are Hiring for Different Jobs in AI

ODSC - Open Data Science

APRIL 26, 2023

Data Engineer Data engineers are the authors of the infrastructure that stores, processes, and manages the large volumes of data an organization has. The main aspect of their profession is the building and maintenance of data pipelines, which allow for data to move between sources.

Data Analyst

Data Analyst Machine Learning Machine Learning Power BI

ML Collaboration: Best Practices From 4 ML Teams

The MLOps Blog

DECEMBER 28, 2022

Team composition The team comprises data pipeline engineers, ML engineers, full-stack engineers, and data scientists. Large organizations have geographically spread out data science teams that are generally not aware of what their peers are working on.

ML

ML ML Data Scientist Machine Learning

Best 8 Experiment Tracking Tools for Machine Learning 2024

DagsHub

DECEMBER 5, 2023

DagsHub MLflow By using DagsHub’s MLflow implementation, the remote setup is done for us, eliminating the need to store experiment data locally or host the server ourselves. It additionally covers features such as live logging, experiment database, artifact storage, model registry, and deployment.

Machine Learning

Machine Learning Machine Learning ML ML

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

First up, let’s dive into the foundation of every Modern Data Stack, a cloud-based data warehouse. Central Source of Truth for Analytics A Cloud Data Warehouse (CDW) is a type of database that provides analytical data processing and storage capabilities within a cloud-based infrastructure.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

It is specially designed for monitoring highly dynamic containerized environments such as Kubernetes and provides powerful features for collecting, querying, visualizing, and alerting on time-series data. Apache Airflow Apache Airflow is an open-source workflow orchestration tool that can manage complex workflows and data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Introduction: The Customer Data Modeling Dilemma You know, that thing we’ve been doing for years, trying to capture the essence of our customers in neat little profile boxes? For years, we’ve been obsessed with creating these grand, top-down customer data models. Yeah, that one.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Why Should you Codify your Best Practices in dbt?

phData

JANUARY 7, 2025

An example of naming intermediate sub-directory and model file name Models The example below illustrates that intermediate models do not need to be physically present in the target database. Downstream Models Dependent on Source : Downstream models (marts or intermediate) should not directly depend on source nodes.

SQL

SQL Data Warehouse Database Data Modeling

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Testing and Monitoring Data Pipelines: Part Two

Webinars

Trending Sources

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Webinars

Demystifying Time Series Database: A Comprehensive Guide

Here’s Why Automation For Data Lakes Could Be Important

Discover the Most Important Fundamentals of Data Engineering

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Comparing Tools For Data Processing Pipelines

MLOps Landscape in 2023: Top Tools and Platforms

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Where Does Fivetran Fit into The Modern Data Stack?

Best 8 Data Version Control Tools for Machine Learning 2024

Who is a BI Developer: Role, Responsibilities & Skills

Data science vs data analytics: Unpacking the differences

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Find Your AI Solutions at the ODSC West AI Expo

What are Snowflake Dynamic Tables?

How to Optimize Power BI and Snowflake for Advanced Analytics

Unlocking Tabular Data’s Hidden Potential

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

Implementing GenAI in Practice

How to Manage Unstructured Data in AI and Machine Learning Projects

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Top 5 Data Warehouses to Supercharge Your Big Data Strategy

What Are dbt Artifacts

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

How to use foundation models and trusted governance to manage AI workflow risk

Implementing Gen AI for Financial Services

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

How to Use Fivetran to Ingest Salesforce Data into Snowflake

Data architecture strategy for data quality

Generative AI in Software Development

Star Schema vs. Snowflake Schema: Comparing Dimensional Modeling Techniques

What Free Tools Pair Well With The Snowflake AI Data Cloud?

What Industries are Hiring for Different Jobs in AI

ML Collaboration: Best Practices From 4 ML Teams

Best 8 Experiment Tracking Tools for Machine Learning 2024

The Ultimate Modern Data Stack Migration Guide

How to Choose MLOps Tools: In-Depth Guide for 2024

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Why Should you Codify your Best Practices in dbt?

Stay Connected