Data Modeling, Data Pipeline and Document

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

Iguazio

JULY 22, 2024

MongoDB for end-to-end AI data management MongoDB Atlas , an integrated suite of data services centered around a multi-cloud NoSQL database, enables developers to unify operational, analytical, and AI data services to streamline building AI-enriched applications. Atlas Vector Search lets you search unstructured data.

AI

AI AI ML ML

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

User support arrangements Consider the availability and quality of support from the provider or vendor, including documentation, tutorials, forums, customer service, etc. Kubeflow integrates with popular ML frameworks, supports versioning and collaboration, and simplifies the deployment and management of ML pipelines on Kubernetes clusters.

Machine Learning

Machine Learning Machine Learning ML ML

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

Leveraging Looker’s semantic layer will provide Tableau customers with trusted, governed data at every stage of their analytics journey. With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics.

Tableau

Tableau Analytics Analytics Machine Learning

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

That said, dbt provides the ability to generate data vault models and also allows you to write your data transformations using SQL and code-reusable macros powered by Jinja2 to run your data pipelines in a clean and efficient way. The most important reason for using DBT in Data Vault 2.0

SQL

SQL Data Observability Data Quality Data Pipeline

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Elementl / Dagster Labs Elementl and Dagster Labs are both companies that provide platforms for building and managing data pipelines. Elementl’s platform is designed for data engineers, while Dagster Labs’ platform is designed for data scientists. However, there are some critical differences between the two companies.

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence. Increase trust in AI outcomes.

AI

AI AI Data Warehouse ML

Implementing GenAI in Practice

Iguazio

JANUARY 22, 2024

In addition, MLOps practices like building data, experting tracking, versioning, artifacts and others, also need to be part of the GenAI productization process. For example, when indexing a new version of a document, it’s important to take care of versioning in the ML pipeline. This helps cleanse the data.

Data Pipeline

Data Pipeline ML ML Data Warehouse

Streamlining Process Configuration in Machine Learning with Hydra

Pickl AI

NOVEMBER 29, 2024

Machine Learning projects evolve rapidly, frequently introducing new data , models, and hyperparameters. Use Cases in ML Workflows Hydra excels in scenarios requiring frequent parameter tuning, such as hyperparameter optimisation, multi-environment testing, and orchestrating pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date.

Machine Learning

Machine Learning Machine Learning AI AI

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

Even with a composite model, the same respective considerations for Import and DirectQuery hold true. For more information on composite models, check out Microsoft’s official documentation. Creating an efficient data model can be the difference between having good or bad performance, especially when using DirectQuery.

Power BI

Power BI Analytics Analytics Azure

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

Leveraging Looker’s semantic layer will provide Tableau customers with trusted, governed data at every stage of their analytics journey. With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics.

Tableau

Tableau Analytics Analytics Machine Learning

Implementing Gen AI for Financial Services

Iguazio

FEBRUARY 20, 2024

Unconstrained, long, open-ended generation that may expose harmful or biased content to users, like legal document creation. This includes management vision and strategy, resource commitment, data and tech and operating model alignment, robust risk management and change management. Let’s dive into the data management pipeline.

AI

AI AI Data Pipeline Data Quality

Mastering Version Control for ML Models: Best Practices You Need to Know

DagsHub

AUGUST 29, 2024

Version control systems (VCS) play a key role in this area by offering a structured method to track changes made to models and handle versions of data and code used in these ML projects. With weak version control, teams could face problems like inconsistent data, model drift , and clashes in their code.

ML

ML ML Python Machine Learning

What Are dbt Artifacts

phData

FEBRUARY 8, 2024

Data Modeling, dbt has gradually emerged as a powerful tool that largely simplifies the process of building and handling data pipelines. dbt is an open-source command-line tool that allows data engineers to transform, test, and document the data into one single hub which follows the best practices of software engineering.

Data Modeling

Data Modeling Data Models Data Warehouse Database

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Open-Source Community: Airflow benefits from an active open-source community and extensive documentation. IBM Infosphere DataStage IBM Infosphere DataStage is an enterprise-level ETL tool that enables users to design, develop, and run data pipelines. Read Further: Azure Data Engineer Jobs.

ETL

ETL Data Quality Data Pipeline Data Warehouse

Who is a BI Developer: Role, Responsibilities & Skills

Pickl AI

JULY 3, 2023

It is the process of converting raw data into relevant and practical knowledge to help evaluate the performance of businesses, discover trends, and make well-informed choices. Data gathering, data integration, data modelling, analysis of information, and data visualization are all part of intelligence for businesses.

Business Intelligence

Business Intelligence Business Intelligence SQL Data Visualization

What Free Tools Pair Well With The Snowflake AI Data Cloud?

phData

OCTOBER 17, 2024

dbt offers a SQL-first transformation workflow that lets teams build data transformation pipelines while following software engineering best practices like CI/CD, modularity, and documentation. Aside from migrations, Data Source is also great for data quality checks and can generate data pipelines.

AI

AI AI SQL Data Quality

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

DagsHub DagsHub is a centralized Github-based platform that allows Machine Learning and Data Science teams to build, manage and collaborate on their projects. In addition to versioning code, teams can also version data, models, experiments and more. It does not support the ‘dvc repro’ command to reproduce its data pipeline.

Machine Learning

Machine Learning Machine Learning Data Lakes Database

What Lays Ahead in 2024? AI/ML Predictions for the New Year

Iguazio

DECEMBER 18, 2023

This will require investing resources in the entire AI and ML lifecycle, including building the data pipeline, scaling, automation, integrations, addressing risk and data privacy, and more. By doing so, you can ensure quality and production-ready models.

ML

ML ML AI AI

A Recipe For AI Strategy

ODSC - Open Data Science

FEBRUARY 8, 2024

How can we build up toward our vision in terms of solvable data problems and specific data products? data sources or simpler data models) of the data products we want to build? Start by making a data science strategy document. What are we working towards? What are the dependencies (e.g.

AI

AI AI Data Science Data Scientist

LLMOps vs. MLOps: Understanding the Differences

Iguazio

FEBRUARY 8, 2024

Data Pipeline - Manages and processes various data sources. ML Pipeline - Focuses on training, validation and deployment. Application Pipeline - Manages requests and data/model validations. Multi-Stage Pipeline - Ensures correct model behavior and incorporates feedback loops.

ML

ML ML Data Scientist AI

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Thus, the solution allows for scaling data workloads independently from one another and seamlessly handling data warehousing, data lakes , data sharing, and engineering. Therefore, you’ll be empowered to truncate and reprocess data if bugs are detected and provide an excellent raw data source for data scientists.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

What Lays Ahead in 2024? AI/ML Predictions for the New Year

Iguazio

DECEMBER 18, 2023

This will require investing resources in the entire AI and ML lifecycle, including building the data pipeline, scaling, automation, integrations, addressing risk and data privacy, and more. By doing so, you can ensure quality and production-ready models.

ML

ML ML AI AI

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

It is specially designed for monitoring highly dynamic containerized environments such as Kubernetes and provides powerful features for collecting, querying, visualizing, and alerting on time-series data. Apache Airflow Apache Airflow is an open-source workflow orchestration tool that can manage complex workflows and data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

What Industries are Hiring for Different Jobs in AI

ODSC - Open Data Science

APRIL 26, 2023

Business Analyst Though in many respects, quite similar to data analysts, you’ll find that business analysts most often work with a greater focus on industries such as finance, marketing, retail, and consulting. The main aspect of their profession is the building and maintenance of data pipelines, which allow for data to move between sources.

Data Analyst

Data Analyst Machine Learning Machine Learning Power BI

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

A typical machine learning pipeline with various stages highlighted | Source: Author Common types of machine learning pipelines In line with the stages of the ML workflow (data, model, and production), an ML pipeline comprises three different pipelines that solve different workflow stages.

ML

ML ML Machine Learning Machine Learning

Why Should you Codify your Best Practices in dbt?

phData

JANUARY 7, 2025

Below is a breakdown of the areas where the dbt project evaluator validates your project: Modeling: Direct Join to Source : Models should not reference both a source and another model. Downstream Models Dependent on Source : Downstream models (marts or intermediate) should not directly depend on source nodes.

SQL

SQL Data Warehouse Database Data Modeling

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

Functional and non-functional requirements need to be documented clearly, which architecture design will be based on and support. A typical SDLC has following stages: Stage1: Planning and requirement analysis, defining Requirements Gather requirement from end customer. Then software development phases are planned to deliver the software.

AI

AI AI Data Analysis Data Analysis

ML Collaboration: Best Practices From 4 ML Teams

The MLOps Blog

DECEMBER 28, 2022

It leads to gaps in communicating the requirements, which are neither understood well nor documented properly. Team composition The team comprises data pipeline engineers, ML engineers, full-stack engineers, and data scientists.

ML

ML ML Data Scientist Machine Learning

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

How do you get executives to understand the value of data governance? First, document your successes of good data, and how it happened. Share stories of data in good times and in bad (pictures help!). We’re planning data governance that’s primarily focused on compliance, data privacy, and protection.

Data Governance

Data Governance Data Quality Data Analyst Data Pipeline

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Transformation tools of old often lacked easy orchestration, were difficult to test/verify, required specialized knowledge of the tool, and the documentation of your transformations dependent on the willingness of the developer to document. It should also enable easy sharing of insights across the organization.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Learning these tools is crucial for building scalable data pipelines. offers Data Science courses covering these tools with a job guarantee for career growth. Below are 20 essential tools every data engineer should know.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Data Science Current

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

MLOps Landscape in 2023: Top Tools and Platforms

Webinars

Trending Sources

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Webinars

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

Find Your AI Solutions at the ODSC West AI Expo

How to use foundation models and trusted governance to manage AI workflow risk

Implementing GenAI in Practice

Streamlining Process Configuration in Machine Learning with Hydra

How to Manage Unstructured Data in AI and Machine Learning Projects

How to Optimize Power BI and Snowflake for Advanced Analytics

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Implementing Gen AI for Financial Services

Mastering Version Control for ML Models: Best Practices You Need to Know

What Are dbt Artifacts

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Who is a BI Developer: Role, Responsibilities & Skills

What Free Tools Pair Well With The Snowflake AI Data Cloud?

Best 8 Data Version Control Tools for Machine Learning 2024

What Lays Ahead in 2024? AI/ML Predictions for the New Year

A Recipe For AI Strategy

LLMOps vs. MLOps: Understanding the Differences

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

What Lays Ahead in 2024? AI/ML Predictions for the New Year

How to Choose MLOps Tools: In-Depth Guide for 2024

What Industries are Hiring for Different Jobs in AI

How to Build an End-To-End ML Pipeline

Why Should you Codify your Best Practices in dbt?

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Generative AI in Software Development

ML Collaboration: Best Practices From 4 ML Teams

Data Governance for Dummies: Your Questions, Answered

The Ultimate Modern Data Stack Migration Guide

Best Data Engineering Tools Every Engineer Should Know

Stay Connected