Data Pipeline, Definition and ML - Data Science Current

Journeying into the realms of ML engineers and data scientists

Dataconomy

MAY 16, 2023

With their technical expertise and proficiency in programming and engineering, they bridge the gap between data science and software engineering. By recognizing these key differences, organizations can effectively allocate resources, form collaborative teams, and create synergies between machine learning engineers and data scientists.

Data Scientist

Data Scientist ML ML Machine Learning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. Solution overview The following diagram illustrates the solution architecture for each option.

ML

ML ML AWS Data Warehouse

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

IBM Data Science in Practice

MARCH 8, 2023

The growth of the AI and Machine Learning (ML) industry has continued to grow at a rapid rate over recent years. Hidden Technical Debt in Machine Learning Systems More money, more problems — Rise of too many ML tools 2012 vs 2023 — Source: Matt Turck People often believe that money is the solution to a problem. Spark, Flink, etc.)

Machine Learning

Machine Learning Machine Learning ML ML

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. Introduction Data pipelines play a pivotal role in modern data architecture by seamlessly transporting and transforming raw data into valuable insights.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot. As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers to build and deploy models at scale.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment.

AWS

AWS Machine Learning Machine Learning ML

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

This makes managing and deploying these updates across a large-scale deployment pipeline while providing consistency and minimizing downtime a significant undertaking. Generative AI applications require continuous ingestion, preprocessing, and formatting of vast amounts of data from various sources.

ML

ML ML Python AWS

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. SageMaker is a fully managed service for building, training, and deploying ML models.

Machine Learning

Machine Learning Machine Learning ML ML

Managing Dataset Versions in Long-Term ML Projects

The MLOps Blog

MARCH 20, 2023

Long-term ML project involves developing and sustaining applications or systems that leverage machine learning models, algorithms, and techniques. An example of a long-term ML project will be a bank fraud detection system powered by ML models and algorithms for pattern recognition. 2 Ensuring and maintaining high-quality data.

ML

ML ML Machine Learning Machine Learning

Organizing ML Monorepo With Pants

The MLOps Blog

AUGUST 4, 2023

Situations described above arise way too often in ML teams, and their consequences vary from a single developer’s annoyance to the team’s inability to ship their code as needed. Let’s dive into the world of monorepos, an architecture widely adopted in major tech companies like Google, and how they can enhance your ML workflows.

ML

ML ML Machine Learning Machine Learning

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

To get a better grip on those changes we reviewed over 25,000 data scientist job descriptions from that past year to find out what employers are looking for in 2023. Much of what we found was to be expected, though there were definitely a few surprises. Employers aren’t just looking for people who can program.

Data Science

Data Science Data Scientist Computer Science Computer Science

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics. Connecting directly to this semantic layer will help give customers access to critical business data in a safe, governed manner. Direct connection to Google BigQuery.

Tableau

Tableau Analytics Analytics Machine Learning

Implementing GenAI in Practice

Iguazio

JANUARY 22, 2024

Definitions: Foundation Models, Gen AI, and LLMs Before diving into the practice of productizing LLMs, let’s review the basic definitions of GenAI elements: Foundation Models (FMs) - Large deep learning models that are pre-trained with attention mechanisms on massive datasets. This helps cleanse the data.

Data Pipeline

Data Pipeline ML ML Data Warehouse

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

Data scientists and machine learning engineers need to collaborate to make sure that together with the model, they develop robust data pipelines. These pipelines cover the entire lifecycle of an ML project, from data ingestion and preprocessing, to model training, evaluation, and deployment.

Machine Learning

Machine Learning Machine Learning ML ML

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

These activities cover disparate fields such as basic data processing, analytics, and machine learning (ML). ML is often associated with PBAs, so we start this post with an illustrative figure. The ML paradigm is learning followed by inference. The union of advances in hardware and ML has led us to the current day.

AWS

AWS ML ML Clustering

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics. Connecting directly to this semantic layer will help give customers access to critical business data in a safe, governed manner. Direct connection to Google BigQuery.

Tableau

Tableau Analytics Analytics Machine Learning

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

And, as organizations progress and grow, “data drift” starts to impact data usage, models, and your business. In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used.

Data Governance

Data Governance ML ML Cloud Data

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Managing unstructured data is essential for the success of machine learning (ML) projects. Without structure, data is difficult to analyze and extracting meaningful insights and patterns is challenging. This article will discuss managing unstructured data for AI and ML projects. What is Unstructured Data?

Machine Learning

Machine Learning Machine Learning Data Lakes AI

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

One should really think of us at the level of doing the technical implementation work around designing, developing and operationally deploying data products and services that use ML. I’ll give you a rough guide to what we’ll talk about—in the first place, a very macro and micro view of the importance of data.

Data Quality

Data Quality ML ML AI

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

One should really think of us at the level of doing the technical implementation work around designing, developing and operationally deploying data products and services that use ML. I’ll give you a rough guide to what we’ll talk about—in the first place, a very macro and micro view of the importance of data.

Data Quality

Data Quality ML ML AI

Best 8 Experiment Tracking Tools for Machine Learning 2024

DagsHub

DECEMBER 5, 2023

This is where ML experiment tracking comes into play! What is ML Experiment Tracking? ML experiment tracking is the process of recording, organizing, and analyzing the results of ML experiments. It helps data scientists keep track of their experiments, reproduce their results, and collaborate with others effectively.

Machine Learning

Machine Learning Machine Learning ML ML

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

One should really think of us at the level of doing the technical implementation work around designing, developing and operationally deploying data products and services that use ML. I’ll give you a rough guide to what we’ll talk about—in the first place, a very macro and micro view of the importance of data.

Data Quality

Data Quality ML ML AI

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

This includes the tools and techniques we used to streamline the ML model development and deployment processes, as well as the measures taken to monitor and maintain models in a production environment. Costs: Oftentimes, cost is the most important aspect of any ML model deployment. This includes data quality, privacy, and compliance.

AWS

AWS ETL ML ML

How to Optimize GPU Usage During Model Training With neptune.ai

The MLOps Blog

MARCH 28, 2024

Strategies for improving GPU usage include mixed-precision training, optimizing data transfer and processing, and appropriately dividing workloads between CPU and GPU. GPU and CPU metrics can be monitored using an ML experiment tracker like Neptune, enabling teams to identify bottlenecks and systematically improve training performance.

Deep Learning

Deep Learning Deep Learning Data Pipeline Machine Learning

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

GPT-4 Data Pipelines: Transform JSON to SQL Schema Instantly Blockstream’s public Bitcoin API. The data would be interesting to analyze. From Data Engineering to Prompt Engineering Prompt to do data analysis BI report generation/data analysis In BI/data analysis world, people usually need to query data (small/large).

AI

AI AI Data Analysis Data Analysis

Gen AI 101: Testing and Monitoring (Part 4)

phData

AUGUST 15, 2024

In traditional machine learning , data pipelines feeding into the model have queries written with idempotency in mind, and data validation checks are performed before and after inference to confirm an expected output. If you missed the other blogs in the series, definitely check them out!

AI

AI AI Data Engineering Data Engineering

How to build reusable data cleaning pipelines with scikit-learn

Snorkel AI

JULY 3, 2023

As the algorithms we use have gotten more robust and we have increased our compute power through new technologies, we haven’t made nearly as much progress on the data part of our jobs. Because of this, I’m always looking for ways to automate and improve our data pipelines. So why should we use data pipelines?

Exploratory Data Analysis

Exploratory Data Analysis Data Pipeline Data Scientist Machine Learning

How to build reusable data cleaning pipelines with scikit-learn

Snorkel AI

JULY 3, 2023

As the algorithms we use have gotten more robust and we have increased our compute power through new technologies, we haven’t made nearly as much progress on the data part of our jobs. Because of this, I’m always looking for ways to automate and improve our data pipelines. So why should we use data pipelines?

Exploratory Data Analysis

Exploratory Data Analysis Data Pipeline Data Scientist Machine Learning

Mastering AI Applications: What to Expect from the AI Builders Summit Schedule

ODSC - Open Data Science

JANUARY 3, 2025

Fine-tune Your Own Open-Source SLMs Devvret Rishi, CEO of Predibase, and Chloe Leung, ML solutions architect at Predibase Discover how to cost-effectively customize open-source small language models (SLMs) to outperform GPT-4 on various tasks. Cloning NotebookLM with Open Weights Models Niels Bantilan, Chief ML Engineer atUnion.AI

AI

AI AI ML ML

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

I am Ali Arsanjani, and I lead partner engineering for Google Cloud, specializing in the area of AI-ML, and I’m very happy to be here today with everyone. Then we’re going to talk about adapting foundation models for the enterprise and how that affects the ML lifecycle, and what we need to potentially add to the lifecycle.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

Google’s Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

I am Ali Arsanjani, and I lead partner engineering for Google Cloud, specializing in the area of AI-ML, and I’m very happy to be here today with everyone. Then we’re going to talk about adapting foundation models for the enterprise and how that affects the ML lifecycle, and what we need to potentially add to the lifecycle.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

Gen AI 101: Technology Choices (Part 1)

phData

JULY 5, 2024

It is definitely an exciting time as the open-source community enhances and builds out these frameworks, but they are still being refined with best practices and new features. Conclusion This blog has only covered the minimum technologies required to build the bare bones of a generative AI application.

AI

AI AI Database AWS

Beginner’s Guide To GCP BigQuery (Part 2)

Mlearning.ai

JULY 10, 2023

In case of complex data pipelines, a combination of Materialized Views, Stored Procedures, and Scheduled Queries could be a better choice than to solely rely on Scheduled Queries by itself.

SQL

SQL Database Database Administration Data Lakes

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Why Migrate to a Modern Data Stack? Data teams can focus on delivering higher-value data tasks with better organizational visibility. Move Beyond One-off Analytics: The Modern Data Stack empowers you to elevate your data for advanced analytics and integration of AI/ML, enabling faster generation of actionable business insights.

Data Warehouse

Data Warehouse Analytics Analytics SQL

Build a Stocks Price Prediction App powered by Snowflake, AWS, Python and Streamlit?—?Part 2 of 3

Mlearning.ai

MARCH 15, 2023

I have checked the AWS S3 bucket and Snowflake tables for a couple of days and the Data pipeline is working as expected. The scope of this article is quite big, we will exercise the core steps of data science, let's get started… Project Layout Here are the high-level steps for this project.

Python

Python AWS Exploratory Data Analysis Machine Learning

Taking the First Steps Toward Enterprise AI

phData

JUNE 7, 2023

Data scientists use data-driven approaches to enable AI systems to make better predictions, optimize decision-making, and uncover hidden patterns that ultimately drive innovation and improve performance across various domains. It includes techniques like supervised, unsupervised, and reinforcement learning.

AI

AI AI Machine Learning Machine Learning

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

There comes a time when every ML practitioner realizes that training a model in Jupyter Notebook is just one small part of the entire project. Getting a workflow ready which takes your data from its raw form to predictions while maintaining responsiveness and flexibility is the real deal.

ML

ML ML Machine Learning Machine Learning

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

All this raw data goes into your persistent stage. Then, if you later refine your definition of what constitutes an “engaged” customer, having the raw data in persistent staging allows for easy reprocessing of historical data with the new logic. Your customer data game will never be the same.

Data Models

Data Models Data Modeling Apache Kafka Data Lakes

Generate training data and cost-effectively train categorical models with Amazon Bedrock

AWS Machine Learning Blog

MARCH 27, 2025

In this post, we explore how you can use Amazon Bedrock to generate high-quality categorical ground truth data, which is crucial for training machine learning (ML) models in a cost-sensitive environment. This use case, solvable through ML, can enable support teams to better understand customer needs and optimize response strategies.

AWS

AWS ETL ML ML

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

AWS Machine Learning Blog

OCTOBER 24, 2024

The agent can generate SQL queries using natural language questions using a database schema DDL (data definition language for SQL) and execute them against a database instance for the database tier. We use Amazon Bedrock Agents with two knowledge bases for this assistant.

AWS

AWS SQL Database AI

Journeying into the realms of ML engineers and data scientists

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Webinars

Trending Sources

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

Webinars

Build Data Pipelines: Comprehensive Step-by-Step Guide

Definite Guide to Building a Machine Learning Platform

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Managing Dataset Versions in Long-Term ML Projects

Organizing ML Monorepo With Pants

40 Must-Know Data Science Skills and Frameworks for 2023

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Implementing GenAI in Practice

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

A review of purpose-built accelerators for financial services

Self-Service Analytics for Google Cloud, now with Looker and Tableau

The Cloud Connection: How Governance Supports Security

How to Manage Unstructured Data in AI and Machine Learning Projects

McKinsey QuantumBlack on automating data quality remediation with AI

McKinsey QuantumBlack on automating data quality remediation with AI

Best 8 Experiment Tracking Tools for Machine Learning 2024

McKinsey QuantumBlack on automating data quality remediation with AI

How to Build a CI/CD MLOps Pipeline [Case Study]

How to Optimize GPU Usage During Model Training With neptune.ai

Generative AI in Software Development

Gen AI 101: Testing and Monitoring (Part 4)

How to build reusable data cleaning pipelines with scikit-learn

How to build reusable data cleaning pipelines with scikit-learn

Mastering AI Applications: What to Expect from the AI Builders Summit Schedule

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Google’s Arsanjani on Enterprise Foundation Model Challenges

Gen AI 101: Technology Choices (Part 1)

Beginner’s Guide To GCP BigQuery (Part 2)

The Ultimate Modern Data Stack Migration Guide

Build a Stocks Price Prediction App powered by Snowflake, AWS, Python and Streamlit?—?Part 2 of 3

Taking the First Steps Toward Enterprise AI

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Generate training data and cost-effectively train categorical models with Amazon Bedrock

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

Stay Connected