Blog, Data Pipeline and ML - Data Science Current

Lakehouse Monitoring: A Unified Solution for Quality of Data and AI

databricks

DECEMBER 12, 2023

Introduction Databricks Lakehouse Monitoring allows you to monitor all your data pipelines – from data to features to ML models – without additional too.

Data Pipeline

Data Pipeline ML ML AI

10 Technical Blogs for Data Scientists to Advance AI/ML Skills

DataRobot Blog

DECEMBER 6, 2022

With a goal to help data science teams learn about the application of AI and ML, DataRobot shares helpful, educational blogs based on work with the world’s most strategic companies. Explore these 10 popular blogs that help data scientists drive better data decisions. Read the blog. Read the blog.

Data Scientist

Data Scientist ML ML AI

Enhanced observability for AWS Trainium and AWS Inferentia with Datadog

AWS Machine Learning Blog

NOVEMBER 26, 2024

With the increasing use of large models, requiring a large number of accelerated compute instances, observability plays a critical role in ML operations, empowering you to improve performance, diagnose and fix failures, and optimize resource utilization. Anjali Thatte is a Product Manager at Datadog.

AWS

AWS ML ML Data Pipeline

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier.

ETL

ETL Data Pipeline ML ML

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow. Organizations can harness the full potential of their data while reducing risk and lowering costs.

Data Pipeline

Data Pipeline ETL SQL Database

The ultimate guide to the Machine Learning Model Deployment

Data Science Dojo

JULY 5, 2023

Machine Learning (ML) is a powerful tool that can be used to solve a wide variety of problems. Getting your ML model ready for action: This stage involves building and training a machine learning model using efficient machine learning algorithms. Training and validation: The next step is to train the model on a subset of the data.

Machine Learning

Machine Learning Machine Learning EDA ML

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others. Let’s learn about the services we will use to make this happen.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS Machine Learning Blog

DECEMBER 4, 2024

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. Only 54% of ML prototypes make it to production, and only 5% of generative AI use cases make it to production. Using SageMaker, you can build, train and deploy ML models.

ML

ML ML AWS AI

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. The next step is to build ML models using features selected from one or multiple feature groups.

ML

ML ML AWS Data Warehouse

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. We add this data to Snowflake as a new table.

ML

ML ML AWS Python

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Automate and streamline our ML inference pipeline with SageMaker and Airflow Building an inference data pipeline on large datasets is a challenge many companies face. The Batch job automatically launches an ML compute instance, deploys the model, and processes the input data in batches, producing the output predictions.

Data Pipeline

Data Pipeline ML ML AWS

Boost your MLOps efficiency with these 6 must-have tools and platforms

Data Science Dojo

FEBRUARY 20, 2023

In this blog, we’ll show you how to boost your MLOps efficiency with 6 essential tools and platforms. Machine learning (ML) is the technology that automates tasks and provides insights. Machine learning (ML) is the technology that automates tasks and provides insights. It also has ML algorithms built into the platform.

Machine Learning

Machine Learning Machine Learning AWS Azure

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineering

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

IBM Data Science in Practice

MARCH 8, 2023

The growth of the AI and Machine Learning (ML) industry has continued to grow at a rapid rate over recent years. Hidden Technical Debt in Machine Learning Systems More money, more problems — Rise of too many ML tools 2012 vs 2023 — Source: Matt Turck People often believe that money is the solution to a problem. Spark, Flink, etc.)

Machine Learning

Machine Learning Machine Learning ML ML

How to establish lineage transparency for your machine learning initiatives

IBM Journey to AI blog

MAY 20, 2024

Machine learning (ML) has become a critical component of many organizations’ digital transformation strategy. From predicting customer behavior to optimizing business processes, ML algorithms are increasingly being used to make decisions that impact business outcomes.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. Introduction Data pipelines play a pivotal role in modern data architecture by seamlessly transporting and transforming raw data into valuable insights.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

10 highest-paying AI jobs and careers in 2024

Data Science Dojo

APRIL 16, 2024

In this blog, we will explore the top 10 AI jobs and careers that are also the highest-paying opportunities for individuals in 2024. Machine learning (ML) engineer Potential pay range – US$82,000 to 160,000/yr Machine learning engineers are the bridge between data science and engineering.

AI

AI AI Machine Learning Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

AWS Machine Learning: A Beginner’s Guide

How to Learn Machine Learning

DECEMBER 24, 2024

You can easily: Store and process data using S3 and RedShift Create data pipelines with AWS Glue Deploy models through API Gateway Monitor performance with CloudWatch Manage access control with IAM This integrated ecosystem makes it easier to build end-to-end machine learning solutions.

Machine Learning

Machine Learning Machine Learning AWS ML

Discovering the Role of Data Science in a Cloud World

Pickl AI

DECEMBER 26, 2024

Advancements in data processing, storage, and analysis technologies power this transformation. In Data Science in a Cloud World, we explore how cloud computing has revolutionised Data Science. Key Features Tailored for Data Science These platforms offer specialised features to enhance productivity.

Data Science

Data Science Cloud Computing Machine Learning Machine Learning

Streamlining Process Configuration in Machine Learning with Hydra

Pickl AI

NOVEMBER 29, 2024

It enhances scalability, experimentation, and reproducibility, allowing ML teams to focus on innovation. This blog highlights the importance of organised, flexible configurations in ML workflows and introduces Hydra. It also simplifies managing configuration dependencies in Deep Learning projects and large-scale data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Statistical methods and machine learning (ML) methods are actively developed and adopted to maximize the LTV. In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker.

AWS

AWS ML ML ETL

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. For more information, refer to Preview: Use Amazon SageMaker to Build, Train, and Deploy ML Models Using Geospatial Data.

ML

ML ML AWS Data Pipeline

Performance Benefits of Snowpark for ML Workloads

phData

MARCH 22, 2023

As companies continue to adopt machine learning (ML) in their workflows, the demand for scalable and efficient tools has increased. In this blog post, we will explore the performance benefits of Snowpark for ML workloads and how it can help businesses make better use of their data. Want to learn more? Can’t wait?

ML

ML ML Machine Learning Machine Learning

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Instead, businesses tend to rely on advanced tools and strategies—namely artificial intelligence for IT operations (AIOps) and machine learning operations (MLOps)—to turn vast quantities of data into actionable insights that can improve IT decision-making and ultimately, the bottom line.

Big Data

Big Data Big Data ML ML

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Heartbeat

NOVEMBER 6, 2023

Image Source — Pixel Production Inc In the previous article, you were introduced to the intricacies of data pipelines, including the two major types of existing data pipelines. You might be curious how a simple tool like Apache Airflow can be powerful for managing complex data pipelines.

Data Pipeline

Data Pipeline Clean Data ETL Python

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

AWS Machine Learning Blog

FEBRUARY 5, 2025

In this two-part blog post series, we explore the key opportunities OfferUp embraced on their journey to boost and transform their existing search solution from traditional lexical search to modern multimodal search powered by Amazon Bedrock and Amazon OpenSearch Service.

K-nearest Neighbors

K-nearest Neighbors Machine Learning Machine Learning Database

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

AWS Machine Learning Blog

APRIL 19, 2023

Since 2018, our team has been developing a variety of ML models to enable betting products for NFL and NCAA football. After reading a few blog posts and DJL’s official documentation, we were sure DJL would provide the best solution to our problem. Business requirements We are the US squad of the Sportradar AI department.

ML

ML ML Deep Learning Deep Learning

Managing Dataset Versions in Long-Term ML Projects

The MLOps Blog

MARCH 20, 2023

Long-term ML project involves developing and sustaining applications or systems that leverage machine learning models, algorithms, and techniques. An example of a long-term ML project will be a bank fraud detection system powered by ML models and algorithms for pattern recognition. 2 Ensuring and maintaining high-quality data.

ML

ML ML Machine Learning Machine Learning

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

Iguazio

JULY 31, 2023

MLOps accelerates the ML model deployment process to make it more efficient and scalable. In this blog post, we detail the steps you need to take to build and run a successful MLOps pipeline. An extension of DevOps, MLOps streamlines and monitors ML workflows. MLOps pipelines support a production-first approach.

ML

ML ML Machine Learning Machine Learning

Organizing ML Monorepo With Pants

The MLOps Blog

AUGUST 4, 2023

Situations described above arise way too often in ML teams, and their consequences vary from a single developer’s annoyance to the team’s inability to ship their code as needed. Let’s dive into the world of monorepos, an architecture widely adopted in major tech companies like Google, and how they can enhance your ML workflows.

ML

ML ML Machine Learning Machine Learning

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

This makes managing and deploying these updates across a large-scale deployment pipeline while providing consistency and minimizing downtime a significant undertaking. Generative AI applications require continuous ingestion, preprocessing, and formatting of vast amounts of data from various sources.

ML

ML ML Python AWS

What Lays Ahead in 2024? AI/ML Predictions for the New Year

Iguazio

DECEMBER 18, 2023

For data science practitioners, productization is key, just like any other AI or ML technology. However, it's important to contextualize generative AI within the broader landscape of AI and ML technologies. By thinking about the ML process in advance: preparing, managing, and versioning data, reusing components, etc.,

ML

ML ML AI AI

Designing generative AI workloads for resilience

AWS Machine Learning Blog

FEBRUARY 1, 2024

Data pipelines In cases where you need to provide contextual data to the foundation model using the RAG pattern, you need a data pipeline that can ingest the source data, convert it to embedding vectors, and store the embedding vectors in a vector database.

AWS

AWS AI AI Database

How HR Tech Company Sense Scaled their ML Operations using Iguazio

Iguazio

JANUARY 16, 2024

Since AI is a central pillar of their value offering, Sense has invested heavily in a robust engineering organization including a large number of data and AI professionals. This includes a data team, an analytics team, DevOps, AI/ML, and a data science team. Gennaro Frazzingaro, Head of AI/ML at Sense.

ML

ML ML DataOps Data Scientist

How Sense Uses Iguazio as a Key Component of Their ML Stack

Iguazio

JANUARY 16, 2024

This includes a data team, an analytics team, DevOps, AI/ML, and a data science team. The AI/Ml team is made up of ML engineers, data scientists and backend product engineers. With Iguazio, Sense’s data professionals can pull data, analyze it, train and run experiments.

ML

ML ML DataOps Data Scientist

Mastering ML Model Performance: Best Practices for Optimal Results

Iguazio

JUNE 25, 2023

Evaluating ML model performance is essential for ensuring the reliability, quality, accuracy and effectiveness of your ML models. In this blog post, we dive into all aspects of ML model performance: which metrics to use to measure performance, best practices that can help and where MLOps fits in.

ML

ML ML Clustering Cross Validation

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

Iguazio

JULY 22, 2024

In this blog post, we introduce the joint MongoDB - Iguazio gen AI solution, which allows for the development and deployment of resilient and scalable gen AI applications. necessitate the storage and processing of even larger volumes of data in an operationally timely manner. Atlas Vector Search lets you search unstructured data.

AI

AI AI ML ML

Unleashing Innovation and Success: Comet.ml?—?The Trusted ML Platform for Enterprise Environments

Heartbeat

SEPTEMBER 18, 2023

Unleashing Innovation and Success: Comet — The Trusted ML Platform for Enterprise Environments Machine learning (ML) is a rapidly developing field, and businesses are increasingly depending on ML platforms to fuel innovation, improve efficiency, and mine data for insights.

ML

ML ML Data Scientist Machine Learning

How Did We Get to ML Model Reproducibility

The MLOps Blog

MARCH 14, 2023

When working on real-world ML projects , you come face-to-face with a series of obstacles. The ml model reproducibility problem is one of them. Instead, we tend to spend much time on data exploration, preprocessing, and modeling. This is indeed an erroneous thing to do when working on ML projects at scale.

ML

ML ML Machine Learning Machine Learning

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. Their task is to construct and oversee efficient data pipelines.

AWS

AWS ML ML Machine Learning

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. SageMaker is a fully managed service for building, training, and deploying ML models.

Machine Learning

Machine Learning Machine Learning ML ML

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

APRIL 5, 2024

SageMaker geospatial capabilities make it straightforward for data scientists and machine learning (ML) engineers to build, train, and deploy models using geospatial data. Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. Outside work, he is a travel enthusiast.

Clustering

Clustering ML ML AWS

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

A novel approach to solve this complex security analytics scenario combines the ingestion and storage of security data using Amazon Security Lake and analyzing the security data with machine learning (ML) using Amazon SageMaker. Deploy the trained ML model to a SageMaker inference endpoint.

AWS

AWS ML ML Algorithm

Lakehouse Monitoring: A Unified Solution for Quality of Data and AI

10 Technical Blogs for Data Scientists to Advance AI/ML Skills

Webinars

Trending Sources

Enhanced observability for AWS Trainium and AWS Inferentia with Datadog

Webinars

How to Build ETL Data Pipeline in ML

The power of remote engine execution for ETL/ELT data pipelines

The ultimate guide to the Machine Learning Model Deployment

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Real value, real time: Production AI with Amazon SageMaker and Tecton

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Use Snowflake as a data source to train ML models with Amazon SageMaker

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Boost your MLOps efficiency with these 6 must-have tools and platforms

How to Build Effective Data Pipelines in Snowpark

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

How to establish lineage transparency for your machine learning initiatives

Build Data Pipelines: Comprehensive Step-by-Step Guide

10 highest-paying AI jobs and careers in 2024

MLOps Landscape in 2023: Top Tools and Platforms

AWS Machine Learning: A Beginner’s Guide

Discovering the Role of Data Science in a Cloud World

Streamlining Process Configuration in Machine Learning with Hydra

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Performance Benefits of Snowpark for ML Workloads

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

Managing Dataset Versions in Long-Term ML Projects

Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects

Organizing ML Monorepo With Pants

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

What Lays Ahead in 2024? AI/ML Predictions for the New Year

Designing generative AI workloads for resilience

How HR Tech Company Sense Scaled their ML Operations using Iguazio

How Sense Uses Iguazio as a Key Component of Their ML Stack

Mastering ML Model Performance: Best Practices for Optimal Results

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

Unleashing Innovation and Success: Comet.ml?—?The Trusted ML Platform for Enterprise Environments

How Did We Get to ML Model Reproducibility

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

Stay Connected