Data Lakes, Data Pipeline and ML - Data Science Current

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

With that, the need for data scientists and machine learning (ML) engineers has grown significantly. Data scientists and ML engineers require capable tooling and sufficient compute for their work. Data scientists and ML engineers require capable tooling and sufficient compute for their work.

ML

ML ML AWS AI

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier.

ETL

ETL Data Pipeline ML ML

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. The next step is to build ML models using features selected from one or multiple feature groups.

ML

ML ML AWS Data Warehouse

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. Introduction Data pipelines play a pivotal role in modern data architecture by seamlessly transporting and transforming raw data into valuable insights.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP. Additionally, Amazon Simple Storage Service (Amazon S3) served as the central data lake, providing a scalable and cost-effective storage solution for the diverse data types collected from different systems.

AWS

AWS Data Governance Data Silos SQL

Top 5 Tools for Building an Interactive Analytics App

Smart Data Collective

OCTOBER 27, 2021

Druid is specifically designed to support workflows that require fast ad-hoc analytics, concurrency, and instant data visibility are core necessities. It is easy to integrate with any existing data pipelines, and it can also stream data from the most popular message buses such as Amazon Kinesis and Kafka. Conclusion.

Analytics

Analytics Analytics Data Warehouse Business Intelligence

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

It does not support the ‘dvc repro’ command to reproduce its data pipeline. DVC Released in 2017, Data Version Control ( DVC for short) is an open-source tool created by iterative. However, these tools have functional gaps for more advanced data workflows. This can also make the learning process challenging.

Machine Learning

Machine Learning Machine Learning Data Lakes Database

How HR Tech Company Sense Scaled their ML Operations using Iguazio

Iguazio

JANUARY 16, 2024

Since AI is a central pillar of their value offering, Sense has invested heavily in a robust engineering organization including a large number of data and AI professionals. This includes a data team, an analytics team, DevOps, AI/ML, and a data science team. Gennaro Frazzingaro, Head of AI/ML at Sense.

ML

ML ML DataOps Data Scientist

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Previously, he was a Data & Machine Learning Engineer at AWS, where he worked closely with customers to develop enterprise-scale data infrastructure, including data lakes, analytics dashboards, and ETL pipelines. He specializes in designing, building, and optimizing large-scale data solutions.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. Their task is to construct and oversee efficient data pipelines.

AWS

AWS ML ML Machine Learning

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

These tools may have their own versioning system, which can be difficult to integrate with a broader data version control system. For instance, our data lake could contain a variety of relational and non-relational databases, files in different formats, and data stored using different cloud providers. DVC Git LFS neptune.ai

ML

ML ML Data Lakes Machine Learning

How Sense Uses Iguazio as a Key Component of Their ML Stack

Iguazio

JANUARY 16, 2024

This includes a data team, an analytics team, DevOps, AI/ML, and a data science team. The AI/Ml team is made up of ML engineers, data scientists and backend product engineers. With Iguazio, Sense’s data professionals can pull data, analyze it, train and run experiments.

ML

ML ML DataOps Data Scientist

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

Despite the challenges, Afri-SET, with limited resources, envisions a comprehensive data management solution for stakeholders seeking sensor hosting on their platform, aiming to deliver accurate data from low-cost sensors. With AWS Glue custom connectors, it’s effortless to transfer data between Amazon S3 and other applications.

AWS

AWS AI AI Python

Mastering ML Model Performance: Best Practices for Optimal Results

Iguazio

JUNE 25, 2023

Evaluating ML model performance is essential for ensuring the reliability, quality, accuracy and effectiveness of your ML models. In this blog post, we dive into all aspects of ML model performance: which metrics to use to measure performance, best practices that can help and where MLOps fits in. Why Evaluate Model Performance?

ML

ML ML Clustering Cross Validation

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Its goal is to help with a quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Apache Superset GitHub | Website Apache Superset is a must-try project for any ML engineer, data scientist, or data analyst. You can watch it on demand here.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

A novel approach to solve this complex security analytics scenario combines the ingestion and storage of security data using Amazon Security Lake and analyzing the security data with machine learning (ML) using Amazon SageMaker. Deploy the trained ML model to a SageMaker inference endpoint.

AWS

AWS ML ML Algorithm

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Just as a writer needs to know core skills like sentence structure, grammar, and so on, data scientists at all levels should know core data science skills like programming, computer science, algorithms, and so on. As MLOps become more relevant to ML demand for strong software architecture skills will increase as well.

Data Science

Data Science Data Scientist Computer Science Computer Science

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

By leveraging data services and APIs, a data fabric can also pull together data from legacy systems, data lakes, data warehouses and SQL databases, providing a holistic view into business performance. It uses knowledge graphs, semantics and AI/ML technology to discover patterns in various types of metadata.

Data Lakes

Data Lakes AI AI Data Governance

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

The PdMS includes AWS services to securely manage the lifecycle of edge compute devices and BHS assets, cloud data ingestion, storage, machine learning (ML) inference models, and business logic to power proactive equipment maintenance in the cloud. It’s an easy way to run analytics on IoT data to gain accurate insights.

AWS

AWS ML ML Machine Learning

How to Effectively Version Control Your Machine Learning Pipeline

phData

AUGUST 20, 2024

However, applying version control to machine learning (ML) pipelines comes with unique challenges. From data prep and model training to validation and deployment, each step is intricate and interconnected, demanding a robust system to manage it all. What are the Pillars of Version Control in ML Pipelines?

Machine Learning

Machine Learning Machine Learning ML ML

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. An AI governance framework ensures the ethical, responsible and transparent use of AI and machine learning (ML). ” Are foundation models trustworthy?

AI

AI AI Data Warehouse ML

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot. As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers to build and deploy models at scale.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Managing unstructured data is essential for the success of machine learning (ML) projects. Without structure, data is difficult to analyze and extracting meaningful insights and patterns is challenging. This article will discuss managing unstructured data for AI and ML projects. What is Unstructured Data?

Machine Learning

Machine Learning Machine Learning AI AI

The Audience for Data Catalogs and Data Intelligence

Alation

JUNE 21, 2022

Over time, we called the “thing” a data catalog , blending the Google-style, AI/ML-based relevancy with more Yahoo-style manual curation and wikis. Thus was born the data catalog. In our early days, “people” largely meant data analysts and business analysts. ML and DataOps teams). data pipelines) to support.

DataOps

DataOps Data Scientist Data Quality Data Pipeline

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

And, as organizations progress and grow, “data drift” starts to impact data usage, models, and your business. In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used.

Data Governance

Data Governance ML ML Cloud Data

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. IBM watsonx.ai With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise.

AI

AI AI Machine Learning Machine Learning

Scale knowledge management use cases with generative AI

IBM Journey to AI blog

JULY 27, 2023

Powering a knowledge management system with a data lakehouse Organizations need a data lakehouse to target data challenges that come with deploying an AI-powered knowledge management system. It provides the combination of data lake flexibility and data warehouse performance to help to scale AI.

AI

AI AI Data Scientist Data Quality

AI-Powered Bots in Ocean Predictoor Get a UX Upgrade: CLI & YAML

Ocean Protocol

JANUARY 17, 2024

The repo provides starting-point predictoor bots, which gather historical CEX price data and build AI/ML models. You can gain your own edge — to earn more $ — by changing the bot as you please: more data, better feature vectors, different modeling approaches, and more. We wanted to build out a proper analytics app.

Data Pipeline

Data Pipeline AI AI Analytics

What is Snowflake Horizon?

phData

AUGUST 5, 2024

Data Lineage The Snowsight UI includes lineage visualizations for tables, views, and machine learning (ML) assets to give users a bird’s-eye view of the upstream and downstream object lineage. Security Encryption As a cloud-based data platform, Snowflake has always prioritized customer data security.

Data Governance

Data Governance Data Quality Data Lakes ML

What Is Data Modernization? 5 Benefits Worth Knowing

Alation

APRIL 19, 2022

In that sense, data modernization is synonymous with cloud migration. Modern data architectures, like cloud data warehouses and cloud data lakes , empower more people to leverage analytics for insights more efficiently. Access the resources your data applications need — no more, no less. Advanced Tooling.

Data Governance

Data Governance Cloud Data Database Data Silos

How Alteryx & Snowflake Accelerates Analytics

phData

FEBRUARY 24, 2023

Organizations can unite their siloed data and securely share governed data while executing diverse analytic workloads. Snowflake’s engine provides a solution for data warehousing, data lakes, data engineering, data science, data application development, and data sharing.

Analytics

Analytics Analytics Database Python

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Thus, the solution allows for scaling data workloads independently from one another and seamlessly handling data warehousing, data lakes , data sharing, and engineering. Machine Learning Integration Opportunities Organizations harness machine learning (ML) algorithms to make forecasts on the data.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

Five benefits of a data catalog

IBM Journey to AI blog

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization.

Data Quality

Data Quality Data Governance Data Wrangling Data Scientist

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

I am Ali Arsanjani, and I lead partner engineering for Google Cloud, specializing in the area of AI-ML, and I’m very happy to be here today with everyone. Then we’re going to talk about adapting foundation models for the enterprise and how that affects the ML lifecycle, and what we need to potentially add to the lifecycle.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

Google’s Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

I am Ali Arsanjani, and I lead partner engineering for Google Cloud, specializing in the area of AI-ML, and I’m very happy to be here today with everyone. Then we’re going to talk about adapting foundation models for the enterprise and how that affects the ML lifecycle, and what we need to potentially add to the lifecycle.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

Beginner’s Guide To GCP BigQuery (Part 2)

Mlearning.ai

JULY 10, 2023

In case of complex data pipelines, a combination of Materialized Views, Stored Procedures, and Scheduled Queries could be a better choice than to solely rely on Scheduled Queries by itself. This allows you to use tools like BigQuery to query the data before it’s migrated to a native BigQuery table.

SQL

SQL Database Database Administration Data Lakes

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

One of the most prevalent complaints we hear from ML engineers in the community is how costly and error-prone it is to manually go through the ML workflow of building and deploying models. Building end-to-end machine learning pipelines lets ML engineers build once, rerun, and reuse many times.

ML

ML ML Machine Learning Machine Learning

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

Flipboard

NOVEMBER 22, 2024

Organizations run millions of Apache Spark applications each month to prepare, move, and process their data for analytics and machine learning (ML). During development, data engineers often spend hours sifting through log files, analyzing execution plans, and making configuration changes to resolve issues.

AWS

AWS AI AI Data Engineering

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Both persistent staging and data lakes involve storing large amounts of raw data. But persistent staging is typically more structured and integrated into your overall customer data pipeline. It’s not just a dumping ground for data, but a crucial step in your customer data processing workflow.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

How to Build ETL Data Pipeline in ML

Webinars

Trending Sources

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Webinars

Build Data Pipelines: Comprehensive Step-by-Step Guide

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

MLOps Landscape in 2023: Top Tools and Platforms

Shaping the future: OMRON’s data-driven journey with AWS

Top 5 Tools for Building an Interactive Analytics App

Best 8 Data Version Control Tools for Machine Learning 2024

How HR Tech Company Sense Scaled their ML Operations using Iguazio

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

How to Version Control Data in ML for Various Data Sources

How Sense Uses Iguazio as a Key Component of Their ML Stack

Improving air quality with generative AI

Mastering ML Model Performance: Best Practices for Optimal Results

11 Open Source Data Exploration Tools You Need to Know in 2023

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

40 Must-Know Data Science Skills and Frameworks for 2023

Comparing Tools For Data Processing Pipelines

Data democratization: How data architecture can drive business decisions and AI initiatives

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

How to Effectively Version Control Your Machine Learning Pipeline

How to use foundation models and trusted governance to manage AI workflow risk

Definite Guide to Building a Machine Learning Platform

How to Manage Unstructured Data in AI and Machine Learning Projects

The Audience for Data Catalogs and Data Intelligence

The Cloud Connection: How Governance Supports Security

Exploring the AI and data capabilities of watsonx

Scale knowledge management use cases with generative AI

AI-Powered Bots in Ocean Predictoor Get a UX Upgrade: CLI & YAML

What is Snowflake Horizon?

What Is Data Modernization? 5 Benefits Worth Knowing

How Alteryx & Snowflake Accelerates Analytics

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Five benefits of a data catalog

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Google’s Arsanjani on Enterprise Foundation Model Challenges

Beginner’s Guide To GCP BigQuery (Part 2)

How to Build an End-To-End ML Pipeline

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Stay Connected