Analytics, Data Pipeline and Data Preparation

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

Microsoft Fabric aims to reduce unnecessary data replication, centralize storage, and create a unified environment with its unique data fabric method. Microsoft Fabric is a cutting-edge analytics platform that helps data experts and companies work together on data projects. What is Microsoft Fabric?

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

5 Ways Where Data-Driven Analytics Reshaped The Software Industry

Smart Data Collective

FEBRUARY 3, 2022

Data analytics helps to determine the success of the business. Therefore, data-driven analytics eventually helps to bring a change. Impact Of Data-Driven Analytics. Several companies in today’s time claim to be a part of the data-driven world. How Is Data-Driven Analytics Being Helpful?

Analytics

Analytics Analytics Machine Learning Machine Learning

Data Fabric and Address Verification Interface

IBM Data Science in Practice

NOVEMBER 28, 2022

As organizations steer their business strategies to become data-driven decision-making organizations, data and analytics are more crucial than ever before. The concept was first introduced back in 2016 but has gained more attention in the past few years as the amount of data has grown.

Data Pipeline

Data Pipeline Data Quality Data Preparation Data Governance

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data Threads: Address Verification Interface

IBM Data Science in Practice

DECEMBER 7, 2022

One of the key elements that builds a data fabric architecture is to weave integrated data from many different sources, transform and enrich data, and deliver it to downstream data consumers. Studies have shown that 80% of time is spent on data preparation and cleansing, leaving only 20% of time for data analytics.

Data Quality

Data Quality Data Pipeline Data Preparation ETL

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

How Alteryx & Snowflake Accelerates Analytics

phData

FEBRUARY 24, 2023

Alteryx and the Snowflake Data Cloud offer a potential solution to this issue and can speed up your path to Analytics. In this blog post, we will explore how Alteryx and Snowflake can accelerate your journey to Analytics by sharing use cases and best practices. What is Alteryx? What is Snowflake?

Analytics

Analytics Analytics Database Python

3 Takeaways from Gartner’s 2018 Data and Analytics Summit

DataRobot Blog

APRIL 1, 2018

Paxata was a Silver Sponsor at the recent Gartner Data and Analytics Summit in Grapevine Texas. Although some product solutions disrupted the operational reporting market, they require users to know the questions they need to ask their data. 2) Line of business is taking a more active role in data projects.

Analytics

Analytics Analytics Data Preparation Augmented Analytics

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Whereas AIOps is a comprehensive discipline that includes a variety of analytics and AI initiatives that are aimed at optimizing IT operations, MLOps is specifically concerned with the operational aspects of ML models, promoting efficient deployment, monitoring and maintenance.

Big Data

Big Data Big Data ML ML

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

This offering enables BMW ML engineers to perform code-centric data analytics and ML, increases developer productivity by providing self-service capability and infrastructure automation, and tightly integrates with BMW’s centralized IT tooling landscape.

ML

ML ML AWS AI

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 21, 2024

In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks.

AWS

AWS Data Preparation ML ML

2024 Mexican Grand Prix: Formula 1 Prediction Challenge Results

Ocean Protocol

NOVEMBER 28, 2024

This competition emphasized leveraging analytics in one of the world’s fastest and most data-intensive sports. Yunus focused on building a robust data pipeline, merging historical and current-season data to create a comprehensive dataset.

Cross Validation

Cross Validation Decision Trees Data Scientist Data Science

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring. Rushikesh Jagtap is a Solutions Architect with 5+ years of experience in AWS Analytics services.

AWS

AWS ML ML Machine Learning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, ML, and application development. Above all, this solution offers you a native Spark way to implement an end-to-end data pipeline from Amazon Redshift to SageMaker.

ML

ML ML AWS Data Warehouse

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. It involves developing data pipelines that efficiently transport data from various sources to storage solutions and analytical tools. ETL is vital for ensuring data quality and integrity.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

If useful, it can be further extended to a data lake platform that uses AWS Glue (a serverless data integration service for data preparation) and Amazon Athena (a serverless and interactive analytics service) to analyze and visualize data.

AWS

AWS AI AI Python

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

This post is co-written with Suhyoung Kim, General Manager at KakaoGames Data Analytics Lab. Continuous ML model retraining is one method to overcome this challenge by relearning from the most recent data. To solve this problem, we had to design a strong data pipeline to create the ML features from the raw data and MLOps.

AWS

AWS ML ML ETL

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

phData

AUGUST 2, 2024

Efficient data transformation and processing are crucial for data analytics and generating insights. Snowflake AI Data Cloud is one of the most powerful platforms, including storage services supporting complex data. Integrating Snowflake with dbt adds another layer of automation and control to the data pipeline.

Data Pipeline

Data Pipeline Python Database SQL

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Visual modeling: Delivers easy-to-use workflows for data scientists to build data preparation and predictive machine learning pipelines that include text analytics, visualizations and a variety of modeling methods. ” Vitaly Tsivin, EVP Business Intelligence at AMC Networks.

AI

AI AI Machine Learning Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

See also Thoughtworks’s guide to Evaluating MLOps Platforms End-to-end MLOps platforms End-to-end MLOps platforms provide a unified ecosystem that streamlines the entire ML workflow, from data preparation and model development to deployment and monitoring. Flyte Flyte is a platform for orchestrating ML pipelines at scale.

Machine Learning

Machine Learning Machine Learning ML ML

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

In order to train a model using data stored outside of the three supported storage services, the data first needs to be ingested into one of these services (typically Amazon S3). This requires building a data pipeline (using tools such as Amazon SageMaker Data Wrangler ) to move data into Amazon S3.

ML

ML ML AWS Python

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Summary: Data transformation tools streamline data processing by automating the conversion of raw data into usable formats. These tools enhance efficiency, improve data quality, and support Advanced Analytics like Machine Learning. BI tools rely on high-quality, consistent data to generate accurate insights.

Data Quality

Data Quality AWS Machine Learning Machine Learning

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

However, for analytics warehouses, you may need to scale for usage. Knowing this, you want to have data prepared in a way to optimize your load. Data Pipelines “Data pipeline” means moving data in a consistent, secure, and reliable way at some frequency that meets your requirements.

Clustering

Clustering Database SQL Data Pipeline

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

With sports (and everything else) cancelled, this data scientist decided to take on COVID-19 | A Winner’s Interview with David Mezzetti When his hobbies went on hiatus, Kaggler David Mezzetti made fighting COVID-19 his mission. He previously co-founded and built Data Works into a 50+ person well-respected software services company.

ETL

ETL Data Scientist Data Science Machine Learning

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

By following this structured approach, businesses can consolidate data from multiple origins, ensuring a unified view for analysis and reporting. The Role of ETL in Data Warehousing and Analytics ETL plays a pivotal role in data warehousing and analytics by facilitating the smooth movement of data across different systems.

ETL

ETL Data Warehouse Data Quality Data Governance

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Introduction Data Science is revolutionising industries by extracting valuable insights from complex data sets, driving innovation, and enhancing decision-making. This roadmap aims to guide aspiring Azure Data Scientists through the essential steps to build a successful career.

Azure

Azure Data Scientist Data Science Machine Learning

How to Use Fivetran to Ingest Salesforce Data into Snowflake

phData

SEPTEMBER 25, 2024

Under this category, tools with pre-built connectors for popular data sources and visual tools for data transformation are better choices. Integration: How well does the tool integrate with your existing infrastructure, databases, cloud platforms, and analytics tools? Another way is to add the Snowflake details through Fivetran.

ETL

ETL Database Data Warehouse Analytics

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

Snowpark Use Cases Data Science Streamlining data preparation and pre-processing: Snowpark’s Python, Java, and Scala libraries allow data scientists to use familiar tools for wrangling and cleaning data directly within Snowflake, eliminating the need for separate ETL pipelines and reducing context switching.

Python

Python ML ML SQL

MLOps and the evolution of data science

IBM Journey to AI blog

AUGUST 11, 2023

Because the machine learning lifecycle has many complex components that reach across multiple teams, it requires close-knit collaboration to ensure that hand-offs occur efficiently, from data preparation and model training to model deployment and monitoring. How to use ML to automate the refining process into a cyclical ML process.

Data Science

Data Science Machine Learning Machine Learning ML

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

The financial services industry (FSI) is no exception to this, and is a well-established producer and consumer of data and analytics. These activities cover disparate fields such as basic data processing, analytics, and machine learning (ML). The union of advances in hardware and ML has led us to the current day.

AWS

AWS ML ML Clustering

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

Data preparation, train and tune, deploy and monitor. We have data pipelines and data preparation. In the data pipeline phase—I’m just going to call out things that I think are more important than the obvious. So the basic ones: you collect and validate and prepare data.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

Google’s Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

Data preparation, train and tune, deploy and monitor. We have data pipelines and data preparation. In the data pipeline phase—I’m just going to call out things that I think are more important than the obvious. So the basic ones: you collect and validate and prepare data.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

A traditional machine learning (ML) pipeline is a collection of various stages that include data collection, data preparation, model training and evaluation, hyperparameter tuning (if needed), model deployment and scaling, monitoring, security and compliance, and CI/CD.

Machine Learning

Machine Learning Machine Learning ML ML

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

The modern data stack is defined by its ability to handle large datasets, support complex analytical workflows, and scale effortlessly as data and business needs grow. Two key technologies that have become foundational for this type of architecture are the Snowflake AI Data Cloud and Dataiku.

Machine Learning

Machine Learning Machine Learning Data Science ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

ZOE is a multi-agent LLM application that integrates with multiple data sources to provide a unified view of the customer, simplify analytics queries, and facilitate marketing campaign creation. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

Standard Chartered Bank’s Global Head of Technology, Santhosh Mahendiran , discussed the democratization of data across 3,500+ business users in 68 countries. We look at data as an asset, regardless of whether the use case is AML/fraud or new revenue. 3) Data professionals come in all shapes and forms.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

Introducing the DataRobot AI Cloud: A Closer Look

DataRobot

SEPTEMBER 14, 2021

We’re building a platform for all users: data scientists, analytics experts, business users, and IT. DataRobot now delivers both visual and code-centric data preparation and data pipelines, along with automated machine learning that is composable, and can be driven by hosted notebooks or a graphical user experience.

AI

AI AI Data Pipeline Data Preparation

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

Flipboard

MARCH 21, 2025

The ability for organizations to quickly analyze data across multiple sources is crucial for maintaining a competitive advantage. SageMaker Unified Studio provides a unified experience for using data, analytics, and AI capabilities. For the simplicity, we chose the SQL analytics project profile.

SQL

SQL Data Analyst Data Warehouse AWS

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

AWS Machine Learning Blog

DECEMBER 18, 2024

This strategic decision was driven by several factors: Efficient data preparation Building a high-quality pre-training dataset is a complex task, involving assembling and preprocessing text data from various sources, including web sources and partner companies. The team opted for fine-tuning on AWS.

Clustering

Clustering AWS AI AI

Data science

Dataconomy

MARCH 19, 2025

Data science combines various disciplines to help businesses understand their operations, customers, and markets more effectively. What is data science? Data science is an interdisciplinary field that utilizes advanced analytics techniques to extract meaningful insights from vast amounts of data.

Data Science

Data Science Citizen Data Scientist Data Scientist Machine Learning

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

5 Ways Where Data-Driven Analytics Reshaped The Software Industry

Webinars

Trending Sources

Data Fabric and Address Verification Interface

Webinars

Data Threads: Address Verification Interface

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

How Alteryx & Snowflake Accelerates Analytics

3 Takeaways from Gartner’s 2018 Data and Analytics Summit

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

2024 Mexican Grand Prix: Formula 1 Prediction Challenge Results

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Discover the Most Important Fundamentals of Data Engineering

Improving air quality with generative AI

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

Exploring the AI and data capabilities of watsonx

MLOps Landscape in 2023: Top Tools and Platforms

Use Snowflake as a data source to train ML models with Amazon SageMaker

Popular Data Transformation Tools: Importance and Best Practices

Getting Started With Snowflake: Best Practices For Launching

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Your Complete Roadmap to Become an Azure Data Scientist

How to Use Fivetran to Ingest Salesforce Data into Snowflake

How Does Snowpark Work?

MLOps and the evolution of data science

A review of purpose-built accelerators for financial services

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Google’s Arsanjani on Enterprise Foundation Model Challenges

How to Choose MLOps Tools: In-Depth Guide for 2024

How Dataiku and Snowflake Strengthen the Modern Data Stack

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

3 Major Trends at Strata New York 2017

Introducing the DataRobot AI Cloud: A Closer Look

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

Data science

Stay Connected