Data Lakes, Data Preparation and Data Science

Data lakes vs. data warehouses: Decoding the data storage debate

Data Science Dojo

JANUARY 12, 2023

When it comes to data, there are two main types: data lakes and data warehouses. What is a data lake? An enormous amount of raw data is stored in its original format in a data lake until it is required for analytics applications. Which one is right for your business?

Data Lakes

Data Lakes Data Warehouse Hadoop Machine Learning

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

With this full-fledged solution, you don’t have to spend all your time and effort combining different services or duplicating data. Overview of One Lake Fabric features a lake-centric architecture, with a central repository known as OneLake. Now, we can save the data as delta tables to use later for sales analytics.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Data mining

Dataconomy

MARCH 4, 2025

It’s an integral part of data analytics and plays a crucial role in data science. By utilizing algorithms and statistical models, data mining transforms raw data into actionable insights. Each stage is crucial for deriving meaningful insights from data.

Data Mining

Data Mining Data Mining Data Mining Decision Trees

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

IBM Data Science in Practice

JANUARY 9, 2024

All data scientists could leverage our patterns during an engagement. These patterns ensure consistency, efficiency, and collaboration among data science teams, making the MAS AI/ML modernization process smoother and scalable. We are leveraging Air Compressors data, but the solutions are generalizable.

ML

ML ML AI AI

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images. The model was then fine-tuned with training data from the data preparation stage. The sunburst graph below is a visualization of this classification.

AWS

AWS Data Lakes ML ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

AI

AI AI ML ML

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Manager Data Science at Marubeni Power International. Data collection and ingestion The data collection and ingestion layer connects to all upstream data sources and loads the data into the data lake. Data preparation As with most ML use cases, data preparation plays a critical role.

AWS

AWS Machine Learning Machine Learning Analytics

What is Data Mining?

Pickl AI

FEBRUARY 21, 2023

Businesses require Data Scientists to perform Data Mining processes and invoke valuable data insights using different software and tools. What is Data Mining and how is it related to Data Science ? What is Data Mining? Further, data transformation is also a process ensuring consistent data sets.

Data Mining

Data Mining Data Mining Data Mining Data Scientist

How OLAP and AI can enable better business

IBM Journey to AI blog

DECEMBER 7, 2023

Increased operational efficiency benefits Reduced data preparation time : OLAP data preparation capabilities streamline data analysis processes, saving time and resources. IBM watsonx.data is the next generation OLAP system that can help you make the most of your data.

Data Preparation

Data Preparation Database Data Analysis Data Analysis

The Top AI Slides from ODSC West 2024

ODSC - Open Data Science

NOVEMBER 19, 2024

ODSC West 2024 showcased a wide range of talks and workshops from leading data science, AI, and machine learning experts. This blog highlights some of the most impactful AI slides from the world’s best data science instructors, focusing on cutting-edge advancements in AI, data modeling, and deployment strategies.

Deep Learning

Deep Learning Deep Learning Data Science AI

Introduction to Power BI Datamarts

ODSC - Open Data Science

JUNE 12, 2023

No-code/low-code experience using a diagram view in the data preparation layer similar to Dataflows. Building business-focussed semantic layers in the cloud (the Power BI Service) with data modeling capabilities, such as managing relationships, creating measures, defining incremental refresh, and creating and managing RLS.

Power BI

Power BI Data Warehouse ETL Data Preparation

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

The service streamlines ML development and production workflows (MLOps) across BMW by providing a cost-efficient and scalable development environment that facilitates seamless collaboration between data science and engineering teams worldwide. This results in faster experimentation and shorter idea validation cycles.

ML

ML ML AWS AI

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

See also Thoughtworks’s guide to Evaluating MLOps Platforms End-to-end MLOps platforms End-to-end MLOps platforms provide a unified ecosystem that streamlines the entire ML workflow, from data preparation and model development to deployment and monitoring. Check out the Metaflow Docs. neptune.ai

Machine Learning

Machine Learning Machine Learning ML ML

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Summary: This blog provides a comprehensive roadmap for aspiring Azure Data Scientists, outlining the essential skills, certifications, and steps to build a successful career in Data Science using Microsoft Azure. Storage Solutions: Secure and scalable storage options like Azure Blob Storage and Azure Data Lake Storage.

Azure

Azure Data Scientist Data Science Machine Learning

What Do You Actually Need from a Data Catalog Tool?

Alation

SEPTEMBER 23, 2021

In most cases data catalogs are used in three distinct ways. Data Catalogs for Data Science & Engineering – Data catalogs that are primarily used for data science and engineering are typically used by very experienced data practitioners. Key Features of a Data Catalog.

Data Preparation

Data Preparation SQL Data Governance Data Analysis

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

By supporting open-source frameworks and tools for code-based, automated and visual data science capabilities — all in a secure, trusted studio environment — we’re already seeing excitement from companies ready to use both foundation models and machine learning to accomplish key tasks.

AI

AI AI Machine Learning Machine Learning

Introducing watsonx: The future of AI for business

IBM Journey to AI blog

MAY 9, 2023

It offers its users advanced machine learning, data management , and generative AI capabilities to train, validate, tune and deploy AI systems across the business with speed, trusted data, and governance. It helps facilitate the entire data and AI lifecycle, from data preparation to model development, deployment and monitoring.

AI

AI AI Data Warehouse Machine Learning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. If you want to do the process in a low-code/no-code way, you can follow option C.

ML

ML ML AWS Data Warehouse

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

In LnW Connect, an encryption process was designed to provide a secure and reliable mechanism for the data to be brought into an AWS data lake for predictive modeling. Data preprocessing and feature engineering In this section, we discuss our methods for data preparation and feature engineering.

AWS

AWS ML ML Machine Learning

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Role of Data Engineers in the Data Ecosystem Data Engineers play a crucial role in the data ecosystem by bridging the gap between raw data and actionable insights. They are responsible for building and maintaining data architectures, which include databases, data warehouses, and data lakes.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

Data Engineering plays a critical role in enabling organizations to efficiently collect, store, process, and analyze large volumes of data. It is a field of expertise within the broader domain of data management and Data Science. Best Data Engineering Books for Beginners 1.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

Train a recommendation model in SageMaker Studio using training data that was prepared using SageMaker Data Wrangler. The real-time inference call data is first passed to the SageMaker Data Wrangler container in the inference pipeline, where it is preprocessed and passed to the trained model for product recommendation.

ML

ML ML AWS AI

How Alteryx & Snowflake Accelerates Analytics

phData

FEBRUARY 24, 2023

Alteryx provides organizations with an opportunity to automate access to data, analytics , data science, and process automation all in one, end-to-end platform. Its capabilities can be split into the following topics: automating inputs & outputs, data preparation, data enrichment, and data science.

Analytics

Analytics Analytics Database Python

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Jupyter notebooks have been one of the most controversial tools in the data science community. Nevertheless, many data scientists will agree that they can be really valuable – if used well. I’ll show you best practices for using Jupyter Notebooks for exploratory data analysis.

SQL

SQL Database Data Scientist Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

What’s really important in the before part is having production-grade machine learning data pipelines that can feed your model training and inference processes. And that’s really key for taking data science experiments into production. And so that’s where we got started as a cloud data warehouse.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

What’s really important in the before part is having production-grade machine learning data pipelines that can feed your model training and inference processes. And that’s really key for taking data science experiments into production. And so that’s where we got started as a cloud data warehouse.

SQL

SQL ML ML Python

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Informatica’s AI-powered automation helps streamline data pipelines and improve operational efficiency. Common use cases include integrating data across hybrid cloud environments, managing data lakes, and enabling real-time analytics for Business Intelligence platforms.

Data Quality

Data Quality AWS Machine Learning Machine Learning

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

If you answer “yes” to any of these questions, you will need cloud storage, such as Amazon AWS’s S3, Azure Data Lake Storage or GCP’s Google Storage. Knowing this, you want to have data prepared in a way to optimize your load. It might be tempting to have massive files and let the system sort it out. How Can phData Help?

Clustering

Clustering Database SQL Data Pipeline

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

The pipelines are interoperable to build a working system: Data (input) pipeline (data acquisition and feature management steps) This pipeline transports raw data from one location to another. Model/training pipeline This pipeline trains one or more models on the training data with preset hyperparameters. Kale v0.7.0.

ML

ML ML Machine Learning Machine Learning

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

This highlights the two companies’ shared vision on self-service data discovery with an emphasis on collaboration and data governance. 2) When data becomes information, many (incremental) use cases surface. He is creating information services for his clients, an emerging use case for SSDP.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

Data Science Current

Data lakes vs. data warehouses: Decoding the data storage debate

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Webinars

Trending Sources

Data mining

Webinars

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

How Marubeni is optimizing market decisions using AWS machine learning and analytics

What is Data Mining?

How OLAP and AI can enable better business

The Top AI Slides from ODSC West 2024

Introduction to Power BI Datamarts

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

MLOps Landscape in 2023: Top Tools and Platforms

Your Complete Roadmap to Become an Azure Data Scientist

What Do You Actually Need from a Data Catalog Tool?

Exploring the AI and data capabilities of watsonx

Introducing watsonx: The future of AI for business

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Discover the Most Important Fundamentals of Data Engineering

10 Best Data Engineering Books [Beginners to Advanced]

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

How Alteryx & Snowflake Accelerates Analytics

How to Use Exploratory Notebooks [Best Practices]

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Popular Data Transformation Tools: Importance and Best Practices

Getting Started With Snowflake: Best Practices For Launching

How to Build an End-To-End ML Pipeline

3 Major Trends at Strata New York 2017

Stay Connected