AI, Data Lakes and Data Preparation

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data.

SQL

SQL AWS Data Lakes AI

The Ultimate Guide to Data Preparation for Machine Learning

DagsHub

FEBRUARY 29, 2024

Data, is therefore, essential to the quality and performance of machine learning models. This makes data preparation for machine learning all the more critical, so that the models generate reliable and accurate predictions and drive business value for the organization. Why do you need Data Preparation for Machine Learning?

Data Preparation

Data Preparation Machine Learning Machine Learning Data Governance

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

IBM Data Science in Practice

JANUARY 9, 2024

One groundbreaking technology that has emerged as a game-changer is asset performance management (APM) artificial intelligence (AI). However, embarking on the journey of implementing artificial intelligence (AI) in your asset performance management strategy can be both exciting and daunting.

ML

ML ML AI AI

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

Nowadays, the majority of our customers is excited about large language models (LLMs) and thinking how generative AI could transform their business. In this post, we discuss how to operationalize generative AI applications using MLOps principles leading to foundation model operations (FMOps).

AI

AI AI ML ML

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Specifically, we cover the computer vision and artificial intelligence (AI) techniques used to combine datasets into a list of prioritized tasks for field teams to investigate and mitigate. Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images.

AWS

AWS Data Lakes ML ML

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. The data lake environment is required to configure an AWS Glue database table, which is used to publish an asset in the Amazon DataZone catalog.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

This post presents a solution that uses a generative artificial intelligence (AI) to standardize air quality data from low-cost sensors in Africa, specifically addressing the air quality data integration problem of low-cost sensors. Qiong (Jo) Zhang , PhD, is a Senior Partner Solutions Architect at AWS, specializing in AI/ML.

AWS

AWS AI AI Python

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

Data scientists and ML engineers require capable tooling and sufficient compute for their work. To pave the way for the growth of AI, BMW Group needed to make a leap regarding scalability and elasticity while reducing operational overhead, software licensing, and hardware management.

ML

ML ML AWS AI

Introducing watsonx: The future of AI for business

IBM Journey to AI blog

MAY 9, 2023

Today is a revolutionary moment for Artificial Intelligence (AI). Suddenly, everybody is talking about generative AI: sometimes with excitement, other times with anxiety. The answer is that generative AI leverages recent advances in foundation models. AI is already driving results for business.

AI

AI AI Data Warehouse Machine Learning

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

How OLAP and AI can enable better business

IBM Journey to AI blog

DECEMBER 7, 2023

Online analytical processing (OLAP) database systems and artificial intelligence (AI) complement each other and can help enhance data analysis and decision-making when used in tandem. As AI techniques continue to evolve, innovative applications in the OLAP domain are anticipated.

Data Preparation

Data Preparation Database Data Analysis Data Analysis

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Now all you need is some guidance on generative AI and machine learning (ML) sessions to attend at this twelfth edition of re:Invent. And although generative AI has appeared in previous events, this year we’re taking it to the next level. And although our track focuses on generative AI, many other tracks have related sessions.

AWS

AWS ML ML AI

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

AWS

AWS Database ETL AI

Simplify continuous learning of Amazon Comprehend custom models using Comprehend flywheel

AWS Machine Learning Blog

MARCH 1, 2023

Amazon Comprehend is a managed AI service that uses natural language processing (NLP) with ready-made intelligence to extract insights about the content of documents. Flywheel creates a data lake (in Amazon S3) in your account where all the training and test data for all versions of the model are managed and stored.

Data Lakes

Data Lakes AWS ML ML

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise.

AI

AI AI Machine Learning Machine Learning

The Top AI Slides from ODSC West 2024

ODSC - Open Data Science

NOVEMBER 19, 2024

ODSC West 2024 showcased a wide range of talks and workshops from leading data science, AI, and machine learning experts. This blog highlights some of the most impactful AI slides from the world’s best data science instructors, focusing on cutting-edge advancements in AI, data modeling, and deployment strategies.

Deep Learning

Deep Learning Deep Learning Data Science AI

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. This is the second post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker. Train a recommendation model in SageMaker Studio using training data that was prepared using SageMaker Data Wrangler.

ML

ML ML AWS AI

What is a data fabric?

Tableau

APRIL 18, 2022

Shine a light on who or what is using specific data to speed up collaboration or reduce disruption when changes happen. Data modeling. Leverage semantic layers and physical layers to give you more options for combining data using schemas to fit your analysis. Data preparation. Virtualization and discovery.

Tableau

Tableau Data Quality Analytics Analytics

What is a data fabric?

Tableau

APRIL 18, 2022

Shine a light on who or what is using specific data to speed up collaboration or reduce disruption when changes happen. Data modeling. Leverage semantic layers and physical layers to give you more options for combining data using schemas to fit your analysis. Data preparation. Virtualization and discovery.

Tableau

Tableau Data Quality Analytics Analytics

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. Sherry Ding is a Senior AI/ML Specialist Solutions Architect.

ML

ML ML AWS Data Warehouse

What is Data Mining?

Pickl AI

FEBRUARY 21, 2023

The data locations may come from the data warehouse or data lake with structured and unstructured data. The Data Scientist’s responsibility is to move the data to a data lake or warehouse for the different data mining processes. are the various data mining tools.

Data Mining

Data Mining Data Mining Data Mining Data Scientist

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

This means that individuals can ask companies to erase their personal data from their systems and from the systems of any third parties with whom the data was shared. If the data needs to be purged immediately from the service account, you can contact the AWS team to do so.

AWS

AWS Machine Learning Machine Learning Database

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Figure 1 illustrates the typical metadata subjects contained in a data catalog. Figure 1 – Data Catalog Metadata Subjects. Datasets are the files and tables that data workers need to find and access. They may reside in a data lake, warehouse, master data repository, or any other shared data resource.

Data Lakes

Data Lakes Data Analysis Data Analysis Big Data

Introduction to Power BI Datamarts

ODSC - Open Data Science

JUNE 12, 2023

No-code/low-code experience using a diagram view in the data preparation layer similar to Dataflows. Building business-focussed semantic layers in the cloud (the Power BI Service) with data modeling capabilities, such as managing relationships, creating measures, defining incremental refresh, and creating and managing RLS.

Power BI

Power BI Data Warehouse ETL Data Preparation

What Do You Actually Need from a Data Catalog Tool?

Alation

SEPTEMBER 23, 2021

There are many other features and functions , but to verify that a data catalog functions as platform, look for these five features: Intelligence – A data catalog should leverage AI/ML-driven pattern detection, including popularity, pattern matching, and provenance/impact analysis. Key Features of a Data Catalog.

Data Preparation

Data Preparation SQL Data Governance Data Analysis

Modern Data Management Essentials: Exploring Data Fabric

Precisely

JULY 18, 2024

Key Takeaways Data Fabric is a modern data architecture that facilitates seamless data access, sharing, and management across an organization. Data management recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata.

Data Lakes

Data Lakes Data Warehouse Data Governance Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

See also Thoughtworks’s guide to Evaluating MLOps Platforms End-to-end MLOps platforms End-to-end MLOps platforms provide a unified ecosystem that streamlines the entire ML workflow, from data preparation and model development to deployment and monitoring. Might be useful Unlike manual, homegrown, or open-source solutions, neptune.ai

Machine Learning

Machine Learning Machine Learning ML ML

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

In LnW Connect, an encryption process was designed to provide a secure and reliable mechanism for the data to be brought into an AWS data lake for predictive modeling. Data preprocessing and feature engineering In this section, we discuss our methods for data preparation and feature engineering.

AWS

AWS ML ML Machine Learning

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

Key Components of Data Engineering Data Ingestion : Gathering data from various sources, such as databases, APIs, files, and streaming platforms, and bringing it into the data infrastructure. Data Processing: Performing computations, aggregations, and other data operations to generate valuable insights from the data.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Role of Data Engineers in the Data Ecosystem Data Engineers play a crucial role in the data ecosystem by bridging the gap between raw data and actionable insights. They are responsible for building and maintaining data architectures, which include databases, data warehouses, and data lakes.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

Ali Arsanjani, director of cloud partner engineering at Google Cloud , presented a talk entitled “Challenges and Ethics of DLM and LLM Adoption in the Enterprise” at Snorkel AI’s recent Foundation Model Virtual Summit. Data preparation, train and tune, deploy and monitor. Hope you can all hear me well.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

Google’s Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

Ali Arsanjani, director of cloud partner engineering at Google Cloud , presented a talk entitled “Challenges and Ethics of DLM and LLM Adoption in the Enterprise” at Snorkel AI’s recent Foundation Model Virtual Summit. Data preparation, train and tune, deploy and monitor. Hope you can all hear me well.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

It supports batch and real-time data processing, making it a preferred choice for large enterprises with complex data workflows. Informatica’s AI-powered automation helps streamline data pipelines and improve operational efficiency.

Data Quality

Data Quality AWS Machine Learning Machine Learning

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

How Alteryx & Snowflake Accelerates Analytics

phData

FEBRUARY 24, 2023

Alteryx provides organizations with an opportunity to automate access to data, analytics , data science, and process automation all in one, end-to-end platform. Its capabilities can be split into the following topics: automating inputs & outputs, data preparation, data enrichment, and data science.

Analytics

Analytics Analytics Database Python

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Storage Solutions: Secure and scalable storage options like Azure Blob Storage and Azure Data Lake Storage. Key features and benefits of Azure for Data Science include: Scalability: Easily scale resources up or down based on demand, ideal for handling large datasets and complex computations.

Azure

Azure Data Scientist Data Science Machine Learning

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Placing functions for plotting, data loading, data preparation, and implementations of evaluation metrics in plain Python modules keeps a Jupyter notebook focused on the exploratory analysis | Source: Author Using SQL directly in Jupyter cells There are some cases in which data is not in memory (e.g.,

SQL

SQL Database Data Scientist Python

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

AWS Machine Learning Blog

JANUARY 26, 2024

Generative artificial intelligence (AI) applications built around large language models (LLMs) have demonstrated the potential to create and accelerate economic value for businesses. Many customers are looking for guidance on how to manage security, privacy, and compliance as they develop generative AI applications.

AWS

AWS ML ML AI

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

An ML platform is an environment that standardizes the technology stack for your ML/AI team and provides tools, libraries, and infrastructure for developing, deploying, and operationalizing machine learning applications. Model/training pipeline This pipeline trains one or more models on the training data with preset hyperparameters.

ML

ML ML Machine Learning Machine Learning

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

Importing data from the SageMaker Data Wrangler flow allows you to interact with a sample of the data before scaling the data preparation flow to the full dataset. This improves time and performance because you don’t need to work with the entirety of the data during preparation.

ML

ML ML Data Preparation AWS

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

This highlights the two companies’ shared vision on self-service data discovery with an emphasis on collaboration and data governance. 2) When data becomes information, many (incremental) use cases surface. DataRobot Data Prep. The post 3 Major Trends at Strata New York 2017 appeared first on DataRobot AI Cloud.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

The Ultimate Guide to Data Preparation for Machine Learning

Webinars

Trending Sources

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

Webinars

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Improving air quality with generative AI

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Introducing watsonx: The future of AI for business

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

How OLAP and AI can enable better business

Your guide to generative AI and ML at AWS re:Invent 2023

Tackling AI’s data challenges with IBM databases on AWS

Simplify continuous learning of Amazon Comprehend custom models using Comprehend flywheel

Exploring the AI and data capabilities of watsonx

The Top AI Slides from ODSC West 2024

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

What is a data fabric?

What is a data fabric?

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

What is Data Mining?

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

What Is a Data Catalog?

Introduction to Power BI Datamarts

What Do You Actually Need from a Data Catalog Tool?

Modern Data Management Essentials: Exploring Data Fabric

MLOps Landscape in 2023: Top Tools and Platforms

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

10 Best Data Engineering Books [Beginners to Advanced]

Discover the Most Important Fundamentals of Data Engineering

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Google’s Arsanjani on Enterprise Foundation Model Challenges

Popular Data Transformation Tools: Importance and Best Practices

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

How Alteryx & Snowflake Accelerates Analytics

Your Complete Roadmap to Become an Azure Data Scientist

How to Use Exploratory Notebooks [Best Practices]

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

How to Build an End-To-End ML Pipeline

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

3 Major Trends at Strata New York 2017

Stay Connected