Artificial Intelligence, Data Engineering and Data Preparation

30 Best Data Science Books to Read in 2023

Analytics Vidhya

FEBRUARY 28, 2023

Introduction Data science has taken over all economic sectors in recent times. To achieve maximum efficiency, every company strives to use various data at every stage of its operations.

Data Science

Data Science Data Preparation Big Data Big Data

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

With over 50 connectors, an intuitive Chat for data prep interface, and petabyte support, SageMaker Canvas provides a scalable, low-code/no-code (LCNC) ML solution for handling real-world, enterprise use cases. Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data.

ML

ML ML Data Preparation AWS

Step-by-step guide: Generative AI for your business

IBM Journey to AI blog

JULY 30, 2024

Generative artificial intelligence (gen AI) is transforming the business world by creating new opportunities for innovation, productivity and efficiency. Data Scientists will typically help with training, validating, and maintaining foundation models that are optimized for data tasks.

AI

AI AI Data Scientist Data Preparation

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

State of Machine Learning Survey Results Part Two

ODSC - Open Data Science

MARCH 13, 2023

First, there’s a need for preparing the data, aka data engineering basics. Machine learning practitioners are often working with data at the beginning and during the full stack of things, so they see a lot of workflow/pipeline development, data wrangling, and data preparation.

Machine Learning

Machine Learning Machine Learning Data Wrangling Data Science

How are AI Projects Different

Towards AI

AUGUST 16, 2023

Michael Dziedzic on Unsplash I am often asked by prospective clients to explain the artificial intelligence (AI) software process, and I have recently been asked by managers with extensive software development and data science experience who wanted to implement MLOps. Norvig, Artificial Intelligence: A Modern Approach, 4th ed.

Machine Learning

Machine Learning Machine Learning AI AI

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. This happens only when a new data format is detected to avoid overburdening scarce Afri-SET resources. Having a human-in-the-loop to validate each data transformation step is optional.

AWS

AWS AI AI Python

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

We will demonstrate an example feature engineering process on an e-commerce schema and how GraphReduce deals with the complexity of feature engineering on the relational schema. Data preparation happens at the entity-level first so errors and anomalies don’t make their way into the aggregated dataset.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Instead, businesses tend to rely on advanced tools and strategies—namely artificial intelligence for IT operations (AIOps) and machine learning operations (MLOps)—to turn vast quantities of data into actionable insights that can improve IT decision-making and ultimately, the bottom line.

Big Data

Big Data Big Data ML ML

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.

Towards AI

MAY 8, 2024

Created by the author with DALL E-3 Google Earth Engine for machine learning has just gotten a new face lift, with all the advancement that has been going on in the world of Artificial intelligence, Google Earth Engine was not going to be left behind as it is an important tool for spatial analysis.

Machine Learning

Machine Learning Machine Learning ML ML

How OLAP and AI can enable better business

IBM Journey to AI blog

DECEMBER 7, 2023

Online analytical processing (OLAP) database systems and artificial intelligence (AI) complement each other and can help enhance data analysis and decision-making when used in tandem. IBM watsonx.data is the next generation OLAP system that can help you make the most of your data.

Data Preparation

Data Preparation Database Data Analysis Data Analysis

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

AWS Machine Learning Blog

MAY 23, 2023

A recent PwC CEO survey unveiled that 84% of Canadian CEOs agree that artificial intelligence (AI) will significantly change their business within the next 5 years, making this technology more critical than ever. As such, an ML model is the product of an MLOps pipeline, and a pipeline is a workflow for creating one or more ML models.

Machine Learning

Machine Learning Machine Learning AWS ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

What is MLOps

Towards AI

AUGUST 16, 2023

Thus, MLOps is the intersection of Machine Learning, DevOps, and Data Engineering (Figure 1). Figure 4: The ModelOps process [Wikipedia] The Machine Learning Workflow Machine learning requires experimenting with a wide range of datasets, data preparation, and algorithms to build a model that maximizes some target metric(s).

Machine Learning

Machine Learning Machine Learning ML ML

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. Explore SageMaker Pipelines and open source data querying engines like PrestoDB, and build a solution using the sample implementation provided.

ML

ML ML AWS Machine Learning

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

AWS Machine Learning Blog

JANUARY 6, 2023

The vendors evaluated for this MarketScape offer various software tools needed to support end-to-end machine learning (ML) model development, including data preparation, model building and training, model operation, evaluation, deployment, and monitoring.

AWS

AWS ML ML Data Preparation

AI Development Lifecycle Learnings of What Changed with LLMs

ODSC - Open Data Science

FEBRUARY 5, 2025

The Evolving AI Development Lifecycle Despite the revolutionary capabilities of LLMs, the core development lifecycle established by traditional natural language processing remains essential: Plan, Prepare Data, Engineer Model, Evaluate, Deploy, Operate, and Monitor. For instance: Data Preparation: GoogleSheets.

Data Preparation

Data Preparation AI AI Data Scientist

Vertex AI: Guide to Google’s Unified Machine Learning Platform

Pickl AI

AUGUST 28, 2024

From data preparation and model training to deployment and management, Vertex AI provides the tools and infrastructure needed to build intelligent applications. Unified ML Workflow: Vertex AI provides a simplified ML workflow, encompassing data ingestion, analysis, transformation, model training, evaluation, and deployment.

Machine Learning

Machine Learning Machine Learning ML ML

How to Power Successful AI Projects with Trusted Data

Precisely

SEPTEMBER 26, 2024

Building data literacy across your organization empowers teams to make better use of AI tools. It doesn’t seem like long ago that we thought of artificial intelligence (AI) as a futuristic concept—but today, it’s here in full swing, and organizations across sectors are working to integrate it into their core processes.

AI

AI AI Data Governance Data Quality

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

AWS Machine Learning Blog

JULY 11, 2024

It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. Amazon ECR is a managed container registry that facilitates the storage, management, and deployment of container images.

AWS

AWS ML ML Python

3 Takeaways from Gartner’s 2018 Data and Analytics Summit

DataRobot Blog

APRIL 1, 2018

Today’s data management and analytics products have infused artificial intelligence (AI) and machine learning (ML) algorithms into their core capabilities. These modern tools will auto-profile the data, detect joins and overlaps, and offer recommendations. DataRobot Data Prep. Sallam | Shubhangi Vashisth. .

Analytics

Analytics Analytics Data Preparation Augmented Analytics

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale. The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring.

AWS

AWS ML ML Machine Learning

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

Harnessing the power of big data has become increasingly critical for businesses looking to gain a competitive edge. From deriving insights to powering generative artificial intelligence (AI) -driven applications, the ability to efficiently process and analyze large datasets is a vital capability.

AWS

AWS Clustering Big Data Big Data

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

AWS

AWS Database ETL AI

Unlocking Tabular Data’s Hidden Potential

ODSC - Open Data Science

MAY 10, 2023

Data-centric AI, in his opinion, is based on the following principles: It’s time to focus on the data — after all the progress achieved in algorithms means it’s now time to spend more time on the data Inconsistent data labels are common since reasonable, well-trained people can see things differently.

Data Scientist

Data Scientist Data Science Deep Learning Deep Learning

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

Using the BMW data portal, users can request access to on-premises databases or data stored in BMW’s Cloud Data Hub, making it available in their workspace for development and experimentation, from data preparation and analysis to model training and validation.

ML

ML ML AWS AI

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

We use a test data preparation notebook as part of this step, which is a dependency for the fine-tuning and batch inference step. When fine-tuning is complete, this notebook is run using run magic and prepares a test dataset for sample inference with the fine-tuned model.

ML

ML ML Data Scientist Python

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities.

AI

AI AI Machine Learning Machine Learning

Everything New Coming to ODSC East 2025

ODSC - Open Data Science

DECEMBER 16, 2024

Youll gain immediate, practical skills in Python, data preparation, machine learning modeling, and retrieval-augmented generation (RAG), all leading up to AI Agents. Each course features focused, interactive sessions with hands-on notebooks and exercises, along with dedicated office hours. Learn more about the AI Mini Bootcamphere.

Machine Learning

Machine Learning Machine Learning Data Preparation Artificial Intelligence

MLOps and the evolution of data science

IBM Journey to AI blog

AUGUST 11, 2023

Machine learning (ML), a subset of artificial intelligence (AI), is an important piece of data-driven innovation. Machine learning engineers take massive datasets and use statistical methods to create algorithms that are trained to find patterns and uncover key insights in data mining projects.

Data Science

Data Science Machine Learning Machine Learning ML

How to choose the best AI platform

IBM Journey to AI blog

OCTOBER 20, 2023

Artificial intelligence platforms enable individuals to create, evaluate, implement and update machine learning (ML) and deep learning models in a more scalable way. AI platform tools enable knowledge workers to analyze data, formulate predictions and execute tasks with greater speed and precision than they can manually.

AI

AI AI Machine Learning Machine Learning

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Tools like Apache NiFi, Talend, and Informatica provide user-friendly interfaces for designing workflows, integrating diverse data sources, and executing ETL processes efficiently. Choosing the right tool based on the organisation’s specific needs, such as data volume and complexity, is vital for optimising ETL efficiency.

ETL

ETL Data Warehouse Data Quality Data Governance

The Top AI Slides from ODSC West 2024

ODSC - Open Data Science

NOVEMBER 19, 2024

Introduction to Containers for Data Science / Data Engineering with Michael A. Fudge’s AI slides introduced participants to using containers in data science and engineering workflows. Steven Pousty showcased how to transform unstructured data into a vector-based query system. Fudge Slides Michael A.

Deep Learning

Deep Learning Deep Learning Data Science AI

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

SageMaker Studio allows data scientists, ML engineers, and data engineers to prepare data, build, train, and deploy ML models on one web interface. Key concepts Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning.

ML

ML ML Azure AWS

How Vericast optimized feature engineering using Amazon SageMaker Processing

AWS Machine Learning Blog

MAY 3, 2023

This includes gathering, exploring, and understanding the business and technical aspects of the data, along with evaluation of any manipulations that may be needed for the model building process. One aspect of this data preparation is feature engineering.

AWS

AWS Machine Learning Machine Learning ML

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Flipboard

MARCH 7, 2023

Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity. He develops and codes cloud native solutions with a focus on big data, analytics, and data engineering.

Python

Python AWS ML ML

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022.

SQL

SQL ML ML Python

Harnessing Machine Learning on Big Data with PySpark on AWS

ODSC - Open Data Science

AUGUST 9, 2023

For a comprehensive understanding of the practical applications, including a detailed code walkthrough from data preparation to model deployment, please join us at the ODSC APAC conference 2023. Now, let’s give you a taste of what’s in store (the GitHub code repository can be found here ). if the recipe is a dessert, 0.0

Machine Learning

Machine Learning Machine Learning AWS Big Data

Must-Have Prompt Engineering Skills for 2024

ODSC - Open Data Science

JANUARY 29, 2024

We also examined the results to gain a deeper understanding of why these prompt engineering skills and platforms are in demand for the role of Prompt Engineer, not to mention machine learning and data science roles. For prompt engineers, it can be used for the deployment and orchestration of LLM applications.

Data Science

Data Science Machine Learning Machine Learning Natural Language Processing

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Data Preparation: Cleaning, transforming, and preparing data for analysis and modelling. Collaborating with Teams: Working with data engineers, analysts, and stakeholders to ensure data solutions meet business needs.

Azure

Azure Data Scientist Data Science Machine Learning

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Below, we explore five popular data transformation tools, providing an overview of their features, use cases, strengths, and limitations. Apache Nifi Apache Nifi is an open-source data integration tool that automates system data flow.

Data Quality

Data Quality AWS Machine Learning Machine Learning

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

AWS Machine Learning Blog

JANUARY 26, 2024

Generative artificial intelligence (AI) applications built around large language models (LLMs) have demonstrated the potential to create and accelerate economic value for businesses. She holds an engineering degree from Thapar University, as well as a master’s degree in statistics from Texas A&M University.

AWS

AWS ML ML AI

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

AWS Machine Learning Blog

NOVEMBER 13, 2024

With the integration of SageMaker and Amazon DataZone, it enables collaboration between ML builders and data engineers for building ML use cases. ML builders can request access to data published by data engineers. Additionally, this solution uses Amazon DataZone.

ML

ML ML AWS Data Preparation

30 Best Data Science Books to Read in 2023

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Webinars

Trending Sources

Step-by-step guide: Generative AI for your business

Webinars

State of Machine Learning Survey Results Part Two

How are AI Projects Different

Improving air quality with generative AI

GraphReduce: Using Graphs for Feature Engineering Abstractions

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.

How OLAP and AI can enable better business

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

What is MLOps

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

AI Development Lifecycle Learnings of What Changed with LLMs

Vertex AI: Guide to Google’s Unified Machine Learning Platform

How to Power Successful AI Projects with Trusted Data

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

3 Takeaways from Gartner’s 2018 Data and Analytics Summit

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Tackling AI’s data challenges with IBM databases on AWS

Unlocking Tabular Data’s Hidden Potential

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Exploring the AI and data capabilities of watsonx

Everything New Coming to ODSC East 2025

MLOps and the evolution of data science

How to choose the best AI platform

Maximising Efficiency with ETL Data: Future Trends and Best Practices

The Top AI Slides from ODSC West 2024

Train and deploy ML models in a multicloud environment using Amazon SageMaker

How Vericast optimized feature engineering using Amazon SageMaker Processing

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Harnessing Machine Learning on Big Data with PySpark on AWS

Must-Have Prompt Engineering Skills for 2024

Your Complete Roadmap to Become an Azure Data Scientist

Popular Data Transformation Tools: Importance and Best Practices

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

Stay Connected