Data Engineering, Data Preparation and Python

30 Best Data Science Books to Read in 2023

Analytics Vidhya

FEBRUARY 28, 2023

Introduction Data science has taken over all economic sectors in recent times. To achieve maximum efficiency, every company strives to use various data at every stage of its operations.

Data Science

Data Science Data Preparation Big Data Big Data

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Flipboard

MARCH 7, 2023

This post presents and compares options and recommended practices on how to manage Python packages and virtual environments in Amazon SageMaker Studio notebooks. Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity. Define a Dockerfile.

Python

Python AWS ML ML

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

Aspiring and experienced Data Engineers alike can benefit from a curated list of books covering essential concepts and practical techniques. These 10 Best Data Engineering Books for beginners encompass a range of topics, from foundational principles to advanced data processing methods. What is Data Engineering?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

The solution harnesses the capabilities of generative AI, specifically Large Language Models (LLMs), to address the challenges posed by diverse sensor data and automatically generate Python functions based on various data formats. The solution only invokes the LLM for new device data file type (code has not yet been generated).

AWS

AWS Python AI AI

Recapping the Cloud Amplifier and Snowflake Demo

Towards AI

JANUARY 28, 2024

How to use Cloud Amplifier to: Create a new table in Snowflake and insert data Snowflake APIs in Python allow you to manipulate and integrate your data in sophisticated — and useful — ways. Here’s how we did it in the demo: We leveraged Domo’s APIs to provision these data sets in Domo from dataframes in Python.

ETL

ETL Python Database Data Preparation

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

Additionally, these tools provide a comprehensive solution for faster workflows, enabling the following: Faster data preparation – SageMaker Canvas has over 300 built-in transformations and the ability to use natural language that can accelerate data preparation and making data ready for model building.

Machine Learning

Machine Learning Machine Learning Data Preparation ML

State of Machine Learning Survey Results Part Two

ODSC - Open Data Science

MARCH 13, 2023

First, there’s a need for preparing the data, aka data engineering basics. Machine learning practitioners are often working with data at the beginning and during the full stack of things, so they see a lot of workflow/pipeline development, data wrangling, and data preparation.

Machine Learning

Machine Learning Machine Learning Data Wrangling Data Science

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Organizations are building data-driven applications to guide business decisions, improve agility, and drive innovation. Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Choose the plus sign and for Notebook , choose Python 3.

SQL

SQL AWS Data Lakes AI

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

Tapping into these schemas and pulling out machine learning-ready features can be nontrivial as one needs to know where the data entity of interest lives (e.g., customers), what its relations are, and how they’re connected, and then write SQL, python, or other to join and aggregate to a granularity of interest.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.

Towards AI

MAY 8, 2024

Vertex AI assimilates workflows from data science, data engineering, and machine learning to help your teams work together with a shared toolkit and grow your apps with the help of Google Cloud. Conclusion Vertex AI is a major improvement over Google Cloud’s machine learning and data science solutions.

Machine Learning

Machine Learning Machine Learning ML ML

How are AI Projects Different

Towards AI

AUGUST 16, 2023

MLOps is the intersection of Machine Learning, DevOps, and Data Engineering. Zero, “ How to write better scientific code in Python,” Towards Data Science, Feb. Galarnyk, “ Considerations for Deploying Machine Learning Models in Production,” Towards Data Science, Nov. 15, 2022. [4]

Machine Learning

Machine Learning Machine Learning AI AI

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

SageMaker Studio allows data scientists, ML engineers, and data engineers to prepare data, build, train, and deploy ML models on one web interface. Finally, we deploy the ONNX model along with a custom inference code written in Python to Azure Functions using the Azure CLI. image and Python 3.0

ML

ML ML Azure AWS

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

AWS Machine Learning Blog

JULY 11, 2024

It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. Specifically, we demonstrate how you can customize SageMaker Distribution for geospatial workflows by extending it with open-source geospatial Python libraries.

AWS

AWS ML ML Python

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

The training data used for this pipeline is made available through PrestoDB and read into Pandas through the PrestoDB Python client. The queries that are used to fetch data at training and batch inference steps are configured in the config file.

ML

ML ML AWS Machine Learning

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

You can use this notebook job step to easily run notebooks as jobs with just a few lines of code using the Amazon SageMaker Python SDK. Data scientists currently use SageMaker Studio to interactively develop their Jupyter notebooks and then use SageMaker notebook jobs to run these notebooks as scheduled jobs.

ML

ML ML Data Scientist Python

New DataRobot and Snowflake Integrations: Seamless Data Prep, Model Deployment, and Monitoring

DataRobot Blog

MARCH 16, 2023

The DataRobot team has been working hard on new integrations that make data scientists more agile and meet the needs of enterprise IT, starting with Snowflake. We’ve tightened the loop between ML data prep , experimentation and testing all the way through to putting models into production.

Data Scientist

Data Scientist ML ML Data Preparation

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

Snowpark is the set of libraries and runtimes in Snowflake that securely deploy and process non-SQL code, including Python, Java, and Scala. On the server side, runtimes include Python, Java, and Scala in the warehouse model or Snowpark Container Services (public preview).

Python

Python ML ML SQL

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

phData

AUGUST 2, 2024

These procedures are designed to automate repetitive tasks, implement business logic, and perform complex data transformations , increasing the productivity and efficiency of data processing workflows. Snowflake stored procedures and dbt Hooks are essential to modern data engineering and analytics workflows.

Data Pipeline

Data Pipeline Python Database SQL

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

JuMa is a service of BMW Group’s AI platform for its data analysts, ML engineers, and data scientists that provides a user-friendly workspace with an integrated development environment (IDE). It is powered by Amazon SageMaker Studio and provides JupyterLab for Python and Posit Workbench for R.

ML

ML ML AWS AI

The Top AI Slides from ODSC West 2024

ODSC - Open Data Science

NOVEMBER 19, 2024

Mustafa Hajij introduced TopoX, a comprehensive Python suite for topological deep learning. This session demonstrated how to leverage these tools using Python and PyTorch, offering attendees practical techniques to apply in their research and projects. Introduction to Containers for Data Science / Data Engineering with Michael A.

Deep Learning

Deep Learning Deep Learning Data Science AI

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

With the introduction of EMR Serverless support for Apache Livy endpoints , SageMaker Studio users can now seamlessly integrate their Jupyter notebooks running sparkmagic kernels with the powerful data processing capabilities of EMR Serverless. In this post, we build a Docker image that includes the Python 3.11

AWS

AWS Clustering Big Data Big Data

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring. He is passionate about helping customers to build scalable and modern data analytics solutions to gain insights from the data.

AWS

AWS ML ML Machine Learning

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities.

AI

AI AI Machine Learning Machine Learning

How Alteryx & Snowflake Accelerates Analytics

phData

FEBRUARY 24, 2023

Alteryx provides organizations with an opportunity to automate access to data, analytics , data science, and process automation all in one, end-to-end platform. Its capabilities can be split into the following topics: automating inputs & outputs, data preparation, data enrichment, and data science.

Analytics

Analytics Analytics Database Python

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. This provides end-to-end support for data engineering and MLOps workflows.

Machine Learning

Machine Learning Machine Learning ML ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Airflow for workflow orchestration Airflow schedules and manages complex workflows, defining tasks and dependencies in Python code. An example direct acyclic graph (DAG) might automate data ingestion, processing, model training, and deployment tasks, ensuring that each step is run in the correct order and at the right time.

AWS

AWS Machine Learning Machine Learning ML

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Data Preparation: Cleaning, transforming, and preparing data for analysis and modelling. Collaborating with Teams: Working with data engineers, analysts, and stakeholders to ensure data solutions meet business needs.

Azure

Azure Data Scientist Data Science Machine Learning

WiBD Spring Hackathon 2024: A Journey of Learning and Collaboration

Women in Big Data

JULY 19, 2024

We hope that the hackathon experience enhanced their exposure to real world data and skills to tackle real world challenges using data engineering and machine learning skills. We look forward to future hackathons and continuing our journey in data science!

Data Science

Data Science Big Data Big Data Machine Learning

Everything New Coming to ODSC East 2025

ODSC - Open Data Science

DECEMBER 16, 2024

Youll gain immediate, practical skills in Python, data preparation, machine learning modeling, and retrieval-augmented generation (RAG), all leading up to AI Agents. Each course features focused, interactive sessions with hands-on notebooks and exercises, along with dedicated office hours.

Machine Learning

Machine Learning Machine Learning Data Preparation Artificial Intelligence

Why SQL is important for Data Analyst?

Pickl AI

APRIL 10, 2023

In case of professional Data Analysts, who might be engaged in performing experiments on data, standard SQL tools are required. Data Analysts need deeper knowledge on SQL to understand relational databases like Oracle, Microsoft SQL and MySQL. Moreover, SQL is an important tool for conducting Data Preparation and Data Wrangling.

Data Analyst

Data Analyst SQL Data Analysis Data Analysis

MLOps and the evolution of data science

IBM Journey to AI blog

AUGUST 11, 2023

Because the machine learning lifecycle has many complex components that reach across multiple teams, it requires close-knit collaboration to ensure that hand-offs occur efficiently, from data preparation and model training to model deployment and monitoring.

Data Science

Data Science Machine Learning Machine Learning ML

Harnessing Machine Learning on Big Data with PySpark on AWS

ODSC - Open Data Science

AUGUST 9, 2023

For a comprehensive understanding of the practical applications, including a detailed code walkthrough from data preparation to model deployment, please join us at the ODSC APAC conference 2023. He is passionate about large scale distributed systems and is a vivid fan of Python. if the recipe is a dessert, 0.0

Machine Learning

Machine Learning Machine Learning AWS Big Data

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Airflow allows you to configure, schedule, and monitor data pipelines programmatically in Python to define all the stages of the lifecycle of typical workflow management. We use DAG (Directed Acyclic Graph) in Airflow, DAGs describe how to run a workflow by defining the pipeline in Python, that is configuration as code.

Data Pipeline

Data Pipeline ML ML AWS

Understanding Data Science and Data Analysis Life Cycle

Pickl AI

MAY 30, 2024

Additionally, you will work closely with cross-functional teams, translating complex data insights into actionable recommendations that can significantly impact business strategies and drive overall success. Also Read: Explore data effortlessly with Python Libraries for (Partial) EDA: Unleashing the Power of Data Exploration.

Data Analysis

Data Analysis Data Analysis Data Science Exploratory Data Analysis

Must-Have Prompt Engineering Skills for 2024

ODSC - Open Data Science

JANUARY 29, 2024

Data, Engineering, and Programming Skills Programming Despite the rise of no-code platforms and AI code assistance, programming skills are still essential for training and fine-tuning LLM models, scripting for data processing, and integrating models into applications. Python Python’s prominence is expected.

Data Science

Data Science Machine Learning Machine Learning Natural Language Processing

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

Knowing this, you want to have data prepared in a way to optimize your load. Snowflake Connectors For accessing data, you’ll find a slew of Snowflake connectors on the Snowflake website. For example: ODBC JDBC Python Snowflake Connector And, generally, things will be okay. Be sure to test your scenarios, though.

Clustering

Clustering Database SQL Data Pipeline

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

In August 2019, Data Works was acquired and Dave worked to ensure a successful transition. David: My technical background is in ETL, data extraction, data engineering and data analytics. All of the notebooks are in Python. Do you have any advice for those just getting started in data science?

ETL

ETL Data Scientist Data Science Machine Learning

Data science

Dataconomy

MARCH 19, 2025

Key disciplines involved in data science Understanding the core disciplines within data science provides a comprehensive perspective on the field’s multifaceted nature. Overview of core disciplines Data science encompasses several key disciplines including data engineering, data preparation, and predictive analytics.

Data Science

Data Science Citizen Data Scientist Data Scientist Machine Learning

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

AWS Machine Learning Blog

NOVEMBER 13, 2024

With the integration of SageMaker and Amazon DataZone, it enables collaboration between ML builders and data engineers for building ML use cases. ML builders can request access to data published by data engineers. Additionally, this solution uses Amazon DataZone. It’s mapped to the custom_details field.

ML

ML ML AWS Data Preparation

Deploy RAG applications on Amazon SageMaker JumpStart using FAISS

AWS Machine Learning Blog

DECEMBER 5, 2024

Solution overview To implement our RAG workflow on SageMaker JumpStart, we use a popular open source Python library known as LangChain. SageMaker JumpStart simplifies this process because the model artifacts, data, and container specifications are all pre-packaged for optimal inference.

AWS

AWS ML ML Machine Learning

30 Best Data Science Books to Read in 2023

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Webinars

Trending Sources

10 Best Data Engineering Books [Beginners to Advanced]

Webinars

Discover the Most Important Fundamentals of Data Engineering

Improving air quality with generative AI

Recapping the Cloud Amplifier and Snowflake Demo

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

State of Machine Learning Survey Results Part Two

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

GraphReduce: Using Graphs for Feature Engineering Abstractions

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.

How are AI Projects Different

Train and deploy ML models in a multicloud environment using Amazon SageMaker

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

New DataRobot and Snowflake Integrations: Seamless Data Prep, Model Deployment, and Monitoring

How Does Snowpark Work?

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

The Top AI Slides from ODSC West 2024

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Exploring the AI and data capabilities of watsonx

How Alteryx & Snowflake Accelerates Analytics

MLOps Landscape in 2023: Top Tools and Platforms

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Your Complete Roadmap to Become an Azure Data Scientist

WiBD Spring Hackathon 2024: A Journey of Learning and Collaboration

Everything New Coming to ODSC East 2025

Why SQL is important for Data Analyst?

MLOps and the evolution of data science

Harnessing Machine Learning on Big Data with PySpark on AWS

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Understanding Data Science and Data Analysis Life Cycle

Must-Have Prompt Engineering Skills for 2024

Getting Started With Snowflake: Best Practices For Launching

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Data science

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

Deploy RAG applications on Amazon SageMaker JumpStart using FAISS

Stay Connected