Artificial Intelligence, Data Lakes and Data Pipeline

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Alation

MAY 16, 2023

But with the sheer amount of data continually increasing, how can a business make sense of it? Robust data pipelines. What is a Data Pipeline? A data pipeline is a series of processing steps that move data from its source to its destination. The answer?

Data Pipeline

Data Pipeline Data Governance Data Lakes Data Warehouse

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Data management problems can also lead to data silos; disparate collections of databases that don’t communicate with each other, leading to flawed analysis based on incomplete or incorrect datasets. One way to address this is to implement a data lake: a large and complex database of diverse datasets all stored in their original format.

Data Lakes

Data Lakes Clustering Big Data Big Data

Building an Effective OSS Management Layer for Your Data Lake

ODSC - Open Data Science

OCTOBER 13, 2024

Be sure to check out her talk, “ Don’t Go Over the Deep End: Building an Effective OSS Management Layer for Your Data Lake ,” there! Managing a data lake can often feel like being lost at sea — especially when dealing with both structured and unstructured data.

Data Lakes

Data Lakes Database Data Pipeline SQL

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Alation

MAY 16, 2023

But with the sheer amount of data continually increasing, how can a business make sense of it? Robust data pipelines. What is a Data Pipeline? A data pipeline is a series of processing steps that move data from its source to its destination. The answer?

Data Pipeline

Data Pipeline Data Governance Data Lakes Data Warehouse

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. The output data is transformed to a standardized format and stored in a single location in Amazon S3 in Parquet format, a columnar and efficient storage format.

AWS

AWS Python AI AI

6 Remote AI Jobs to Look for in 2024

ODSC - Open Data Science

DECEMBER 19, 2023

The field of artificial intelligence is growing rapidly and with it the demand for professionals who have tangible experience in AI and AI-powered tools. Data Engineer Data engineers are responsible for the end-to-end process of collecting, storing, and processing data. billion in 2021 to $331.2 billion by 2026.

Data Scientist

Data Scientist Machine Learning Machine Learning AI

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

He specializes in large language models, cloud infrastructure, and scalable data systems, focusing on building intelligent solutions that enhance automation and data accessibility across Amazons operations. Rajesh Nedunuri is a Senior Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

AI is quickly scaling through dozens of industries as companies, non-profits, and governments are discovering the power of artificial intelligence. Cloudera Cloudera is a cloud-based platform that provides businesses with the tools they need to manage and analyze data.

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

What Does a Data Engineering Job Involve in 2024?

ODSC - Open Data Science

JANUARY 30, 2024

Not only does it involve the process of collecting, storing, and processing data so that it can be used for analysis and decision-making, but these professionals are responsible for building and maintaining the infrastructure that makes this possible; and so much more. Think of data engineers as the architects of the data ecosystem.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Overview: Data science vs data analytics Think of data science as the overarching umbrella that covers a wide range of tasks performed to find patterns in large datasets, structure data for use, train machine learning models and develop artificial intelligence (AI) applications.

Data Science

Data Science Analytics Analytics Data Scientist

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

Artificial intelligence (AI) adoption is still in its early stages. The Stanford Institute for Human-Centered Artificial Intelligence’s Center for Research on Foundation Models (CRFM) recently outlined the many risks of foundation models, as well as opportunities. Trustworthiness is critical.

AI

AI AI Data Warehouse ML

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Let’s demystify this using the following personas and a real-world analogy: Data and ML engineers (owners and producers) – They lay the groundwork by feeding data into the feature store Data scientists (consumers) – They extract and utilize this data to craft their models Data engineers serve as architects sketching the initial blueprint.

AWS

AWS ML ML Machine Learning

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Great Expectations provides support for different data backends such as flat file formats, SQL databases, Pandas dataframes and Sparks, and comes with built-in notification and data documentation functionality. You can watch it on demand here.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

Highlights from the Data Engineering Summit Now Available On Demand

ODSC - Open Data Science

FEBRUARY 14, 2023

It also addresses the strategies and best practices for implementing a data mesh. Applying Engineering Best Practices in Data Lakes Architectures Einat Orr | Ceo and Co-Founder | Treeverse This talk examines why agile methodology, continuous integration, and continuous deployment and production monitoring are essential for data lakes.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Mainframe Technology Trends for 2024

Precisely

JANUARY 18, 2024

Customers can build new channels, offload processing, and drive business insights and intelligence with data analytics and data lakes. This enables customers to migrate with zero downtime and/or replicate DB2, IMS, and VSAM data from an on-prem mainframe to the AWS cloud in real time.

AWS

AWS Artificial Intelligence Artificial Intelligence Cloud Computing

How data stores and governance impact your AI initiatives

IBM Journey to AI blog

OCTOBER 12, 2023

Securing AI models and their access to data While AI models need flexibility to access data across a hybrid infrastructure, they also need safeguarding from tampering (unintentional or otherwise) and, especially, protected access to data.

AI

AI AI Data Scientist Data Governance

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

JuMa is tightly integrated with a range of BMW Central IT services, including identity and access management, roles and rights management, BMW Cloud Data Hub (BMW’s data lake on AWS) and on-premises databases.

ML

ML ML AWS AI

What is Data Ingestion? Understanding the Basics

Pickl AI

JULY 25, 2024

Data Ingestion Meaning At its core, It refers to the act of absorbing data from multiple sources and transporting it to a destination, such as a database, data warehouse, or data lake. Batch Processing In this method, data is collected over a period and then processed in groups or batches.

Apache Kafka

Apache Kafka Data Lakes Data Warehouse Data Quality

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Big data isn’t an abstract concept anymore, as so much data comes from social media, healthcare data, and customer records, so knowing how to parse all of that is needed. This pushes into big data as well, as many companies now have significant amounts of data and large data lakes that need analyzing.

Data Science

Data Science Data Scientist Computer Science Computer Science

Best Practices for Your AWS Cloud Migration

Precisely

OCTOBER 3, 2024

Companies once relied heavily on on-premises ETL and data lakes, but today, there’s a shift towards cloud-native data environments. The business case: The broadcasting company had an existing on-prem data management stack, with data pipelines operating on delays of up to 10 hours.

AWS

AWS Data Lakes ETL Data Pipeline

Announcing the First Speakers for the 2024 Data Engineering Summit

ODSC - Open Data Science

FEBRUARY 15, 2024

These systems represent data as knowledge graphs and implement graph traversal algorithms to help find content in massive datasets. These systems are not only useful for a wide range of industries, they are fun for data engineers to work on.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Why Lean Data Management Is Vital for Agile Companies

Pickl AI

DECEMBER 11, 2024

Focusing only on what truly matters reduces data clutter, enhances decision-making, and improves the speed at which actionable insights are generated. Streamlined Data Pipelines Efficient data pipelines form the backbone of lean data management.

Data Silos

Data Silos Data Pipeline Artificial Intelligence Artificial Intelligence

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Watsonx.data is built on 3 core integrated components: multiple query engines, a catalog that keeps track of metadata, and storage and relational data sources which the query engines directly access.

AI

AI AI Machine Learning Machine Learning

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

This individual is responsible for building and maintaining the infrastructure that stores and processes data; the kinds of data can be diverse, but most commonly it will be structured and unstructured data. They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

A novel approach to solve this complex security analytics scenario combines the ingestion and storage of security data using Amazon Security Lake and analyzing the security data with machine learning (ML) using Amazon SageMaker.

AWS

AWS ML ML Algorithm

ETL Process Explained: Essential Steps for Effective Data Management

Pickl AI

OCTOBER 17, 2024

Apache NiFi As an open-source data integration tool, Apache NiFi enables seamless data flow and transformation across systems. Its drag-and-drop interface simplifies the design of data pipelines, making it easier for users to implement complex transformation logic.

ETL

ETL Data Warehouse SQL Data Quality

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

It supports batch and real-time data processing, making it a preferred choice for large enterprises with complex data workflows. Informatica’s AI-powered automation helps streamline data pipelines and improve operational efficiency.

Data Quality

Data Quality AWS Machine Learning Machine Learning

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Storage Solutions: Secure and scalable storage options like Azure Blob Storage and Azure Data Lake Storage. Key features and benefits of Azure for Data Science include: Scalability: Easily scale resources up or down based on demand, ideal for handling large datasets and complex computations.

Azure

Azure Data Scientist Machine Learning Data Science

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022.

SQL

SQL ML ML Python

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP. Additionally, Amazon Simple Storage Service (Amazon S3) served as the central data lake, providing a scalable and cost-effective storage solution for the diverse data types collected from different systems.

AWS

AWS Data Governance Data Silos SQL

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

You don’t need a bigger boat : The repository curated by Jacopo Tagliabue shows how several (mostly open-source) tools can be effectively combined together to run data pipelines at scale with very small teams. Solution Data lakes and warehouses are the two key components of any data pipeline.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Scale knowledge management use cases with generative AI

IBM Journey to AI blog

JULY 27, 2023

Artificial intelligence is disrupting many different areas of business. Powering a knowledge management system with a data lakehouse Organizations need a data lakehouse to target data challenges that come with deploying an AI-powered knowledge management system. A data lakehouse is a fit-for-purpose data store.

AI

AI AI Data Scientist Data Quality

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

It’s distributed both in the cloud and on-premises, allowing extensive use and movement across clouds, apps and networks, as well as stores of data at rest. An architecture designed for data democratization aims to be flexible, integrated, agile and secure to enable the use of data and artificial intelligence (AI) at scale.

Data Lakes

Data Lakes AI AI Data Governance

Data Science Current

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Drowning in Data? A Data Lake May Be Your Lifesaver

Webinars

Trending Sources

Building an Effective OSS Management Layer for Your Data Lake

Webinars

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Improving air quality with generative AI

6 Remote AI Jobs to Look for in 2024

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Find Your AI Solutions at the ODSC West AI Expo

What Does a Data Engineering Job Involve in 2024?

Data science vs data analytics: Unpacking the differences

How to use foundation models and trusted governance to manage AI workflow risk

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

11 Open Source Data Exploration Tools You Need to Know in 2023

Highlights from the Data Engineering Summit Now Available On Demand

Mainframe Technology Trends for 2024

How data stores and governance impact your AI initiatives

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

What is Data Ingestion? Understanding the Basics

40 Must-Know Data Science Skills and Frameworks for 2023

Best Practices for Your AWS Cloud Migration

Announcing the First Speakers for the 2024 Data Engineering Summit

Why Lean Data Management Is Vital for Agile Companies

Exploring the AI and data capabilities of watsonx

How to Shift from Data Science to Data Engineering

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

ETL Process Explained: Essential Steps for Effective Data Management

Popular Data Transformation Tools: Importance and Best Practices

Your Complete Roadmap to Become an Azure Data Scientist

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Shaping the future: OMRON’s data-driven journey with AWS

Definite Guide to Building a Machine Learning Platform

Scale knowledge management use cases with generative AI

Data democratization: How data architecture can drive business decisions and AI initiatives

Stay Connected