Data Lakes, Data Pipeline and Events

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

With this full-fledged solution, you don’t have to spend all your time and effort combining different services or duplicating data. Overview of One Lake Fabric features a lake-centric architecture, with a central repository known as OneLake. Here, we changed the data types of columns and dealt with missing values.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Data management problems can also lead to data silos; disparate collections of databases that don’t communicate with each other, leading to flawed analysis based on incomplete or incorrect datasets. One way to address this is to implement a data lake: a large and complex database of diverse datasets all stored in their original format.

Data Lakes

Data Lakes Clustering Big Data Big Data

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. It will enable you to quickly transform and load the data results into Amazon S3 data lakes or JDBC data stores.

Apache Kafka

Apache Kafka ETL Data Lakes AWS

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP. Additionally, Amazon Simple Storage Service (Amazon S3) served as the central data lake, providing a scalable and cost-effective storage solution for the diverse data types collected from different systems.

AWS

AWS Data Governance Data Silos SQL

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

If the question was Whats the schedule for AWS events in December?, AWS usually announces the dates for their upcoming # re:Invent event around 6-9 months in advance. Rajesh Nedunuri is a Senior Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

What Does a Data Engineering Job Involve in 2024?

ODSC - Open Data Science

JANUARY 30, 2024

Not only does it involve the process of collecting, storing, and processing data so that it can be used for analysis and decision-making, but these professionals are responsible for building and maintaining the infrastructure that makes this possible; and so much more. Think of data engineers as the architects of the data ecosystem.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

6 Remote AI Jobs to Look for in 2024

ODSC - Open Data Science

DECEMBER 19, 2023

Data Engineer Data engineers are responsible for the end-to-end process of collecting, storing, and processing data. They use their knowledge of data warehousing, data lakes, and big data technologies to build and maintain data pipelines. Interested in attending an ODSC event?

Data Scientist

Data Scientist Machine Learning Machine Learning AI

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL SQL Data Quality

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

Enterprise data architects, data engineers, and business leaders from around the globe gathered in New York last week for the 3-day Strata Data Conference , which featured new technologies, innovations, and many collaborative ideas. 2) When data becomes information, many (incremental) use cases surface.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

A novel approach to solve this complex security analytics scenario combines the ingestion and storage of security data using Amazon Security Lake and analyzing the security data with machine learning (ML) using Amazon SageMaker. Store new security logs in an S3 bucket and queue events in Amazon Simple Queue Service (Amazon SQS).

AWS

AWS ML ML Algorithm

Introduction to Apache NiFi and Its Architecture

Pickl AI

JULY 30, 2024

Flow-Based Programming : NiFi employs a flow-based programming model, allowing users to create complex data flows using simple drag-and-drop operations. This visual representation simplifies the design and management of data pipelines. Provenance Repository : This repository records all provenance events related to FlowFiles.

ETL

ETL Data Lakes Big Data Big Data

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. They are crucial in ensuring data is readily available for analysis and reporting.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

LakeFS LakeFS is an open-source platform that provides data lake versioning and management capabilities. It sits between the data lake and cloud object storage, allowing you to version and control changes to data lakes at scale. Flyte Flyte is a platform for orchestrating ML pipelines at scale.

Machine Learning

Machine Learning Machine Learning ML ML

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Recognizing these specific needs, Fivetran has developed a range of connectors, including dedicated applications, databases, files, and events, which can accommodate the diverse formats used by healthcare systems. Addressing these needs may pose challenges that lead to the implementation of custom solutions rather than a uniform approach.

SQL

SQL Data Warehouse Azure Cloud Data

Announcing the First Speakers for the 2024 Data Engineering Summit

ODSC - Open Data Science

FEBRUARY 15, 2024

These systems represent data as knowledge graphs and implement graph traversal algorithms to help find content in massive datasets. These systems are not only useful for a wide range of industries, they are fun for data engineers to work on. Interested in attending an ODSC event? Learn more about our upcoming events here.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

By analyzing datasets, data scientists can better understand their potential use in an algorithm or machine learning model. The data science lifecycle Data science is iterative, meaning data scientists form hypotheses and experiment to see if a desired outcome can be achieved using available data.

Data Science

Data Science Analytics Analytics Data Scientist

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Cloudera Cloudera is a cloud-based platform that provides businesses with the tools they need to manage and analyze data. They offer a variety of services, including data warehousing, data lakes, and machine learning. The platform includes several features that make it easy to develop and test data pipelines.

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Collaboration across teams – Shared features allow disparate teams like fraud, marketing, and sales to collaborate on building ML models using the same reliable data instead of creating siloed features. Audit trail for compliance – Administrators can monitor feature usage by all accounts centrally using CloudTrail event logs.

AWS

AWS ML ML Machine Learning

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

Data engineers will also work with data scientists to design and implement data pipelines; ensuring steady flows and minimal issues for data teams. They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable. Interested in attending an ODSC event?

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

Curated foundation models, such as those created by IBM or Microsoft, help enterprises scale and accelerate the use and impact of the most advanced AI capabilities using trusted data. In addition to natural language, models are trained on various modalities, such as code, time-series, tabular, geospatial and IT events data.

AI

AI AI Data Warehouse ML

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

Traditional maintenance activities rely on a sizable workforce distributed across key locations along the BHS dispatched by operators in the event of an operational fault. With this service, industrial sensors, smart meters, and OPC UA servers can be connected to an AWS data lake with just a few clicks.

AWS

AWS ML ML Machine Learning

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date.

Machine Learning

Machine Learning Machine Learning AI AI

What Is Data Modernization? 5 Benefits Worth Knowing

Alation

APRIL 19, 2022

In that sense, data modernization is synonymous with cloud migration. Modern data architectures, like cloud data warehouses and cloud data lakes , empower more people to leverage analytics for insights more efficiently. Access the resources your data applications need — no more, no less. Advanced Tooling.

Data Governance

Data Governance Cloud Data Database Data Silos

What is Snowflake Horizon?

phData

AUGUST 5, 2024

Data storage flexibility is beneficial for customers who cannot (or choose not) to store their data lakes in Snowflake. Operational Risks: Identify operational risks, such as data loss or failures, in the event of an unforeseen outage or disaster.

Data Governance

Data Governance Data Quality Data Lakes ML

What are the Biggest Challenges with Migrating to Snowflake?

phData

FEBRUARY 5, 2024

Other features include email notifications (to let you know if a job failed or is running long), job scheduling, orchestration to ensure your data gets to Snowflake when you want it, and of course, full automation of your complete data ingestion process.

SQL

SQL Database Data Quality Data Warehouse

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

What Are the Best Third-Party Data Ingestion Tools for Snowflake? Fivetran Fivetran is a tool dedicated to replicating applications, databases, events, and files into a high-performance data warehouse, such as Snowflake. Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.).

Data Warehouse

Data Warehouse Azure AWS Database

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

Data pipeline orchestration. Moving/integrating data in the cloud/data exploration and quality assessment. Supports the ability to interact with the actual data and perform analysis on it. This provides the facility a time or event for a job to run and offers useful post-run information. Scheduling.

Data Governance

Data Governance ML ML Cloud Data

Star Schema vs. Snowflake Schema: Comparing Dimensional Modeling Techniques

Pickl AI

JULY 25, 2024

Must Read Blogs: Exploring the Power of Data Warehouse Functionality. Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world. Exploring Differences: Database vs Data Warehouse. Explore More: Build Data Pipelines: Comprehensive Step-by-Step Guide.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Thus, the solution allows for scaling data workloads independently from one another and seamlessly handling data warehousing, data lakes , data sharing, and engineering. You can use Snowflake cloud computing to store raw data in structured or variant format, using various data models to meet the needs.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

The service will consume the features in real time, generate predictions in near real-time , such as in an event processing pipeline, and write the outputs to a prediction queue. Solution Data lakes and warehouses are the two key components of any data pipeline. Data engineers are mostly in charge of it.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Storage Solutions: Secure and scalable storage options like Azure Blob Storage and Azure Data Lake Storage. Key features and benefits of Azure for Data Science include: Scalability: Easily scale resources up or down based on demand, ideal for handling large datasets and complex computations.

Azure

Azure Data Scientist Data Science Machine Learning

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

We have data pipelines and data preparation. So each of them may require some repositories from a data lake house/analytics hub kind of thing for sharing data, to a feature store, to a model hub, to the responsible AI (known sets of things that you need to guard against), to a model registry.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

Google’s Arsanjani on Enterprise Foundation Model Challenges

Snorkel AI

MARCH 2, 2023

We have data pipelines and data preparation. So each of them may require some repositories from a data lake house/analytics hub kind of thing for sharing data, to a feature store, to a model hub, to the responsible AI (known sets of things that you need to guard against), to a model registry.

Machine Learning

Machine Learning Machine Learning Data Preparation AI

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Methods that allow our customer data models to be as dynamic and flexible as the customers they represent. In this guide, we will explore concepts like transitional modeling for customer profiles, the power of event logs for customer behavior, persistent staging for raw customer data, real-time customer data capture, and much more.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

Data Science Current

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Drowning in Data? A Data Lake May Be Your Lifesaver

Webinars

Trending Sources

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Webinars

Shaping the future: OMRON’s data-driven journey with AWS

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

What Does a Data Engineering Job Involve in 2024?

6 Remote AI Jobs to Look for in 2024

Comparing Tools For Data Processing Pipelines

3 Major Trends at Strata New York 2017

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

Introduction to Apache NiFi and Its Architecture

Discover the Most Important Fundamentals of Data Engineering

MLOps Landscape in 2023: Top Tools and Platforms

Top 5 Fivetran Connectors for Healthcare

Announcing the First Speakers for the 2024 Data Engineering Summit

Data science vs data analytics: Unpacking the differences

Find Your AI Solutions at the ODSC West AI Expo

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

How to Shift from Data Science to Data Engineering

How to use foundation models and trusted governance to manage AI workflow risk

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

How to Manage Unstructured Data in AI and Machine Learning Projects

What Is Data Modernization? 5 Benefits Worth Knowing

What is Snowflake Horizon?

What are the Biggest Challenges with Migrating to Snowflake?

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

The Cloud Connection: How Governance Supports Security

Star Schema vs. Snowflake Schema: Comparing Dimensional Modeling Techniques

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Definite Guide to Building a Machine Learning Platform

Your Complete Roadmap to Become an Azure Data Scientist

Google’s Dr. Arsanjani on Enterprise Foundation Model Challenges

Google’s Arsanjani on Enterprise Foundation Model Challenges

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Stay Connected