Artificial Intelligence, Data Pipeline and Database

Securing the data pipeline, from blockchain to AI

Dataconomy

OCTOBER 8, 2024

Generative artificial intelligence is the talk of the town in the technology world today. These challenges are primarily due to how data is collected, stored, moved and analyzed. With most AI models, their training data will come from hundreds of different sources, any one of which could present problems.

Data Pipeline

Data Pipeline AI AI Data Warehouse

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow. Organizations can harness the full potential of their data while reducing risk and lowering costs.

Data Pipeline

Data Pipeline ETL SQL Database

Designing generative AI workloads for resilience

AWS Machine Learning Blog

FEBRUARY 1, 2024

Data pipelines In cases where you need to provide contextual data to the foundation model using the RAG pattern, you need a data pipeline that can ingest the source data, convert it to embedding vectors, and store the embedding vectors in a vector database.

AWS

AWS AI AI Database

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Meet the Seattle-area startups that just graduated from Y Combinator

Flipboard

SEPTEMBER 25, 2023

(Y Combinator Photo) Seattle-area startups that just graduated from Y Combinator’s summer 2023 batch are tackling a wide range of problems — with plenty of help from artificial intelligence. Neum AI, a platform designed to assist companies in maintaining the relevancy of their AI applications with the latest data.

Data Pipeline

Data Pipeline AI AI Natural Language Processing

Discovering the Role of Data Science in a Cloud World

Pickl AI

DECEMBER 26, 2024

Defining Cloud Computing in Data Science Cloud computing provides on-demand access to computing resources such as servers, storage, databases, and software over the Internet. For Data Science, it means deploying Analytics , Machine Learning , and Big Data solutions on cloud platforms without requiring extensive physical infrastructure.

Data Science

Data Science Cloud Computing Machine Learning Machine Learning

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

IBM Data Science in Practice

MARCH 8, 2023

Source: IBM Cloud Pak for Data Feature Computation Engine Users can transform batch, streaming, and real-time data into features Source: IBM Cloud Pak for Data To productionize a machine learning system, it is necessary to process new data continuously. Spark, Flink, etc.) How to Get Started?

Machine Learning

Machine Learning Machine Learning ML ML

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python. Databases and SQL : Managing and querying relational databases using SQL, as well as working with NoSQL databases like MongoDB.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Supercharge your data strategy: Integrate and innovate today leveraging data integration

IBM Journey to AI blog

OCTOBER 22, 2024

Data is the differentiator as business leaders look to utilize their competitive edge as they implement generative AI (gen AI). Leaders feel the pressure to infuse their processes with artificial intelligence (AI) and are looking for ways to harness the insights in their data platforms to fuel this movement.

Data Silos

Data Silos Data Pipeline DataOps Business Intelligence

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. The fundamental objective is to build a manufacturer-agnostic database, leveraging generative AI’s ability to standardize sensor outputs, synchronize data, and facilitate precise corrections.

AWS

AWS AI AI Python

Navigating the Cloud Modernization Journey: Insights from Precisely’s Partnership with AWS

Precisely

APRIL 11, 2024

The project’s first phase focused on leveraging data replication offered by Precisely to enable near real-time replication of midrange data to AWS with support for heterogeneous source and target databases.

AWS

AWS Cloud Computing Database Data Pipeline

The 2021 Executive Guide To Data Science and AI

Applied Data Science

AUGUST 2, 2021

Automation Automating data pipelines and models ➡️ 6. The Data Engineer Not everyone working on a data science project is a data scientist. Data engineers are the glue that binds the products of data scientists into a coherent and robust data pipeline.

Data Science

Data Science Data Scientist ML ML

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical document workloads at virtually any scale without managing infrastructure. Enter a user name, password, and database name. For this post, we add our restaurant data. Choose Add connection.

Machine Learning

Machine Learning Machine Learning AWS ML

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

There are many well-known libraries and platforms for data analysis such as Pandas and Tableau, in addition to analytical databases like ClickHouse, MariaDB, Apache Druid, Apache Pinot, Google BigQuery, Amazon RedShift, etc. VisiData works with CSV files, Excel spreadsheets, SQL databases, and many other data sources.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

AI is quickly scaling through dozens of industries as companies, non-profits, and governments are discovering the power of artificial intelligence. This can be helpful for businesses that need to track data from multiple sources, such as sales, marketing, and customer service.

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

Build trust in banking with data lineage

IBM Journey to AI blog

APRIL 20, 2023

Before a bank can start the process of certifying a risk model, they first need to understand what data is being used and how it changes as it moves from a database to a model. The value of data lineage applies across all industries, but there are three key focuses when you consider it for banking use cases: 1.

Database

Database Data Engineering Data Engineer Data Engineering

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale. Amazon DynamoDB is a fast and flexible nonrelational database service for any scale. The ML model is trained from pet profiles pulled from Purina’s database, assuming the primary breed label is the true label.

AWS

AWS ML ML Machine Learning

Step-by-step guide: Generative AI for your business

IBM Journey to AI blog

JULY 30, 2024

Generative artificial intelligence (gen AI) is transforming the business world by creating new opportunities for innovation, productivity and efficiency. Data Engineer: A data engineer sets the foundation of building any generating AI app by preparing, cleaning and validating data required to train and deploy AI models.

AI

AI AI Data Scientist Data Preparation

Building a Dataset for Triplet Loss with Keras and TensorFlow

Flipboard

FEBRUARY 13, 2023

Project Structure Creating Our Configuration File Creating Our Data Pipeline Preprocessing Faces: Detection and Cropping Summary Citation Information Building a Dataset for Triplet Loss with Keras and TensorFlow In today’s tutorial, we will take the first step toward building our real-time face recognition application. The dataset.py

Data Pipeline

Data Pipeline Deep Learning Deep Learning Python

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Overview: Data science vs data analytics Think of data science as the overarching umbrella that covers a wide range of tasks performed to find patterns in large datasets, structure data for use, train machine learning models and develop artificial intelligence (AI) applications.

Data Science

Data Science Analytics Analytics Data Scientist

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

Generative artificial intelligence (generative AI) has enabled new possibilities for building intelligent systems. Given the data sources, LLMs provided tools that would allow us to build a Q&A chatbot in weeks, rather than what may have taken years previously, and likely with worse performance.

SQL

SQL AWS AI AI

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

JuMa is tightly integrated with a range of BMW Central IT services, including identity and access management, roles and rights management, BMW Cloud Data Hub (BMW’s data lake on AWS) and on-premises databases. Furthermore, the notebooks can be integrated into the corporate Git repositories to collaborate using version control.

ML

ML ML AWS AI

What Does a Data Engineering Job Involve in 2024?

ODSC - Open Data Science

JANUARY 30, 2024

Not only does it involve the process of collecting, storing, and processing data so that it can be used for analysis and decision-making, but these professionals are responsible for building and maintaining the infrastructure that makes this possible; and so much more. Think of data engineers as the architects of the data ecosystem.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

ODSC’s AI Weekly Recap: Week of September 27th

ODSC - Open Data Science

SEPTEMBER 27, 2024

We’re thrilled to announce our first group of Keynote Speakers, representing the groundbreaking AI companies shaking up the industry including Anthropic, Voltron Data, NVIDIA, Google DeepMind, and Microsoft. Develop the tools to build your future in AI at ODSC West. Learn more about this lineup here!

Artificial Intelligence

Artificial Intelligence Artificial Intelligence AI AI

Gen AI 101: Technology Choices (Part 1)

phData

JULY 5, 2024

The release of ChatGPT in late 2022 introduced generative artificial intelligence to the general public and triggered a new wave of AI-oriented companies, products, and open-source projects that provide tools and frameworks to enable enterprise AI.

AI

AI AI Database AWS

Evaluating Siamese Network Accuracy (F1-Score, Precision, and Recall) with Keras and TensorFlow

PyImageSearch

FEBRUARY 5, 2024

Implementing Face Recognition and Verification Given that we want to identify people with id-1021 to id-1024 , we are given 1 image (or a few samples) of each person, which allows us to add the person to our face recognition database. On Lines 40 and 41 , we define the path to our face database (i.e.,

Database

Database Data Pipeline Deep Learning Deep Learning

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

Artificial intelligence (AI) adoption is still in its early stages. The Stanford Institute for Human-Centered Artificial Intelligence’s Center for Research on Foundation Models (CRFM) recently outlined the many risks of foundation models, as well as opportunities. Trustworthiness is critical.

AI

AI AI Data Warehouse ML

Leveraging Large Language Models for Tabular Synthetic Data Generation

Mlearning.ai

JANUARY 15, 2024

The constant evolution of artificial intelligence has opened up exciting new perspectives in the field of natural language processing (NLP). Modeling tabular data distributions with GReaT Generate datasets without training data What is an LLM and how does it work? Outline What is an LLM and how does it work?

Database

Database Artificial Intelligence Artificial Intelligence Natural Language Processing

Triplet Loss with Keras and TensorFlow

Flipboard

MARCH 6, 2023

In the previous tutorial of this series, we built the dataset and data pipeline for our Siamese Network based Face Recognition application. Specifically, we looked at an overview of triplet loss and discussed what kind of data samples are required to train our model with the triplet loss. And that’s exactly what I do.

Deep Learning

Deep Learning Deep Learning Data Pipeline Computer Science

Advanced Snowflake Features in Coalesce

phData

JULY 4, 2024

This blog will cover creating customized nodes in Coalesce, what new advanced features can already be used as nodes, and how to create them as part of your data pipeline. They’re essentially an entire data pipeline within itself. Snowflake even handles the orchestration and scheduling of the refresh.

SQL

SQL Data Pipeline Data Engineering Data Engineer

How Data Observability Helps to Build Trusted Data

Precisely

SEPTEMBER 18, 2023

Data observability is a key element of data operations (DataOps). It enables a big-picture understanding of the health of your organization’s data through continuous AI/ML-enabled monitoring – detecting anomalies throughout the data pipeline and preventing data downtime.

Data Observability

Data Observability Data Quality Data Pipeline DataOps

A Primer to Scaling Pandas

ODSC - Open Data Science

AUGUST 23, 2023

Modin empowers practitioners to use pandas on data at scale, without requiring them to change a single line of code. Modin leverages our cutting-edge academic research on dataframes — the abstraction underlying pandas to bring the best of databases and distributed systems to dataframes. Run operations in pandas - all in Snowflake!

Data Warehouse

Data Warehouse Data Science Database SQL

What Is DataOps? Definition, Principles, and Benefits

Alation

SEPTEMBER 28, 2022

Automated testing to ensure data quality. There are many inefficiencies that riddle a data pipeline and DataOps aims to deal with that. DataOps encourages better collaboration between data professionals and other IT roles. DataOps makes processes more efficient by automating as much of the data pipeline as possible.

DataOps

DataOps Data Pipeline Data Quality Analytics

What is Data Ingestion? Understanding the Basics

Pickl AI

JULY 25, 2024

It’s the critical process of capturing, transforming, and loading data into a centralised repository where it can be processed, analysed, and leveraged. Data Ingestion Meaning At its core, It refers to the act of absorbing data from multiple sources and transporting it to a destination, such as a database, data warehouse, or data lake.

Apache Kafka

Apache Kafka Data Lakes Data Warehouse Data Quality

Unlocking Tabular Data’s Hidden Potential

ODSC - Open Data Science

MAY 10, 2023

Many mistakenly equate tabular data with business intelligence rather than AI, leading to a dismissive attitude toward its sophistication. Standard data science practices could also be contributing to this issue. In practice, tabular data is anything but clean and uncomplicated.

Data Scientist

Data Scientist Data Science Deep Learning Deep Learning

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Introduction ETL plays a crucial role in Data Management. This process enables organisations to gather data from various sources, transform it into a usable format, and load it into data warehouses or databases for analysis. The goal is to retrieve the required data efficiently without overwhelming the source systems.

ETL

ETL Data Warehouse Data Quality Data Governance

Building an Effective OSS Management Layer for Your Data Lake

ODSC - Open Data Science

OCTOBER 13, 2024

She’ll cover the distinct challenges that come with handling different data types and how modern tools can turn what feels like chaos into a manageable, streamlined architecture. Data lakes allow for the ingestion of vast amounts of data — regardless of type or format — without the need for a pre-defined schema.

Data Lakes

Data Lakes Database Data Pipeline SQL

How to Use Fivetran to Ingest Salesforce Data into Snowflake

phData

SEPTEMBER 25, 2024

Under this category, tools with pre-built connectors for popular data sources and visual tools for data transformation are better choices. Integration: How well does the tool integrate with your existing infrastructure, databases, cloud platforms, and analytics tools? What is Fivetran?

ETL

ETL Database Data Warehouse Analytics

The Data Integration Solution Checklist: Top 10 Considerations

Precisely

MAY 13, 2024

If you’re in the market for a data integration solution, there are many things to consider – including the flexibility of integration solutions, the availability of a strong network of service providers, and the vendor’s reputation for thought leadership in the integration space.

Data Governance

Data Governance Data Pipeline Cloud Data Data Quality

AI-Powered ETL Pipeline Orchestration: Multi-Agent Systems in the Era of Generative AI

ODSC - Open Data Science

FEBRUARY 19, 2025

Well according to Brij Kishore Pandey, it stands for Extract, Transform, Load and is a fundamental process in data engineering, ensuring data moves efficiently from raw sources to structured storage for analysis. The stepsinclude: Extraction : Data is collected from multiple sources (databases, APIs, flatfiles).

ETL

ETL AI AI Data Warehouse

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

This unified schema streamlines downstream consumption and analytics because the data follows a standardized schema and new sources can be added with minimal data pipeline changes. After the security log data is stored in Amazon Security Lake, the question becomes how to analyze it.

AWS

AWS ML ML Algorithm

Enable data sharing through federated learning: A policy approach for chief digital officers

AWS Machine Learning Blog

MARCH 15, 2024

Federation learning to save the day (and save lives) For good artificial intelligence (AI), you need good data. Legacy systems, which are frequently found in the federal domain, pose significant data processing challenges before you can derive any intelligence or merge them with newer datasets.

AWS

AWS ML ML Data Silos

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Through workload optimization across multiple query engines and storage tiers, organizations can reduce data warehouse costs by up to 50 percent. 1 Watsonx.data offers built-in governance and automation to get to trusted insights within minutes, and integrations with existing databases and tools to simplify setup and user experience.

AI

AI AI Machine Learning Machine Learning

Securing the data pipeline, from blockchain to AI

The power of remote engine execution for ETL/ELT data pipelines

Webinars

Trending Sources

Designing generative AI workloads for resilience

Webinars

Meet the Seattle-area startups that just graduated from Y Combinator

Discovering the Role of Data Science in a Cloud World

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

A Guide to Choose the Best Data Science Bootcamp

Supercharge your data strategy: Integrate and innovate today leveraging data integration

Improving air quality with generative AI

Navigating the Cloud Modernization Journey: Insights from Precisely’s Partnership with AWS

The 2021 Executive Guide To Data Science and AI

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

11 Open Source Data Exploration Tools You Need to Know in 2023

Find Your AI Solutions at the ODSC West AI Expo

Build trust in banking with data lineage

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Step-by-step guide: Generative AI for your business

Building a Dataset for Triplet Loss with Keras and TensorFlow

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Data science vs data analytics: Unpacking the differences

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

What Does a Data Engineering Job Involve in 2024?

ODSC’s AI Weekly Recap: Week of September 27th

Gen AI 101: Technology Choices (Part 1)

Evaluating Siamese Network Accuracy (F1-Score, Precision, and Recall) with Keras and TensorFlow

How to use foundation models and trusted governance to manage AI workflow risk

Leveraging Large Language Models for Tabular Synthetic Data Generation

Triplet Loss with Keras and TensorFlow

Advanced Snowflake Features in Coalesce

How Data Observability Helps to Build Trusted Data

A Primer to Scaling Pandas

What Is DataOps? Definition, Principles, and Benefits

What is Data Ingestion? Understanding the Basics

Unlocking Tabular Data’s Hidden Potential

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Building an Effective OSS Management Layer for Your Data Lake

How to Use Fivetran to Ingest Salesforce Data into Snowflake

The Data Integration Solution Checklist: Top 10 Considerations

AI-Powered ETL Pipeline Orchestration: Multi-Agent Systems in the Era of Generative AI

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

Enable data sharing through federated learning: A policy approach for chief digital officers

Exploring the AI and data capabilities of watsonx

Stay Connected