Data Pipeline and Natural Language Processing

Streaming Langchain: Real-time Data Processing with AI

Data Science Dojo

NOVEMBER 25, 2024

As the world becomes more interconnected and data-driven, the demand for real-time applications has never been higher. Artificial intelligence (AI) and natural language processing (NLP) technologies are evolving rapidly to manage live data streams.

AI

AI AI Predictive Analytics Python

Innovations in Analytics: Elevating Data Quality with GenAI

Towards AI

OCTOBER 31, 2024

GenAI can help by automatically clustering similar data points and inferring labels from unlabeled data, obtaining valuable insights from previously unusable sources. Natural Language Processing (NLP) is an example of where traditional methods can struggle with complex text data.

Data Quality

Data Quality Analytics Analytics Clean Data

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

ODSC - Open Data Science

FEBRUARY 17, 2023

Natural language processing (NLP) has been growing in awareness over the last few years, and with the popularity of ChatGPT and GPT-3 in 2022, NLP is now on the top of peoples’ minds when it comes to AI. Data Engineering Platforms Spark is still the leader for data pipelines but other platforms are gaining ground.

Data Science

Data Science Deep Learning Deep Learning Natural Language Processing

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

10 Data Engineering Topics and Trends You Need to Know in 2024

ODSC - Open Data Science

JANUARY 9, 2024

Data Engineering for Large Language Models LLMs are artificial intelligence models that are trained on massive datasets of text and code. They are used for a variety of tasks, such as natural language processing, machine translation, and summarization.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

What is the Pile Dataset

Pickl AI

DECEMBER 25, 2024

By understanding its significance, readers can grasp how it empowers advancements in AI and contributes to cutting-edge innovation in natural language processing. Its diverse content includes academic papers, web data, books, and code. Frequently Asked Questions What is the Pile dataset?

Natural Language Processing

Natural Language Processing Machine Learning Machine Learning AI

Meet the Seattle-area startups that just graduated from Y Combinator

Flipboard

SEPTEMBER 25, 2023

We borrow proven techniques from the latest in NLP (natural language processing) academia to build evaluation tooling that any software engineer can use. Devs shouldn’t be neck-deep in evaluation pipelines just to test their software, so we solve that complexity for them. What’s your secret sauce?

Data Pipeline

Data Pipeline AI AI Natural Language Processing

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rajesh Nedunuri is a Senior Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team. He specializes in designing, building, and optimizing large-scale data solutions.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

5 Ways Where Data-Driven Analytics Reshaped The Software Industry

Smart Data Collective

FEBRUARY 3, 2022

It uses machine learning and natural language processing technology to improve data matching. The reusability feature will help in data management and analytics, further maintaining the data pipeline. With the help of explainability, businesses can easily understand what the output would be.

Analytics

Analytics Analytics Machine Learning Machine Learning

The 2021 Executive Guide To Data Science and AI

Applied Data Science

AUGUST 2, 2021

Automation Automating data pipelines and models ➡️ 6. First, let’s explore the key attributes of each role: The Data Scientist Data scientists have a wealth of practical expertise building AI systems for a range of applications. The Data Engineer Not everyone working on a data science project is a data scientist.

Data Science

Data Science Data Scientist ML ML

10 highest-paying AI jobs and careers in 2024

Data Science Dojo

APRIL 16, 2024

Natural language processing (NLP) engineer Potential pay range – US$164,000 to 267,000/yr As the name suggests, these professionals specialize in building systems for processing human language, like large language models (LLMs).

AI

AI AI Machine Learning Machine Learning

6 Remote AI Jobs to Look for in 2024

ODSC - Open Data Science

DECEMBER 19, 2023

Data Engineer Data engineers are responsible for the end-to-end process of collecting, storing, and processing data. They use their knowledge of data warehousing, data lakes, and big data technologies to build and maintain data pipelines.

Data Scientist

Data Scientist Machine Learning Machine Learning AI

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

Amazon Kendra uses natural language processing (NLP) to understand user queries and find the most relevant documents. For our final structured and unstructured data pipeline, we observe Anthropic’s Claude 2 on Amazon Bedrock generated better overall results for our final data pipeline.

SQL

SQL AWS AI AI

Explain text classification model predictions using Amazon SageMaker Clarify

AWS Machine Learning Blog

JANUARY 25, 2023

Apart from supporting explanations for tabular data, Clarify also supports explainability for both computer vision (CV) and natural language processing (NLP) using the same SHAP algorithm. Solution overview SageMaker algorithms have fixed input and output data formats.

Algorithm

Algorithm Natural Language Processing Machine Learning Machine Learning

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data Engineering : Building and maintaining data pipelines, ETL (Extract, Transform, Load) processes, and data warehousing. Artificial Intelligence : Concepts of AI include neural networks, natural language processing (NLP), and reinforcement learning.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Learn more The Best Tools, Libraries, Frameworks and Methodologies that ML Teams Actually Use – Things We Learned from 41 ML Startups [ROUNDUP] Key use cases and/or user journeys Identify the main business problems and the data scientist’s needs that you want to solve with ML, and choose a tool that can handle them effectively.

Machine Learning

Machine Learning Machine Learning ML ML

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

AWS Machine Learning Blog

NOVEMBER 1, 2023

In this post, Reveal experts showcase how they used Amazon Comprehend in their document processing pipeline to detect and redact individual pieces of PII. Amazon Comprehend is a fully managed and continuously trained natural language processing (NLP) service that can extract insight about the content of a document or text.

AWS

AWS Machine Learning Machine Learning ML

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Elementl / Dagster Labs Elementl and Dagster Labs are both companies that provide platforms for building and managing data pipelines. Elementl’s platform is designed for data engineers, while Dagster Labs’ platform is designed for data scientists. However, there are some critical differences between the two companies.

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

Primary activities AIOps relies on big data-driven analytics , ML algorithms and other AI-driven techniques to continuously track and analyze ITOps data. The process includes activities such as anomaly detection, event correlation, predictive analytics, automated root cause analysis and natural language processing (NLP).

Big Data

Big Data Big Data ML ML

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

SEPTEMBER 24, 2024

An optional CloudFormation stack to deploy a data pipeline to enable a conversation analytics dashboard. Choose an option for allowing unredacted logs for the Lambda function in the data pipeline. This allows you to control which IAM principals are allowed to decrypt the data and view it. For testing, choose yes.

AWS

AWS AI AI Analytics

Gain an AI Advantage with Data Governance and Quality

Precisely

AUGUST 29, 2024

Key Takeaways Data quality ensures your data is accurate, complete, reliable, and up to date – powering AI conclusions that reduce costs and increase revenue and compliance. Data observability continuously monitors data pipelines and alerts you to errors and anomalies.

Data Governance

Data Governance Data Quality Data Observability AI

Robust time series forecasting with MLOps on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 28, 2023

He builds machine learning pipelines and recommendation systems for product recommendations on the Detail Page. Maria Masood specializes in building data pipelines and data visualizations at AWS Commerce Platform. Outside of work, he enjoys game development and rock climbing.

AWS

AWS Machine Learning Machine Learning ML

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

Foundation models: The power of curated datasets Foundation models , also known as “transformers,” are modern, large-scale AI models trained on large amounts of raw, unlabeled data. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence.

AI

AI AI Data Warehouse ML

The Top LLMs and AI Tools in 2024 So Far

ODSC - Open Data Science

MAY 9, 2024

From state-of-the-art language models to innovative AI-driven applications, to new open-source models hoping to take away GPT’s crown, let’s take a tour of some of the most notable AI tools and top LLMs that are working to shape how 2024 concludes, and how AI will shape the future.

Machine Learning

Machine Learning Machine Learning AI AI

Snowflake Cortex vs. Snowpark – What’s the difference?

phData

MAY 28, 2024

Cortex offers a collection of ready-to-use models for common use cases, with capabilities broken into two categories: Cortex LLM functions provide Generative AI capabilities for natural language processing, including completion (prompting) , translation, summarization, sentiment analysis , and vector embeddings.

Machine Learning

Machine Learning Machine Learning Data Engineering Data Engineering

Pioneering computer vision: Aleksandr Timashov, ML developer

Dataconomy

AUGUST 22, 2024

We developed a custom data pipeline to handle the immense volume of visual data, resulting in significant cost savings and reduced human exposure to hazardous environments. This unprecedented project in Malaysia required creating robust models to identify defects in diverse industrial equipment under varying conditions.

ML

ML ML Machine Learning Machine Learning

How to Automate Document Processing with Snowflake’s Document AI

phData

APRIL 5, 2024

Data pipelines can be set up in Snowflake using stages , streams, and tasks to automate the continuous process of uploading documents, extracting information, and loading them into destination tables. Large Language Models Snowflake Document AI is powered by a first-party Large Language Model (LLM).

AI

AI AI Natural Language Processing Tableau

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

AWS Machine Learning Blog

APRIL 19, 2023

It has intuitive helpers and utilities for modalities like computer vision, natural language processing, audio, time series, and tabular data. About the authors Fred Wu is a Senior Data Engineer at Sportradar, where he leads infrastructure, DevOps, and data engineering efforts for various NBA and NFL products.

ML

ML ML Deep Learning Deep Learning

Build a generative AI Slack chat assistant using Amazon Bedrock and Amazon Kendra

AWS Machine Learning Blog

OCTOBER 7, 2024

By using the natural language processing and generation capabilities of generative AI, the chat assistant can understand user queries, retrieve relevant information from various data sources, and provide tailored, contextual responses. Mohamed Mohamud is a Partner Solutions Architect with a focus on Data Analytics.

AWS

AWS AI AI Natural Language Processing

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

This allows users to accomplish different Natural Language Processing (NLP) functional tasks and take advantage of IBM vetted pre-trained open-source foundation models. Encoder-decoder and decoder-only large language models are available in the Prompt Lab today. To bridge the tuning gap, watsonx.ai

AI

AI AI Machine Learning Machine Learning

Manufacturing Questions phData Can Answer with Data

phData

JULY 18, 2024

A startup food manufacturer was utilizing social media data to track trends and find niche markets for developing new products. The marketing team spent weeks analyzing spreadsheets of TikTok and Twitter data. The Results The manufacturer quickly released two new product lines to capitalize on growing food trends.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Engineering

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

DataRobot Blog

SEPTEMBER 13, 2022

Once an organization has identified its AI use cases , data scientists informally explore methodologies and solutions relevant to the business’s needs in the hunt for proofs of concept. These might include—but are not limited to—deep learning, image recognition and natural language processing.

Data Scientist

Data Scientist ML ML AI

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

AWS Machine Learning Blog

DECEMBER 6, 2023

Overview of RAG RAG solutions are inspired by representation learning and semantic search ideas that have been gradually adopted in ranking problems (for example, recommendation and search) and natural language processing (NLP) tasks since 2010. Inside the frontend/streamlit-ui folder, run bash run-streamlit-ui.sh.

SQL

SQL AWS Analytics Analytics

ODSC East 2025 Bootcamp: Your Gateway to AI and Beyond

ODSC - Open Data Science

DECEMBER 17, 2024

Cutting-Edge Topics for EveryInterest Unlike many other AI bootcamps, ODSC East is designed to cover a wide range of trending topics in data science, ensuring theres something for everyone. We also place a heavy emphasis on the biggest topics in AI like LLMs, RAG, AI agents, and other things defining todays AI landscape.

Data Science

Data Science Data Scientist AI AI

Five benefits of a data catalog

IBM Journey to AI blog

DECEMBER 16, 2022

Data professionals such as data engineers, data scientists, data analysts and data stewards benefit from these self-service data catalog tools that allow for self-service analytics, data discovery, and metadata management.

Data Quality

Data Quality Data Governance Data Scientist Data Wrangling

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

ODSC - Open Data Science

JANUARY 7, 2025

Natural Language Processing (NLP) has emerged as a dominant area, with tasks like sentiment analysis, machine translation, and chatbot development leading the way. Data Engineering Data engineering remains integral to many data science roles, with workflow pipelines being a key focus.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

Taking the First Steps Toward Enterprise AI

phData

JUNE 7, 2023

DL is particularly effective in processing large amounts of unstructured data, such as images, audio, and text. Natural Language Processing (NLP) : NLP is a branch of AI that deals with the interaction between computers and human languages.

AI

AI AI Machine Learning Machine Learning

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 12, 2024

This is achieved by using the pipeline to transfer data from a Splunk index into an S3 bucket, where it will be cataloged. The approach is shown in the following diagram. In this example, we use the service to classify whether a patient is likely to be admitted to a hospital over the next 30 days based on the combined dataset.

ML

ML ML AWS AI

MLOps and the evolution of data science

IBM Journey to AI blog

AUGUST 11, 2023

How to use ML to automate the refining process into a cyclical ML process. MLOps vs. AIOps AIOps , or artificial intelligence for IT operations, uses AI capabilities, such as natural language processing and ML models, to automate and streamline operational workflows. How MLOps will be used within the organization.

Data Science

Data Science Machine Learning Machine Learning ML

Leveraging Large Language Models for Tabular Synthetic Data Generation

Mlearning.ai

JANUARY 15, 2024

The constant evolution of artificial intelligence has opened up exciting new perspectives in the field of natural language processing (NLP). At the heart of this technological revolution are Large Language Models (LLMs), deep learning models capable of understanding and generating text remarkably smoothly and accurately.

Database

Database Artificial Intelligence Artificial Intelligence Natural Language Processing

How to Create a Fan 360 Profile with Snowflake & Fivetran

phData

DECEMBER 12, 2023

Integrating natural language processing capabilities allows for more human-like interactions, enhancing the overall fan experience. Check out this eye-opening story of how the Denver Broncos leveraged a Fan 360 approach to please fans and consistently sell out tickets with efficient data insights.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Tableau

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

“You need to find a place to park your data. It needs to be optimized for the type of data and the format of the data you have,” he said. By optimizing every part of the data pipeline, he said, “You will, as a result, get your models to market faster.”

AI

AI AI Data Scientist Machine Learning

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

“You need to find a place to park your data. It needs to be optimized for the type of data and the format of the data you have,” he said. By optimizing every part of the data pipeline, he said, “You will, as a result, get your models to market faster.”

AI

AI AI Data Scientist Machine Learning

Self-Service BI: A Case of Trust Working Both Ways?

Alation

MARCH 31, 2022

And with interfaces potentially ranging from SQL editors to natural language processing, self-service for all may require a chameleon or suite of solutions to suit the individual. Regardless, all variations require the foundational data to be: Discoverable. The interface adapts to the unique needs of the persona.

Business Intelligence

Business Intelligence Business Intelligence Data Warehouse Data Scientist

Streaming Langchain: Real-time Data Processing with AI

Innovations in Analytics: Elevating Data Quality with GenAI

Webinars

Trending Sources

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

Webinars

10 Data Engineering Topics and Trends You Need to Know in 2024

What is the Pile Dataset

Meet the Seattle-area startups that just graduated from Y Combinator

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

5 Ways Where Data-Driven Analytics Reshaped The Software Industry

The 2021 Executive Guide To Data Science and AI

10 highest-paying AI jobs and careers in 2024

6 Remote AI Jobs to Look for in 2024

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

Explain text classification model predictions using Amazon SageMaker Clarify

A Guide to Choose the Best Data Science Bootcamp

MLOps Landscape in 2023: Top Tools and Platforms

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

Find Your AI Solutions at the ODSC West AI Expo

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

Gain an AI Advantage with Data Governance and Quality

Robust time series forecasting with MLOps on Amazon SageMaker

How to use foundation models and trusted governance to manage AI workflow risk

The Top LLMs and AI Tools in 2024 So Far

Snowflake Cortex vs. Snowpark – What’s the difference?

Pioneering computer vision: Aleksandr Timashov, ML developer

How to Automate Document Processing with Snowflake’s Document AI

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

Build a generative AI Slack chat assistant using Amazon Bedrock and Amazon Kendra

Exploring the AI and data capabilities of watsonx

Manufacturing Questions phData Can Answer with Data

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

How to Manage Unstructured Data in AI and Machine Learning Projects

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

ODSC East 2025 Bootcamp: Your Gateway to AI and Beyond

Five benefits of a data catalog

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

Taking the First Steps Toward Enterprise AI

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

MLOps and the evolution of data science

Leveraging Large Language Models for Tabular Synthetic Data Generation

How to Create a Fan 360 Profile with Snowflake & Fivetran

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Self-Service BI: A Case of Trust Working Both Ways?

Stay Connected