Artificial Intelligence, Data Pipeline and SQL

Securing the data pipeline, from blockchain to AI

Dataconomy

OCTOBER 8, 2024

Generative artificial intelligence is the talk of the town in the technology world today. These challenges are primarily due to how data is collected, stored, moved and analyzed. With most AI models, their training data will come from hundreds of different sources, any one of which could present problems.

Data Pipeline

Data Pipeline AI AI Data Warehouse

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Data engineers build data pipelines, which are called data integration tasks or jobs, as incremental steps to perform data operations and orchestrate these data pipelines in an overall workflow. With a multicloud data strategy, organizations need to optimize for data gravity and data locality.

Data Pipeline

Data Pipeline ETL SQL Database

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

Webinars

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

OMRONs data strategyrepresented on ODAPalso allowed the organization to unlock generative AI use cases focused on tangible business outcomes and enhanced productivity. This tool democratizes data access across the organization, enabling even nontechnical users to gain valuable insights.

AWS

AWS Data Governance Data Silos SQL

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

Generative artificial intelligence (generative AI) has enabled new possibilities for building intelligent systems. Given the data sources, LLMs provided tools that would allow us to build a Q&A chatbot in weeks, rather than what may have taken years previously, and likely with worse performance.

SQL

SQL AWS AI AI

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data Processing and Analysis : Techniques for data cleaning, manipulation, and analysis using libraries such as Pandas and Numpy in Python. Databases and SQL : Managing and querying relational databases using SQL, as well as working with NoSQL databases like MongoDB.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

With all this packaged into a well-governed platform, Snowflake continues to set the standard for data warehousing and beyond. Snowflake supports data sharing and collaboration across organizations without the need for complex data pipelines.

Machine Learning

Machine Learning Machine Learning Data Science ML

ODSC West 2023 Recap in Pictures

ODSC - Open Data Science

DECEMBER 5, 2023

We had bigger sessions on getting started with machine learning or SQL, up to advanced topics in NLP, and of course, plenty related to large language models and generative AI. You can see our photos from the event here , and be sure to follow our YouTube for virtual highlights from the conference as well.

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

Advanced Snowflake Features in Coalesce

phData

JULY 4, 2024

Because it runs Snowflake SQL from an easy-to-use, code-first GUI interface, it can take advantage of everything Snowflake offers, even if the feature is brand new. This blog will cover creating customized nodes in Coalesce, what new advanced features can already be used as nodes, and how to create them as part of your data pipeline.

SQL

SQL Data Pipeline Data Engineering Data Engineering

The 2021 Executive Guide To Data Science and AI

Applied Data Science

AUGUST 2, 2021

Automation Automating data pipelines and models ➡️ 6. The most common data science languages are Python and R — SQL is also a must have skill for acquiring and manipulating data. The Data Engineer Not everyone working on a data science project is a data scientist.

Data Science

Data Science Data Scientist ML ML

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

ODSC - Open Data Science

FEBRUARY 17, 2023

Cloud Computing, APIs, and Data Engineering NLP experts don’t go straight into conducting sentiment analysis on their personal laptops. Data Engineering Platforms Spark is still the leader for data pipelines but other platforms are gaining ground. Knowing some SQL is also essential.

Deep Learning

Deep Learning Deep Learning Data Science Natural Language Processing

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Flipboard

NOVEMBER 24, 2023

This use case highlights how large language models (LLMs) are able to become a translator between human languages (English, Spanish, Arabic, and more) and machine interpretable languages (Python, Java, Scala, SQL, and so on) along with sophisticated internal reasoning.

Database

Database AWS ETL SQL

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

In order to train a model using data stored outside of the three supported storage services, the data first needs to be ingested into one of these services (typically Amazon S3). This requires building a data pipeline (using tools such as Amazon SageMaker Data Wrangler ) to move data into Amazon S3.

ML

ML ML AWS Python

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage. Data Visualization: Matplotlib, Seaborn, Tableau, etc.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Journeying into the realms of ML engineers and data scientists

Dataconomy

MAY 16, 2023

With the explosion of big data and advancements in computing power, organizations can now collect, store, and analyze massive amounts of data to gain valuable insights. Machine learning, a subset of artificial intelligence , enables systems to learn and improve from data without being explicitly programmed.

Data Scientist

Data Scientist ML ML Machine Learning

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Computer Science and Computer Engineering Similar to knowing statistics and math, a data scientist should know the fundamentals of computer science as well. While knowing Python, R, and SQL are expected, you’ll need to go beyond that. Big Data As datasets become larger and more complex, knowing how to work with them will be key.

Data Science

Data Science Data Scientist Computer Science Computer Science

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Great Expectations provides support for different data backends such as flat file formats, SQL databases, Pandas dataframes and Sparks, and comes with built-in notification and data documentation functionality. VisiData works with CSV files, Excel spreadsheets, SQL databases, and many other data sources.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Overview: Data science vs data analytics Think of data science as the overarching umbrella that covers a wide range of tasks performed to find patterns in large datasets, structure data for use, train machine learning models and develop artificial intelligence (AI) applications.

Data Science

Data Science Analytics Analytics Data Scientist

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 12, 2024

As the scale and complexity of data handled by organizations increase, traditional rules-based approaches to analyzing the data alone are no longer viable. This is achieved by using the pipeline to transfer data from a Splunk index into an S3 bucket, where it will be cataloged.

ML

ML ML AWS AI

A Primer to Scaling Pandas

ODSC - Open Data Science

AUGUST 23, 2023

That’s a problem when you’re trying to work with that data in pandas because you have to pull the dataset into the memory of your machine, which can be slow, expensive, and lead to fatal out-of-memory issues. Ponder solves this problem by translating your pandas code to SQL that can be understood by your data warehouse.

Data Warehouse

Data Warehouse Data Science Database SQL

AI-Powered Digital Transformation: Get Your Data and AI Ready

Precisely

AUGUST 15, 2024

Create an AI-driven data and process improvement loop to continuously enhance your business operations. An increasing number of GenAI tools use large language models that automate key data engineering, governance, and master data management tasks.

AI

AI AI Data Quality Data Engineering

Manufacturing Questions phData Can Answer with Data

phData

JULY 18, 2024

Large manufacturers are starting to use computer vision artificial intelligence (AI) to detect defects cheaper and more efficiently than using human eyes. Detecting product defects can be time-consuming, costly (both for paying people to catch them and if you don’t catch them soon enough), and tedious.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Engineering

ETL Process Explained: Essential Steps for Effective Data Management

Pickl AI

OCTOBER 17, 2024

Databases These are structured data collections managed by Database Management Systems (DBMS). Organisations often extract data from relational databases like MySQL, Oracle, or SQL Server to facilitate analysis. It consolidates data into a single source, facilitating accurate reporting and analytics.

ETL

ETL Data Warehouse SQL Data Quality

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

Generative AI can be used to automate the data modeling process by generating entity-relationship diagrams or other types of data models and assist in UI design process by generating wireframes or high-fidelity mockups. GPT-4 Data Pipelines: Transform JSON to SQL Schema Instantly Blockstream’s public Bitcoin API.

AI

AI AI Data Analysis Data Analysis

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

This individual is responsible for building and maintaining the infrastructure that stores and processes data; the kinds of data can be diverse, but most commonly it will be structured and unstructured data. They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

SEPTEMBER 24, 2024

An optional CloudFormation stack to deploy a data pipeline to enable a conversation analytics dashboard. Choose an option for allowing unredacted logs for the Lambda function in the data pipeline. This allows you to control which IAM principals are allowed to decrypt the data and view it. For testing, choose yes.

AWS

AWS AI AI Analytics

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Real-time analytics and BI: Combine data from existing sources with new data to unlock new, faster insights without the cost and complexity of duplicating and moving data across different environments. Presto engine: Incorporates the latest performance enhancements to the Presto query engine.

AI

AI AI Machine Learning Machine Learning

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

It’s distributed both in the cloud and on-premises, allowing extensive use and movement across clouds, apps and networks, as well as stores of data at rest. An architecture designed for data democratization aims to be flexible, integrated, agile and secure to enable the use of data and artificial intelligence (AI) at scale.

Data Lakes

Data Lakes AI AI Data Governance

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Automation Automation plays a pivotal role in streamlining ETL processes, reducing the need for manual intervention, and ensuring consistent data availability. By automating key tasks, organisations can enhance efficiency and accuracy, ultimately improving the quality of their data pipelines.

ETL

ETL Data Warehouse Data Quality Data Governance

Building an Effective OSS Management Layer for Your Data Lake

ODSC - Open Data Science

OCTOBER 13, 2024

Open table formats, such as Delta Lake or Apache Iceberg, add a crucial metadata layer over raw data files, allowing us to manage schema, enforce transactions, and even track changes to datasets over time. Lastly, data version control systems, like lakeFS, allow for a Git-like approach to managing data.

Data Lakes

Data Lakes Database Data Pipeline SQL

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Furthermore, we’ve developed data encryption and governance solutions for HPCC Systems to help secure data, ensure it is only accessed by appropriate personnel, and to create audit trails to ensure data security SLAs and regulations are met. It truly is an all-in-one data lake solution. Tell me more about ECL.

Data Lakes

Data Lakes Clustering Big Data Big Data

What Industries are Hiring for Different Jobs in AI

ODSC - Open Data Science

APRIL 26, 2023

Though scripted languages such as R and Python are at the top of the list of required skills for a data analyst, Excel is still one of the most important tools to be used. Because they are the most likely to communicate data insights, they’ll also need to know SQL, and visualization tools such as Power BI and Tableau as well.

Data Analyst

Data Analyst Machine Learning Machine Learning Power BI

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

However, a master’s degree or specialised Data Science or Machine Learning courses can give you a competitive edge, offering advanced knowledge and practical experience. Essential Technical Skills Technical proficiency is at the heart of an Azure Data Scientist’s role.

Azure

Azure Data Scientist Data Science Machine Learning

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

A legacy data stack usually refers to the traditional relational database management system (RDBMS), which uses a structured query language (SQL) to store and process data. While an RDBMS can still be used in a modern data stack, it is not as common because it is not as well-suited for managing big data.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

Driving Progress with Open Data Science: Trends, Tools, and Opportunities

ODSC - Open Data Science

DECEMBER 9, 2024

Yet despite these rich capabilities, challenges stillarise The Fragmentation Challenge With so many modular open-source libraries and frameworks now available, effectively stitching together coherent data science application workflows poses a frequent headache for practitioners.

Data Science

Data Science Machine Learning Machine Learning Python

Announcing the ODSC West 2023 Preliminary Schedule

ODSC - Open Data Science

SEPTEMBER 20, 2023

Register now while tickets are 50% off. Prices go up Friday!

Data Wrangling

Data Wrangling Data Science Machine Learning Machine Learning

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

ODSC - Open Data Science

JANUARY 7, 2025

Computer Science and Computer Engineering Similar to knowing statistics and math, a data scientist should know the fundamentals of computer science as well. While knowing Python, R, and SQL is expected, youll need to go beyond that. Employers arent just looking for people who can program.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Support for Numerous Data Sources: Fivetran supports over 200 data sources, including popular databases, applications, and cloud platforms like Salesforce, Google Analytics, SQL Server, Snowflake, and many more. Additionally, unsupported data sources can be integrated using Fivetran’s cloud function connectors.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

I have worked with customers where R and SQL were the first-class languages of their data science community. You don’t need a bigger boat : The repository curated by Jacopo Tagliabue shows how several (mostly open-source) tools can be effectively combined together to run data pipelines at scale with very small teams.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

AWS Machine Learning Blog

OCTOBER 24, 2024

The agent can generate SQL queries using natural language questions using a database schema DDL (data definition language for SQL) and execute them against a database instance for the database tier. Make sure to add a semicolon after the end of the SQL statement generated. Create, invoke, test, and deploy the agent.

AWS

AWS SQL Database AI

Accelerate analysis and discovery of cancer biomarkers with Amazon Bedrock Agents

AWS Machine Learning Blog

NOVEMBER 19, 2024

As shown in the preceding figure, we define our solution to include planning and reasoning with multiple tools including: Biomarker query engine : Convert natural language questions to SQL statements and execute on an Amazon Redshift database of biomarkers. We reuse the data pipelines described in this blog post.

SQL

SQL Database Data Analysis Data Analysis

Securing the data pipeline, from blockchain to AI

The power of remote engine execution for ETL/ELT data pipelines

Webinars

Trending Sources

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Webinars

Shaping the future: OMRON’s data-driven journey with AWS

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

A Guide to Choose the Best Data Science Bootcamp

How Dataiku and Snowflake Strengthen the Modern Data Stack

ODSC West 2023 Recap in Pictures

Advanced Snowflake Features in Coalesce

The 2021 Executive Guide To Data Science and AI

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Use Snowflake as a data source to train ML models with Amazon SageMaker

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Journeying into the realms of ML engineers and data scientists

40 Must-Know Data Science Skills and Frameworks for 2023

11 Open Source Data Exploration Tools You Need to Know in 2023

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Data science vs data analytics: Unpacking the differences

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

A Primer to Scaling Pandas

AI-Powered Digital Transformation: Get Your Data and AI Ready

Manufacturing Questions phData Can Answer with Data

ETL Process Explained: Essential Steps for Effective Data Management

Generative AI in Software Development

How to Shift from Data Science to Data Engineering

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

Exploring the AI and data capabilities of watsonx

Data democratization: How data architecture can drive business decisions and AI initiatives

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Building an Effective OSS Management Layer for Your Data Lake

Drowning in Data? A Data Lake May Be Your Lifesaver

What Industries are Hiring for Different Jobs in AI

Your Complete Roadmap to Become an Azure Data Scientist

The Modern Data Stack Explained: What The Future Holds

Driving Progress with Open Data Science: Trends, Tools, and Opportunities

Announcing the ODSC West 2023 Preliminary Schedule

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

The Ultimate Modern Data Stack Migration Guide

Definite Guide to Building a Machine Learning Platform

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

Accelerate analysis and discovery of cancer biomarkers with Amazon Bedrock Agents

Stay Connected