Big Data, Data Lakes and ML - Data Science Current

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Flipboard

NOVEMBER 22, 2024

This post is part of an ongoing series about governing the machine learning (ML) lifecycle at scale. This post dives deep into how to set up data governance at scale using Amazon DataZone for the data mesh. The data mesh is a modern approach to data management that decentralizes data ownership and treats data as a product.

Data Governance

Data Governance ML ML Data Lakes

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries.

ML

ML ML AWS Data Lakes

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. However, with the help of AI and machine learning (ML), new software tools are now available to unearth the value of unstructured data. These services write the output to a data lake.

AWS

AWS ML ML Analytics

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

It’s been one decade since the “ Big Data Era ” began (and to much acclaim!). Analysts asked, What if we could manage massive volumes and varieties of data? Yet the question remains: How much value have organizations derived from big data? Big Data as an Enabler of Digital Transformation.

Big Data

Big Data Big Data Apache Kafka Data Lakes

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

With that, the need for data scientists and machine learning (ML) engineers has grown significantly. Data scientists and ML engineers require capable tooling and sufficient compute for their work. Data scientists and ML engineers require capable tooling and sufficient compute for their work.

ML

ML ML AWS AI

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data. Choose Continue.

SQL

SQL AWS Data Lakes AI

Real-Time ML with Spark and SBERT, AI Coding Assistants, Data Lake Vendors, and ODSC East…

ODSC - Open Data Science

JUNE 1, 2023

Real-Time ML with Spark and SBERT, AI Coding Assistants, Data Lake Vendors, and ODSC East Highlights Getting Up to Speed on Real-Time Machine Learning with Spark and SBERT Learn more about real-time machine learning by using this approach that uses Apache Spark and SBERT. Well, these libraries will give you a solid start.

Data Lakes

Data Lakes ML ML Citizen Data Scientist

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Flipboard

JUNE 26, 2023

Companies are faced with the daunting task of ingesting all this data, cleansing it, and using it to provide outstanding customer experience. Typically, companies ingest data from multiple sources into their data lake to derive valuable insights from the data. Run the AWS Glue ML transform job.

AWS

AWS ML ML ETL

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

Starting today, you can interactively prepare large datasets, create end-to-end data flows, and invoke automated machine learning (AutoML) experiments on petabytes of data—a substantial leap from the previous 5 GB limit. Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data.

ML

ML ML Data Preparation AWS

8 Data Lake Vendors to Make Your Data Life Easier in 2023

ODSC - Open Data Science

JUNE 7, 2023

To make your data management processes easier, here’s a primer on data lakes, and our picks for a few data lake vendors worth considering. What is a data lake? First, a data lake is a centralized repository that allows users or an organization to store and analyze large volumes of data.

Data Lakes

Data Lakes Azure Data Warehouse Hadoop

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Flipboard

NOVEMBER 17, 2023

Amazon SageMaker enables enterprises to build, train, and deploy machine learning (ML) models. Amazon SageMaker JumpStart provides pre-trained models and data to help you get started with ML. MongoDB vector data store MongoDB Atlas Vector Search is a new feature that allows you to store and search vector data in MongoDB.

K-nearest Neighbors

K-nearest Neighbors AWS Clustering Database

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. Enterprises can use no-code ML solutions to streamline their operations and optimize their decision-making without extensive administrative overhead.

Machine Learning

Machine Learning Machine Learning Data Governance ML

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

This post, part of the Governing the ML lifecycle at scale series ( Part 1 , Part 2 , Part 3 ), explains how to set up and govern a multi-account ML platform that addresses these challenges. An enterprise might have the following roles involved in the ML lifecycles. This ML platform provides several key benefits.

ML

ML ML Data Scientist AWS

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

DVC Released in 2017, Data Version Control ( DVC for short) is an open-source tool created by iterative. DVC can be used for versioning data and models, to track experiments and compare any data, code, parameters models and graphical plots of performance. DVC can efficiently handle large files and machine learning models.

Machine Learning

Machine Learning Machine Learning Data Lakes Database

Why companies need to accelerate data warehousing solution modernization

IBM Journey to AI blog

APRIL 24, 2023

By running reports on historical data, a data warehouse can clarify what systems and processes are working and what methods need improvement. Data warehouse is the base architecture for artificial intelligence and machine learning (AI/ML) solutions as well. Modern data warehousing technology can handle all data forms.

Data Warehouse

Data Warehouse Data Lakes Database Big Data

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Just as a writer needs to know core skills like sentence structure, grammar, and so on, data scientists at all levels should know core data science skills like programming, computer science, algorithms, and so on. As MLOps become more relevant to ML demand for strong software architecture skills will increase as well.

Data Science

Data Science Data Scientist Computer Science Computer Science

Achieve your AI goals with an open data lakehouse approach

IBM Journey to AI blog

OCTOBER 4, 2023

A data lakehouse architecture combines the performance of data warehouses with the flexibility of data lakes, to address the challenges of today’s complex data landscape and scale AI. How does an open data lakehouse architecture support AI? All of this supports the use of AI.

Data Lakes

Data Lakes Data Warehouse AI AI

Generate financial industry-specific insights using generative AI and in-context fine-tuning

AWS Machine Learning Blog

NOVEMBER 12, 2024

In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon. He is focused on Big Data, Data Lakes, Streaming and batch Analytics services and generative AI technologies.

SQL

SQL AWS AI AI

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. Data is frequently kept in data lakes that can be managed by AWS Lake Formation , giving you the ability to implement fine-grained access control using a straightforward grant or revoke procedure.

AWS

AWS Data Lakes Clustering Data Preparation

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts.

AWS

AWS ML ML Machine Learning

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

AWS Machine Learning Blog

DECEMBER 4, 2023

We capitalized on the powerful tools provided by AWS to tackle this challenge and effectively navigate the complex field of machine learning (ML) and predictive analytics. SageMaker is a fully managed ML service. This was a crucial aspect in achieving agility in our operations and a seamless integration of our ML efforts.

AWS

AWS Predictive Analytics ML ML

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. Well, let’s find out. Artificial intelligence (AI).

AWS

AWS Cloud Computing Data Lakes Database

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Previously, he was a Data & Machine Learning Engineer at AWS, where he worked closely with customers to develop enterprise-scale data infrastructure, including data lakes, analytics dashboards, and ETL pipelines.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

By using these capabilities, businesses can efficiently store, manage, and analyze time-series data, enabling data-driven decisions and gaining a competitive edge. If you need an automated workflow or direct ML model integration into apps, Canvas forecasting functions are accessible through APIs.

Clustering

Clustering AWS Database ML

What is Snowpark — and Why Does it Matter? A phData Perspective

phData

SEPTEMBER 20, 2023

On the client side, Snowpark consists of libraries, including the DataFrame API and native Snowpark machine learning (ML) APIs for model development (public preview) and deployment (private preview). The release of Snowpark makes our customers’ lives simpler by unifying their data lake into a complete data platform.

SQL

SQL Python Data Lakes Machine Learning

5 Recent Data Science and AI Webinars You Need to See

ODSC - Open Data Science

MARCH 23, 2023

Real-time Analytics & Built-in Machine Learning Models with a Single Database Akmal Chaudhri, Senior Technical Evangelist at SingleStore, explores the importance of delivering real-time experiences in today’s big data industry and how data models and algorithms rely on powerful and versatile data infrastructure.

Data Science

Data Science Data Lakes Machine Learning Machine Learning

Generating value from enterprise data: Best practices for Text2SQL and generative AI

AWS Machine Learning Blog

JANUARY 4, 2024

In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML.

SQL

SQL Database AI AI

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

AWS Machine Learning Blog

DECEMBER 7, 2023

Amazon Forecast is a fully managed service that uses machine learning (ML) algorithms to deliver highly accurate time series forecasts. Initially, daily forecasts for each country are formulated through ML models. These daily predictions are subsequently broken down into hourly segments, as depicted in the following graph.

AWS

AWS Algorithm Data Science Machine Learning

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Third, despite the larger adoption of centralized analytics solutions like data lakes and warehouses, complexity rises with different table names and other metadata that is required to create the SQL for the desired sources. About the Authors Sanjeeb Panda is a Data and ML engineer at Amazon.

SQL

SQL AWS Database ML

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

AWS Machine Learning Blog

APRIL 18, 2023

Amazon Monitron is an end-to-end condition monitoring solution that enables you to start monitoring equipment health with the aid of machine learning (ML) in minutes, so you can implement predictive maintenance and reduce unplanned downtime. For the detailed Amazon Monitron installation guide, refer to Getting started with Amazon Monitron.

AWS

AWS ML ML Database

Watch the Top ODSC Europe 2023 Virtual Sessions Here

ODSC - Open Data Science

JULY 14, 2023

Time Series Forecasting for Managers — All Forecasts Are Wrong but Some Are Useful Tanvir Ahmed Shaikh | Data Strategist (Director) | Genentech, Inc Time series forecasting remains an under-appreciated technique in data science education, often overshadowed by more popular machine learning methods.

Machine Learning

Machine Learning Machine Learning Apache Kafka Data Science

10 everyday machine learning use cases

IBM Journey to AI blog

OCTOBER 16, 2023

Machine learning (ML)—the artificial intelligence (AI) subfield in which machines learn from datasets and past experiences by recognizing patterns and generating predictions—is a $21 billion global industry projected to become a $209 billion industry by 2029. At Facebook Messenger, ML powers customer service chatbots.

Machine Learning

Machine Learning Machine Learning ML ML

How to Effectively Handle Unstructured Data Using AI

DagsHub

NOVEMBER 11, 2024

We use data-specific preprocessing and ML algorithms suited to each modality to filter out noise and inconsistencies in unstructured data. NLP cleans and refines content for text data, while audio data benefits from signal processing to remove background noise. Tools like Unstructured.io

AI

AI AI Data Lakes Database

Promote pipelines in a multi-environment setup using Amazon SageMaker Model Registry, HashiCorp Terraform, GitHub, and Jenkins CI/CD

AWS Machine Learning Blog

NOVEMBER 9, 2023

Building out a machine learning operations (MLOps) platform in the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) for organizations is essential for seamlessly bridging the gap between data science experimentation and deployment while meeting the requirements around model performance, security, and compliance.

AWS

AWS ML ML Machine Learning

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 20, 2023

You’ll use MLRun, Langchain, and Milvus for this exercise and cover topics like the integration of AI/ML applications, leveraging Python SDKs, as well as building, testing, and tuning your work. In this session, we’ll demonstrate how you can fine-tune a Gen AI model, build a Gen AI application, and deploy it in 20 minutes.

AI

AI AI Data Science Machine Learning

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Managing unstructured data is essential for the success of machine learning (ML) projects. Without structure, data is difficult to analyze and extracting meaningful insights and patterns is challenging. This article will discuss managing unstructured data for AI and ML projects. What is Unstructured Data?

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Demand forecasting at Getir built with Amazon Forecast

AWS Machine Learning Blog

MAY 15, 2023

Getir used Amazon Forecast , a fully managed service that uses machine learning (ML) algorithms to deliver highly accurate time series forecasts, to increase revenue by four percent and reduce waste cost by 50 percent. In this post, we describe how we used Forecast to achieve these benefits.

Algorithm

Algorithm Data Scientist Machine Learning Machine Learning

Top Data Analytics Skills and Platforms for 2023

ODSC - Open Data Science

APRIL 3, 2023

Data Wrangling: Data Quality, ETL, Databases, Big Data The modern data analyst is expected to be able to source and retrieve their own data for analysis. Competence in data quality, databases, and ETL (Extract, Transform, Load) are essential.

Analytics

Analytics Analytics Data Analyst Data Science

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

AWS Machine Learning Blog

AUGUST 2, 2024

Generative AI empowers organizations to combine their data with the power of machine learning (ML) algorithms to generate human-like content, streamline processes, and unlock innovation. After data is extracted, the job performs document chunking, data cleanup, and postprocessing.

AWS

AWS Machine Learning Machine Learning Database

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

And, as organizations progress and grow, “data drift” starts to impact data usage, models, and your business. In today’s AI/ML-driven world of data analytics, explainability needs a repository just as much as those doing the explaining need access to metadata, EG, information about the data being used.

Data Governance

Data Governance ML ML Cloud Data

A Guide to Data Analytics in the Travel Industry

Alation

MARCH 21, 2023

To fully realize data’s value, organizations in the travel industry need to dismantle data silos so that they can securely and efficiently leverage analytics across their organizations. What is big data in the travel and tourism industry? Using Alation, ARC automated the data curation and cataloging process. “So

Analytics

Analytics Analytics Data Silos Big Data

Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra

AWS Machine Learning Blog

AUGUST 2, 2023

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra supports a variety of document formats , such as Microsoft Word, PDF, and text from various data sources. Image captioning with GenAI Image description with GenAI involves using ML algorithms to generate textual descriptions of images.

AWS

AWS AI AI Machine Learning

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Webinars

Trending Sources

Unstructured data management and governance using AWS AI/ML and analytics services

Webinars

Did Big Data Deliver Business Transformation & Improved CX?

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Real-Time ML with Spark and SBERT, AI Coding Assistants, Data Lake Vendors, and ODSC East…

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

8 Data Lake Vendors to Make Your Data Life Easier in 2023

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

Best 8 Data Version Control Tools for Machine Learning 2024

Why companies need to accelerate data warehousing solution modernization

MLOps Landscape in 2023: Top Tools and Platforms

40 Must-Know Data Science Skills and Frameworks for 2023

Achieve your AI goals with an open data lakehouse approach

Generate financial industry-specific insights using generative AI and in-context fine-tuning

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

10 Things AWS Can Do for Your SaaS Company

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

What is Snowpark — and Why Does it Matter? A phData Perspective

5 Recent Data Science and AI Webinars You Need to See

Generating value from enterprise data: Best practices for Text2SQL and generative AI

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

Watch the Top ODSC Europe 2023 Virtual Sessions Here

10 everyday machine learning use cases

How to Effectively Handle Unstructured Data Using AI

Promote pipelines in a multi-environment setup using Amazon SageMaker Model Registry, HashiCorp Terraform, GitHub, and Jenkins CI/CD

Find Your AI Solutions at the ODSC West AI Expo

How to Manage Unstructured Data in AI and Machine Learning Projects

Demand forecasting at Getir built with Amazon Forecast

Top Data Analytics Skills and Platforms for 2023

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

The Cloud Connection: How Governance Supports Security

A Guide to Data Analytics in the Travel Industry

Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra

Stay Connected