Blog and Data Quality - Data Science Current

Data Errors in Financial Services: Addressing the Real Cost of Poor Data Quality

The Data Administration Newsletter

NOVEMBER 20, 2024

Data quality issues continue to plague financial services organizations, resulting in costly fines, operational inefficiencies, and damage to reputations. Key Examples of Data Quality Failures — […]

Data Quality

Data Quality Data Silos Data Governance

2025 Planning Insights: Data Quality Remains the Top Data Integrity Challenge and Priority

Precisely

NOVEMBER 5, 2024

Key Takeaways: Data quality is the top challenge impacting data integrity – cited as such by 64% of organizations. Data trust is impacted by data quality issues, with 67% of organizations saying they don’t completely trust their data used for decision-making.

Data Quality

Data Quality Analytics Analytics Data Governance

Data Quality Metrics Best Practices

Dataversity

MARCH 17, 2025

The amount of data we deal with has increased rapidly (close to 50TB, even for a small company), whereas75% of leaders dont trust their datafor business decision-making.Though these are two different stats, the common denominator playing a role could be data quality.With new data flowing from almost every direction, there must be a yardstick or […] (..)

Data Quality

Data Quality Data Governance

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The Crucial Intersection of Generative AI and Data Quality: Ensuring Reliable Insights

Data Science Blog

OCTOBER 1, 2024

Just like a skyscraper’s stability depends on a solid foundation, the accuracy and reliability of your insights rely on top-notch data quality. Enter Generative AI – a game-changing technology revolutionizing data management and utilization. Businesses must ensure their data is clean, structured, and reliable.

Data Quality

Data Quality AI AI Analytics

Driving Data Usability for Health Plans through Simplified Data Quality Enforcement with Databricks

databricks

MAY 30, 2023

Faced with clinician shortages, an aging population, and stagnant health outcomes, the healthcare industry has the potential to greatly benefit from disruptive technologies.

Data Quality

The Cool Kids Corner: Data Quality Is Not a Fish You Can Catch

Dataversity

NOVEMBER 6, 2023

This is my monthly check-in to share with you the people and ideas I encounter as a data evangelist with DATAVERSITY. This month we’re talking about Data Quality (DQ). Read last month’s column here.)

Data Quality

Data Quality Data Governance

Good Data Quality Is the Secret to Successful GenAI Implementation

Dataversity

APRIL 22, 2024

So why are many technology leaders attempting to adopt GenAI technologies before ensuring their data quality can be trusted? Reliable and consistent data is the bedrock of a successful AI strategy.

Data Quality

Data Quality AI AI Data Governance

Mind the Gap: Did You Know About the ISO 25000 Series Data Quality Standards? Me Neither

Dataversity

APRIL 3, 2025

This is the first in a two-part series exploring Data Quality and the ISO 25000 standard. Despite efforts to recall the bombers, one plane successfully drops a […] The post Mind the Gap: Did You Know About the ISO 25000 Series Data Quality Standards? Ripper orders a nuclear strike on the USSR.

Data Quality

Data Quality Data Governance

Why Is Data Quality Still So Hard to Achieve?

Dataversity

OCTOBER 25, 2023

In fact, it’s been more than three decades of innovation in this market, resulting in the development of thousands of data tools and a global data preparation tools market size that’s set […] The post Why Is Data Quality Still So Hard to Achieve? appeared first on DATAVERSITY.

Data Quality

Data Quality Data Preparation Algorithm Data Silos

How to Assess Data Quality Readiness for Modern Data Pipelines

Dataversity

FEBRUARY 13, 2023

The key to being truly data-driven is having access to accurate, complete, and reliable data. In fact, Gartner recently found that organizations believe […] The post How to Assess Data Quality Readiness for Modern Data Pipelines appeared first on DATAVERSITY.

Data Pipeline

Data Pipeline Data Quality Data Silos Data Governance

Ledger: Stripe’s system for tracking and validating money movement

Hacker News

FEBRUARY 15, 2024

In this blog post, we’ll share technical details on how we built this state-of-the-art money movement tracking system, and describe how teams at Stripe interact with the data quality metrics that underlie our payment processing network.

Data Quality

Natural Language Processing techniques that improve data quality with LLMs

SAS Software

JULY 9, 2024

Adding linguistic techniques in SAS NLP with LLMs not only help address quality issues in text data, but since they can incorporate subject matter expertise, they give organizations a tremendous amount of control over their corpora.

Natural Language Processing

Natural Language Processing Data Quality Analytics Analytics

DGIQ + AIGov Conference: Takeaways and Trending Topics in AI Governance

Dataversity

APRIL 4, 2025

In this series of blog posts, I aim toshare some key takeaways from the DGIQ + AIGov Conference 2024 held by DATAVERSITY. These takeaways include my overall professional impressions and a high-level review of the most prominenttopics discussed in the conferences core subject areas: data governance, data quality, and AI governance.

Data Governance

Data Governance Data Quality AI AI

RAG (Retrieval Augmented Generation) Architecture for Data Quality Assessment

Dataversity

JULY 12, 2024

At their core, LLMs are trained on large amounts of content and data, and the architecture […] The post RAG (Retrieval Augmented Generation) Architecture for Data Quality Assessment appeared first on DATAVERSITY. It is estimated that by 2025, 50% of digital work will be automated through these LLM models.

Data Quality

Data Quality Artificial Intelligence Artificial Intelligence AI

Data Mesh Architecture on Cloud for BI, Data Science and Process Mining

Data Science Blog

JULY 23, 2023

It advocates decentralizing data ownership to domain-oriented teams. Each team becomes responsible for its Data Products , and a self-serve data infrastructure is established. This enables scalability, agility, and improved data quality while promoting data democratization.

Data Science

Data Science Azure Power BI Business Intelligence

Change Data Capture and the Value of Real-Time Data Integration

Dataversity

APRIL 24, 2025

Business insights are only as good as the accuracy of the data on which they are built. According to Gartner, data quality is important to organizations in part because poor data quality costs organizations at least $12.9 million a year on average.

Data Quality

Data Quality Data Pipeline ETL Database

Advancing Data Fabric with Micro-segment Creation in IBM Knowledge Catalog

IBM Data Science in Practice

JANUARY 2, 2025

By creating microsegments, businesses can be alerted to surprises, such as sudden deviations or emerging trends, empowering them to respond proactively and make data-driven decisions. This step allows users to analyze data quality, create metadata enrichment (MDE), or define data quality rules for thesubset.

SQL

SQL Data Quality Data Profiling Data Preparation

Enhancing Data Fabric with SQL Asset Type in IBM Knowledge Catalog

IBM Data Science in Practice

APRIL 26, 2024

Data fabric is a cohesive architecture designed to simplify data access and management across diverse environments. In this blog, we explore how the introduction of SQL Asset Type enhances the metadata enrichment process within the IBM Knowledge Catalog , enhancing data governance and consumption.

SQL

SQL Data Quality Data Governance Data Scientist

LLM Agents Underscore One Truth: Data Is The Real Differentiator.

Towards AI

NOVEMBER 8, 2024

. — Peter Norvig, The Unreasonable Effectiveness of Data. Edited Photo by Taylor Vick on Unsplash In ML engineering, data quality isn’t just critical — it’s foundational. Since 2011, Peter Norvig’s words underscore the power of a data-centric approach in machine learning. Using biased or low-quality data?

ML

ML ML Data Quality Algorithm

5 essential machine learning practices every data scientist should know

Data Science Dojo

MAY 24, 2023

By following best practices in algorithm selection, data preprocessing, model evaluation, and deployment, we unlock the true potential of machine learning and pave the way for innovation and success. In this blog, we focus on machine learning practices—the essential steps that unlock the potential of this transformative technology.

Machine Learning

Machine Learning Machine Learning Data Scientist Support Vector Machines

Looking Ahead: The Future of Data Preparation for Generative AI

Data Science Blog

AUGUST 22, 2024

This shift not only saves time but also ensures a higher standard of data quality. Tools like BiG EVAL are leading data quality field for all technical systems in which data is transported and transformed. Foster a Data-Driven Culture Promote a culture where data quality is a shared responsibility.

Data Preparation

Data Preparation Data Quality AI AI

2025 Planning Insights: Data Governance Adoption Has Risen Dramatically

Precisely

DECEMBER 9, 2024

Top reported benefits of data governance programs include improved quality of data analytics and insights (58%), improved data quality (58%), and increased collaboration (57%). Data governance is a top data integrity challenge, cited by 54% of organizations second only to data quality (56%).

Data Governance

Data Governance Data Quality Analytics Analytics

Data Sips: Interview with Tom Redman

Dataversity

JANUARY 31, 2025

Data Sips is a new video miniseries presented by Ippon Technologies and DATAVERSITY that showcases quick conversations with industry experts from last months Data Governance & Information Quality (DGIQ) Conference in Washington, D.C.

Data Governance

Data Governance Data Quality

Effective strategies for gathering requirements in your data project

Dataconomy

DECEMBER 17, 2024

This blog post explores effective strategies for gathering requirements in your data project. Whether you are a data analyst , project manager, or data engineer, these approaches will help you clarify needs, engage stakeholders, and ensure requirements gathering techniques to create a roadmap for success.

Data Quality

Data Quality Power BI Data Engineer Data Engineering

5 Essential Machine Learning Techniques to Master Your Data Preprocessing

Towards AI

SEPTEMBER 25, 2024

A Comprehensive Data Science Guide to Preprocessing for Success: From Missing Data to Imbalanced Datasets This member-only story is on us. In just about any organization, the state of information quality is at the same low level – Olson, Data Quality Data is everywhere! Upgrade to access all of Medium.

Machine Learning

Machine Learning Machine Learning Data Scientist Data Science

Quality Control Tips for Data Collection with Drone Surveying

Smart Data Collective

APRIL 5, 2022

Drone surveyors must also know how to gather and use data properly. They will need to be aware of the potential that data can bring to entities using drones. Indiana Lee discussed these benefits in an article for Drone Blog. You will also want to know how to harvest the data that you get.

Data Quality

Data Quality Big Data Big Data Analytics

Anomaly Detection: How to Find Outliers Using the Grubbs Test

PyImageSearch

JANUARY 6, 2025

The Grubbs test works by comparing the maximum deviation of the data points from the mean relative to the standard deviation. In this blog post, we will delve into the mechanics of the Grubbs test, its application in anomaly detection, and provide a practical guide on how to implement it using real-world data.

Python

Python Deep Learning Deep Learning Clustering

What does “Garbage in, garbage out” mean in solving real business problems?

Towards AI

AUGUST 25, 2023

In today's business landscape, relying on accurate data is more important than ever. The phrase "garbage in, garbage out" perfectly captures the importance of data quality in achieving successful data-driven solutions. Join thousands of data leaders on the AI newsletter.

Data Quality

Data Quality AI AI Clean Data

Enhanced observability for AWS Trainium and AWS Inferentia with Datadog

AWS Machine Learning Blog

NOVEMBER 26, 2024

This data makes sure models are being trained smoothly and reliably. If failures increase, it may signal issues with data quality, model configurations, or resource limitations that need to be addressed. Execution status – You can monitor the progress of training jobs, including completed tasks and failed runs.

AWS

AWS ML ML Data Pipeline

Transitioning off Amazon Lookout for Metrics

AWS Machine Learning Blog

OCTOBER 9, 2024

The service, which was launched in March 2021, predates several popular AWS offerings that have anomaly detection, such as Amazon OpenSearch , Amazon CloudWatch , AWS Glue Data Quality , Amazon Redshift ML , and Amazon QuickSight. You can review the recommendations and augment rules from over 25 included data quality rules.

AWS

AWS ML ML Data Quality

Five Trends Shaping Enterprise Data Labeling for LLM Development

Dataversity

SEPTEMBER 25, 2023

In an era where large language models (LLMs) are redefining AI digital interactions, the criticality of accurate, high-quality, and pertinent data labeling emerges as paramount. That means data labelers and the vendors overseeing them must seamlessly blend data quality with human expertise and ethical work practices.

Data Quality

Data Quality AI AI Data Governance

How AWS sales uses Amazon Q Business for customer engagement

AWS Machine Learning Blog

DECEMBER 11, 2024

This enables sales teams to interact with our internal sales enablement collateral, including sales plays and first-call decks, as well as customer references, customer- and field-facing incentive programs, and content on the AWS website, including blog posts and service documentation.

AWS

AWS Database AI AI

5 strategies for data security and governance in data warehousing: ensuring data protection and compliance

Data Science Dojo

SEPTEMBER 6, 2023

M aintaining the security and governance of data within a data warehouse is of utmost importance. Data ownership extends beyond mere possession—it involves accountability for data quality, accuracy, and appropriate use. This includes defining data formats, naming conventions, and validation rules.

Data Warehouse

Data Warehouse Data Governance Data Quality ETL

Putting Data Mapping to Work

Dataversity

OCTOBER 16, 2024

In my previous blog post, I defined data mapping and its importance. Here, I explore how it works, the most popular techniques, and the common challenges that crop up and that teams must overcome to ensure the integrity and accuracy of the mapped data.

Data Quality

Data Quality Data Governance

Machine Learning Models: 4 Ways to Test them in Production

Data Science Dojo

JULY 5, 2024

In this blog, we will explore the 4 main methods to test ML models in the production phase. TensorFlow There are three main types of TensorFlow frameworks for testing: TensorFlow Extended (TFX): This is designed for production pipeline testing, offering tools for data validation, model analysis, and deployment.

Machine Learning

Machine Learning Machine Learning ML ML

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 27, 2024

Monitoring – Continuous surveillance completes checks for drifts related to data quality, model quality, and feature attribution. Workflow A corresponds to preprocessing, data quality and feature attribution drift checks, inference, and postprocessing. Workflow B corresponds to model quality drift checks.

Machine Learning

Machine Learning Machine Learning ML ML

The one constant in our AI future? Data

SAS Software

JULY 19, 2024

Data appeared first on SAS Blogs. “How will we catch up when technology seems to change overnight, nearly every night?” It’s a surprisingly common [.] The post The one constant in our AI future?

AI

AI AI Clean Data Data Quality

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Data quality control: Robust dataset labeling and annotation tools incorporate quality control mechanisms such as inter-annotator agreement analysis, review workflows, and data validation checks to ensure the accuracy and reliability of annotations. Data monitoring tools help monitor the quality of the data.

Machine Learning

Machine Learning Machine Learning ML ML

Crucial Advantages of Investing in Big Data Management Solutions

Smart Data Collective

SEPTEMBER 28, 2022

Data engineering services can analyze large amounts of data and identify trends that would otherwise be missed. If you’re looking for ways to increase your profits and improve customer satisfaction, then you should consider investing in a data management solution. Big data management increases the reliability of your data.

Big Data

Big Data Big Data Big Data Analytics Big Data Analytics

RAG and finetuning: A comprehensive guide to understanding the two approaches

Data Science Dojo

MARCH 18, 2024

This is the first blog in the series of RAG and finetuning, focusing on providing a better understanding of the two approaches. This blog will walk you through RAG and finetuning, unraveling how they work, why they matter, and how they’re applied to solve real-world problems.

Database

Database Natural Language Processing AI AI

Understanding Machine Learning Challenges: Insights for Professionals

Pickl AI

FEBRUARY 17, 2025

Introduction: The Reality of Machine Learning Consider a healthcare organisation that implemented a Machine Learning model to predict patient outcomes based on historical data. However, once deployed in a real-world setting, its performance plummeted due to data quality issues and unforeseen biases.

Machine Learning

Machine Learning Machine Learning Supervised Learning ML

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

To quickly explore the loan data, choose Get data insights and select the loan_status target column and Classification problem type. The generated Data Quality and Insight report provides key statistics, visualizations, and feature importance analyses. Now you have a balanced target column.

Data Preparation

Data Preparation ML ML Data Quality

Supercharge your data strategy: Integrate and innovate today leveraging data integration

IBM Journey to AI blog

OCTOBER 22, 2024

The ability to effectively deploy AI into production rests upon the strength of an organization’s data strategy because AI is only as strong as the data that underpins it. This flexibility enables organizations to maximize the potential of their data, regardless of infrastructure or use case.

Data Silos

Data Silos Data Pipeline DataOps Business Intelligence

Data Errors in Financial Services: Addressing the Real Cost of Poor Data Quality

2025 Planning Insights: Data Quality Remains the Top Data Integrity Challenge and Priority

Webinars

Trending Sources

Data Quality Metrics Best Practices

Webinars

The Crucial Intersection of Generative AI and Data Quality: Ensuring Reliable Insights

Driving Data Usability for Health Plans through Simplified Data Quality Enforcement with Databricks

The Cool Kids Corner: Data Quality Is Not a Fish You Can Catch

Good Data Quality Is the Secret to Successful GenAI Implementation

Mind the Gap: Did You Know About the ISO 25000 Series Data Quality Standards? Me Neither

Why Is Data Quality Still So Hard to Achieve?

How to Assess Data Quality Readiness for Modern Data Pipelines

Ledger: Stripe’s system for tracking and validating money movement

Natural Language Processing techniques that improve data quality with LLMs

DGIQ + AIGov Conference: Takeaways and Trending Topics in AI Governance

RAG (Retrieval Augmented Generation) Architecture for Data Quality Assessment

Data Mesh Architecture on Cloud for BI, Data Science and Process Mining

Change Data Capture and the Value of Real-Time Data Integration

Advancing Data Fabric with Micro-segment Creation in IBM Knowledge Catalog

Enhancing Data Fabric with SQL Asset Type in IBM Knowledge Catalog

LLM Agents Underscore One Truth: Data Is The Real Differentiator.

5 essential machine learning practices every data scientist should know

Looking Ahead: The Future of Data Preparation for Generative AI

2025 Planning Insights: Data Governance Adoption Has Risen Dramatically

Data Sips: Interview with Tom Redman

Effective strategies for gathering requirements in your data project

5 Essential Machine Learning Techniques to Master Your Data Preprocessing

Quality Control Tips for Data Collection with Drone Surveying

Anomaly Detection: How to Find Outliers Using the Grubbs Test

What does “Garbage in, garbage out” mean in solving real business problems?

Enhanced observability for AWS Trainium and AWS Inferentia with Datadog

Transitioning off Amazon Lookout for Metrics

Five Trends Shaping Enterprise Data Labeling for LLM Development

How AWS sales uses Amazon Q Business for customer engagement

5 strategies for data security and governance in data warehousing: ensuring data protection and compliance

Top 20 Data Warehouse Interview Questions You Must Know in 2025

Putting Data Mapping to Work

Machine Learning Models: 4 Ways to Test them in Production

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

The one constant in our AI future? Data

MLOps Landscape in 2023: Top Tools and Platforms

Crucial Advantages of Investing in Big Data Management Solutions

RAG and finetuning: A comprehensive guide to understanding the two approaches

Understanding Machine Learning Challenges: Insights for Professionals

Accelerate data preparation for ML in Amazon SageMaker Canvas

Supercharge your data strategy: Integrate and innovate today leveraging data integration

Stay Connected