article thumbnail

Debugging data to build better and more fair ML applications

Snorkel AI

You can approximate your machine learning training components into some simpler classifiers—for example, a k-nearest neighbors classifier. Here’s one application where you have a 100% clean data set that also has some fairness issues, meaning that if you clean up the whole dataset, the model could be unfair.

ML 52
article thumbnail

Debugging data to build better and more fair ML applications

Snorkel AI

You can approximate your machine learning training components into some simpler classifiers—for example, a k-nearest neighbors classifier. Here’s one application where you have a 100% clean data set that also has some fairness issues, meaning that if you clean up the whole dataset, the model could be unfair.

ML 52
professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Basic Data Science Terms Every Data Analyst Should Know

Pickl AI

Data cleaning identifies and addresses these issues to ensure data quality and integrity. Data Analysis: This step involves applying statistical and Machine Learning techniques to analyse the cleaned data and uncover patterns, trends, and relationships.

article thumbnail

Identifying defense coverage schemes in NFL’s Next Gen Stats

AWS Machine Learning Blog

We design a K-Nearest Neighbors (KNN) classifier to automatically identify these plays and send them for expert review. He has collaborated with the Amazon Machine Learning Solutions Lab in providing clean data for them to work with as well as providing domain knowledge about the data itself.

ML 71
article thumbnail

[Updated] 100+ Top Data Science Interview Questions

Mlearning.ai

The following figure represents the life cycle of data science. It starts with gathering the business requirements and relevant data. Once the data is acquired, it is maintained by performing data cleaning, data warehousing, data staging, and data architecture. Why is data cleaning crucial?