DQM: Data Quality Metrics for AI components in the industry

Sabrina Chaouche; Yoann Randon; Faouzi Adjed; Nadira Boudjani; Mohamed Ibn Khedher

Communication Dans Un Congrès Année : 2024

DQM: Data Quality Metrics for AI components in the industry

(1) , (1) , (1) , (1, 2) , (1)

1
2

Sabrina Chaouche

Fonction : Auteur
PersonId : 1349359

IRT SystemX

Yoann Randon

Fonction : Auteur
PersonId : 1420969

IRT SystemX

Faouzi Adjed

Fonction : Auteur
PersonId : 1170570
IdHAL : faouzi-adjed
ORCID : 0000-0002-0100-9352

IRT SystemX

Nadira Boudjani

Fonction : Auteur
PersonId : 1420970

IRT SystemX

Valeo Brain Division

Mohamed Ibn Khedher

Fonction : Auteur
PersonId : 1224892

IRT SystemX

Résumé

In industrial settings, measuring the quality of data used to represent an intended domain of use and its operating conditions is crucial and challenging. Thus, this paper aims to present a set of metrics addressing this data quality issue in the form of a library, named DQM (Data Quality Metrics), for Machine Learning (ML) use. Additional metrics specific to industrial application are developed in the proposed library. This work aims also to assess various data and datasets types. Those metrics are used to characterize the training and evaluating datasets involved in the process of building ML models for industrial use cases. Two categories of metrics are implemented in DQM: inherent data metrics, are the ones evaluating the quality of a given dataset independently from the ML model such as statistical proprieties and attributes, and model dependent metrics which are those implemented to measure the quality of the dataset by considering the ML model outputs such the gap between two datasets in regards to a given ML model. DQM is used in the scope of the Confiance.ai program to evaluate datasets used for industrial purposes such as autonomous driving.

Domaines

Machine Learning [stat.ML] Intelligence artificielle [cs.AI] Mathématiques [math] Statistiques [math.ST]

Fichier principal

AAAI_DQM___Data_centric_author_version.pdf (1.91 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Faouzi ADJED : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04719346

Soumis le : jeudi 3 octobre 2024-10:18:29

Dernière modification le : vendredi 11 octobre 2024-00:13:55

Dates et versions

hal-04719346 , version 1 (03-10-2024)

Identifiants

HAL Id : hal-04719346 , version 1

Citer

Sabrina Chaouche, Yoann Randon, Faouzi Adjed, Nadira Boudjani, Mohamed Ibn Khedher. DQM: Data Quality Metrics for AI components in the industry. AI Trustworthiness and Risk Assessment for Challenged Contexts workshop (ATRACC). AAAI Fall symposium, Nov 2024, Arlington, United States. ⟨hal-04719346⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IRT-SYSTEMX CONFIANCEAI

125 Consultations

25 Téléchargements

DQM: Data Quality Metrics for AI components in the industry

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager