Measuring data quality across open government datasetsRajan Gupta
, Sushmita Yadav
, Avinash Prasad
, Saibal K. Pal
, 2019, objavljeni znanstveni prispevek na konferenci
Opis: Data Quality has become the base for any analytical operation or modelling. Poor Quality of data can lead to poor analytical modeling, which in turn can lead to poor decision making and predictions, which can finally impact the revenue and working of an organization. This is true for both public and private sector organizations. With rise in E-Governance, lot of nations and their respective public sector units are making use of publicly available datasets. But are these datasets reliable and have good quality. This is the major research question studied in this paper. The study collected publicly available datasets from Open Government Data platforms across 8 different nations around the world. More than 300 datasets having roughly 3.5 million rows were assessed for various data quality measures. The various parameters studied for the data were valid data types, correctness, completeness, statistical features, variability, comparability, duplicacy and the likes. Script was written in R to check the value for various measures. It was found that different countries had advantages on different parameters. Not one country was found to have all the parameters to be of high quality. Different ranges were found for the dataset for various parameters which was helpful in determining the overall quality of the dataset. This will be helpful for various public and private sector organizations in assessing the quality of datasets they intend to work on. Substantial efforts and resources can be saved on Advanced Analytics if the quality of dataset can be determined in advance. The proposed data quality assessment model can be applied on any private or public dataset. Different industry and organizations can set different threshold values for the parameters to benchmark their analytical process. Both practitioners and researchers can be benefitted from this research work.
Ključne besede: data quality assessment, open government datasets, e-governance, data quality measures
Objavljeno v RUNG: 05.04.2021; Ogledov: 1957; Prenosov: 0
Gradivo ima več datotek! Več...