Title: | Measuring data quality across open government datasets |
---|
Authors: | ID Gupta, Rajan (Author) ID Yadav, Sushmita (Author) ID Prasad, Avinash (Author) ID Pal, Saibal K. (Author) |
Files: |
This document has no files that are freely available to the public. This document may have a physical copy in the library of the organization, check the status via COBISS. |
---|
Language: | English |
---|
Work type: | Unknown |
---|
Typology: | 1.08 - Published Scientific Conference Contribution |
---|
Organization: | UNG - University of Nova Gorica
|
---|
Abstract: | Data Quality has become the base for any analytical operation or modelling. Poor Quality of data can lead to poor analytical modeling, which in turn can lead to poor decision making and predictions, which can finally impact the revenue and working of an organization. This is true for both public and private sector organizations. With rise in E-Governance, lot of nations and their respective public sector units are making use of publicly available datasets. But are these datasets reliable and have good quality. This is the major research question studied in this paper. The study collected publicly available datasets from Open Government Data platforms across 8 different nations around the world. More than 300 datasets having roughly 3.5 million rows were assessed for various data quality measures. The various parameters studied for the data were valid data types, correctness, completeness, statistical features, variability, comparability, duplicacy and the likes. Script was written in R to check the value for various measures. It was found that different countries had advantages on different parameters. Not one country was found to have all the parameters to be of high quality. Different ranges were found for the dataset for various parameters which was helpful in determining the overall quality of the dataset. This will be helpful for various public and private sector organizations in assessing the quality of datasets they intend to work on. Substantial efforts and resources can be saved on Advanced Analytics if the quality of dataset can be determined in advance. The proposed data quality assessment model can be applied on any private or public dataset. Different industry and organizations can set different threshold values for the parameters to benchmark their analytical process. Both practitioners and researchers can be benefitted from this research work. |
---|
Keywords: | data quality assessment, open government datasets, e-governance, data quality measures |
---|
Year of publishing: | 2019 |
---|
Number of pages: | Str. 442-451 |
---|
PID: | 20.500.12556/RUNG-6420 |
---|
COBISS.SI-ID: | 58258435 |
---|
UDC: | 004 |
---|
NUK URN: | URN:SI:UNG:REP:W9RJVDDB |
---|
Publication date in RUNG: | 05.04.2021 |
---|
Views: | 2929 |
---|
Downloads: | 0 |
---|
Metadata: | |
---|
:
|
Copy citation |
---|
| | | Average score: | (0 votes) |
---|
Your score: | Voting is allowed only for logged in users. |
---|
Share: | |
---|
Hover the mouse pointer over a document title to show the abstract or click
on the title to get all document metadata. |