There are various dimensions that can contribute to measuring your data quality. The dimensions that are relevant for your organisation depend on the context in which the data is used, and your organisation wants to use it for. Organisations often choose the most relevant dimensions to measure data quality on.
Below, some of the most used data quality dimensions are explained:
- Accuracy: Accuracy refers to the correctness of the data. Data is considered accurate when it reflects the real-world values or attributes it is supposed to represent. Inaccurate data can lead to incorrect conclusions and decisions. For example, a source with outdated data on the age of customers is not accurate.
- Completeness: Completeness assesses the presence of all required data elements within the dataset, without missing or omitted values. Data should be comprehensive and contain all the necessary information for its intended use. Incomplete data can result in gaps and hinder effective use and analysis.
- Consistency: Consistency is about the absence of conflicts or differentiations between different data elements, either between different data sets or within the data set. Data should be consistent both within itself and with other related data. Inconsistencies can lead to confusion and conflicting results. For example, if a customers’ address in the data of the sales department is different than the address within the data of the finance department, the data is not consistent.
- Timeliness: Timeliness refers to the relevance and currency of data, making sure it is up-to-date and good for decision-making. Timely data is up-to-date and relevant for the specific use case. Outdated data can be misleading and result in poor decision-making. For example, timely data is very important for trading.
- Validity: Validity ensures that data adheres to rules or standards, confirming its appropriateness for the intended purpose. The rules can be about structure, such as proper coding for diagnoses and treatments for medicine.
- Uniqueness: Uniqueness assesses how free a dataset is from duplicate records. It involves identifying and eliminating repeated or redundant information, ensuring that each data entry is distinct. Maintaining uniqueness is crucial for accurate analysis, decision-making, and data integration, enhancing the overall reliability and efficiency of the dataset.
It is important to note that not all dimensions are as relevant within your organisation, this differs per data element, data set and even end-user. For example, in a financial context timeliness might be very important, while in health-care consistency can be set as priority.
Curious on how to start implementing data quality checks within your organisation? Check this page or get in touch with us!
Frequently asked questions:
Data quality refers to the accuracy, completeness, consistency, timeliness, validity, and uniqueness of data within a dataset.
The six dimensions of data quality are: accuracy, completeness, consistency, timeliness, validity and uniqueness.