Clean, healthy data can be a major competitive advantage, especially for businesses that invest the appropriate time and resources into their data management strategies. In the age of Big Data, organizations that harness data effectively and promote data integrity can make better data-driven decisions, improve data quality, and reduce the risk of data loss or corruption.
SEE: Hiring Kit: Database engineer (TechRepublic Premium)
But what exactly is data integrity, and why is it important to the overall health of the business? More importantly, what can be done to maintain high data integrity standards? In this guide, we’ll discuss how data integrity works and why it’s important for your business.
- What is data integrity?
- Why is data integrity important?
- The risks associated with data integrity
- Managing data integrity through data governance
- Types of data integrity
What is data integrity?
At its most basic level, data integrity is the accuracy and consistency of data across its entire life cycle, from when it is captured and stored to when it is processed, analyzed and used.
Data integrity management means ensuring data is complete and accurate, free from errors or anomalies that could compromise data quality.
Data that has been accurately and consistently recorded and stored will retain its integrity, while data that has been distorted or corrupted cannot be trusted or relied upon for business use.
Why is data integrity important?
Data integrity is important for a number of reasons. However, its importance is best explained with a practical example.
Imagine you are a project manager who is running clinical trials for a new revolutionary drug that will be a game changer in the fight against cancer. You have conducted human trials over the past five years and are convinced you’re ready to move into production.
However, while going through regulatory protocols with the FDA to get your drug to market, they find data integrity issues within the data from your trials — some crucial quality control data is missing.
SEE: Data quality in healthcare: Current problems and possible solutions (TechRepublic)
As a result, they halt your trials. Although you may convince them to proceed with the approval process after you’ve addressed the data integrity issues, the delay will likely cost your company millions of dollars and impact public perception of your drug.
In this example, data integrity is critical to the success of your clinical trials and the ultimate product. This is just one example from the pharmaceutical industry, but this issue cuts across many sectors and data types.
Data integrity is fundamental in regulated industries, where data must be accurate, complete and verifiable at all times. Poor data integrity can cause enterprises to lose money, positive public and industrial reputations, and valuable production time.
The risks associated with data integrity
Data integrity is a complex and multifaceted issue. Data professionals must be vigilant about the various risks that can compromise data integrity and quality. These include the following:
In data management, human error is a major risk factor for data integrity. Human errors can occur when data is incorrectly input, processed or analyzed. In some industries, like finance or customer service, the reliance on multiple data sources can also lead to data integrity issues.
Misconfigurations and security errors
If data is not configured correctly — for example, if incorrect user permissions have been set — it may be more vulnerable to cybercriminals or data breaches. Likewise, if data is not appropriately secured with encryption and access controls, it can also be compromised by unauthorized individuals or programs.
Hardware can fail, data can be accidentally deleted or overwritten, data can be corrupted during data transfer and storage, and data may be unintentionally accessed or overwritten by other data users. If your organization is considering a migration to the cloud , assess data quality and integrity on these legacy systems before making the shift.
Unintended transfer errors
When data is migrated between different data systems, data may be accidentally lost or corrupted during the transfer process. This situation can be a significant data integrity risk, especially if data is shared between different teams or sources.
Malware, insider threats and cyberattacks
Data integrity can also be compromised by malware or viruses that corrupt data. It’s important to have protections in place against malicious insiders seeking to steal data and cyberattacks that target data repositories or data infrastructure.
Managing data integrity through data governance
To mitigate many data integrity risks, data managers should implement a robust data governance strategy that includes data integrity checks at every stage. This process may involve:
- Data quality assessments
- Data literacy and security training for data users
- Process improvements that reduce data errors
- Data redundancy and data backup practices to ensure data reliability
- Data encryption for data security
- Data auditing for detecting data integrity issues
- Robust cybersecurity measures
Learn more about data governance best practices with our TechRepublic Premium data governance checklist for your organization .
Types of data integrity
To effectively maintain data integrity, you must understand the two main types of data integrity that exist: physical integrity and logical integrity.
Ensuring data integrity through physical means is essential for data processing and retrieval to function as intended. While software-based safeguards provide a critical layer of defense, you must also protect data via physical measures to ensure that data remains unaltered and complete, even during an outage or other destructive event.
SEE: Disaster recovery and business continuity plan (TechRepublic Premium)
Natural disasters, power outages, cyberattacks, human error and storage degradation can jeopardize data’s physical integrity. Therefore, organizations must recognize the importance of incorporating software and physical security measures to guarantee data accuracy and completeness over long periods.
In a relational database, logical integrity ensures that data remains unchanged. This helps to keep data safe from human error and malicious attacks.
There are four types of logical integrity that work together to ensure data is consistent and reliable:
Entity integrity defines each entity’s primary key, making sure each record in a table has a unique identifier. Having a different primary key for each record helps prevent duplicate or incomplete records because it guarantees all entries have this crucial element before any other information can be added to the record. Entity integrity also establishes relationships between tables in a database, allowing you to link information from one table to another.
Referential integrity ensures that records in related tables are linked correctly. For example, if you delete a record from one table, any related records in other tables will also be deleted, preventing the existence of orphaned records and corrupted data.
Referential integrity also prevents users from entering data into foreign keys without verifying whether the foreign key exists in its parent table, thus avoiding linking errors between tables.
Domain integrity enforces rules about what types of data can be entered into specific fields within an Excel spreadsheet’s database table or column.
This type of integrity prevents incorrect values from being entered. There may be projections against entering text into numeric fields or numbers into alpha-numeric fields, which could cause errors when running reports or queries against the dataset.
Domain integrity also checks inputs against predetermined values. This could include checking credit card numbers against Luhn’s algorithm to confirm they are valid before accepting them into the system.
User-defined integrity allows users to create custom rules for their databases , including limiting certain characters or words from being used in passwords. User-defined integrity could also involve setting maximum field values or blocking specific IP addresses from accessing the system altogether.
Read next: Top data quality tools (TechRepublic)