Data could be seen as the latest form of wealth, yet an excess of something beneficial can transform a precious asset into a costly liability. The demand for information is universal. It serves many functions that enhance profits and operational effectiveness for businesses. With artificial intelligence (AI) currently the topic of widespread discussion, the desire for information is growing exponentially.
According to the AI & Information Management Report from AvePoint, a data management and governance firm, 64% of companies handle at least one petabyte of data, with 41% going beyond that, managing at least 500 petabytes. As businesses accumulate more data, it becomes increasingly challenging for their teams to comprehend the extent of their holdings, assess the associated risks, and extract value from it. In an era where data is frequently likened to oil and gold, the question arises: how much is too much?
The Costs and Risks
Data comes at a price. The cost of storing large volumes of data is a direct expense. “Unstructured data poses a significant challenge for organisations due to a common misconception, which is false, that storage is either free or nearly so, especially among those transitioning to cloud services,” explains Dana Simberkoff, the head of risk, privacy, and information security at AvePoint.
More and more, organisations are shifting their data to the cloud, yet this doesn’t imply that everything is being stored in one place. A report on cloud security from a company specialising in threat detection and response, MixMode, revealed that 78% of organisations utilise a mix of cloud services or a hybrid setup.
“Having data scattered across various locations is driving up your expenses,” notes Chris Pierson, the founder and CEO of BlackCloak, a cybersecurity firm. Companies also need to factor in the expenses related to upkeep, which might involve the time of engineers and program analysts. The expenses don’t stop at storage and upkeep. Data also carries the risk of exposure.
Hackers are always on the prowl for methods to breach and exploit the data protected by companies. Should they succeed, and many do, companies are likely to face a series of additional costs. In 2023, the typical expense of a data breach was $4.45 million, as reported in the Cost of a Data Breach Report 2023 by the Ponemon Institute and IBM Security.
Companies might not be aware of the data they possess or its location, which only increases their vulnerability. “If the data is just lying around and you no longer require it, it’s there as a massive risk ready to be taken advantage of through a cyber attack, whether that’s a breach or a privacy breach,” explains Christopher Wall, a data protection officer and a special counsel focusing on global privacy and forensics at HaystackID, a firm specialising in eDiscovery services.
Data Governance
Leaders of companies can’t determine the threshold at which data becomes excessive without understanding their existing data and its location. This necessitates the implementation of data governance. However, establishing effective data governance poses challenges; many companies embark on such endeavours with the hope of success, only to discover their efforts are in vain.
Achieving robust data governance demands input and dedication from every part of the company. Pierson suggests that there ought to be a dialogue among various business divisions, different sectors within the company, and business executives regarding the necessity and desire for specific data. It comes down to, what data we currently possess and how it contributes to the growth of our business.
Data Retention and Deletion
Maintaining every piece of data collected is neither economical nor wise for managing risks. There’s this tendency to cling to all the data collected over time, and eventually, there comes a point where some of it should be considered for deletion.
Once a business has a handle on managing its data, leaders can begin to question which data should be removed and when. The straightforward question of how much data is excessive comes down to weighing the value against the risk. Begin with the basic question: What value does the company derive from the data? Is it more expensive to store and secure that data than the data contributes to the organisation?
When it comes to keeping data, think about the reason for collecting it and its duration. If the data isn’t necessary, don’t gather it. That’s the first essential principle. If you do gather it and it’s needed, use it only for its intended use. For numerous companies, having redundant data leads to “data bloat,” as Simberkoff describes.
A lot of data is redundant and repetitive … if you can remove this data, it will be a significant improvement for most companies. Storing a mass of redundant data does more than just increase the risk for a company. If you’re overwhelmed with redundant obsolete trivial data that starts to diminish the value of the data you have, that’s a problem.
Policies on keeping and removing data should consider the relevance of the information. Is it still needed? Does holding onto it just pose risks without any benefits? Take, for instance, a business that stores people’s Social Security numbers for checking purposes. Do you need the Social Security number? Is it necessary to retain it? Leaders of businesses can assess if their companies truly require certain kinds of personal information.
Following that, they can decide which personal information is necessary to retain and for how long. Some data might be collected, utilised for its original intent, and then immediately erased. The Transportation Security Administration (TSA) is a case in point. It employs facial recognition technology for identification, and after the process is finished, it removes the images and personal details. (TSA mentions that it might keep information on passengers for up to 24 months during the testing and development stages.)
The information that needs to be removed or deleted is highly sensitive personal data that is unnecessary for its original purpose. The keeping of certain information is required by law. For instance, numerous financial documents must be kept for seven years due to the Sarbanes-Oxley Act. Leaders of companies must also pay more attention to privacy laws concerning the gathering, use, and storage of data.
The General Data Protection Regulation (GDPR), along with privacy statutes in various countries and US states, safeguard different kinds of personal information and specify how businesses gather, preserve, and utilise this information. Additionally, the US is progressing towards a national data privacy law, the American Privacy Rights Act (APRA).
Although no single regulation is the same, they all strive to provide consumers with certain rights concerning their personal information, including the ability to choose not to share their data, the right to correct inaccuracies, and the right to request deletion. The common thread among these rights is the necessity for businesses to understand the nature of the data, its location, and its usage.
As companies strive to comply with regulatory requirements, teams need to establish and uphold policies. “Every business ought to have a policy on privacy or data protection, which includes both an internal policy and an external one, serving as the company’s agreement with individuals whose data it is processing,” Wall adds. Privacy is now on the consumer’s terms.
Every procedure and policy you implement, from retention and disposition to handling data subject requests, including those for erasure, must undergo thorough auditing and monitoring for ongoing enhancement. These steps are essential for identifying areas where you can consistently improve.
Reducing costs and minimising risks are obvious advantages of eliminating unnecessary data. However, it can also provide businesses with a competitive advantage. By restricting access to sensitive data and implementing a consistent purging and deletion process, I believe this is a feature that companies can highlight in their marketing.
AI and Data Proliferation
As business leaders incorporate artificial intelligence into their processes, the importance of data is perceived to increase even further. AI algorithms are in constant need of data, so why not fill them to the brim with all available information? Artificial intelligence is set to reveal that there exists a point of data overload, especially when the data is of poor quality.
AI magnifies and quickens any issues related to privacy and security that were already present. It can serve as either a valuable ally or a formidable adversary, depending on how it’s managed. For firms adopting systems that enhance data retrieval, the quality of the data they utilise is crucial. These systems enable a business to enhance an existing language model with its data.
Businesses are under increasing pressure to determine the threshold at which data becomes excessive. From a perspective of innovation, businesses require improved data cleanliness to fully harness the capabilities of artificial intelligence. From a security viewpoint, businesses must lower the dangers linked to protecting information. From a viewpoint of privacy, businesses must grasp their duties to both customers and employees to avoid violating laws. Organisations, states, and the federal government have the opportunity to truly take a stand and develop comprehensive, all-encompassing laws, as well as internal company policies, to secure consumer information in the United States.