
It’s been a while since I’ve posted here. I plan to do that more regularly. So, stay tuned!
Recently, TDWI published one of our Best Practices Reports on Responsible Data and Analytics. These are primary research reports that examine important topics in data and analytics.
Responsible data and analytics considers the ethical, societal, compliance, legal, and environmental ramifications of using data in a wide variety of applications and processes. Responsible data and analytics can be a strategic framework for proactively addressing broader planetary, societal, and business concerns, or a set of tactics for mitigating the risks and downsides from accidental or ill-advised uses of data and analytics.
Given the fact that data and analytics are permeating every part of our lives, this is a critical concept. Data-driven analytics applications have the potential to enrich our lives immensely. Consider machine learning applications that can diagnose disease in individuals or help improve crop yield. Alternatively, if designed or deployed irresponsibly, they can also wreak havoc in countless ways. For example, that same application that identifies a disease may not protect an individual’s right to privacy. Improperly
trained machine learning models can result in predictions that can negatively affect a person’s life and reputation. Data that is not kept private can do the same.
As part of our research, TDWI has developed its own TDWI framework for responsible data and analytics. The framework highlights the importance of ethical and trustworthy data and analytics, that data and analytics security and privacy can be maintained, and of course includes data governance principles. It supports responsible and sustainable business practices. The framework includes closely connected principles such as fairness, explainability, transparency, and data safety. It considers the individual as well as the planet. Although our framework is evolving, it is a good starting point for the kinds of issues that business and IT should consider when thinking about responsible data and analytics.
The framework includes factors such as:
- Ethics. Data ethics is about the right and wrong use of data. It examines the moral issues and practices related to data topics including data curation and data sharing as well as the ethics of AI/ML algorithms and the design of applications that use data. It is concerned about fairness and bias in data and analytics. There are many different kinds of bias which we discuss in the report. Bias is important when building models because it can impact the outcome. It is important to understand the bias inherent in data to ensure fair predictions across all groups.
- Trustworthiness. We are concerned about trustworthy data – that it is complete accurate, timely, reliable, relevant, comprehensible. Organizations know that if data doesn’t have high integrity, it will impact their analysis.
- Security and privacy. The data and analytics remain secure and private through mechanisms like access controls, encryption, authorization. Privacy is a key responsible outcome that many enterprises are addressing in the design, operationalization, and management of data and analytics applications.
- Transparency. Our framework includes transparency for data and analytics. Transparency involves expanding visibility into the algorithmic underpinning of machine learning and other analytics models so they are not black boxes that people can’t understand. Models being built today affect people’s lives. That means that if they are making decisions that impact people, people should know how they came to that conclusion.
- Sustainability. Data and analytics should supports responsible and sustainable business practices in terms of say, net carbon emissions from data centers used to process data or adopting data analytics platforms that incorporate more energy-efficient chipsets, servers, and cloud providers in AI platforms. It should include practices such as exploring model training techniques that take less time and use less data to achieve acceptable results.
- Safety. The framework even includes data safety in terms of risk mitigation and processes such as human-in-the-loop.
We were very interested in the state of responsible data and analytics as part of this research, since it is a relatively new topic. To understand the current state, we asked respondents a series of questions about data governance and data ethics as well the kind of activities that the organization consider to be important when managing data. Security, privacy, and accuracy had some of the highest scores in terms of importance. Likewise, high-quality data was also important. This refers to data that is complete, accurate, timely, reliable, valid, relevant, and compliant. Data accuracy was reported as a priority by 71% of respondents, and the more comprehensive notion of data quality was cited by 59%. This is not surprising; TDWI Research has seen this priority consistently in other studies we’ve conducted over the years. In this survey, for example, the majority of respondents performed data quality assessments of input data used for analytics either on a monthly, quarterly, or yearly basis.
Although data trustworthiness and governance ranked high in importance in our survey, this was not the case for data ethics. Our survey found that activities associated with ethics rate much lower than those for data governance and security. For example, in this survey, less than 20% each rated avoiding unintended consequences, fairness, and avoiding bias as important activities in data management. Additionally, less than 20% of respondents said that data engineers, analysts, data scientists, developers, and other who work with data have appropriate training or knowledge about issues such as data ethics, bias, or other responsible outcomes.
Want to learn more? Visit https://tdwi.org/research/2022/12/diq-all-best-practices-report-responsible-data-and-analytics.aspx?tc=page0.