MarkLogicの新しいウェブサイトは www.progress.com/marklogic です。それをご覧ください。
BLOG ARTICLE

The Next Critical Step for AI: Eliminate Data Bias

Back to blog
04.13.2023
所要時間:4分
Back to blog
04.13.2023
所要時間:4分

Artificial Intelligence (AI) has a great capacity for good. I believe human-driven AI will probably be one of the greatest tools humanity has ever developed. But fulfilling that potential requires us to do the hard work—now. This begins with ensuring the data our systems ingest are comprehensive and free of bias. The good news is that technology can and should help.

Data Bias—A Real-World Example

The typical enterprise won’t gain much benefit from AI trained on data scraped randomly off the internet. Business value comes with AI trained on an organization’s own data, which is also where bias can creep in. Flawed data sets produce flawed AI decisions, and these can have drastic consequences:

A woman in the United States took sleeping tablets, following her doctor’s advice based on the manufacturer’s own guidelines. The next morning, she rose and drove to work, but got pulled over—and later arrested. The issue? The prior night’s medication still in her system left her driving under the influence. She fought the charges in court where it was later revealed the medicine guidelines her physician gave her, based on the advice from the manufacturer, were developed using data solely from male test subjects. With faster metabolisms, certain medicines exit the systems of men far faster than women. In this case, biased medical data led to bad medicine and a scary legal entanglement.

How to Avoid Biased Datasets

To avoid biased data, or at the very least mitigate its prevalence, companies should follow two important steps. First, the widest array of data needs to be ingested. This includes vast amounts of their own, proprietary raw data, structured and unstructured, drawing upon every possible company source, such as documents, excel files, research, financials, regulatory data, historical data and benchmarks. Second, controls are required, enabled by meta-tagging data with contextual information.

To accelerate this process, companies need a tool that enables the data to be ingested with the necessary context applied. This has historically been the role of subject matter experts. However, processing data at scale requires a rules-based engine to classify data with the proper taxonomies and ontologies, thus providing the context behind the data, which can so often expose the bias.

This process enables businesses to not only consider the validity of the algorithm, but really, the source data used to train the algorithm as well. Oversight is where humans can help keep the AI decisioning on track. For example, we wouldn’t teach an algorithm that 2+2=5. But that’s exactly what we’re doing if we don’t ensure the data we use for AI is clean, sensible and has the proper metadata context.

Infusing AI with internal data already shows great promise. BloombergGPT™ is reported to be 52% proprietary or cleaned financial data. Its study found, “the BloombergGPT model outperforms existing open models of a similar size on financial tasks by large margins, while still performing on par or better on general natural language processing benchmarks.” This is just one example but shows how powerful integrating internally sourced data sets can be.

AI Still Needs Humans

Regardless of where the data comes from, AI lacks a moral compass and ethical context that human decisions organically include.

To compensate for this gap, we must ask the right questions and include those rationales in our data sets. AI algorithms also need to be trained across cultures, ages and genders, as well as a host of other parameters to account for bias. The cleaner the data points used, the more sound the decision.

The “wisdom of crowd” theory puts forth, in brief, that the more data points you combine about a particular question, the more “right” your resulting answer. This even holds when crowd-sourced decisions are compared to experts. Stripped to its core, AI takes a reasonable guess based on the data it has. Accuracy, therefore, comes from aggregating the data points and balancing the wrong and the right to discern the most probable. But AI can’t govern itself. It takes diverse and critical thinking, weighing many factors to ensure the decisions we get via AI’s advanced decision-making are for the good of the whole, rather than biased to the few.

A Transparent Way Forward

As the world of data grows, businesses need scalable solutions to process and manage it all. There is a limit to how much information a human brain can process. And repeatedly retaining subject matter experts is impractical. Achieving unbiased data requires an agile, transparent, rules-based data platform where data can be ingested, harmonised and curated for the AI tool. If businesses and their AI teams are to responsibly move forward, they need a replicable, scalable way to ensure AI algorithms are trained with clean, quality data. Preferably, their proprietary own.

In my next blog, I am going to look at another feature that any data platform should have to help remove data bias and add further transparency to the data: bi-temporality. That piece will look at how it can be leveraged to provide data provenance and lineage throughout the life cycle of the data.

Data Bias Survey Results

For more information on the state of data bias in business today, and to gain insight into how to avoid and address data bias in your own organization, read the highlights from our data bias survey.

Read the blog

フィリップ・ミラー

フィリップ・ミラーはプログレス | MarkLogicのカスタマーサクセスマネージャとして、国際的な標準化団体や出版業界の顧客をお手伝いしています。また、カスタマーウェビナーである「Digital Acceleration」シリーズや、プログレス | MarkLogicのVisioinイベントも担当しています。お客さまを支援することに熱心で、またプログレス | MarkLogicのデータプラットフォームを改善し革新性を高めるために、社内でも意見を提供しています。また、Onalyticaの「Who's Who in Data Management」ではトップインフルエンサーに選ばれています。仕事以外では、2人の娘の父親であり、愛犬家であり、日々何か新しいことを学ぼうとする熱心な勉強家でもあります。

この投稿者の投稿をもっと読む
続きを読む

Related Posts

今回お読みいただいた投稿のほかにも、おすすめの記事がいくつかあります。またブログの概要ページにアクセスすると、さらに多くの記事をご覧いただけます。

Business Insights

データアジリティを実現する方法

ビジネス環境の変化にうまく対応するには、データアジリティが必須です。ビジョナリー組織が何をしたのか、それと同じことをするにはどうすればよいのかをご紹介します。

ブログ投稿すべてを表示
Business Insights

Knowledge Sharing Challenges

Sharing data can be relatively easy. Sharing our specialized knowledge about data is harder – and current approaches don’t scale.

ブログ投稿すべてを表示
Business Insights

データのアジリティがビジネスに不可欠な理由

Data agility is the ability to make simple, powerful, and immediate changes to any aspect of how information is interpreted and acted on.

ブログ投稿すべてを表示

Sign up for a Demo

もうコンポーネントをつなぎ合わせて時間を無駄にする必要はありません。MarkLogicは、マルチモデルデータベース、検索、セマンティックAI技術の機能を1つのプラットフォームに一元化し、マスタリング、メタデータ管理、政府レベルのセキュリティなどを提供しています。

デモをリクエストする