Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Siemens U2U 3D IC Design and Verification Panel
Dan, thanks for capturing the key points..

— skengeri on May 12, 2026
Solving the EDA tool fragmentation crisis
Excellent article. The industry is clearly reaching a point where “tool interoperability” is no longer just a productivity issue —…

— moh.kolb on May 3, 2026
Is Intel About to Take Flight?
You've identified the core trap precisely. It's a treadmill that only pays if you're already winning; the capital cost of…

— Jonah McLeod on April 24, 2026
Is Intel About to Take Flight?
the challenge with the fab builds is this (and it is an age old problem) Once you get leadership in…

— Mark Webb on April 24, 2026
Elon Musk Needs to Put His Fab Money Where his Mouth is!
Good summary. Right now my understanding is 1) the world needs more Fabs and Musk wants to make this happen.…

— Mark Webb on April 24, 2026
TSMC Technology Symposium 2026 Overview
TSMC builds fabs based on customer orders. They have been doing this for 30+ years. What makes you think they…

— Daniel Nenni on April 22, 2026
TSMC Technology Symposium 2026 Overview
This is all very impressive but the question everyone wants to know is how the supply shortage for 2nm and…

— benb on April 22, 2026
TSMC to Elon Musk: There are no Shortcuts in Building Fabs!
The takeaway is that the volumes of demand are there, but supply is not, for 5 years at least. That…

— benb on April 22, 2026
Is Intel About to Take Flight?
The questions you raised are the right ones and the scope disclosure Tan promised should answer most of them. On…

— Jonah McLeod on April 22, 2026
Is Intel About to Take Flight?
Lets see the scope of any agreement. But two items: 1) Terafab is initially a 25B site in Texas. Then…

— Mark Webb on April 22, 2026

WP_Term Object
(
    [term_id] => 97
    [name] => Security
    [slug] => security
    [term_group] => 0
    [term_taxonomy_id] => 97
    [taxonomy] => category
    [description] => 
    [parent] => 0
    [count] => 337
    [filter] => raw
    [cat_ID] => 97
    [category_count] => 337
    [category_description] => 
    [cat_name] => Security
    [category_nicename] => security
    [category_parent] => 0
)

July 5, 2020July 6, 2020 by Matthew Rosenquist

Teaching AI to be Evil with Unethical Data

Teaching AI to be Evil with Unethical Data
by Matthew Rosenquist on 07-05-2020 at 2:00 pm
Categories: AI, Security
1 Comment

An Artificial Intelligence (AI) system is only as good as its training. For AI Machine Learning (ML) and Deep Learning (DL) frameworks, the training data sets are a crucial element that defines how the system will operate. Feed it skewed or biased information and it will create a flawed inference engine.

MIT recently removed a dataset that has been popular with AI developers. The training set, 80 Million Tiny Images, was scraped from Google in 2008 and used in training AI software to identify objects. It consists of images that are labeled with descriptions. During the learning phase, an AI system will ingest the dataset and ‘learn’ how to classify images. The problem is that many of the images are questionable and the labels were inappropriate. For example, women are described with derogatory terms, body parts are identified with offensive slang, and racial slurs were sometimes used to label minority people. Such training should never be allowed.

AI developers need vast amounts of training data to train their systems. Collections are often created out of convenience, without consideration for courteous content, copyright restrictions, compliance to licensing agreements, people’s privacy rights, or respect for society. Unfortunately, many of the available sets were haphazardly created by scraping the internet, social sites, copyrighted content, and human interactions without approval or notice.

Many of the most used training datasets have issues. A large number were created by unethically acquiring content, some contain derogatory or inflammatory information, and for others, the sample is not representative because it excludes certain groups that would benefit from inclusion.

The problem has become worse over time. Flawed datasets, that were made openly available to the developer community early-on, became so popular that they are now considered a standard. These benchmarks are used to check accuracy and performance across different AI systems and configurations.

Too few are vetted for inclusion, content, accuracy, or socially acceptable content. Using such flawed records is simply unethical because the resulting systems can be racially charged, biased, and promote inequality.

We cannot have good AI if the commonly used datasets create unethical systems. All files should be vetted and both the creators and product developers held responsible. Just as chefs are held accountable for the ingredients they put into their prepared dishes, so should the AI community be held responsible for allowing poor data to result in harmful AI systems.

Share this post via:

Comments

One Reply to “Teaching AI to be Evil with Unethical Data”

You must register or log in to view/post comments.

Siemens U2U 3D IC Design and Verification Panel
Dan, thanks for capturing the key points..

— skengeri on May 12, 2026
Solving the EDA tool fragmentation crisis
Excellent article. The industry is clearly reaching a point where “tool interoperability” is no longer just a productivity issue —…

— moh.kolb on May 3, 2026
Is Intel About to Take Flight?
You've identified the core trap precisely. It's a treadmill that only pays if you're already winning; the capital cost of…

— Jonah McLeod on April 24, 2026
Is Intel About to Take Flight?
the challenge with the fab builds is this (and it is an age old problem) Once you get leadership in…

— Mark Webb on April 24, 2026
Elon Musk Needs to Put His Fab Money Where his Mouth is!
Good summary. Right now my understanding is 1) the world needs more Fabs and Musk wants to make this happen.…

— Mark Webb on April 24, 2026
TSMC Technology Symposium 2026 Overview
TSMC builds fabs based on customer orders. They have been doing this for 30+ years. What makes you think they…

— Daniel Nenni on April 22, 2026
TSMC Technology Symposium 2026 Overview
This is all very impressive but the question everyone wants to know is how the supply shortage for 2nm and…

— benb on April 22, 2026
TSMC to Elon Musk: There are no Shortcuts in Building Fabs!
The takeaway is that the volumes of demand are there, but supply is not, for 5 years at least. That…

— benb on April 22, 2026

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

One Reply to “Teaching AI to be Evil with Unethical Data”

Recent Forum Threads

Recent Article Comments