Content Moderation

Flag content issues in text such as hate, sexual, etc.

Content Moderation

The following demonstration uses the OpenAI moderation model to analyse whether text contains content which is deemed hate, hate/threatening, self-harm, sexual, sexual/minors, violence and violence/graphic.

Instructions

Click on an example text or enter your own in the text box below and click analyse.
Please note that the analysis will be limited to 1000 characters and a maximum of 5 sentences.
Select one of the content types to determine if each sentence of the text was predicted to contain that trait.

More information about this demo

Some potential uses for this tool include profanity filters, identifying toxic comments on student forums and the ability to provide some protection for young people interacting online.

Things to consider

This tool is not fail safe and not 100% accurate when identifying the traits in the text. It is also difficult when non-conventional language is used, such as slang, sarcasm and irony.

The example comment used in this demo is taken from an evaluation dataset released by OpenAI.