Twitter will request users to review tweets that contain potentially harmful or offensive words


Twitter is introducing a new feature in their Android and iOS apps, which will prompt users to review their replies if Twitter detects potentially harmful or offensive in the reply. The goal behind this feature is to try and discourage users from posting such tweets, and as a result, make Twitter a more positive experience on the whole.

Twitter has been testing this feature for quite a while, making improvements along the way to the detection and identification process. Often times, in early stages, these automatic systems will struggle to differentiate between potentially offensive language, sarcasm, and friendly banter.

With the system in place, Twitter did notice a difference in user behaviour such as:

  • If prompted, 34% of people revised their initial reply or decided to not send their reply at all.
  • After being prompted once, people composed, on average, 11% fewer offensive replies in the future.
  • If prompted, people were less likely to receive offensive and harmful replies back.

Now, while it did not completely solve the issue, it is a step in the right direction. Having users think twice before posting a negative tweet can make a difference. In the meantime, Twitter has promised to continue working on their detection systems to ensure a better experience for their users.