It’s almost the end of 2018 and it’s time to look back on all the new capabilities of AI. AI has definitely expanded and improved this year, but there are still areas of improvement, namely us. An ideal algorithm yields a functional AI system that can accurately act on its own. Unfortunately, that’s not always the case and there are more disaster than you might think. You can’t build effective algorithms if you don’t understand the data yourself. When we think of complex AI algorithms, we know data scientists spent countless hours mapping, calculating, and analyzing data in order to direct AI software. Any myopic assumptions about data will only result in disaster. We’ve pointed out before how bias in AI has disastrous consequences. We’ve even talked about the more invasive intentions people have with AI, resulting in nightmare scenarios. Remember, AI is here to help processes, not replace our analytic capabilities to extract insights. Which means no matter what kind of algorithm we build, we need to understand the data ourselves.
An algorithm can still be functional, but it means absolutely nothing if it misses the goal. Let’s consider Tumblr, a content sharing social media platform that recently announced it would ban all explicit content “primarily including photos, videos, or GIFs that show real-life human genitals or female-presenting nipples, and any content — including photos, videos, GIFs and illustrations — that depicts sex acts.” Makes sense that the site would want to prevent further issues with strong, graphic content that can be dangerous. But what’s the line that shouldn’t be crossed? Imagine how much data analysis is needed even for an idea for this kind of algorithm. People have to analyze all the possible types of content and find an objective way to measure visual semantics without bias.
Is all nudity considered explicit? What about artwork? Who judges the “explicitness” or “danger” of all the content? Where in the algorithm does it specifically look for child porn and human trafficking patterns? Will it actually alleviate the problem or push the problem elsewhere? Why ban tags like “LGBTQ” and “trans” but let tags like “necrophilia” and “white power” continue to post content?
The algorithm technically works since it does ban content but it has started to ban any and everything. However, instead of going after the root of the problem, Tumblr’s ban acts as a short-sighted approach that targets the wrong bloggers on a surface level. The ban doesn’t the address any of the real issues surrounding banned minority groups, while other users are free to publicly support mass shootings and murderers without consequence. It begs the question, how is Tumblr analyzing its own algorithmic capabilities in a sea of diverse data?
Know your data
Any good data scientist knows data analytics builds the foundation for AI algorithms. Issues like racism and sexism are magnified in AI because they’re present in collected data. And because data produced by humans reflects our biases, data scientists need to make sure to identify and eliminate them. Even with more defined data, algorithms can still fail. Some good intentioned police on the lookout for child porn distributors once ended up with photos of sand dunes thanks to an AI algorithm. The only thing you can do is continue to analyze data for deeper insights.