Everyone can now mine open sources and social network information (but the government may have a new too for doing it too).

Posted by: Akiva Miller

It was recently published that defense giant Raytheon has developed a system called “Rapid Information Overlay Technology” (RIOT), designed to powerfully mine information from social networks, including photos and the location information associated with them. RIOT reportedly has the ability to predict behavior based on people’s online habits.



Although RIOT has not yet been sold to any client, the clear market for it is national security agencies and law enforcement.  The news on RIOT already sparked some strong negative reactions from rights advocates:


Meanwhile, it seems that many commercial entities are looking into technologies that would allow them to harness information from open sources, including social networks, in much more sophisticated ways than ever before. One company that provides this kind of software is ClearForest, a Thomson Reuters company. ClearForest offers a product called Calais, which allows users to “derive meaning from unstructured information, such as news articles, blog posts, research reports and more”.

See: http://www.clearforest.com/

Israeli newspaper Haaretz reports that ClearForest software is used in a variety of ways: Reuters uses it to offer its users better access to its content. Brand-monitoring services (such as Meltwater) use it to track brand reputation, pension funds and hedge funds use it in order to scour the internet for relevant information that could impact their investments, and at least one journalist uses the software to find hidden connections between government-owned enterprises and contractors who win public tenders.

http://www.haaretz.co.il/misc/1.1196500 (Sorry, its only in Hebrew)

Here’s what was written about ClearForest when it was bought in 2007:


The proliferation of tools to mine blogs social media raises interesting questions about the new and potentially valuable ways information that ordinary people generate can be used by corporations and the government. How should we react to the knowledge that our blog posts and tweets are not merely visible by anyone but can also mined for a myriad of new purposes?

A few other sources on data mining of open sources and social media:

More on data mining for brand management:


Mining information for job applicant screening and employee monitoring (apparently, it’s not a violation of the FCRA):



Mining social media for banking and credit assessment purposes (doesn’t this possibly run afoul of the FCRA?):