26 June 2011

Google Censorship and Information Flow Hegemony; Internet Homogenisation; Epistemic Issues Regarding Search Engines

The advent of widespread internet access was originally portended - with varying levels of techno-mania and hyperbole - as the dawn of a great new age of universal democracy, free flow of information, debate and exponential refinement of human knowledge. Excitement has since been tempered by the realities of Facebook-style banal social networking / social dislocation, proliferating crank/quack pages, information overload, general world-wide-web wankery and security issues.

Most damaging, however, has been the lack of trust in the veracity of information and gravitas of opinion offered by various small/fringe sites, and the resulting flight of the masses to sites perceived to be authoritative. Sites such as Wikipedia, and online versions of mainstream newspapers and magazines fit into the latter category. There is also Google-skimming, a practice in which the consensus version of "facts" is obtained by flicking one's eyes down the extracted text from the top ten sites listed by a Google search query.

Most readers of the preceding sentence probably wouldn't have registered the very fact that they should have been surprised at the mention of only one search engine. Google has become so dominant in the search engine market (over 80% of all searches) over the course of the first decade of the 21st century that the term "google" is just about the only one-word verb in the English language meaning "to search the internet for information about" or "to use the internet to ascertain whether 'x' is true".

Most Google users simply type a word or several words into the query field and see what comes up. Google does offer very limited Boolean operators and some other features to refine searches, but hardly anyone is conversant with Boolean anymore or learns Google's own modifiers. In fact, Google offers tempting drop-down auto-complete search suggestions as you type just in case you happen to momentarily consider thinking for yourself. In auto-complete, certain terms are censored, some of them bizarre. For a list, see here.

Google crawls text in individual web pages, but does not capture everything on the internet, especially if pages are not linked to by other already-indexed pages. Non-captured pages are relegated to the Deep Web. Google then processes language through PageRank, a secret, constantly-updated algorithm that decides the order of search results. PageRank involves hundreds of factors like synonyms, text proximity, context, the number and quality of links from other pages and their prominence in turn, location of the searcher, meta-tagging, frequency of people's clicking on Google's proffered results, the user's previous searches, hand-tweaking and other non-disclosed factors, possibly including commercial and political considerations.

One result of PageRank's reliance on linking is that a small number of people can perform a "Google Bomb": setting up a bunch of links from a certain term to a certain site. Once upon a time the search term "miserable failure" was Google Bombed to link to George W Bush's official biography. Google hand-tweaked things so this was no longer the case.

From personal experience, I can vouch for the accusation that hand-tweaking of ranking goes on. According to Sitemeter I've had a few people from Google spend up to a few minutes reading some of my pages, presumably for quality assurance, but with two particular pages that could be considered defamatory I've had multiple reads. To Google's credit they still rank on the front page for searches of these two relatively prominent people's names. On the other hand, my diatribe Against Tourism ranked #1 on Google for about five years for search terms like "against tourism" and "for and against tourism", but sometime in the past six months was demoted ruthlessly (buried so deep in Google that I can't be bothered to dig down to see where it's gone). And it has nothing to do with linking, because no-one had ever linked to it. As far as I can tell none of my other pages have been effectively censored like this. I like to flatter myself that my argument was too politically and commercially uncomfortable, concise and compelling for Google's friends, despite the fact that few people actually read it! After this particular post Google might not be kind to my whole site at all.

On the creepy/conspiracy theory side, this article documents Google's and the CIA's investment in Recorded Future, a company involved in scouring the internet in real time and predicting events like "terrorist" activity. Google Street View, Big-Brother-like in itself, harvested 600 Gigabytes of data from people's private computers via their unsecured wireless devices. And anyone that's ever used Gmail would know the eeriness of having the entire text of email scanned and ads directly relating to the subject matter of their email appearing on screen, especially when connected with the information that Google harvests about the email user's Google searches.

It reminds me of Maya Arulpragasam (MIA)'s: tirade : "You can Google 'Sri Lanka' and it doesn't come up that all these people have been murdered or bombed, it's 'Come to Sri Lanka on vacation, there are beautiful beaches' ... you're not gonna get the truth till you hit like, page 56, and it's my and your responsibility to pass on the information that it's not easy anymore." And the lyrics to her track "The Message": "Headbone's connected to the headphones / Headphones connected to the iPhone / iPhone's connected to the internet / Connected to the Google / Connected to the Government" (from the ungoogleable album "/\/\ /\ Y /\").

And "Bulls on Parade" by Rage Against the Machine: "I walk the corner to the rubble that used to be a library / Linin' to the mind cemetery now / What we don't know keeps the contracts alive and movin' / They don't gotta burn the books they just remove 'em."

Leaving aside conspiracy theories, the current dysfunctional situation with the internet carries major epistemic, and therefore power, implications for the human race. We first need to ditch Google, explore and be vigilant about other search engines such as Scroogle.

Then we have to have the intellectual courage and drive to entertain alternative conceptualisations of the way the world works, while maintaining a critical stance towards fringe sources of information. I've always thought this but had the point driven home to me recently while researching "Hadronic Mechanics", a thought system that may or may not be bogus. It was devised by Ananda Bosman and Ruggero Santilli. The revision history on Wikipedia's page about Santilli vividly illustrates the issue of Internet information homogeneity and suppression of information. Without any knowledge of the subject itself you can instantly tell that there is an orchestrated campaign to discredit Santilli being waged on Wikipedia, especially in light of the number of physicists who have cited his works according to Google Scholar.

But, indeed, it seems to be everyone's need for consensus or some kind of ultimate citation authority that is the cause of the retarded development of a truly dynamic, critically engaged,intellectually democratic internet with wide and varied information dispersion. Instead the Wikipedians cry, "Citation needed".

The statement "a = a" cannot be contested; it's a priori and analytic. However, the attribution of insanity to someone who questions whether 1 + 1 = 2 is authoritarian if you believe (unlike most contemporary philosophers) that there is such a thing as Kantian a priori synthetic knowledge. Or if you believe in set theory, believe in Goedel's theorem, or have taken cognisance of the irreconcilability of quantum physics and the theory of relativity.

Maybe we can use the term "sanity" loosely to describe correspondence with consensus common sense apprehension of reality and unspoken assumptions. This would mean, for instance, that Newtonian physics is the sanest form of physics, with other theories possessing varying levels of sanity. All synthetic propositions in all thought systems, including the current proposition, are unable to be given the status of being absolutely true, and rely on infinite regresses of assumptions. So we are reduced to making subjective assessments of each other's sanity depending on adherence to different assumptions. This is fine. In fact there is no other way. Citation not needed. Hyperlink away! ...or even not at all.

