Natural Content - The Future of SEO?
June 5th, 2008As someone who designs and develops websites for the Internet I have become involved in another field that’s more related to marketing than actually creating a website. This field is Search Engine Optimisation, a loose subset of Search Engine Marketing. For around five years I have worked towards standards-based development in my own free time outside of my full-time education, leading to regular late-night readings of W3C specifications, constant following of news on Web Development forums and generally keeping on top of my game.
For those that lack knowledge of Search Engine Optimisation, I’ll fill you in on one of the basic fundamentals of getting a good page ranking on Google, and that is good content. Google loves content, and as far as Google is concerned content is king. At the moment Google uses keywords to check relevancy and validity of web pages, along with very extensive and complicated algorithms to decide what pages match up with respective search terms. Whilst this is potentially the best way of going about this it provides us, webmasters of the Internet, with a problem.
That problem is black-hat SEO methods.
I’m using black-hat in its worst term at the moment. As far as I’m concerned a lot of black-hat research into security needs to exist, as otherwise no progress would be made with anything. I do not believe in the warrantless destruction of websites for fun, but the compromising of ones security is not the problem, only the vandalism of their possessions. That being cast aside, these bad methods of SEO have existed for years, even though the widespread usage of the term SEO hasn’t existed for long. SEO analysts and consultants have found countless ways to ‘game’ Google and provide content and web pages tailored solely to getting a high rank on Google under a certain term, whether it be by keyword stuffing, providing themselves with thousands of links to their own websites, the list is endless.
Now, there seems to be technology that could prevent this.
After a lot of hype around Web Development and SEO circles of the Internet an online search tool named Powerset has been unleashed. Powerset is unique in the sense that it compares the natural language of content, not related keywords. At the moment it only crawls and understands articles from Wikipedia, but the idea is not just that the search engine should not be used elsewhere, just that Wikipedia is a perfect testing ground in terms of rich content. Whilst this article praises the efforts of Natural Language Processing, this is in no way what Powerset is currently aiming to do. To understand this, you have to understand a bit more about Google.
In many ways, what Google does is automated based on statistics. In easier terms, Google does not understand a web page or its content. It is just looking for patterns that work, and these patterns are what black-hat SEO methods are targetting. All Powerset is introducing is a method of understanding each and every term within its crawling domain (currently Wikipedia) and using this data as a means of coming to conclusions about what is relevant. In many ways, you could say that it is adding knowledge to search!
Being a new tool, despite years of development, it is obviously a bit weak compared to established engines like Google for searching websites. The quality of searches does not do the potential technology behind it justice, but we can surely expect that to change in the near and distant future. To some this may not be all that impressive, but if this technology were to ever take over and produce great results it would completely change search engines and the web for the better.
Many webmasters and SEO analysts out there will rejoice if this were to take off and become popular. Whilst statistical methods that are currently being used by the majority of search engines are proving effective they are in no way perfect, not in the slightest. Natural Language processing could mean that relevancy in search engines could improve tenfold and websites will finally be judged on the quality of content to the actual language, not statistically what the search engine believes to be the best.