Welcome to Search Kingdom


Castles, keeps, moats... No, sadly we haven't got any of those, but we do have all the first hand knowledge you need to help your website to rank well in search engine results. No hype, no false promises, just clear advice, training or direct assistance to get your website found.

Archive for the ‘SEO Tips’ Category


Getting Everything Indexed Is NOT Good

Wednesday, July 30th, 2008

Quick scenario for you…

You have a search on your site that say… searches for your stock of hamster wheels (why I chose that example, I don’t know!). You are a good (nay great) pet store and you have hundreds. You notice the search could be a bit more helpful to your more discerning visitor if it broke things down a little more at the second level. You say to your web developer “could you tweak the search results to reorder by things like colour, size, etc.”. He says “sure, no problem” (he comes from America).

You now have a great search that shows all your hamster wheels and now you can click on the category and it will group them for you too. Great! Before you pat yourself on the back too hard, bear in mind a few things.

  1. Have you now created a lot more pages that the search engines will index?
  2. Will your new sorting part of the URL (e.g. ?sort=blue) be added to the new selection and create a big issues with the ranking for your www.petpetpetstore.com/product/search?=id245 page which ranks really well for “blue hamster wheel” (because you now have about 10 different URLs for the same page)?
  3. Are you leaking “link juice” out to these new pages?
  4. Will the search engine robots now devote enough time to crawl the important pages when they visit?
  5. Have you just created a load of duplicate content?
  6. Should you get out more?

Apart from number 6 (you might like hamster wheels!), you have probably done the top five. What to do? Well…

  1. Get your database guy to dynamically add “noindex” to the header of all the duplicate content product page URLs.
  2. Get your database guy to dynamically add “noindex, nofollow” to the header of all the duplicate content “advanced search” URLs.
  3. Try to create a URL structure that lets all of the main pages still get indexed, but the advanced search ones don’t (put a # in there, that usually does the trick).
  4. Exclude the duplicate content URLs from crawling in your robots.txt file.

Every scenario is different here, so do some analysis first as to what the best course of action is. I have listed above a typical approach, but each case needs to be look at separately.

Next time I will pick an example less abstract than “hamster wheels”, promise…

The Key To Great SEO Link Building

Friday, July 25th, 2008

I, like most of us, gave up on most of my heroes as I was growing up. Don’t get me wrong, I mean there are still people I really admire and aspire to, but ‘hero’ doesn’t usually come into it these days.

So following on from this, I wouldn’t really describe Eric Ward as a ‘hero’ of mine, but in the SEO link building world he comes as close as you can get.

Eric does a regular column for Search Engine Land and today he posted this article on SEO link building.

All I can say is READ IT and learn (in my opinion) the most important aspect of great SEO link building. Remember the web is a ‘real place’ and search engines are quite ‘clever’, therefore, real and intelligent link building should always be your goal.

SEO Link Juice Leakage

Friday, July 4th, 2008

Firstly, can I just state that I pretty much hate the term “link juice”; it is just that it is so damn descriptive, that you, kind of, have to use it. No matter how much the term “link fluid” cuts it from a “ooh er missus” perspective, it is still not right. And “link gel” just sounds like a new toothpaste. So “link juice” it is…

Anyway… “link juice” is the flow of link power that runs through your website and out of it. Controlling this juice is a pretty important part of SEO and search engines don’t mind you doing it (well, they don’t mind at the moment).

The ability to tell search engines what your most important pages are can be done in many ways. From priority in the XML site map to the weight of your internal linking structure, all bases should be covered to ensure your best and most important pages get all the exposure they need. This work (believe it or not some people get this wrong!) applied to both search engine robots and human visitors.

So for this post what do I man by leakage? Well, for various reasons you might (to use a technical term) “stuff up” your structure and pages inadvertently which could cause major “link juice” leakage. For example you might have a page on your site called “special offers” which has been around for years and been well used and well linked to. You decided (in a fit of pique) to now call this page “today’s offers” (a bit more punchy and “happening” you think), so you go into your CMS (content management system) and change the things you needs to. The end result is that you now have a page that in the navigation says “today’s offers”. Cool… you think. The only problem is that your CMS created a new page and now rather than having “id=654″ in the URL, it now says “id=876″. Or if it didn’t create a new page it might have been (ever so helpful) and created a new and SEO friendly URL that swapped www.mybloodyfantsticsite.co.uk/speacial-offers/ to www.mybloodyfantsticsite.co.uk/todays-offers/. The end result? Search engines have now lost your “special offers” page forever (unless the old page is still live and you have just created a “duplicate content” issue… more of that in another article).

The “special offers” page in question ranked well for about twelve really good phrases (and many long tail phrases) and had a really good incoming link profile. Now, this is not completely terminal, search engine will try to work out what happened and realise that you just swapped pages. Google will eventually work things through and minimise the damage. But that is about all. Your actions have resulted in taking something of strength down the rout of damage limitation. The “juice” is well and truly leaking through a big hole, and also THE VISITORS THAT WILL STILL BE COMING FROM SEARCH ENGINES (until it is dropped) WILL NOW (probably) GET A 404 ERROR PAGE… sorry for shouting this!

So your role as an internal SEO person or an SEO consultant is to try to teach anyone that updates the website, that you are responsible for, to understand what needs to happen to safeguard against this leakage/loss. As a part of this (and especially when you take over a new site) do your own leakage/loss checks, like:

  • Do a site: command on your site in Google and check out any pages that show up high in the list but have a title and/or description that alludes to a error page/404.
  • Check your server logs for the main bad links and 404s, etc.
  • Check the Wayback Machine (http://www.archive.org/web/web.php) for changes to the site and its structure/pages
  • Ask people who have updated the site what things they have done that could of affected pages/URLs/etc.

Even though the PageRank bar in Google is not a good indicator of much these days, it really becomes a MASSIVE and noticeable thing when a page you have accessed is missing/blank, but the tool bar meter shows a healthy green link that indicates a rank of three or above (I have seen an eight before!).

What do you need to do to solve this? Well, if it is a pretty recent occurrence then I would rename the new page back to the old URL and for good measure put a 301 redirect (more of this in another post) on the new URL back to the old one.

If the change happened to much in the past for this to be a good solution, you need to 301 the old URL to the new one. You should also look at the link profile for the page (use Yahoo’s Site Explorer) and contact all the really good incoming linkers and get them to change the URL of their link. This bit is the real pain and for some links it is impossible.

What about all the other pages and 404′s that my site may be leaking from? Well, yes, these are important too, but work through these in a top down sort of way and you will solve 80-90% of the problem pretty quickly. There will be some pages (that also don’t leak much of anything) that don’t really have a current (page) equivalent on the site now and you can deal with these in a variety of different ways. My favourite option for this to give the visitor a better experience by using a custom 404 error page along the lines of “oops, sorry, but to make up for our sloppyness, here are some great pages you will find scintillating”.

If you want to really tidy things up you could always have a deeper look at what Google has indexed on your site and go through all the pages from the bottom of the list up. Here you will find some stuff that you could do a combination of:

  • Submiting a “remove URL” request
  • Excluding the file from robots in the robots.txt file
  • Using the .htaccess file (or similar) to do some 301 redirecting
  • Deleting the page from your server

Make sure you get the above “right” so that it only affects the pages you want it to i.e. don’t mess up and get important pages taken out of the index for goodness sake!

In essence, the “juice” your site has is a very important part of the SEO process. Controlling it and directing it to where it need to be should be a big part of your SEO strategy. If you work on a big and important site (and usually one with a CMS) there will be lots of leakage (I guarantee this) and if you have not looked at this issue, then you should.

There will be another post soon on better direction of “link juice” by making changes to your internal link structue. This have been called many things but the current buzz words are sculpting and creating silos.

In the “Carry On”/”Sid James” spirit that I started this post with, my advice is to stop “link juice” leakage by “putting your finger in the dyke” today! You will be glad you did!

What is good and bad cloaking?

Wednesday, July 2nd, 2008

Nice article from Danny Sullivan on this, check it out here.

In essence the thing to remember is don’t treat Google differently; that is what gets you in trouble. You can still end up getting slaped by Google doing things the “right” way, so be careful. However, if you have done everything you can to improve the experience of the visitor and not been silly (or dumb), then the penalty won’t last (you can appeal).

IP delivery or cloaking is a reasonable complex subject, so don’t go jumping in without some good advice. But most importantly it is not “evil” and Google has been reasonably consitant in modifying their definition to underline this.

Google can now read Flash?

Wednesday, July 2nd, 2008

Good news for all web developers who create sites in Flash as Google (and Yahoo eventually, but probably not Microsoft) are working with Adobe to make indexing Flash sites nicer and more possible.

Now, this is early days and I wouldn’t get your “how to produce “flash” Flash websites” (not a real book just yet…) any time soon, but for all you devoted Flash developers, this may start to get you out of jail when you clients ask “no one can find me”.

Google would really like to index the whole web if it could and Flash (and JavaScript) are the biggest banes of it life in this respect. Also, because of the heavy incoming link structure for some Flash sites/sections, they do indeed end up indexing some of them relatively highly, but usually with the fantastically descriptive listing of something like…

[FLASH] Loading… Play Video Play Video Replay Copy to clipboard URL …File Format: Shockwave Flash
Loading… Play Video Play Video Replay Copy to clipboard URL: 0ß0å0ü0È MUTE 0ß0å0ü0È MUTE.
www.reuters.com/resources/flash/includevideo.swfSimilar pages

As you can image, Google has never been a fan of this as it messes up the relevance for the searcher from the results pages.

So what progress have they made so far? Well if you do a search for Flash SWF files in Google (click here to try) you still don’t really get a great experience as a searcher. I think the time when parity come between text based sites and Flash based sites from an indexing perspective will be a long time coming (if ever).

So, my cut on this for the short and medium term is make sure you still only use Flash as a part of your site (nice fading images, animation, etc.) and not the whole thing. If you do, then don’t expect to be top of the rankings any time soon. This initiative will help, but it is not a green light to carry on regardless.

Now I may move on to “Flash and accessibilty”… maybe not…

IP Delivery and SEO

Friday, June 27th, 2008

IP delivery is a technique where a website/server looks at who a visitor is or where they are located from an IP address perspective. Once this is known (not always completely accurate, but god enough), the site/server can serve a page/site that will be catered for that visitor.

Sounds simple? Well, give or take a bit of server side programming it is. The main part of the process is taking the time and effort to create the individual content for the groups of users you are targeting. This work mainly fits into two categories, namely, specific regional information for that visitor (contact details, in country taxes, etc.) or translating words into a specific language. Google, for example, does this all the time when you type in one of its URLs (try typing in www.google.com from an UK IP address for instance and see where you end up, for an extra part of the test try setting you language in Google to French… see what I mean?).

So, what is the connection between this and SEO? Well, if you treat Google as a specific user, then you can also serve specific information to them by recognising the IP addresses of their robots. This (in a Google ‘rules” sort of way) is called cloaking. The ‘black hat’ version of this technique is where you deliver different content to fool search engines into indexing you for something ‘they’ see, but when a visitor gets to the actual page you get something completely different. The level of ‘completely different’ will define what side you stay of the ‘rules’. Think carefully before you do any of this kind of stuff for this purpose, even if you are doing it with the best of intentions.

The real area where you can learn here is if you have a multiple country and multiple language website. From a usability perspective it is really desirable to serve relevant content based upon who your visitor is. Many (many, many, many) large companies have attempted this (and still do) and get this completely wrong from a search engine perspective. Bad IP delivery will only confuse search engines and their robots. Conversely good regional IP delivery will enhance your search engine presence in the regions you target and operate in. Also, if you still bring everyone in to the same domain you will not be diluting your power with many different regional sites, but still serving relevant content to each visitor. However, there are pluses and minuses for each approach, you need to find one that is right for you and you objectives.

If you are someone responsible for a multiple country/language website or still don’t know which side of the ‘rules’ you come down on from an IP delivery perspective. Here is a good video from Google Webmaster Central. It is worth a watch.

In essence, look at what Google want and are doing with IP delivery and you won’t go far wrong. There are other benefits that you can incorporate from a SEO perspective, but more of those in a future post.

robots.txt

Friday, June 20th, 2008

OK, on my “post per day” quest, sometimes we need to cover in depth topics and sometimes the more basic. Today is one of the latter, but get the robots.txt file wrong and you will omit your pages from all (decent) search engines.

Well, robots.txt is a file that (should, in most cases) sit in the root directory of your web server and it is a file that all the major search engine robots access before they start doing their collection work on your site. The best way of thinking about this file is that it is like leaving a note for someone with directions, the main difference is that the note (mainly) contains instructions for the things that you do NOT want that person to do.

A basic robots.txt file (which you can create in any text editor or there are also some automated programmes like this. Also, Google has a basic editor in its Webmaster Tools programme) looks like this:

User-agent: *
Disallow: /bank/details.html
Disallow: /personal/stuff.html
Disallow: /testscripts/

The user agent part is where you can specify which robots you want to give instructions to; the ‘*’ indicated “all spiders”. Alternatively, you can input ‘google’ here or whatever if you only want that robot to have that particular instruction. The disallow part is (as you would expect) the parts of your webserver that you would rather the robots not visit. This could be for privacy reasons (suck eggs time – although, you need to pretty much not upload things in unprotected areas of your web server that you don’t want people to access) or because you want the spider (who is short on time and patience) to not bother with the unimportant parts of your site and get quickly into the important stuff.

The main thing to remember about disallowing is NOT to do this:

Disallow: /

This will tell the robot not to index anything on your site (unless this is what you want to happen).

Will the pages/sections I disallow stop them getting indexed by Google, etc? Err, no, actually. If someone on the web links to pages that you would rather not be indexed then the spiders will still follow those links and index the pages. If there are pages you truly don’t want indexed then you need to use “noindex” in your meta tag for that page, something like this:

meta name=”ROBOTS” content=”NOINDEX,FOLLOW”

This means that you don’t really want the search engine to index the page, but you are quite happy for it to follow links on the page (more of this in another post).

A good element is that you can tell the robot where your XML sitemap (again, more in another post) is, by using this:

User-agent: *
Sitemap: http://www.searchkingdom.co.uk/sitemap.xml
Disallow:

Another handy thing is that because you know that all good robots access the file first before they visit your site, if you look at the log files and see how many times the robots.txt file has been accessed you can get an idea, over time, of how frequently those little robots are accessing your stuff.

If you are still worried or unsure about all of this, Google have a nice little checker in their Webmaster Tools programme. You can see if the file is in the right place and also see if you have made some (big and small) errors.

Robot.txt won’t make or break your SEM efforts, but get it right and it will help. Get it wrong and prepare to have to sit around until the search engines reindex your site… not nice.