How the basic search algorhitm works

This is a description of how a generic search engine would very basically work.

Each factor listed here is rated as either:

  1. Very high relevancy – score 10
  2. High relevancy – score 4
  3. Relevancy – score 1

In this model the page is compared to the search query and for each match a score is awarded. The more points a page gets the higher up the search results it comes.

  1. Key words or phrases appear in domain name or URL extension – score relevancy
  2. Key words or phrases appear in Title tag – score very high relevancy
  3. The order of the words in the Title tag matches order of the words as they appear in search query match the – score high relevancy
  4. Key words or phrases in the Metadata Description tag – score high relevancy
  5. The order of the words in the Description tag matches order of the words as they appear in search query match the – score high relevancy
  6. Key words or synonyms of key words appear on the Metadata Description tag – score relevancy
  7. Key words or words appears in the body content – score relevancy
    • Key words or words appear in Heading tag – score high relevancy
    • Key words or words appear in the first 200 words of the main body text – score high relevancy
    • Key words or phrases appear in emphasis or strong tags – score high relevancy

This is a rough account of the approach one would take to configuring an on-site search engine. It does not include any of the other factors that Google and the other search engines would take into account, such as inbound links.

Note that if a word or words matching the search query appears too frequently in either the metadata or the body content, then the page may be altogether excluded for being 'spammy'. The spamming of the Keywords metadata field with repetitions of words more than likely contributed to Google either seriously reducing the weighting of the Keywords metadata, or disregarding it altogether.

  • Inbound link from a page with a Google PageRank of 0 – score nil
  • Inbound link from a page with a Google PageRank of 1-3 – score one
  • Inbound link from a page with a Google PageRank of 4-6 – score five
  • Inbound link from a page with a Google PageRank of 7-9 – score twenty
  • Indound link from a page with a Google PageRank of 10 – score two hundred
  • Key words or phrases from the search query appear in the link phrase or link title text of inbound links – score bonus four
  • Domain is always available – score nil
  • Domain is infrequently not available – score minus ten
  • Domain is frequently not available – score minus fifty
  • Domain has been listed in Google index for less than one year – score nil
  • Domain has been listed in Google index for more than one year and less than two years – score five
  • Domain has been continuously listed in Google index for between two and five years – score twenty
  • Domain has been listed continuously in Google index for more than five years – score fifty
  • Page is site homepage – score five
  • Page is one click away from site homepage – score five
  • Page is two clicks away from site homepage – score two
  • Page is three clicks away from site homepage – score nil
  • Page is four or more clicks away from the homepage – score minus five
  • Page has been clicked through to from the search results none to few times relative to how often it has been returned – score nil
  • Page has been clicked through to from search results more than a few times relative to how often it has been returned – score five
  • Page has been clicked through to from search results frequently relative to how often it has been returned – score twenty
  • Page has been clicked through to from search results very frequently relative to how often it has been returned – score fifty
  • Page loads in an acceptable time for users with slower web connections – score five
  • Page loads slowly for web users with slower connections, but acceptably for users with average connections speeds – score nil
  • Page requires faster than average connection speeds to download in an acceptable time – score minus ten
  • Page has less than 200 characters enclosed within body content elements such a paragraph, headings, or blockquotes tags – score minus twenty
  • Page has less than 500 characters enclosed within body content elements such a paragraph, headings, or blockquotes tags – score minus five
  • Website has less than 5 pages – score minus 1
  • Website has 6-50 pages – score nil
  • Website has 51-200 pages – score five
  • Website has 200+ pages – score twenty
  • Number of domain links to this page

There is another, I beleieve, likely element in Google's assessment of links:

  • One inbound link from a site tagged as spam – score nil
  • Two inbound links from a site tagged as spam – score minus one
  • Three inbound links from a site tagged as spam – score minus three
  • Four inbounds links from a site tagged as spam – score minus nine

With inbound links from spam sites I sense from SEO practise, though I can't prove it, than there is a penalty for being associated with a 'bad neighbourhood'. Obviously the penalty cannnot be that great, as ultimately one cannot control inbound links. However having a large amount of links from spam, or link farm sites, does seeem to have a negative affect on Google search ranking.

Conversely I believe there is an SEO benefit from being a good neighbour, that works something like this:

  • Zero links to external websites – score nil
  • 1-5 links to external websites – score one
  • More than 5 links to external website – score five

Too many external links however, could result in a page being excluded for being 'spammy'.

I beleieve that there is an optimal number of links, both in domain navigation links, and links to other websites. What is optimal is related to context, like the size of the domain. I suggest to look at the BBC website, which has a very finely judged mix of internal navigation links, and useful links top related subjects.

And again there is the factor of the human review. Though this is an extra factor to the calculations listed above.

More on search engine optimisation

See also

All UX services

References and further reading

Ross Holloway Web Consultant | UX web designer | business analyst | web content | project manager