Do Certain Websites Get Special Treatment In Regards To Google PageRank
People on the cutting edge of SEO have realized certain patterns in page rankings that suggest that certain websites get special treatment in regards to PageRank. In other words, they’re evaluated in “special” ways that the algorithm as-is can’t handle. These modifications are done “on the fly” or “by hand” – that is, they’re custom modifications to PageRank that are due entirely to the website’s being what it is rather than its having earned the spot through quality inbound links and the like. Here, we’ll take a look at a few of the directories that get a special PageRank boost. The best known directories that receive this boost are Yahoo and the Open Directory Project. The immediate SEO consequence here is that placement in either directory immediately lands a huge boost in a site’s PageRank status.
One way to deal algorithmically with the idea of directory placement influencing PageRank is to assign PageRank values before the iterative computation of PageRank values for nested pages. As usual, it’s best to learn through example, so let’s consider a world wide web consisting of two pages. Each page links to the other page, and we assign a PR of 10 to page A and a PR of 1 to page B. We’ll set d = 0.1 this time, which produces the following set of equations:
PR(A) = 0.9 + 0.1 PR(B)
PR(B) = 0.9 + 0.1 PR(A)
. . . And after iteration, we get:
Iteration | PR (A) | PR (B) |
0 | 1 | 10 |
1 | 1.9 | 1.09 |
2 | 1.009 | 1.0009 |
3 | 1.00009 | 1.000009 |
Clearly, the PageRank values converge to 1 as the iterations continue, which is just what would happen if we were to assign no special initial values. This shows that assigning arbitrary PageRank values changes nothing if a sufficient number of iterations take place in the algorithm. Of course, in the real world, a high number of iterations is typically involved, which means we do not need to consider the cases in which a low number of iterations messes with the results.
Modifying PageRank
Just because the initial assignment of a unique PageRank value has no long-term effect on the PageRank status of a website doesn’t mean that it’s impossible to influence the PageRank with the right intervention. Page, in fact, identifies one method for influencing PageRank in his patent specifications. The basic idea is that a random surfer will “reset” at a point. Stepping back a bit, imagine you’re surfing the web, clicking links on a page more or less randomly. Eventually, due to boredome or whatever external factors, you will stop clicking with constant probability and instead relocate to some part of the web with greater probability than others. This is a more realistic model than the pure random surfer, and takes into consideration the fact that someone in such a state might “reset” by using a directory like ODP or Yahoo, which is why sites indexed in these directories have consistently higher PageRanks. Let’s see what this means for the PageRank algorithm mathematically. If we add an additional expected value, we end up with the following expression:
PR(A) = E(A) (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
We take (1 – d) as the probability a random surfer will stop following links, which incorporates that aspect of the “real world” into the model, and E(A) is the probability he or she will go to page A after ceasing random activity. E() is the expected value, with a mean over all pages equal to 1. In other words, the average probability associated with the random surfer hitting an arbitrary site after ceasing random activity still converges to 1. This allows the PageRank algorithm to remain stable in the general case while remaining functional and precise with regard to directory considerations.
The damping factor we chose, 0.1, is the probabiliy that the surfer will go back to page A after following links randomly. The probability of hitting page B is the complement of this: 0.9. If our world wide web just contains these two pages, we end up with E(A) = 0.2 and E(B) = 1.8. If we have a damping factor of d = 0.5, we end up with the following equations:
PR(A) = 0.2 × 0.5 + 0.5 × PR(B)
PR(B) = 1.8 × 0.5 + 0.5 × PR(A)
Solving yields:
PR(A) = 11/15
PR(B) = 19/15
The sum of the PageRank values, significantly, remains 2 (30/15). The fact that B is a more likely stop is refleced in its PageRank. The actual method by which Google incorporates PageRank data into its evaluation of websites is still up in the air. Page has suggested the use of usage data in determining what the best initial values are to be, which allows PageRank to “fill in the blanks”, so to speak, and provide a more accurate picture of the relative importance of pages on the known web.
The inherent problem in these considerations is the fact that the algorithm that Google uses to do custom rankings, if they even use one, is a trade secret that can never be accurately deduced empirically. Fortunately, thinking a little bit about the possibilities reveals a lot about how the method, regardless as to its specifics, must work, which, in turn, provides us with a toolbox to use in applying SEO techniques to development-side work. So far, we see that applying an initial value changes relatively little after several iterations, but we also see that there are ways for initial values to propagate quite nicely if the iterations are considered carefully. This shows that Google’s algorithm doesn’t need to be extended too drastically to incorporate the information that special rankings would introduce. The takeaway? Focus on optimizing each web page for a particular key word, and land yourself into a directory such as Yahoo or ODP.