What Are The Effects Of Outbound Links On Your Pagerank
The whole point of PageRank is to, with a highly complex algorithm, take into consideration the network of the entire whole web in ranking a web page. Because of this, it’s sensible that the outbound links from a page would influence the page’s PageRank, just as do inbound links. In order to understand just how PageRank works, let’s take a look at an example. For this example, we’ll create a simulated “web”, that consists of two websites, each of which has two pages. Site 1 will have pages A and B; Site 2 will have pages C and D; and each page obviously will have a PageRank of 1, before we do anything else to this picture.
For the sake of example, let’s play with this one. We’ll set a damping factor d = 0.75. This produces the following equations:
PR(A) = 0.25 + 0.75 PR(B)
PR(B) = 0.25 + 0.375 PR(A)
PR(C) = 0.25 + 0.75 PR(D) + 0.375 PR(A)
PR(D) = 0.25 + 0.75 PR(C)
. . . Which, in turn, yields the following page ranks for Site 1:
PR(A) = 14/23
PR(B) = 11/23
. . . With an accumulated, or gross, PageRank for Site 1 of 25/23. For Site 2, we see similar results:
PR(C) = 35/23
PR(D) = 32/23
. . . With an accumulated page rank of 67/23, and a total PageRank of 92/23. This shows that adding a link has no effect on the total PageRank of the web, and that the PageRank gain of one site equals the PageRank loss of another. This is a sort of PR conservation law, if you will. It’s important that the PageRank algorithm takes this into consideration, as it allows a self-consistent, self propagating PageRank scheme to generate and propagate itself quite nicely, reducing the amount of work on everyone’s part.
Outbound Links
We’ve already looked at the change in PageRank for a closed site system with an additional inbound link. The differential is given by:
(d / (1-d)) × (PR(X) / C(X))
. . . With X the linking page, PR(X) the PageRank of X, and C(X) the number of outbound links from X. This formula also represents the PageRank loss of a once-closed web page cycle when a page X links external to the system. This formula is valid under the assumption that the page receiving the link does not link back into the system. That adds an additional degree of complexity that we’ll not consider here. In that case, the site gains back some of its lost Pagerank. Even when this does occur, though, the effect is usually negligible. Why? Well, because the damping factor is typically low enough in practice to forbid the gained rank from really affecting the accumulated PageRank of the out-linking pages. An additional mathematical complication arises from the fact that, if each of these sitees linked to by page X also link out, the PageRank changes as well.
Justification for Outbound Link Effect
The reason this occurs as it does has to do with the random surfer model, which says that a user is liable to use any of the outgoing links on a page with more or less equal probability, at least as far as PageRanks are concerned. If a website links to an external site, there is a chance he or she will end up on a site completely unrelated to the host site, which leads to a lower likelihood that the surfer will re-enter the system. Thus, lower PageRank – it really makes quite a bit of sense.
Dangling Links
Some websites have no outbound links, which has a strong effect on the PageRank of a website. If a website has no outbound links, the pageRank of that page can’t be distributed to others. These pages are called ‘dangling links’, because . . . Well, because they’r dangling leaves on the tree of the network. Let’s assume we have the same setup as above, but with the added characteristic that Page C links to nothing. This leads to the following set of equations:
PR(A) = 0.25 + 0.75 PR(B)
PR(B) = 0.25 + 0.375 PR(A)
PR(C) = 0.25 + 0.375 PR(A)
. . . Which yields the following numbers:
PR(A) = 14/23
PR(B) = 11/23
PR(C) = 11/23
This leads to an accumulated PageRank of 36/23, which is a slight bit over the value we’d expect if A linked to another page. The number of dangling links out there is fairly high, according to sources like Page and Brin. This leads to a wholesale lack of indexing on Google’s part. It’s true that dangling links have severe consequences for PageRank.
This obviously poses a computational problem that must be dealt with. In order to reduce the damage dangling links do to PageRank, those links are first removed from the database under consideration. This normalization of dangling links is iterative, because removing one dangling link can leave another. After these are removed and the normalized PageRank calculated, a PageRank can be recursively assigned to the removed pages.
Removing dangling links from the database allows us to calculate PageRank of the rest of the web . PDF files, .doc files, and the like are all dangling links, but do not have detrimental effects on PageRank because they are first removed from the database.
Conclusion
Here, we’ve taken a look at how outbound links affect the PageRank algorithm. It’s clear that, if inbound links affect it, outbound links should as well, and the interplay with outbound links and PageRank is a complex one that involves the identification of outbound links, their evaluation, and the identification, removal, and recursive ranking of dangling links. All of this is taken care of automatically in Google’s PageRank algorithm, and can be exploited quite nicely by those of us on the development end of things with a simple understanding of this extremely complex algorithm.