How The Google PageRank Algorithm Works
Sergey Brin and Lawrence Page originally introduced PageRank Algorithm. This metric uses the information about the number of links to a particular webpage to determine how important that page is overall. Web pages that have a higher number of links will garner a higher page ranking, which increases the popularity of the page on internet searches with Google and drives that particular page higher in the results.
Using the PageRank Algorithm
The algorithm uses individual web page information to determine the ranking of each page, with each link to a particular page working to increase the popularity of that individual page. This is determined using:
- The PageRank of web page A: PR(A)
- The Ti pages that link to page A: PR(Ti)
- The outbound link number on page Ti: C(Ti)
- The damping factor, usually described as between 0 and 1: d
This information results in the PageRank Algorithm PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
It is important to understand how these variables influence each other in the determination of each PageRank. Page A’s value is repeatedly determined by the individual PageRanks of the pages that link to it.
Page Ti, which is a weighted number that depends on the actual number of links that are considered outbound. These outbound links, however, do not consistently affect the PageRank of the A page. If the T page has a greater number of outbound links, page A’s linking benefits from page T will be decreased.
The PageRank weighted value is added, resulting in each added inbound Page A link causing an increase in the PageRank for page A.
The damping factor is set between the values 0 and 1. This value is multiplied by the weighted PageRank calculation of all of the Ti pages. This effectively reduces the extent to which the PageRank value of one page linking to another provides benefits.
The PageRank Algorithm was justified by Brin and Page using the idea that when someone is surfing on the internet, the links they choose to click on are chosen at random, without placing any value on the actual content of the page.
Their theory relied on the PageRank value as being the driving force behind the surfer choosing that page. The potential for the internet surfer to land on one particular page over another is determined only by how many links that particular page has.
This means that the likelihood for the internet surfer to reach a certain page is due only to the value of the addition of the probabilities of that person following certain links to that page. The damping factor reduces this number, under the assumption that the internet surfer will not continue to click on links for an infinite amount of time. Instead, they will eventually get bored and move on to another random page.
The damping factor illustrates the probability that the surfer will not stop to click on particular links, which is where the value 0 to 1 comes from. If the value of d is higher, it shows that the surfer is more likely to keep clicking on links. Due to the internet surfer choosing to jump to another random page when he gets bored, this value is maintained within the algorithm as a constant (1-d). This value does not change, no matter how many inbound links there may be, which keeps the page’s PageRank value at a minimum.
PageRank Algorithm Version 2
Brin and Sergey also provided another version of the PageRank Algorithm that added a value N, which represents the total number of all web pages, wherein the PageRank of the A page is given as:
PR(A) = (1-d) / N + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
While slightly different, the two algorithms are very much used in the same way. Version 2 also uses the internet surfer model, but the value of a PageRank focuses on the possibility of how often he would reach the page each time he restarted the search, for as many times are there are webpages. For example, if a page has a PageRank valuation of 2, and there are 100 pages on the web, the surfer will generally land on that particular page twice on average, if he restarts his surfing 100 times.
Even though these algorithms are basically the same, it is generally much easier to use the first version since there is no need to know the absolute value of the number of web pages currently available.
Using the PageRank Algorithm
If we were trying to determine the PageRank for a website that has 3 pages, wherein page A has links for both pages B and C; page B provides links only to the C page, and page C only has page A links. The value for the damping factor is often used as .85, but in order to keep calculations a bit more simple, we will use .5. Using this, we come up with these equations and their resultant PageRank valuation:
PR(A) = 0.5 + 0.5 PR(C) = 14/13 = 1.07692308
PR(B) = 0.5 + 0.5 (PR(A) / 2) = 10/13 = 0.76923077
PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B)) = 15/13 = 1.15384615
Obviously, this particular set of pages leads to each having a PageRank of 3. Our calculations were using a simplified example, and is not indicative of the normal valuation for a 3 page website.
The web is huge, so this requires Google to use an approximate, recursive computation when determining PageRank values. This requires each page to be given a starting value to begin computation, with the actual PageRank value requiring several computations of the algorithm before being reached. The only way to get a truly beneficial, approximate value for the PageRank is to work through the iterations as many as 100 times in order to get the best approximate PageRank value in comparison to the number of pages on the entire web.