<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>{ Online Notes } &#187; netflix challenge</title>
	<atom:link href="http://www.armando.ws/tag/netflix-challenge/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.armando.ws</link>
	<description>All things Technical and Personal - Armando Padilla</description>
	<lastBuildDate>Sun, 01 Jan 2012 00:51:38 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Comparing Distance Formulas. Part III</title>
		<link>http://www.armando.ws/2008/02/comparing-distance-formulas-part-iii/</link>
		<comments>http://www.armando.ws/2008/02/comparing-distance-formulas-part-iii/#comments</comments>
		<pubDate>Wed, 13 Feb 2008 22:45:37 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[PHP Development]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[compatibility formulas]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[euclidean distance]]></category>
		<category><![CDATA[netflix challenge]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=112</guid>
		<description><![CDATA[This article will cover the Euclidean Formula. I will cover the basic equation for the algorithm, provide running and working examples for both Ruby and PHP5, but will state advantages and disadvantageous of the algorithm after covering every algorithm in the article. Background &#8211; Euclidean Algorithm One of the oldest and one of the most [...]]]></description>
			<content:encoded><![CDATA[<p>This article will cover the Euclidean Formula. I will cover the basic equation for the algorithm, provide running and working examples for both Ruby and PHP5, but will state advantages and disadvantageous of the algorithm after<br />
covering every algorithm in the article.</p>
<p><strong>Background &#8211; Euclidean Algorithm</strong><br />
One of the oldest and one of the most widely used distance formulas is the Euclidean formula. The formula captures two items and compares each attribute of each item to one another to determine how close, related, they<br />
are.  In other words.  With the formula we can take two users run through each of their characteristics, in our case movie ratings, and determine how similar they are.</p>
<p>But how is the value that the algorithm provides us used?  Once the calculations are done between two users we are presented with a value between 0 and infinity.  The larger the value the farther apart both users are to one another (not similar).   A few example of what values may be provided are 0.56, 2.89, 4.  These are the values that will be used to determine how similar both users are to each other.</p>
<p><strong>The Math Equation</strong></p>
<p><img src="http://upload.wikimedia.org/math/6/c/6/6c667279ab399449a0d34e364d0129f6.png" height="60" width="508" /></p>
<p>The equation above gives us both the short hand and the expanded version.   We&#8217;ll take a look at the short hand. In short hand the equation takes the difference between the attributes of two objects specified in the (Pi-Q1) portion and squares it. For a single dimension, attribute, the equation will take the square-root and then finish. Fortunately for us, the equation allows us to use more than one attribute so it continues by taking the sum of the next set of attribute differences until there are no additional attributes to compare.</p>
<p><strong>Calculating Distances with an example.</strong><br />
Using the notation above, lets take a simple example using our data set to make it clear what the equation is doing.  For this example well use 3 users and 2 movies all users rated.</p>
<blockquote><p>1:<br />
30878,4,2005-12-26<br />
1481961,2,2005-05-24<br />
885013,4,2005-10-19</p>
<p>5:<br />
30878,3,2004-10-19<br />
1481961,2,2005-09-27<br />
885013,5,2005-05-15</p></blockquote>
<p>The data sets we are using for this example has three users, user with id 30878, id 1481961, and id 885013 rating two movies.  One movie with id 1 and the other with id 5.  User 30878 gave the movie 1 a rating of 4 and gave movie 5 a rating of 3.  User 1481961 gave movie 1 a rating of 2 and gave a rating of 2 to movie 5. User 885013 gave movie id 1 a rating of 4 and a rating of 5 to movie id 5.</p>
<p>Preparing the above data to use with the Euclidean formula, we convert each user to either a P or Q.  The goal is to determine to distance between user  30878 and both  1481961 and  885013.  30878 will be represented by the letter P and Q will be the user 1481961 for the first distance calculation.   The second distance calculations will simply change the Q to represent the user  885013.  Also, since there are only two movies, dimensions, we know that the equation will take the calculation for 2 dimensions.</p>
<p><em>Distance between user 30878 and  1481961</em></p>
<p><a href="http://www.armando.ws/wp-content/uploads/2008/02/image2.jpg" title="Euclidean User Distance 1"><img src="http://www.armando.ws/wp-content/uploads/2008/02/image2.jpg" alt="Euclidean User Distance 1" border="0" /></a></p>
<p><em>Distance between user 30878 and </em>885013</p>
<p><a href="http://www.armando.ws/wp-content/uploads/2008/02/image.jpg" title="Euclidean User Distance 2"><img src="http://www.armando.ws/wp-content/uploads/2008/02/image.jpg" alt="Euclidean User Distance 2" border="0" /></a><a href="http://www.armando.ws/wp-content/uploads/2008/02/image.jpg" title="Euclidean User Distance 2"> </a></p>
<p>After calculating the distance between both users what does the final numerical value represent?  It simply tells us that the distance, or the similarity between both users is 2.23. The rule of thumb is, as the results decrease in value the similarity between each user increases. If the result was a 3.5 or 2.89 the similarity between the users would decrease while a result of 0 or 0.2 would indicate an extremely similar user.</p>
<p><strong>Visualizing Similarity with a Graph </strong><br />
Yet another way to view the data is graphically.  Plotting the users on a X/Y graph where the Y-axis represents one dimension, ratings for video 5, and the X-axis represents ratings for video 1 we can visually see how close the users are to one another.</p>
<p><a href="http://www.armando.ws/wp-content/uploads/2008/02/graph6.jpg" title="Euclidean Distance Graph"><img src="http://www.armando.ws/wp-content/uploads/2008/02/graph6.jpg" alt="Euclidean Distance Graph" height="431" width="561" /></a></p>
<p>User 30878 has the coordinates, (4,3), while user 1481961 has coordinates (2,2), and user  has coordinates (4,5).  The graph describes how close, similar, each user is to the user  30878.  User 1481961 is closer to 30878 compared to 885013, thereby making these two users similar.</p>
<p><strong>The Code<br />
</strong></p>
<p><em>PHP5 Class<br />
</em></p>
<blockquote><p><code>/**<br />
* Distance Formula Class<br />
*<br />
* @author Armando Padilla, mando81@prodigy.net<br />
* @copyright Armando Padilla.<br />
*<br />
*/<br />
class DistanceFormulas {</code></p>
<p>public function __construct(){}</p>
<p>public function euclidean(array $objectXAttribute, array $objectYAttribute){</p>
<p>$distanceFromXToY = 0;<br />
for($i=0; $i&lt;count($objectXAttribute); $i++){</p>
<p>$p = $objectXAttribute[$i];<br />
$q = $objectYAttribute[$i];</p>
<p>$distanceFromXToY += pow(($p-$q), 2);</p>
<p>}</p>
<p>return $distanceFromXToY;</p>
<p>}//End</p>
<p>}//End class</p></blockquote>
<p><em> PHP5 Test</em></p>
<blockquote><p>/**<br />
* Simple example using the DistanceFormula Class<br />
*<br />
* @author Armando Padilla, mando81@prodigy.net<br />
* @copyright Armando Padilla<br />
*/<br />
require_once(&#8220;DistanceFormulas.php&#8221;);</p>
<p>//List of users along with their ratings.<br />
//usually you would have this in a database and have more than 3 users <img src='http://www.armando.ws/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
//<br />
// rating order: &#8220;The Abyss&#8221;, &#8220;Hunt for Red October&#8221;, &#8220;Goonies&#8221;<br />
$profiles = array(array(&#8220;username&#8221; =&gt; &#8216;Armando&#8217;, &#8220;movieratings&#8221; =&gt;  array(5, 4, 4)),<br />
array(&#8220;username&#8221; =&gt; &#8216;Snoopy&#8217;,  &#8220;movieratings&#8221; =&gt;  array(2,2,5)),<br />
array(&#8220;username&#8221; =&gt; &#8216;Pearl&#8217;,   &#8220;movieratings&#8221; =&gt;  array(1,4,1)));</p>
<p>//Instantiate the class<br />
$DistanceFormula = new DistanceFormulas();</p>
<p>//Let&#8217;s get the distance from Armando to the other two users.<br />
for($i=1; $i&lt;count($profiles); $i++){</p>
<p>$distance .= &#8220;Distance from &#8220;.$profiles[0]['username']. &#8221; to &#8220;.$profiles[$i]['username'].&#8221;:: &#8220;;<br />
$distance .= $DistanceFormulas-&gt;euclidean($profiles[0]['movieratings'], $profiles[$i]['movieratings']);<br />
$distance .= &#8220;&lt;br&gt;&#8221;;</p>
<p>}</p></blockquote>
<blockquote></blockquote>
<p>Ruby Example &#8211; coming soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2008/02/comparing-distance-formulas-part-iii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

