<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>{ Online Notes } &#187; Research</title>
	<atom:link href="http://www.armando.ws/category/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.armando.ws</link>
	<description>All things Technical and Personal - Armando Padilla</description>
	<lastBuildDate>Sat, 12 Jun 2010 05:06:35 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Scala + Lift.  First Impressions</title>
		<link>http://www.armando.ws/2009/01/scala-lift-first-impressions/</link>
		<comments>http://www.armando.ws/2009/01/scala-lift-first-impressions/#comments</comments>
		<pubDate>Fri, 23 Jan 2009 04:33:38 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=286</guid>
		<description><![CDATA[The line, &#8220;Meet the new boss, same as the old boss&#8221; comes to mind.   Scala is a relative new language and is a functional language.  Lift is a web framework which runs scala.  For the last few days ive dabbled with both and aside from somewhat fun ive had working with [...]]]></description>
			<content:encoded><![CDATA[<p>The line, &#8220;Meet the new boss, same as the old boss&#8221; comes to mind.   Scala is a relative new language and is a functional language.  Lift is a web framework which runs scala.  For the last few days ive dabbled with both and aside from somewhat fun ive had working with it I have to point out that I dont get it&#8230;</p>
<p>No, I dont mean i dont know how to use it. Ive created a few example projects already but just dont get why the the developers decided to change the syntax SO MUCH.  Its like &#8230;what??? why??? just for the hell of it??? Also one of the selling points for scala is how well it plays with Java.  Most of you know from previous articles how i think Java is not for the web so I dont get why scala would be any different.  It runs on the JVM and list runs on a servlet container (ex Tomcat).  So all the headaches that i had with Tomcat, Java, and the web application are back&#8230;great.  So far these are my first impressions of both scala and lift.  I plan to create a small TO-DO list using both technologies and publish a small tutorial soon.  </p>
<p>It was either scala/lift or learn Ruby well.  I chose something completely out in left field.  We&#8217;ll see, heck Scala+Lift is bosting thats it a few times faster than Ruby on Rails.  So&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2009/01/scala-lift-first-impressions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Zend Framework &amp; Netflix &#8211; Zend_Service_Netflix</title>
		<link>http://www.armando.ws/2008/10/zend-framework-and-netflix-zend_service_netflix/</link>
		<comments>http://www.armando.ws/2008/10/zend-framework-and-netflix-zend_service_netflix/#comments</comments>
		<pubDate>Fri, 10 Oct 2008 08:24:14 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[PHP Development]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[netflix]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[Zend]]></category>
		<category><![CDATA[Zend Framework]]></category>
		<category><![CDATA[zend_service_netflix]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=255</guid>
		<description><![CDATA[I popped on the headphones, slipped a Thievery Corporation track and started to read the new Netflix REST API. A few hours later and a mad rush to code, I created a pre-pre-pre-pre-alpha release package for the Zend Framework.  It&#8217;s a rough implementation that has much to be done to it and has not been [...]]]></description>
			<content:encoded><![CDATA[<p>I popped on the headphones, slipped a <a href="http://www.imeem.com/nataliewalker/music/NZR8oXug/natalie_walker_quicksand_thievery_corporation_remix/" target="_blank">Thievery Corporation track</a> and started to read the new Netflix REST API. A few hours later and a mad rush to code, I created a pre-pre-pre-pre-alpha release package for the Zend Framework.  It&#8217;s a rough implementation that has much to be done to it and has not been flushed out that well. It currently supports only one REST call, &#8220;/catalog/title&#8221;, but I have plans to support the entire gambit of REST calls open by Netflix for this package.</p>
<p>For those interested in the <a href="http://developer.netflix.com/page">netflix api check this page out</a>.  It&#8217;s basically the netflix documentation. Its pretty straight-forward and should be an easy and fun read.  I was surprised they opened up their &#8216;predicted rating&#8217; api. Nice touch <img src='http://www.armando.ws/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  I can see A LOT of things developers can do with just the prediction REST api.</p>
<p>And for those that want to check out the Zend_Service_Netflix code i currently have, here is the <a href="http://code.google.com/p/zendframeworknetflixservice/">Google Code Project Page</a> link.  </p>
<p>I plan to finish the package by Sunday evening.  Here is my attack plan.</p>
<p><strong>Attack Plan:</strong></p>
<ol>
<li>Finish the concrete classes. (ETA Friday evening) </li>
<li>Finish Up the remaining REST calls (ETA Saturday Evening)</li>
<li>Create test using PHPUnit. (ETA. as I go)</li>
<li>Documentation.  (ETA Sunday evening) </li>
</ol>
<div><strong>Links:</strong><br />
<a href="http://code.google.com/p/zendframeworknetflixservice/" target="_blank">Zend Service Nerflix download package</a> (my version)</div>
<div><a href="http://developer.netflix.com/page" target="_blank">Netflix API URL</a></div>
<div><a href="http://framework.zend.com/" target="_blank">Zend Framework URL </a> <br />
 </div>
<div>Ok so its like almost 2am and spent all evening reading and coding ill have move updates tomorrow evening.</div>
<div>
<p>Armando Padilla<br />
PS.  Yes i know, the code looks like ass..sue me (no don&#8217;t that would suck) I did all that in a few (3) hours. </p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2008/10/zend-framework-and-netflix-zend_service_netflix/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Gnosis &amp; SemanticHacker Review</title>
		<link>http://www.armando.ws/2008/05/gnosis-semantichacker-review/</link>
		<comments>http://www.armando.ws/2008/05/gnosis-semantichacker-review/#comments</comments>
		<pubDate>Tue, 20 May 2008 20:26:57 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=196</guid>
		<description><![CDATA[I started to work on a small project on my off time, its a semantic web project based on the use case below.  I started looking into any applications that are currently available to the public and found 2 projects that currently attempt to do what I want to do but come short of it. [...]]]></description>
			<content:encoded><![CDATA[<p>I started to work on a small project on my off time, its a semantic web project based on the use case below.  I started looking into any applications that are currently available to the public and found 2 projects that currently attempt to do what I want to do but come short of it. The idea, if your too lazy to read the below post;  A document parser that will highlight interesting items (&#8217;interesting&#8217; is defined by key words of places, people, and things) Once the &#8220;interesting&#8221; items are hihglighted he user can hover the highlighted word and see a short description of the item, or present the user with a list of links that might be of some use.</p>
<p>Since i see no point in reinventing the wheel, I found <a href="https://addons.mozilla.org/en-US/firefox/addon/3999">Gnosis</a> and SemanticHacker.com.  Gnosis has a great natural language processor and can accomplish the requirement to find &#8220;interesting&#8221; items but dosnt provide useful articles.   Gnosis instead send you to a google result page (pretty lame but its a start).  On the other hand SemanticHacker.com has come up with a API that basically does both the Natural Language Processing and the relevance articles but has two drawbacks.  1.  The NLP sorta sucks, well its not as good as the Gnosis one.  2.  Articles of interest are mined from Wikipedia which might not be up-to-date or might be misleading.  3.  No FIrefox plugin.</p>
<p>So what am i going to do now? Easy.  I will check out the API that SemanticHacker has released and work on two things.  1.  Create a Firefox plugin for it.  and 2. Expand on the Knowledge Base the system references.  Move from a wikipedia setting to a flickr (look for tagged relevant images) and a news RSS feeds.</p>
<p>Armando Padilla</p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2008/05/gnosis-semantichacker-review/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Semantic Web. (use case)</title>
		<link>http://www.armando.ws/2008/05/semantic-web-use-case/</link>
		<comments>http://www.armando.ws/2008/05/semantic-web-use-case/#comments</comments>
		<pubDate>Thu, 15 May 2008 20:22:01 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=194</guid>
		<description><![CDATA[I was reading, &#8220;Semantic Web Technologies &#8211; trends and research in ontology-based system&#8221;, and then i started to read some article on CNN.  While reading I started to notice that I wanted to learn more about the earthquakes that happend in china but I didnt wnat to go through google again.
So I started to draw [...]]]></description>
			<content:encoded><![CDATA[<p>I was reading, &#8220;Semantic Web Technologies &#8211; trends and research in ontology-based system&#8221;, and then i started to read some article on CNN.  While reading I started to notice that I wanted to learn more about the earthquakes that happend in china but I didnt wnat to go through google again.</p>
<p>So I started to draw up some plans during lunch to create a plugin for Firefox that would run through a document opened on the browser.  It will then use a natural language processor to find key words; places, events, etc etc.  I then decided to use the API provided by these folks to find related articles on the subject.  Im not sure if this has been done but I have yet to see it done.  yes I know there are keyword ad related scripts on some sites but i havent seen anything that would compliment articles im reading, movies im watching, or basically anything out there.</p>
<p>Time to get back to work.<br />
Armando Padilla</p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2008/05/semantic-web-use-case/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finally!</title>
		<link>http://www.armando.ws/2008/04/finally-2/</link>
		<comments>http://www.armando.ws/2008/04/finally-2/#comments</comments>
		<pubDate>Sat, 19 Apr 2008 01:22:20 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=178</guid>
		<description><![CDATA[It came to me like a &#8220;oh my god&#8221; moment today!  I know what my thesis will be about.  The semantic web.  Specifically, Semantic Web Services.  This is why i chose the topic to pursue, yea i know the topic is still too broad but at least I have an understanding [...]]]></description>
			<content:encoded><![CDATA[<p>It came to me like a &#8220;oh my god&#8221; moment today!  I know what my thesis will be about.  The semantic web.  Specifically, Semantic Web Services.  This is why i chose the topic to pursue, yea i know the topic is still too broad but at least I have an understanding what im finally doing and why.</p>
<p>This is how it came about.  I was given a task at work, its a prototype application using Flex and AIR and I started to look into how these particular technologies work with XML and web services.  I started to look into RDF parsing and found that there is no clear RDF or OWL parser for Action Script 3, that got me thinking and soon had me looking over my class notes on the subject.  That then brought me to where im at now, how these technologies work with web services.</p>
<p>The technology to work with web services isnt that difficult but the technology to have web services find each other to facilitate each others needs captures my attention.  The underlining idea of self discovery on the computational level is interesting.  How can you have a program locate and basically build itself to accomplish a task?  In other words, how can one person locate two people across town, that dont know each but posses skills to complete a task, come together? Before the advent of the web, we used the phone book now we do a quick google search.  This is the idea behind the semantic web service, from what i&#8217;ve gathered so far. (I might retract after i read a bit more on the subject)</p>
<p>Im going to focus on, first the technology RDF, OWL, see whats currently available and working in production, then im going to focus my attention on roadmaps on how to set up a very simple semantic web service (when i mean simple i mean simple). Finally im going to look at soa type of set ups for sws.  (this roadmap might change dramatically as i start to dig deeper.  Ill keep the Research section updated with my findind though and ill wrap up all the research related tutorials ive started aswell soon.)</p>
<p>Armando Padilla</p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2008/04/finally-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Semantic Web &#8211; Possible Thesis.</title>
		<link>http://www.armando.ws/2008/04/semantic-web-possible-thesis/</link>
		<comments>http://www.armando.ws/2008/04/semantic-web-possible-thesis/#comments</comments>
		<pubDate>Sat, 12 Apr 2008 22:17:43 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=174</guid>
		<description><![CDATA[A quarter or two ago I took a graduate class where we had to chose a topic that we possibly would enjoy looking into.  Ashok and myself spent three months putting together a 58 page report on the Semantic Web.  Its current state, current technology, its challenges, and its future.
If you do a [...]]]></description>
			<content:encoded><![CDATA[<p>A quarter or two ago I took a graduate class where we had to chose a topic that we possibly would enjoy looking into.  Ashok and myself spent three months putting together a 58 page report on the Semantic Web.  Its current state, current technology, its challenges, and its future.</p>
<p>If you do a quick google search on the semantic web RDF, OWL, and some other XML based junk will come up.  All that technology is just to put meaning into all the documents published on the net, information that gadgets create so one day all that data can be combined to help machines help us. So the theory goes.</p>
<p>Here is the paper.<br />
<em><a href="http://www.armando.ws/wp-content/uploads/2008/04/web.doc">Semantic Web Paper &#8211; by Armando Padilla &amp; Ashok Sahu</a></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2008/04/semantic-web-possible-thesis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comparing Distance Formulas. Part III</title>
		<link>http://www.armando.ws/2008/02/comparing-distance-formulas-part-iii/</link>
		<comments>http://www.armando.ws/2008/02/comparing-distance-formulas-part-iii/#comments</comments>
		<pubDate>Wed, 13 Feb 2008 22:45:37 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[PHP Development]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[compatibility formulas]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[euclidean distance]]></category>
		<category><![CDATA[netflix challenge]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=112</guid>
		<description><![CDATA[This article will cover the Euclidean Formula. I will cover the basic equation for the algorithm, provide running and working examples for both Ruby and PHP5, but will state advantages and disadvantageous of the algorithm after
covering every algorithm in the article.
Background &#8211; Euclidean Algorithm
One of the oldest and one of the most widely used distance [...]]]></description>
			<content:encoded><![CDATA[<p>This article will cover the Euclidean Formula. I will cover the basic equation for the algorithm, provide running and working examples for both Ruby and PHP5, but will state advantages and disadvantageous of the algorithm after<br />
covering every algorithm in the article.</p>
<p><strong>Background &#8211; Euclidean Algorithm</strong><br />
One of the oldest and one of the most widely used distance formulas is the Euclidean formula. The formula captures two items and compares each attribute of each item to one another to determine how close, related, they<br />
are.  In other words.  With the formula we can take two users run through each of their characteristics, in our case movie ratings, and determine how similar they are.</p>
<p>But how is the value that the algorithm provides us used?  Once the calculations are done between two users we are presented with a value between 0 and infinity.  The larger the value the farther apart both users are to one another (not similar).   A few example of what values may be provided are 0.56, 2.89, 4.  These are the values that will be used to determine how similar both users are to each other.</p>
<p><strong>The Math Equation</strong></p>
<p><img src="http://upload.wikimedia.org/math/6/c/6/6c667279ab399449a0d34e364d0129f6.png" height="60" width="508" /></p>
<p>The equation above gives us both the short hand and the expanded version.   We&#8217;ll take a look at the short hand. In short hand the equation takes the difference between the attributes of two objects specified in the (Pi-Q1) portion and squares it. For a single dimension, attribute, the equation will take the square-root and then finish. Fortunately for us, the equation allows us to use more than one attribute so it continues by taking the sum of the next set of attribute differences until there are no additional attributes to compare.</p>
<p><strong>Calculating Distances with an example.</strong><br />
Using the notation above, lets take a simple example using our data set to make it clear what the equation is doing.  For this example well use 3 users and 2 movies all users rated.</p>
<blockquote><p>1:<br />
30878,4,2005-12-26<br />
1481961,2,2005-05-24<br />
885013,4,2005-10-19</p>
<p>5:<br />
30878,3,2004-10-19<br />
1481961,2,2005-09-27<br />
885013,5,2005-05-15</p></blockquote>
<p>The data sets we are using for this example has three users, user with id 30878, id 1481961, and id 885013 rating two movies.  One movie with id 1 and the other with id 5.  User 30878 gave the movie 1 a rating of 4 and gave movie 5 a rating of 3.  User 1481961 gave movie 1 a rating of 2 and gave a rating of 2 to movie 5. User 885013 gave movie id 1 a rating of 4 and a rating of 5 to movie id 5.</p>
<p>Preparing the above data to use with the Euclidean formula, we convert each user to either a P or Q.  The goal is to determine to distance between user  30878 and both  1481961 and  885013.  30878 will be represented by the letter P and Q will be the user 1481961 for the first distance calculation.   The second distance calculations will simply change the Q to represent the user  885013.  Also, since there are only two movies, dimensions, we know that the equation will take the calculation for 2 dimensions.</p>
<p><em>Distance between user 30878 and  1481961</em></p>
<p><a href="http://www.armando.ws/wp-content/uploads/2008/02/image2.jpg" title="Euclidean User Distance 1"><img src="http://www.armando.ws/wp-content/uploads/2008/02/image2.jpg" alt="Euclidean User Distance 1" border="0" /></a></p>
<p><em>Distance between user 30878 and </em>885013</p>
<p><a href="http://www.armando.ws/wp-content/uploads/2008/02/image.jpg" title="Euclidean User Distance 2"><img src="http://www.armando.ws/wp-content/uploads/2008/02/image.jpg" alt="Euclidean User Distance 2" border="0" /></a><a href="http://www.armando.ws/wp-content/uploads/2008/02/image.jpg" title="Euclidean User Distance 2"> </a></p>
<p>After calculating the distance between both users what does the final numerical value represent?  It simply tells us that the distance, or the similarity between both users is 2.23. The rule of thumb is, as the results decrease in value the similarity between each user increases. If the result was a 3.5 or 2.89 the similarity between the users would decrease while a result of 0 or 0.2 would indicate an extremely similar user.</p>
<p><strong>Visualizing Similarity with a Graph </strong><br />
Yet another way to view the data is graphically.  Plotting the users on a X/Y graph where the Y-axis represents one dimension, ratings for video 5, and the X-axis represents ratings for video 1 we can visually see how close the users are to one another.</p>
<p><a href="http://www.armando.ws/wp-content/uploads/2008/02/graph6.jpg" title="Euclidean Distance Graph"><img src="http://www.armando.ws/wp-content/uploads/2008/02/graph6.jpg" alt="Euclidean Distance Graph" height="431" width="561" /></a></p>
<p>User 30878 has the coordinates, (4,3), while user 1481961 has coordinates (2,2), and user  has coordinates (4,5).  The graph describes how close, similar, each user is to the user  30878.  User 1481961 is closer to 30878 compared to 885013, thereby making these two users similar.</p>
<p><strong>The Code<br />
</strong></p>
<p><em>PHP5 Class<br />
</em></p>
<blockquote><p><code>/**<br />
* Distance Formula Class<br />
*<br />
* @author Armando Padilla, mando81@prodigy.net<br />
* @copyright Armando Padilla.<br />
*<br />
*/<br />
class DistanceFormulas {</code></p>
<p>public function __construct(){}</p>
<p>public function euclidean(array $objectXAttribute, array $objectYAttribute){</p>
<p>$distanceFromXToY = 0;<br />
for($i=0; $i&lt;count($objectXAttribute); $i++){</p>
<p>$p = $objectXAttribute[$i];<br />
$q = $objectYAttribute[$i];</p>
<p>$distanceFromXToY += pow(($p-$q), 2);</p>
<p>}</p>
<p>return $distanceFromXToY;</p>
<p>}//End</p>
<p>}//End class</p></blockquote>
<p><em> PHP5 Test</em></p>
<blockquote><p>/**<br />
* Simple example using the DistanceFormula Class<br />
*<br />
* @author Armando Padilla, mando81@prodigy.net<br />
* @copyright Armando Padilla<br />
*/<br />
require_once(&#8221;DistanceFormulas.php&#8221;);</p>
<p>//List of users along with their ratings.<br />
//usually you would have this in a database and have more than 3 users <img src='http://www.armando.ws/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
//<br />
// rating order: &#8220;The Abyss&#8221;, &#8220;Hunt for Red October&#8221;, &#8220;Goonies&#8221;<br />
$profiles = array(array(&#8221;username&#8221; =&gt; &#8216;Armando&#8217;, &#8220;movieratings&#8221; =&gt;  array(5, 4, 4)),<br />
array(&#8221;username&#8221; =&gt; &#8216;Snoopy&#8217;,  &#8220;movieratings&#8221; =&gt;  array(2,2,5)),<br />
array(&#8221;username&#8221; =&gt; &#8216;Pearl&#8217;,   &#8220;movieratings&#8221; =&gt;  array(1,4,1)));</p>
<p>//Instantiate the class<br />
$DistanceFormula = new DistanceFormulas();</p>
<p>//Let&#8217;s get the distance from Armando to the other two users.<br />
for($i=1; $i&lt;count($profiles); $i++){</p>
<p>$distance .= &#8220;Distance from &#8220;.$profiles[0]['username']. &#8221; to &#8220;.$profiles[$i]['username'].&#8221;:: &#8220;;<br />
$distance .= $DistanceFormulas-&gt;euclidean($profiles[0]['movieratings'], $profiles[$i]['movieratings']);<br />
$distance .= &#8220;&lt;br&gt;&#8221;;</p>
<p>}</p></blockquote>
<blockquote></blockquote>
<p>Ruby Example &#8211; coming soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2008/02/comparing-distance-formulas-part-iii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comparing Distance Formulas. Part II</title>
		<link>http://www.armando.ws/2008/02/comparing-distance-formulas-part-ii/</link>
		<comments>http://www.armando.ws/2008/02/comparing-distance-formulas-part-ii/#comments</comments>
		<pubDate>Mon, 11 Feb 2008 19:59:35 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=110</guid>
		<description><![CDATA[Before diving into the algorithms I would like to step back and take a look at the
data we will be analyzing with our examples.  

Where did the data come from?
The data originated from the Netflix.com data mining challenge.  The challenge was created by Netflix 2-3 years ago to allowed data mining junkies the [...]]]></description>
			<content:encoded><![CDATA[<p>Before diving into the algorithms I would like to step back and take a look at the<br />
data we will be analyzing with our examples.  <strong><br />
</strong></p>
<p><strong>Where did the data come from?</strong><br />
The data originated from the Netflix.com data mining challenge.  The challenge was created by Netflix 2-3 years ago to allowed data mining junkies the  ability to help the movie renting population find movies they might enjoy based on their current renting selection and similar users with similar renting habits.  Of  course every junkie needs an incentive so Netflix.com is offered $1,000,000 to anyone  that could increase the recommendation engines success rate by 10%.  The challenge<br />
is ongoing and ends 2010. More information can be found <a href="http://www.netflixprize.com/">here</a></p>
<p><strong>The Data</strong><br />
Focusing on the data, the files contain a list of movies, roughly 17,000, along with 547 user ratings per movie.  The format that Netflix has placed the data<br />
into is:</p>
<blockquote><p>1:<br />
1488844,3,2005-09-06<br />
822109,5,2005-05-13<br />
885013,4,2005-10-19<br />
30878,4,2005-12-26<br />
823519,3,2004-05-03<br />
893988,3,2005-11-17<br />
124105,4,2004-08-05<br />
1248029,3,2004-04-22</p></blockquote>
<p>Where 1 is the movie id. The movie id is a unique number which represents a movie title in their system.  For example, the movie id shown above, 1, can be associated to the movie, <em>The Abyss</em>.  Since the movie id is unique to one and only one movie we can safely assume that <em>The Abyss</em> will always be 1.</p>
<p>The following lines contain user rating information.  Taking one line as an example, we can break up the data by commos.  The first piece of the line is the unique id of a user, in this case a person using Netflix to rent a movie. Again, we can safely<br />
assume that one id is associated to one and only one user in the system.  As an<br />
example. The id, 30878, is associated to the user, Armando Padilla.</p>
<p>The next item on the line is the rating that the user has given to the movie.  In<br />
the data that Netflix.com has used the ratings can be anywhere from 1 to 5, where 1 s a very low rating, indicating that the user hated the movie while 5 is a very high rating indicating that the user has extremely enjoyed watching the movie.   Again with an example, the user 30878 has rated The Abyss with a 4, indicating that he extremely enjoyed it.</p>
<p>Finally the last item on the line is the date the user decided to rate the movie.<br />
The format the date is in is, Year-month-dayofmonth.  The final reading of a line<br />
will be, &#8220;The user, Armando Padilla, with the id of 30878, rated the movie <em>The Abyss </em>with the id of 1 a 4 on December 26th 2006.</p>
<p><strong>How are we using the data for our examples?</strong><br />
As mentioned before there are a little over 17,000 movies in the training data. The specific number of movies rated is 17,770 according the to &#8220;readme&#8221; file the tranining data is accompined with and the total number of users with movie ratings is 480,189.</p>
<p>For our example we will use all 17,770 movies as attributes for all 480,189 users in the system.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2008/02/comparing-distance-formulas-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comparing Distance Formulas. Part I</title>
		<link>http://www.armando.ws/2008/02/comparing-distance-formulas-part-i/</link>
		<comments>http://www.armando.ws/2008/02/comparing-distance-formulas-part-i/#comments</comments>
		<pubDate>Fri, 08 Feb 2008 21:27:39 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=109</guid>
		<description><![CDATA[Match.com, Singles.com and Netflix.com and countless other sites out there rely on comparing a group of people to answers users submit on a daily basis.  Some like Match.com use a long questioner  while others, such as Netflix.com, uses renting habits of a user to determine if a the user might like another product. [...]]]></description>
			<content:encoded><![CDATA[<p>Match.com, Singles.com and Netflix.com and countless other sites out there rely on comparing a group of people to answers users submit on a daily basis.  Some like Match.com use a long questioner  while others, such as Netflix.com, uses renting habits of a user to determine if a the user might like another product.  In a nuttshell, How similar two people are depends on how they behave during their visit to the application.</p>
<p>What I&#8217;ll cover in this first part will be a brief intro into the overall plan.  What algorithms will be covered and what data set I will use.</p>
<p><strong>Overall plan</strong><br />
I was given the task of creating a simple but effective, match.com-like, compatibility application.  The user would fill out a group of questions and based on the answers the application would present the users overall compatibility to other users in the system. The application was a success and to my surprise resisted the load that 3,000 concurrent users exerted on the application.  After launch, I had a lingering question in the back of my mind.  How good were those results?  Could I have made a better engine?</p>
<p>The goal will be to compare 8 distance formulas and determine which formula produces the best results, which formula produces results efficiently for a web application with 500,000 users (small compared to other data sets out there).</p>
<p><strong>Algorithm&#8217;s to Cover </strong><br />
Distance formulas will be the key focus in this piece.  I will cover the more traditional distance formulas along with a few obscure ones.  The list of algorithms is:</p>
<ol>
<li>Eucledian</li>
<li>Manhattan</li>
<li>Mahalantos</li>
<li> Jacaard</li>
<li>Hammin</li>
<li>Pearson</li>
<li>Sorensen</li>
</ol>
<p>Of course I will touch on the benefits of each algorithm and the disadvantages that each algorithm contains. I will also have small working Ruby and PHP code to accompany this article.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2008/02/comparing-distance-formulas-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fast Paced Development Model Part I</title>
		<link>http://www.armando.ws/2007/12/fast-paced-development-model-part-i/</link>
		<comments>http://www.armando.ws/2007/12/fast-paced-development-model-part-i/#comments</comments>
		<pubDate>Sun, 30 Dec 2007 06:10:17 +0000</pubDate>
		<dc:creator>Armando Padilla</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.armando.ws/?p=81</guid>
		<description><![CDATA[In a article I recently wrote, which was not published on this web site, I raised the question.  How can a software development team of any size properly release software applications with a limited amount of acceptable bugs. By acceptable bugs I mean, bugs that do not hinder the ability of the user to [...]]]></description>
			<content:encoded><![CDATA[<p>In a article I recently wrote, which was not published on this web site, I raised the question.  How can a software development team of any size properly release software applications with a limited amount of acceptable bugs. By acceptable bugs I mean, bugs that do not hinder the ability of the user to properly use the application.  Any takers?</p>
<p>The hypothetical situation I presented was:</p>
<ol>
<li>Team of 1-10 developers all working on different projects at any given time but return to one main application when the need is there.</li>
<li>Application development time is 1/4 th of normal development times.</li>
<li>Development team &amp; the companies development teams believe in documentation but never follow through.</li>
<li>No QA team. (developer is on the line to deliver on time and with a near flawless product)</li>
<li>No unit testing of any kind.</li>
</ol>
<p>I have, to date, presented this problem to the public but none have come to a conclusion that I see will work.  Would a TDD method work?  Would Waterfall model wok? Or, would a Spiral model work?  Or would Agyle work in this case?</p>
<p>This will be my challenge. Find a software engineering model that works well with such a situation or propose one.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.armando.ws/2007/12/fast-paced-development-model-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
