<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>'pearson algorithm for finding similarity' Thread RSS Feed</title>
    <link>http://www.programmersheaven.com/</link>
    <description>Contains the latest posts from the thread 'pearson algorithm for finding similarity' posted on the 'Python' forum at Programmer's Heaven.</description>
    <language>en</language>
    <copyright>Copyright 2012 Programmers Heaven</copyright>
    <pubDate>Wed, 23 May 2012 23:59:23 -0700</pubDate>
    <lastBuildDate>Wed, 23 May 2012 23:59:23 -0700</lastBuildDate>
    <generator>Argotic Syndication Framework 2007.3.0.1, http://www.codeplex.com/Argotic</generator>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <ttl>360</ttl>
    <image>
      <url>http://www.programmersheaven.com/images/ph.gif</url>
      <title>Programmers Heaven</title>
      <link>http://www.programmersheaven.com/</link>
      <width>88</width>
      <height>31</height>
    </image>
    <item>
      <title>pearson algorithm for finding similarity</title>
      <link>http://www.programmersheaven.com/mb/python/414413/414413/pearson-algorithm-for-finding-similarity/</link>
      <description>we hav used pearson algorithm which is given below. Ideally the algorithm should return values between -1 and 1 but since we hav a large set of data therefore it is giving values such as 1.0327955589886444 and 1.1547005383792517 so can u suggest us some solution to this problem or any other efficient algorithm for finding similarity between users.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
the input file is a dictionary of users, their choices and ranking.&lt;br /&gt;
eg: dict1={user1:{choice1:rank1,choice2:rank1},user2:{
choice1:rank1,choice2:rank1}, user3:{choice1:rank1,choice2:rank1}}&lt;br /&gt;
but we are working with a very large dictionary consisting of varying ranking.&lt;br /&gt;
&lt;br /&gt;
def pearson(prefs,p1,p2):&lt;br /&gt;
  # Get the list of mutually rated items&lt;br /&gt;
  si={}&lt;br /&gt;
  for item in prefs[p1]: &lt;br /&gt;
    if item in prefs[p2]: si[item]=1&lt;br /&gt;
  # if they are no ratings in common, return 0&lt;br /&gt;
  if len(si)==0: return &lt;br /&gt;
  # Sum calculations&lt;br /&gt;
  n=len(si)&lt;br /&gt;
  # Sums of all the preferences&lt;br /&gt;
  sum1=sum([prefs[p1][it] for it in si])&lt;br /&gt;
  sum2=sum([prefs[p2][it] for it in si])&lt;br /&gt;
  # Sums of the squares&lt;br /&gt;
  sum1Sq=sum([pow(prefs[p1][it],2) for it in si])&lt;br /&gt;
  sum2Sq=sum([pow(prefs[p2][it],2) for it in si])	&lt;br /&gt;
  # Sum of the products&lt;br /&gt;
  pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])&lt;br /&gt;
  # Calculate r (Pearson score)&lt;br /&gt;
  num=pSum-(sum1*sum2/n)&lt;br /&gt;
  den=math.sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))&lt;br /&gt;
  if den==0: return 0&lt;br /&gt;
  r=num/den&lt;br /&gt;
  return r&lt;br /&gt;</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/mb/python/414413/414413/pearson-algorithm-for-finding-similarity/</guid>
      <pubDate>Wed, 10 Mar 2010 21:34:25 -0700</pubDate>
      <category>Python</category>
    </item>
    <item>
      <title>Re: pearson algorithm for finding similarity</title>
      <link>http://www.programmersheaven.com/mb/python/414413/415068/re-pearson-algorithm-for-finding-similarity/#415068</link>
      <description>The correlation coefficient is really just a cosine of the angle between the two arrays.  Here is a link to some code for computing it:&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.dreamincode.net/code/snippet3042.htm"&gt;http://www.dreamincode.net/code/snippet3042.htm&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
If your values are outside of the range of -1 to +1, then either you are computing it incorrectly or you have some pretty profound round-off error.&lt;br /&gt;
&lt;br /&gt;
You can see more about the formula in Wikipedia:&lt;br /&gt;
&lt;a href="http://en.wikipedia.org/wiki/Correlation_and_dependence"&gt;http://en.wikipedia.org/wiki/Correlation_and_dependence&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
and more about cosine and correlation here:&lt;br /&gt;
&lt;a href="http://www.mega.nu/ampp/rummel/uc.htm"&gt;http://www.mega.nu/ampp/rummel/uc.htm&lt;/a&gt;&lt;br /&gt;
Herb&lt;br /&gt;</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/mb/python/414413/415068/re-pearson-algorithm-for-finding-similarity/#415068</guid>
      <pubDate>Wed, 31 Mar 2010 08:26:28 -0700</pubDate>
      <category>Python</category>
    </item>
  </channel>
</rss>
