Python

Moderators: None (Apply to moderate this forum)
Number of threads: 400
Number of posts: 1055

This Forum Only
Post New Thread
Single Post View       Linear View       Threaded View      f

Report
pearson algorithm for finding similarity Posted by nehasingh on 10 Mar 2010 at 9:34 PM
we hav used pearson algorithm which is given below. Ideally the algorithm should return values between -1 and 1 but since we hav a large set of data therefore it is giving values such as 1.0327955589886444 and 1.1547005383792517 so can u suggest us some solution to this problem or any other efficient algorithm for finding similarity between users.


the input file is a dictionary of users, their choices and ranking.
eg: dict1={user1:{choice1:rank1,choice2:rank1},user2:{choice1:rank1,choice2:rank1}, user3:{choice1:rank1,choice2:rank1}}
but we are working with a very large dictionary consisting of varying ranking.

def pearson(prefs,p1,p2):
# Get the list of mutually rated items
si={}
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
# if they are no ratings in common, return 0
if len(si)==0: return
# Sum calculations
n=len(si)
# Sums of all the preferences
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si])
# Sums of the squares
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
# Sum of the products
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
# Calculate r (Pearson score)
num=pSum-(sum1*sum2/n)
den=math.sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
if den==0: return 0
r=num/den
return r
Report
Re: pearson algorithm for finding similarity Posted by herbR on 31 Mar 2010 at 8:26 AM
The correlation coefficient is really just a cosine of the angle between the two arrays. Here is a link to some code for computing it:

http://www.dreamincode.net/code/snippet3042.htm

If your values are outside of the range of -1 to +1, then either you are computing it incorrectly or you have some pretty profound round-off error.

You can see more about the formula in Wikipedia:
http://en.wikipedia.org/wiki/Correlation_and_dependence

and more about cosine and correlation here:
http://www.mega.nu/ampp/rummel/uc.htm
Herb



 

Recent Jobs

Official Programmer's Heaven Blogs
Web Hosting | Browser and Social Games | Gadgets

Popular resources on Programmersheaven.com
Assembly | Basic | C | C# | C++ | Delphi | Flash | Java | JavaScript | Pascal | Perl | PHP | Python | Ruby | Visual Basic
© Copyright 2011 Programmersheaven.com - All rights reserved.
Reproduction in whole or in part, in any form or medium without express written permission is prohibited.
Violators of this policy may be subject to legal action. Please read our Terms Of Use and Privacy Statement for more information.
Operated by CommunityHeaven, a BootstrapLabs company.