Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

pearson algorithm for finding similarity

we hav used pearson algorithm which is given below. Ideally the algorithm should return values between -1 and 1 but since we hav a large set of data therefore it is giving values such as 1.0327955589886444 and 1.1547005383792517 so can u suggest us some solution to this problem or any other efficient algorithm for finding similarity between users.


the input file is a dictionary of users, their choices and ranking.
eg: dict1={user1:{choice1:rank1,choice2:rank1},user2:{choice1:rank1,choice2:rank1}, user3:{choice1:rank1,choice2:rank1}}
but we are working with a very large dictionary consisting of varying ranking.

def pearson(prefs,p1,p2):
# Get the list of mutually rated items
si={}
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
# if they are no ratings in common, return 0
if len(si)==0: return
# Sum calculations
n=len(si)
# Sums of all the preferences
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si])
# Sums of the squares
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
# Sum of the products
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
# Calculate r (Pearson score)
num=pSum-(sum1*sum2/n)
den=math.sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
if den==0: return 0
r=num/den
return r

Comments

Sign In or Register to comment.