#Map-Reduce
##Objectif
Calculer la "Document Frequency" (DF) des documents.
Document Frequency: Nombre de documents dans lesquelles apparait chaque mots.
##Principe Map: JSON -> Liste (key,value) Reduce: Liste (key,value) -> aggregation sur les valeurs (i.e addition)
##Exemple Map: {A,A,B,C} -> (A: 1),(A,1),(B,1),(C,1) Reduce -> A:2 B:1 C:1
##implementation
function map(){
var text = this.comment;
var words = text.match('/\w+/g');
if(words == null)
return;
var df=[];
for(var i = 0;i< words.length;i++)
df[words[i]]=1;
}
for(var mot in df){
emit(mot,{df:1});
}
}
function reduce(key, values){
var total =0;
for (var i =0; i< values.length;i++)
{
total += values[i].df;
}
return {df:total}
}
}Term Frequency: TF(Word,Document) Relevent Score Value(d,v) := TF(w,d)*log(N/DF(w)) (+PR(d) := google)
Mysql: + 2 tables
mot | docu | tf
mot | df
function map(){
var text = this.comment;
var words = text.match('/\w+/g');
if(words == null)
return;
var tf=[];
for(var i = 0;i< words.length;i++)
if(tf[words[i]]==undefined)
tf[words[i]]=1;
else
tf[words[i]]++
}
for(var mot in df){
emit(mot,{tf:words[i],doc:this._id});
}
}
function reduce(key, values){
return {word:key,tfs:values}
}