我一直在编写一个 Java 库,我想用它来构建贝叶斯信念网络。我有用于构建有向图的类
public class Node{
private String label;
private List<Node> adjacencyList = new ArrayList<Node>();
private Frequency<String> distribution = new Frequency<String>();
public String getLabel() {
return label;
}
public void setLabel(String label) {
this.label = label;
}
public List<Node> getAdjacencyList(){
return adjacencyList;
}
public void addNeighbour(Node neighbour){
adjacencyList.add(neighbour);
}
public void setDistribution(List<String> data){
for(String s:data){
distribution.addValue(s);
}
}
public double getDistributionValue(String value){
return distribution.getPct(value);
}
}
图形
public class DirectedGraph {
Map<String,Node> graph = new HashMap<String,Node>();
public void addVertex(String label){
Node vertex = new Node();
vertex.setLabel(label);
graph.put(label, vertex);
}
public void addEdge(String here, String there){
Node nHere = graph.get(here);
Node nThere = graph.get(there);
nThere.addNeighbour(nHere);
graph.put(there,nThere);
}
public List<Node> getNeighbors(String vertex){
return graph.get(vertex).getAdjacencyList();
}
public int degree(String vertex){
return graph.get(vertex).getAdjacencyList().size();
}
public boolean hasVertex(String vertex){
return graph.containsKey(vertex);
}
public boolean hasEdge(String here, String there){
Set<Node> nThere = new HashSet<Node>(graph.get(there).getAdjacencyList());
boolean thereConHere = nThere.contains(here);
return (thereConHere);
}
}
我有一个类用于跟踪数据集的概率分布
public class Frequency<T extends Comparable<T>> {
private Multiset event = HashMultiset.create();
private Multimap event2 = LinkedListMultimap.create();
public void addValue(T data){
if(event2.containsKey(data) == false){
event2.put(data,data);
}
event.add(data);
}
public void clear(){
this.event = null;
this.event2 = null;
this.event = HashMultiset.create();
this.event2 = LinkedListMultimap.create();
}
public double getPct(T data){
int numberOfIndElements = event.count(data);
int totalNumOfElements = event.size();
return (double) numberOfIndElements/totalNumOfElements;
}
public int getNum(T data){
int numberOfIndElements = event.count(data);
return numberOfIndElements;
}
public int getSumFreq(){
return event.size();
}
public int getUniqueCount(){
return event.entrySet().size();
}
public String[] getKeys(){
Set<String> test = event2.keySet();
Object[] keys = test.toArray();
String[] keysAsStrings = new String[keys.length];
for(int i=0;i<keys.length;i++){
keysAsStrings[i] = (String) keys[i];
}
return keysAsStrings;
}
}
以及我可以用来计算条件概率的另一个函数
public double conditionalProbability(List<String> interestedSet,
List<String> reducingSet,
String interestedClass,
String reducingClass){
List<Integer> conditionalData = new LinkedList<Integer>();
double returnProb = 0;
iFrequency.clear();
rFrequency.clear();
this.setInterestedFrequency(interestedSet);
this.setReducingFrequency(reducingSet);
for(int i = 0;i<reducingSet.size();i++){
if(reducingSet.get(i).equalsIgnoreCase(reducingClass)){
if(interestedSet.get(i).equalsIgnoreCase(interestedClass)){
conditionalData.add(i);
}
}
}
int numerator = conditionalData.size();
int denominator = this.rFrequency.getNum(reducingClass);
if(denominator !=0){
returnProb = (double)numerator/denominator;
}
iFrequency.clear();
rFrequency.clear();
return returnProb;
}
但是,我仍然不确定如何将所有内容连接起来以执行分类。
我正在阅读一篇题为“比较贝叶斯网络分类器”的论文,试图了解一下。
假设我试图根据身高、体重和鞋码的属性来预测一个人的性别。我的理解是,我将 Sex 作为我的父/分类节点,而身高、体重和鞋码将由我的子节点。
这就是我感到困惑的地方。各种分类节点只跟踪它们各自属性的概率分布,但我需要条件概率才能执行分类。
我有一个我写的旧版本的朴素贝叶斯
public void naiveBayes(Data data,List<String> targetClass, BayesOption bayesOption,boolean headers){
//intialize variables
int numOfClasses = data.getNumOfKeys();//.getHeaders().size();
String[] keyNames = data.getKeys();// data.getHeaders().toArray();
double conditionalProb = 1.0;
double prob = 1.0;
String[] rClass;
String priorName;
iFrequency.clear();
rFrequency.clear();
if(bayesOption.compareTo(BayesOption.TRAIN) == 0){
this.setInterestedFrequency(targetClass);
this.targetClassKeys = Util.convertToStringArray(iFrequency.getKeys());
for(int i=0;i<this.targetClassKeys.length;i++){
priors.put(this.targetClassKeys[i],iFrequency.getPct(this.targetClassKeys[i]));
}
}
//for each classification in the target class
for(int i=0;i<this.targetClassKeys.length;i++){
//get all of the different classes for that variable
for(int j=0;j<numOfClasses;j++){
String reducingKey = Util.convertToString(keyNames[j]);
List<String> reducingClass = data.dataColumn(reducingKey,DataOption.GET,true);// new ArrayList(data.getData().get(reducingKey));
this.setReducingFrequency(reducingClass);
Object[] reducingClassKeys = rFrequency.getKeys();
rClass = Util.convertToStringArray(reducingClassKeys);
for(int k=0;k<reducingClassKeys.length;k++){
if(bayesOption.compareTo(BayesOption.TRAIN) == 0){
conditionalProb = conditionalProbability(targetClass, reducingClass, this.targetClassKeys[i], rClass[k]);
priorName = this.targetClassKeys[i]+"|"+rClass[k];
priors.put(priorName,conditionalProb);
}
if(bayesOption.compareTo(BayesOption.PREDICT) == 0){
priorName = this.targetClassKeys[i]+"|"+rClass[k];
prob = prob * priors.get(priorName);
}
}
rFrequency.clear();
}
if(BayesOption.PREDICT.compareTo(bayesOption) == 0){
prob = prob * priors.get(this.targetClassKeys[i]);
Pair<String,Double> pred = new Pair<String, Double>(this.targetClassKeys[i],prob);
this.predictions.add(pred);
}
}
this.iFrequency.clear();
this.rFrequency.clear();
}
所以我通常理解数学是如何工作的,但我不太确定我应该如何让事情与这个特定的架构一起工作。
如何计算条件概率?
有人可以向我解释这种差异吗?