贝叶斯线性回归/分类变量/拉普拉斯先验

数据挖掘 Python 统计数据 贝叶斯
2022-03-14 07:49:28

我正在尝试在贝叶斯框架中使用拉普拉斯先验在 Python 中使用以下代码进行特征选择;

代码:

#nb_predictors = len(df.columns) - 1 # we remove the target variable
nb_predictors = 7
beta = list()

with pm.Model() as model:
    # Define priors
    intercept = pm.Normal('Intercept', mu=0, sd=1/25)

    for cpt in range(1, nb_predictors + 1):
        beta.append(pm.Laplace('beta_' + str(cpt), mu=0, b=np.sqrt(2)))

    # Define Likelihood    
    logit = intercept + beta[0] * df['satisfaction_level'] + beta[1] * df['last_evaluation'] \
                      + beta[2] * df['number_project'] + beta[3] * df['average_montly_hours'] \
                      + beta[4] * df['time_spend_company'] + beta[5] * df['Work_accident'] \
                      + beta[6] * df['promotion_last_5years']



    likelihood = pm.Bernoulli('left', pm.math.sigmoid(logit), observed=df['left'])

我想知道如果我添加一个新的分类变量(通过一个热编码)会发生什么,比如这个销售变量。我是否仍然可以先使用拉普拉斯并观察密度是否接近 0(因此它可能与目标变量无关)或者对分类变量没有意义?它只适用于连续变量?

# Define Likelihood    
logit = intercept + beta[0] * df['satisfaction_level'] + beta[1] * df['last_evaluation'] \
                  + beta[2] * df['number_project'] + beta[3] * df['average_montly_hours'] \
                  + beta[4] * df['time_spend_company'] + beta[5] * df['Work_accident'] \
                  + beta[6] * df['promotion_last_5years'] + beta[7] * df['sales_low'] + beta[8] * df['sales_high']
0个回答
没有发现任何回复~