我试图在这篇论文之后进行分类变量工程。代码如下:
import random
import pandas
import numpy as np
import tensorflow as tf
from tensorflow.contrib import layers
from tensorflow.contrib import learn
from __future__ import print_function
from sklearn.preprocessing import LabelEncoder
我的数据集如下所示。它有 2 个自变量('X1' & 'X2')和 1 个因变量('lable')。“X2”是分类变量。我想为这个变量创建一个嵌入向量并运行简单的线性回归以使用 Tensorflow 预测“标签”。我可以使用任何其他方法。但由于线性回归最容易理解,我正在尝试。
df = pd.DataFrame({'X1': np.array(["A","A","B","C","B","C","B","C","C","B",
"A","B","A","C","A","A","C"]),'X2': np.array([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1]),
'label': np.array([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
2.827,3.465,1.65,2.904,2.42,2.94,1.3])})
对于变量“X1”,我正在创建关卡。
encoder = LabelEncoder()
encoder.fit(df.X1.values)
X = encoder.transform(df.X1.values)
重新创建因变量列表。
y = np.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
2.827,3.465,1.65,2.904,2.42,2.94,1.3])
设置超参数
training_epochs = 5
learning_rate = 1e-3
cardinality = len(np.unique(X))
embedding_size = 2
input_X_size = 1
n_hidden = 10
设置变量:
embeddings = tf.Variable(tf.random_uniform([cardinality, embedding_size], -1.0, 1.0))
h = tf.Variable(tf.truncated_normal((embedding_size + len(df.X1), n_hidden), stddev=0.1))
W_out = tf.get_variable(name='out_w', shape=[n_hidden],
initializer=tf.contrib.layers.xavier_initializer())
嵌入:
embedded_chars = tf.nn.embedding_lookup(embeddings, x)
embedded_chars = tf.reshape(embedded_chars, [-1])
embedded_chars= embedded_chars + np.array([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1])
与隐藏层相乘:
layer_1 = tf.matmul(embedded_chars,h)
layer_1 = tf.nn.relu(layer_1)
out_layer = tf.matmul(layer_1, W_out)
# 定义损失和优化器
cost = tf.reduce_sum(tf.pow(out_layer-y, 2))/(2*n_samples)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
运行图表
初始化 = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
avg_cost = 0.
_, c = sess.run([optimizer, cost],
feed_dict={x: X, y: Y})
print("Ran without Error")
运行代码时,我收到以下错误。
ValueError:形状必须为 2 级,但对于输入形状为 [17]、[19,10] 的“MatMul_1”(操作:“MatMul”)为 1 级。
我无法使用嵌入变量添加连续变量。
谁能指导我怎么做?
谢谢!