问题: 我正在尝试为黑色素瘤“MEL”和痣“NV”建立一个二元分类模型,数据集来自 ISIC 档案ISIC 2019,但对于 8 种不同类型的皮肤病变,我只使用了两种类型,正如我提到的(二进制分类)。
这两个类是不平衡的,'NV' 为 10000,'MEL' 为 3000。
数据集被拆分为训练和验证。我执行该问题的解决方案:下采样和图像增强。train 文件夹现在包含两个子文件夹“MEL”和“NV”,其中包含 10000 个图像 te 验证文件夹包含每个类的 904 个图像。
使用 keras,我微调了 Densenet201,并使用了密集网的预处理功能
from keras.applications.densenet import DenseNet201, preprocess_input
densenet_model = DenseNet201(input_shape=(224, 224, 3), include_top=False, weights="imagenet")
然后我在预训练模型的末尾添加一些层
#get the last layer shape
last_layer = densenet_model.get_layer('relu')
print('last layer output shape:', last_layer.output_shape)
last_output = last_layer.output
# Flatten the output layer to 1 dimension
x = layers.GlobalMaxPooling2D()(last_output)
# Add a fully connected layer with 512 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
# Add a dropout rate of 0.5
x = layers.Dropout(0.5)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(2, activation='sigmoid')(x)
# Configure and compile the model
model = Model(densenet_model.input, x)
我没有冻结任何层,然后我编译模型
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None,
decay=0.0, amsgrad=True)
model.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics=['accuracy','binary_accuracy'])
filepath = "densenet.h5"
# Declare a checkpoint to save the best version of the model
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
# Reduce the learning rate as the learning stagnates
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.5, patience=2,
verbose=1, mode='max', min_lr=0.00001)
callbacks_list = [checkpoint, reduce_lr]
# Fit the model
history = model.fit_generator(train_batches,
steps_per_epoch=train_steps,
validation_data=val_batches,
validation_steps=val_steps,
epochs=20,
verbose=1,
callbacks=callbacks_list)
结果 :
Epoch 1/20
1701/1701 [==============================] - 793s 466ms/step - loss: 0.4436 - acc: 0.7890 - binary_accuracy: 0.7890 - val_loss: 0.3416 - val_acc: 0.8404 - val_binary_accuracy: 0.8404
Epoch 00001: val_acc improved from -inf to 0.84043, saving model to densenet.h5
Epoch 2/20
1701/1701 [==============================] - 720s 423ms/step - loss: 0.3447 - acc: 0.8450 - binary_accuracy: 0.8450 - val_loss: 0.3564 - val_acc: 0.8446 - val_binary_accuracy: 0.8446
Epoch 00002: val_acc improved from 0.84043 to 0.84458, saving model to densenet.h5
Epoch 3/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.2718 - acc: 0.8835 - binary_accuracy: 0.8835 - val_loss: 0.3785 - val_acc: 0.8487 - val_binary_accuracy: 0.8487
Epoch 00003: val_acc improved from 0.84458 to 0.84873, saving model to densenet.h5
Epoch 4/20
1701/1701 [==============================] - 726s 427ms/step - loss: 0.2051 - acc: 0.9172 - binary_accuracy: 0.9172 - val_loss: 0.3779 - val_acc: 0.8581 - val_binary_accuracy: 0.8581
Epoch 00004: val_acc improved from 0.84873 to 0.85813, saving model to densenet.h5
Epoch 5/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1529 - acc: 0.9403 - binary_accuracy: 0.9403 - val_loss: 0.3923 - val_acc: 0.8581 - val_binary_accuracy: 0.8581
Epoch 00005: val_acc did not improve from 0.85813
Epoch 6/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1163 - acc: 0.9553 - binary_accuracy: 0.9553 - val_loss: 0.4813 - val_acc: 0.8498 - val_binary_accuracy: 0.8498
Epoch 00006: val_acc did not improve from 0.85813
Epoch 00006: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.
Epoch 7/20
1701/1701 [==============================] - 727s 427ms/step - loss: 0.0407 - acc: 0.9864 - binary_accuracy: 0.9864 - val_loss: 0.5726 - val_acc: 0.8667 - val_binary_accuracy: 0.8667
Epoch 00007: val_acc improved from 0.85813 to 0.86670, saving model to densenet.h5
Epoch 8/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0287 - acc: 0.9904 - binary_accuracy: 0.9904 - val_loss: 0.5919 - val_acc: 0.8711 - val_binary_accuracy: 0.8711
Epoch 00008: val_acc improved from 0.86670 to 0.87113, saving model to densenet.h5
Epoch 9/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0253 - acc: 0.9909 - binary_accuracy: 0.9909 - val_loss: 0.5453 - val_acc: 0.8720 - val_binary_accuracy: 0.8720
Epoch 00009: val_acc improved from 0.87113 to 0.87196, saving model to densenet.h5
Epoch 10/20
1701/1701 [==============================] - 730s 429ms/step - loss: 0.0216 - acc: 0.9927 - binary_accuracy: 0.9927 - val_loss: 0.5498 - val_acc: 0.8706 - val_binary_accuracy: 0.8706
Epoch 00010: val_acc did not improve from 0.87196
Epoch 11/20
1701/1701 [==============================] - 729s 428ms/step - loss: 0.0145 - acc: 0.9954 - binary_accuracy: 0.9954 - val_loss: 0.6332 - val_acc: 0.8822 - val_binary_accuracy: 0.8822
Epoch 00011: val_acc improved from 0.87196 to 0.88219, saving model to densenet.h5
Epoch 12/20
1701/1701 [==============================] - 731s 430ms/step - loss: 0.0176 - acc: 0.9939 - binary_accuracy: 0.9939 - val_loss: 0.6256 - val_acc: 0.8756 - val_binary_accuracy: 0.8756
Epoch 00012: val_acc did not improve from 0.88219
Epoch 13/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0117 - acc: 0.9965 - binary_accuracy: 0.9965 - val_loss: 0.5959 - val_acc: 0.8838 - val_binary_accuracy: 0.8838
Epoch 00013: val_acc improved from 0.88219 to 0.88385, saving model to densenet.h5
Epoch 14/20
1701/1701 [==============================] - 736s 433ms/step - loss: 0.0132 - acc: 0.9958 - binary_accuracy: 0.9958 - val_loss: 0.7139 - val_acc: 0.8598 - val_binary_accuracy: 0.8598
Epoch 00014: val_acc did not improve from 0.88385
Epoch 15/20
1701/1701 [==============================] - 735s 432ms/step - loss: 0.0109 - acc: 0.9963 - binary_accuracy: 0.9963 - val_loss: 0.6139 - val_acc: 0.8720 - val_binary_accuracy: 0.8720
Epoch 00015: val_acc did not improve from 0.88385
Epoch 00015: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05.
Epoch 16/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0048 - acc: 0.9980 - binary_accuracy: 0.9980 - val_loss: 0.6759 - val_acc: 0.8764 - val_binary_accuracy: 0.8764
Epoch 00016: val_acc did not improve from 0.88385
Epoch 17/20
1701/1701 [==============================] - 733s 431ms/step - loss: 0.0028 - acc: 0.9992 - binary_accuracy: 0.9992 - val_loss: 0.7179 - val_acc: 0.8805 - val_binary_accuracy: 0.8805
Epoch 00017: val_acc did not improve from 0.88385
Epoch 00017: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05.
Epoch 18/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0014 - acc: 0.9996 - binary_accuracy: 0.9996 - val_loss: 0.7525 - val_acc: 0.8816 - val_binary_accuracy: 0.8816
Epoch 00018: val_acc did not improve from 0.88385
Epoch 19/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0011 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7580 - val_acc: 0.8803 - val_binary_accuracy: 0.8803
Epoch 00019: val_acc did not improve from 0.88385
Epoch 00019: ReduceLROnPlateau reducing learning rate to 1e-05.
Epoch 20/20
1701/1701 [==============================] - 733s 431ms/step - loss: 8.0918e-04 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7667 - val_acc: 0.8800 - val_binary_accuracy: 0.8800
Epoch 00020: val_acc did not improve from 0.88385
混淆矩阵:
来自 sklearn.metrics 的分类报告
precision recall f1-score support
MEL 0.48 0.47 0.48 904
NV 0.48 0.49 0.49 904
accuracy 0.48 1808
macro avg 0.48 0.48 0.48 1808
weighted avg 0.48 0.48 0.48 1808
困境
如您所见,这是一个完全垃圾的模型。你认为问题是什么?请任何建议,这种情况发生了一个月,没有任何改善。如果您想查看其他代码,我将发布它们。
这是增强的代码
class_list = ['MEL','NV']
for item in class_list:
# Create a temporary directory for the augmented images
aug_dir = 'aug_dir'
os.mkdir(aug_dir)
# Create a directory within the base dir to store images of the same class
img_dir = os.path.join(aug_dir, 'img_dir')
os.mkdir(img_dir)
# Choose a class
img_class = item
# List all the images in the directory
img_list = os.listdir('base_dir/train_dir/' + img_class)
# Copy images from the class train dir to the img_dir
for fname in img_list:
# source path to image
src = os.path.join('base_dir/train_dir/' + img_class, fname)
# destination path to image
dst = os.path.join(img_dir, fname)
# copy the image from the source to the destination
shutil.copyfile(src, dst)
# point to a dir containing the images and not to the images themselves
path = aug_dir
save_path = 'base_dir/train_dir/' + img_class
# Create a data generator to augment the images in real time
datagen = ImageDataGenerator(
rotation_range=60,
width_shift_range=0.1,
height_shift_range=0.1,
#zoom_range=0.1,
shear_range= 0.2,
horizontal_flip=True,
vertical_flip=True,
brightness_range=(0.9,1.1),
fill_mode='nearest')
batch_size = 50
aug_datagen = datagen.flow_from_directory(path,
save_to_dir=save_path,
save_format='jpg',
target_size=(224, 224),
batch_size=batch_size)
# Generate the augmented images and add them to the training folders
num_aug_images_wanted = 10000 # total number of images we want to have in each class
num_files = len(os.listdir(img_dir))
num_batches = int(np.ceil((num_aug_images_wanted - num_files) / batch_size))
# run the generator and create about 6000 augmented images
for i in range(0, num_batches):
imgs, labels = next(aug_datagen)
# delete temporary directory with the raw image files
shutil.rmtree('aug_dir')
预处理代码
# Declare a few useful values
num_train_samples = train_len
num_val_samples = val_len
train_batch_size = 16
val_batch_size = 100
image_height = 224
image_width = 224
# Declare how many steps are needed in an iteration
train_steps = np.ceil(num_train_samples / train_batch_size)
val_steps = np.ceil(num_val_samples / val_batch_size)
# Set up generators
datagenr = ImageDataGenerator(
preprocessing_function= \
keras.applications.densenet.preprocess_input)
train_batches = datagenr.flow_from_directory(
train_path,
target_size=(image_height, image_width),
batch_size=train_batch_size)
val_batches = datagenr.flow_from_directory(
val_path,
target_size=(image_height, image_width),
batch_size=val_batch_size)
# Note: shuffle=False causes the test dataset to not be shuffled
test_batches = datagenr.flow_from_directory(
val_path,
target_size=(image_height, image_width),
batch_size=val_batch_size,
shuffle=False)
我是该领域的新手,所以如果您认为这个问题不合适,请不要只是不喜欢这个问题,而是发表评论,这可能有助于我和其他人提高自己,不要问类似的问题。

