数据挖掘 - 如何在 tfds 库中加载具有特定结构的数据集？ - 吾爱随笔录

如何在 tfds 库中加载具有特定结构的数据集？

数据挖掘神经网络喀拉斯张量流数据集美国有线电视新闻网

2022-03-04 22:59:56

我有一个数据集，它的类按以下方式排列：

/dataset/train/images/class1/

/dataset/train/images/class2/
.
.
.
/dataset/train/images/classN/

有谁知道如何在train_dsTFDS 库的帮助下将数据存储在变量中？

1个回答

这是图像分类的常见文件夹结构，因此许多库都有一个数据集类（torchvision、fastai、tfds），通常称为ImageFolder.

在 TFDS 的情况下，这是在tfds.folder_dataset.ImageFolder：

tfds.folder_dataset.ImageFolder(
    root_dir: str,
    *,
    shape: Optional[type_utils.Shape] = None,
    dtype: Optional[tf.DType] = None
)

由于您的文件夹已经具有预期的格式，即：

  split_name/  # Ex: 'train'
    label1/  # Ex: 'airplane' or '0015'
      xxx.png
      xxy.png
      xxz.png
    label2/
      xxx.png
      xxy.png
      xxz.png
  split_name/  # Ex: 'test'

你可以像这样实例化它：

train_ds = tfds.folder_dataset.ImageFolder(
    root_dir = "/dataset/", # Note that this is a absolute path, you should use "./dataset/" or "dataset/" or "<current_working_dir_full_path>/dataset/" if that is the case.
)

# If you want a tensorflow.data.Dataset
train_ds = train_ds.as_dataset(**args)

方法支持的参数as_dataset是：

(
    split: Optional[Union[str, tfds.core.ReadInstruction]] = None,
    batch_size: tfds.typing.Dim = None,
    shuffle_files: bool = False,
    decoders: Optional[TreeDict[decode.Decoder]] = None,
    read_config: Optional[tfds.ReadConfig] = None,
    as_supervised: bool = False
)

您可以参考文档了解详细信息。

其它你可能感兴趣的问题

上一篇决策树为什么基尼指数只用于二元选择？下一篇如何解决这个问题：一个 KPI 的百分比变化导致其他 KPI 的变化？