Skip to content

Drive Groups

Joshua Schmid edited this page Feb 26, 2019 · 37 revisions

Drive Group specification

drive_group_default:
  target: *
  data_devices:
    device_spec: arg 
  db_devices:
    device_spec: arg
  wal_devices:
    device_spec: arg
  osds_per_device: 1 # number of osd daemons per device. To fully utilize nvme devices multiple osds are required.
  objectstore: bluestore # not implemented (defaulting to bluestore, filestore is deprecated with nautilus )
  encrypted: True # not implemented
  db_slots: 5  # not implemented
  wal_slots: 1 # if deploying on 3 devices how many wal volumes per db device # not implemented
  # and other c-v flags like wal_underprovision_ratio, though this is still being discussed
  # also some keys are only available when others are present ({wal,db}_devices for bluestore, journal_devices for filestore) # not implemented

with {device_spec} being Either

Substring Matching:

# substring match on the ID_MODEL property of the drive
model: disk_model_name

# substring match on the MODEL property of the drive
vendor: disk_vendor_name

Size Matching:

Please note the quotes around the values when using delimiter notation. Yaml would interpret the ':' as a new hash. 

Low:High
size: '10G:40G' 

Size specification of format LOW:HIGH. Can also take the the form :HIGH, LOW: or an exact value (as ceph-volume inventory reports)

Exact
size: '10G'

Low:
size: ':10G'

High
size: '50G:'


Sizes don't have to be exclusively in Gigabyte(G). Supported units are Megabyte(M), Gigabyte(G) and Terrabyte(T). Also appending the (B) for byte is supported. MB, GB, TB

Equality Matcher:

# is the drive rotating or not (SSDs and NVMEs don't rotate)
rotates: 0

# if this is present limit the number of drives to this number. 
limit: 10

The drive_spec for data_devices may also simply be all instead of a yaml structure. This should offer a convenient method to deploy a node while using all available drives to deploy standalone OSDs.

All Matcher:

all: true

This new structure is proposed to serve as a declarative way to specify OSD deployments. On a per host basis OSD deployments are defined by the list of devices and their intended use (data, wal, db or journal) and a list of flags for the deployment tools (ceph-volume in this case). The Drive Group specification (dg) is intended to be created manually by a user and specifies a group of OSDs that are interrelated (hybrid OSDs that are deployed on solid state and spinners) or share the same deployment options (identical, i.e. same objectstore, same encryption option, ... standalone OSDs) To avoid explicitly listing devices, we'll rely on a list of filter items. These correspond to a few selected fields of ceph-volume inventory reports. In the simplest case this could be the rotational flag (all solid-state drives are to be db_devices, all rotating one data devices) or something more involved like model strings, sizes or others. DeepSea will provide code that translates these drive groups into actual device lists for inspection by the user.

Example Drive Group Files

2 Nodes with the same setup:

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 2 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB

This is a common setup and can be described quite easily:

The simple case

drive_group_default:
  target: '*'
  data_devices:
    model: SSD-123-foo
  db_devices:
    model: MC-55-44-XZ

This is a simple and valid, but maybe not future-safe configuration. The user may add disks of different vendors in the future, which wouldn't be included with this configuration

We can improve it by reducing the filters on core properties of the drives:

drive_group_default:
  target: '*'
  data_devices:
    rotates: 1
  db_devices:
    rotates: 0

Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db)

If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size:

drive_group_default:
  target: '*'
  data_devices:
    size: '2TB:'
  db_devices:
    size: ':2TB'

Forcing encryption on your OSDs is as simple as appending 'encrypted: True' to the layout(? need to agree on a terminology, probably layout is bad - specification or spec?).

drive_group_default:
  target: '*'
  data_devices:
    size: '2TB:'
  db_devices:
    size: ':2TB'
  encrypted: True

This was a rather simple setup. Following this approach you can also describe more sophisticated setups.

The advanced case

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 12 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB
  • 2 NVMEs
    • Vendor: Samsung
    • Model: NVME-QQQQ-987
    • Size: 256GB

Here we have two distinct setups;

20 HDDs should share 2 SSDs;

10 SSDs should share 2 NVMes;

This can be described with two layouts.

drive_group:
  target: '*'
  data_devices:
    rotates: 0
  db_devices:
    model: MC-55-44-XZ
  db_slots: 5 # How many OSDs per DB device

Settings db_slots: 5 will ensure that only two SSDs will be used ( 10 left )

followed by

drive_group_default:
  target: '*'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    vendor: samsung
    size: 256GB
  db_slots: 5 # How many OSDs per DB device

The advanced case (with non-uniform nodes)

The examples above assumed that all nodes have the same drives. That's however not always the case. Example:

Node1-5:

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 2 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB

Node6-10:

  • 5 NVMEs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 20 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB

You can use the 'target' key in the layout to target certain nodes. Salt target notation helps to keep things easy.

drive_group_default:
  target: 'node[1-5]'
  data_devices:
    rotates: 1
  db_devices:
    rotates: 0

followed by:

drive_group_default:
  target: 'node[6-10]'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    model: SSD-123-foo

The expert case

All previous cases co-colacated the WALs with the DBs. It's however possible to deploy the WAL on a dedicated device as well(if it makes sense).

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 2 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB
  • 2 NVMEs
    • Vendor: Samsung
    • Model: NVME-QQQQ-987
    • Size: 256GB
drive_group_default:
  target: '*'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    model: SSD-123-foo
  wal_devices:
    model: NVME-QQQQ-987
  db_slots: 10
  wal_slots: 10

The very unlikely(but possible) case

Neither Ceph, Deepsea or ceph-volume prevents you from making questionable decisions.

  • 23 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 10 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB
  • 1 NVMEs
    • Vendor: Samsung
    • Model: NVME-QQQQ-987
    • Size: 256GB

Here we are trying to define:

20 HDDs backed by 1 NVME

2 HDDs backed by 1 SSD(db) and 1 NVME(wal)

8 SSDs backed by 1 NVME

2 SSDs standalone (encrypted)

1 HDD is spare and should not be deployed

drive_group_hdd_nvme:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: NVME-QQQQ-987
  db_slots: 20
drive_group_hdd_ssd_nvme:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: MC-55-44-XZ
  wal_devices:
    model: NVME-QQQQ-987
  db_slots: 2
  wal_slots: 2
drive_group_ssd_nvme:
  target: '*'
  data_devices:
    model: SSD-123-foo
  db_devices:
    model: NVME-QQQQ-987
  db_slots: 8
drive_group_ssd_standalone_encrypted:
  target: '*'
  data_devices:
    model: SSD-123-foo
  encryption: True

One HDD will remain as the file is being parsed from top to bottom and the db_slots(former ratios) are strictly defined.