Skip to content

Benchmarking Zoned Block Devices

The Flexible I/O Tester (fio) was originally written as a test tool for the kernel block I/O stack. Over the years, fio however gained many features and detailed performance statistics output that turned this tool into a standard benchmark application for storage devices.

fio source code is available in GitHub.

fio Zoned Block Device Support

Support for zoned block devices was added to fio with version 3.9. All previous versions do not provide guarantees for write command ordering with host managed zoned block devices. Executing workloads is still possible, but requires writing complex fio scripts.

Command Line Options

Changes to fio to support zoned block devices include several new options allowing a user to control zoned block device compliant workloads. fio already implemented the option --zonemode which allows defining workloads operating on disjoint ranges of blocks. This option was reused to define the new zbd zone mode.

When the zbd zone mode is used by an fio job, the --zonerange option is ignored and the --zonesize option is automatically set to the device zone size. Furthermore, the behavior of read and write commands is modified as follows.

  • Read and write commands are split when a zone boundary is crossed.

  • For sequential writes, the write stream is always started from a zone write pointer position. If the next zone to be written is not empty, the write stream "jumps" to that zone write pointer and resumes.

  • For random write workloads, write commands are always issued at the write pointer position of the target zone of the write command.

  • Any write command targeting a sequential zone that is full (entirely written) will trigger a reset of the zone write pointer before issuing the write I/O.

  • By default, all read commands always target written sectors, that is, sectors between the start sector and the write pointer position of sequential write zones. This behavior can be disabled, allowing read commands to be issued to any sector, using the new option read_beyond_wp.

Additionally, finer control over the workload operation can be added with the following new options.

  • --max_open_zones This option limits the number of zones that are being written by a workload. With this option, a random write workload cannot issue write commands targeting more zones than the limit set. Once a zone that is being written becomes full, another zone is chosen and writes are allowed to target the zone, resulting in a constant number of zones being written always at most equal to the max_open_zones limit.

  • --zone_reset_threshold and --zone_reset_frequency These two options allow a user to emulate the execution of zone reset commands being issued by an application.

In addition to these options, the zbd zone mode automatically enables job synchronization to ensure that a workload spanning multiple threads or processes can concurrently execute write I/Os targeting the same zone.

Restrictions

As discussed in the kernel support document, using direct write I/O is mandatory for zoned block devices. The zbd zone mode, when enabled, enforces this requirement by checking that the option --direct=1 is specified for any job executing write I/Os.

The --offset, --size and --zonesize options must specify values that are aligned to zone boundaries.

Examples

This section provides various examples showing how to use fio new zbd zone mode. In all examples, a 15TB ZAC host managed SATA disk is used. The disk zone size is 256 MiB and has 524 conventional zones starting at offset 0. The first sequential write required zone of the disk starts at sector 274726912 (512 B sector unit), that is, at the byte offset 140660178944.

Sequential Write Workload

The following command sequentially writes the first 4 sequential zones of the disk using the libaio I/O engine with a queue depth of 8.

# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \
      --offset=140660178944 --size=1G \
      --ioengine=libaio --iodepth=8 --rw=write --bs=256K
zbc: (g=0): rw=write, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=8
fio-3.13
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=239MiB/s][w=955 IOPS][eta 00m:00s]
zbc: (groupid=0, jobs=1): err= 0: pid=4124: Fri May 24 11:49:18 2019
  write: IOPS=938, BW=235MiB/s (246MB/s)(1024MiB/4365msec); 0 zone resets
    slat (nsec): min=5930, max=39068, avg=9729.06, stdev=2048.99
    clat (usec): min=783, max=55846, avg=8511.40, stdev=4079.36
     lat (usec): min=809, max=55854, avg=8521.19, stdev=4079.36
    clat percentiles (usec):
     |  1.00th=[ 3884],  5.00th=[ 7701], 10.00th=[ 8094], 20.00th=[ 8225],
     | 30.00th=[ 8225], 40.00th=[ 8225], 50.00th=[ 8225], 60.00th=[ 8291],
     | 70.00th=[ 8356], 80.00th=[ 8356], 90.00th=[ 8356], 95.00th=[ 8356],
     | 99.00th=[30540], 99.50th=[45351], 99.90th=[55837], 99.95th=[55837],
     | 99.99th=[55837]
   bw (  KiB/s): min=224830, max=249357, per=99.49%, avg=239009.50, stdev=9032.95, samples=8
   iops        : min=  878, max=  974, avg=933.50, stdev=35.36, samples=8
  lat (usec)   : 1000=0.02%
  lat (msec)   : 4=2.91%, 10=95.70%, 20=0.20%, 50=0.78%, 100=0.39%
  cpu          : usr=0.73%, sys=0.64%, ctx=1045, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=99.8%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4096,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=235MiB/s (246MB/s), 235MiB/s-235MiB/s (246MB/s-246MB/s), io=1024MiB (1074MB), run=4365-4365msec

Disk stats (read/write):
  sdd: ios=0/984, merge=0/2943, ticks=0/8365, in_queue=7383, util=24.13%

The first four sequential write zones of the disk are now full.

# zbc_report_zones /dev/sdd
Device /dev/sdd:
    Vendor ID: ATA xxxx xxxxxxxxxxx xxxx
    Zoned block device interface, Host-managed zone model
    29297213440 512-bytes sectors
    3662151680 logical blocks of 4096 B
    3662151680 physical blocks of 4096 B
    15000.173 GB capacity
    Read commands are unrestricted
    Maximum number of open sequential write required zones: 128
    55880 zones from 0, reporting option 0x00
55880 / 55880 zones:
...
one 00524: type 0x2 (Sequential-write-required), cond 0xe (Full), reset recommended 0, non_seq 0, sector 274726912, 524288 sectors, wp 275251200
Zone 00525: type 0x2 (Sequential-write-required), cond 0xe (Full), reset recommended 0, non_seq 0, sector 275251200, 524288 sectors, wp 275775488
Zone 00526: type 0x2 (Sequential-write-required), cond 0xe (Full), reset recommended 0, non_seq 0, sector 275775488, 524288 sectors, wp 276299776
Zone 00527: type 0x2 (Sequential-write-required), cond 0xe (Full), reset recommended 0, non_seq 0, sector 276299776, 524288 sectors, wp 276824064
Zone 00528: type 0x2 (Sequential-write-required), cond 0x1 (Empty), reset recommended 0, non_seq 0, sector 276824064, 524288 sectors, wp 276824064
...

With the disk in this state, executing the same command again without the zbd zone mode enabled, fio will attempt to write to full zones, resulting in I/O errors.

# fio --name=zbc --filename=/dev/sdd --direct=1 \
      --offset=140660178944 --size=1G \
      --ioengine=libaio --iodepth=8 --rw=write --bs=256K
zbc: (g=0): rw=write, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=8
fio-3.13
Starting 1 process
fio: io_u error on file /dev/sdd: Remote I/O error: write offset=140660178944, buflen=262144
fio: pid=4206, err=121/file:io_u.c:1791, func=io_u error, error=Remote I/O error

zbc: (groupid=0, jobs=1): err=121 (file:io_u.c:1791, func=io_u error, error=Remote I/O error): pid=4206: Fri May 24 12:34:27 2019
  cpu          : usr=1.22%, sys=0.00%, ctx=3, majf=0, minf=16
  IO depths    : 1=12.5%, 2=25.0%, 4=50.0%, 8=12.5%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,8,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):

Disk stats (read/write):
  sdd: ios=48/3, merge=0/5, ticks=13/35, in_queue=32, util=5.39%

With the zbd zone mode enabled, the same command executed again with the zones full succeeds.

# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \
      --offset=140660178944 --size=1G \
      --ioengine=libaio --iodepth=8 --rw=write --bs=256K
zbc: (g=0): rw=write, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=8
fio-3.13
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=243MiB/s][w=973 IOPS][eta 00m:00s]
zbc: (groupid=0, jobs=1): err= 0: pid=4220: Fri May 24 12:37:29 2019
  write: IOPS=949, BW=237MiB/s (249MB/s)(1024MiB/4316msec); 4 zone resets
    slat (nsec): min=5937, max=40055, avg=9651.92, stdev=2104.34
    clat (usec): min=795, max=36562, avg=8243.26, stdev=2031.55
     lat (usec): min=818, max=36571, avg=8252.96, stdev=2031.50
    clat percentiles (usec):
     |  1.00th=[ 3884],  5.00th=[ 7701], 10.00th=[ 8160], 20.00th=[ 8225],
     | 30.00th=[ 8225], 40.00th=[ 8225], 50.00th=[ 8225], 60.00th=[ 8291],
     | 70.00th=[ 8291], 80.00th=[ 8356], 90.00th=[ 8356], 95.00th=[ 8356],
     | 99.00th=[10159], 99.50th=[27919], 99.90th=[33817], 99.95th=[36439],
     | 99.99th=[36439]
   bw (  KiB/s): min=234538, max=249357, per=99.93%, avg=242777.88, stdev=5032.30, samples=8
   iops        : min=  916, max=  974, avg=948.25, stdev=19.70, samples=8
  lat (usec)   : 1000=0.02%
  lat (msec)   : 2=0.07%, 4=2.78%, 10=96.02%, 20=0.51%, 50=0.59%
  cpu          : usr=1.27%, sys=0.67%, ctx=1051, majf=0, minf=12
  IO depths    : 1=0.1%, 2=0.2%, 4=0.4%, 8=99.3%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4096,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=237MiB/s (249MB/s), 237MiB/s-237MiB/s (249MB/s-249MB/s), io=1024MiB (1074MB), run=4316-4316msec

Disk stats (read/write):
  sdd: ios=4/998, merge=0/2985, ticks=81/8205, in_queue=7286, util=24.38%

Note that fio output in this case indicates the number of zones that were reset prior to writing.

Sequential Read Workload

With the disk previous state preserved (with the first four sequential write zones full), the previous command can be changed to read operations and will succeed in accessing the written zones.

# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \
      --offset=140660178944 --size=1G \
      --ioengine=libaio --iodepth=8 --rw=read --bs=256K
zbc: (g=0): rw=read, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=8
fio-3.13
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=243MiB/s][r=970 IOPS][eta 00m:00s]
zbc: (groupid=0, jobs=1): err= 0: pid=4236: Fri May 24 12:40:18 2019
  read: IOPS=951, BW=238MiB/s (249MB/s)(1024MiB/4304msec)
    slat (usec): min=5, max=148, avg= 6.62, stdev= 5.68
    clat (usec): min=976, max=39536, avg=8394.44, stdev=1933.57
     lat (usec): min=1125, max=39543, avg=8401.12, stdev=1934.43
    clat percentiles (usec):
     |  1.00th=[ 6390],  5.00th=[ 6718], 10.00th=[ 6915], 20.00th=[ 7439],
     | 30.00th=[ 8094], 40.00th=[ 8291], 50.00th=[ 8356], 60.00th=[ 8356],
     | 70.00th=[ 8356], 80.00th=[ 8356], 90.00th=[ 9765], 95.00th=[10290],
     | 99.00th=[14615], 99.50th=[25560], 99.90th=[29492], 99.95th=[39584],
     | 99.99th=[39584]
   bw (  KiB/s): min=223808, max=249868, per=99.57%, avg=242586.38, stdev=8265.29, samples=8
   iops        : min=  874, max=  976, avg=947.50, stdev=32.35, samples=8
  lat (usec)   : 1000=0.02%
  lat (msec)   : 2=0.05%, 4=0.05%, 10=92.53%, 20=6.81%, 50=0.54%
  cpu          : usr=0.40%, sys=0.72%, ctx=4113, majf=0, minf=522
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=99.8%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=4096,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: bw=238MiB/s (249MB/s), 238MiB/s-238MiB/s (249MB/s-249MB/s), io=1024MiB (1074MB), run=4304-4304msec

Disk stats (read/write):
  sdd: ios=4031/0, merge=0/0, ticks=33836/0, in_queue=29809, util=57.74%

If the zones are reset before executing this command, no read I/O will be executed as fio will be enable to find zones with written sectors.

# blkzone reset -o 274726912 -l 2097152 /dev/sdd
# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \
      --offset=140660178944 --size=1G \
      --ioengine=libaio --iodepth=8 --rw=read --bs=256K
zbc: (g=0): rw=read, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=8
fio-3.13
Starting 1 process


Run status group 0 (all jobs):

Disk stats (read/write):
  sdd: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Forcing the execution of read I/Os targeting empty zones can be done using the --read_beyond_wp option.

# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \
      --offset=140660178944 --size=1G --read_beyond_wp=1 \
      --ioengine=libaio --iodepth=8 --rw=read --bs=256K
zbc: (g=0): rw=read, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=8
fio-3.13
Starting 1 process
Jobs: 1 (f=1): [R(1)][-.-%][r=353MiB/s][r=1412 IOPS][eta 00m:00s]
zbc: (groupid=0, jobs=1): err= 0: pid=4322: Fri May 24 12:46:32 2019
  read: IOPS=1411, BW=353MiB/s (370MB/s)(1024MiB/2901msec)
    slat (usec): min=5, max=147, avg= 6.50, stdev= 5.72
    clat (usec): min=1978, max=8845, avg=5654.86, stdev=112.79
     lat (usec): min=2126, max=8851, avg=5661.41, stdev=111.86
    clat percentiles (usec):
     |  1.00th=[ 5604],  5.00th=[ 5604], 10.00th=[ 5604], 20.00th=[ 5669],
     | 30.00th=[ 5669], 40.00th=[ 5669], 50.00th=[ 5669], 60.00th=[ 5669],
     | 70.00th=[ 5669], 80.00th=[ 5669], 90.00th=[ 5669], 95.00th=[ 5735],
     | 99.00th=[ 5735], 99.50th=[ 5800], 99.90th=[ 7177], 99.95th=[ 7963],
     | 99.99th=[ 8848]
   bw (  KiB/s): min=360239, max=361261, per=99.81%, avg=360750.00, stdev=361.33, samples=5
   iops        : min= 1407, max= 1411, avg=1409.00, stdev= 1.41, samples=5
  lat (msec)   : 2=0.02%, 10=99.98%
  cpu          : usr=0.24%, sys=1.34%, ctx=4108, majf=0, minf=522
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=99.8%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=4096,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: bw=353MiB/s (370MB/s), 353MiB/s-353MiB/s (370MB/s-370MB/s), io=1024MiB (1074MB), run=2901-2901msec

Disk stats (read/write):
  sdd: ios=3869/0, merge=0/0, ticks=21868/0, in_queue=18002, util=81.20%

Note

The higher IOPS performance observed with this test compared to the previous one (i.e. IOPS=1411 vs. IOPS=951) results from the disk not physically executing any media access as there is no data to read (no written sectors). The disks returns a fill pattern as data without seeking to the sectors specified by the read commands.

Random Read and Write Workloads

The following command randomly write sequential write zones of the disk using 4 jobs, each job operating at a queue depth of 4 (overall queue depth of 16 for the disk). The run time is set to 30 seconds.

# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \
      --offset=140660178944 --numjobs=4 --group_reporting=1 --runtime=30 \
      --ioengine=libaio --iodepth=4 --rw=randwrite --bs=256K
zbc: (g=0): rw=randwrite, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=4
...
fio-3.13
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][w=33.0MiB/s][w=132 IOPS][eta 00m:00s]
zbc: (groupid=0, jobs=4): err= 0: pid=4425: Fri May 24 12:58:43 2019
  write: IOPS=160, BW=40.2MiB/s (42.2MB/s)(1209MiB/30064msec); 0 zone resets
    slat (nsec): min=7815, max=75710, avg=12304.05, stdev=3091.33
    clat (usec): min=1410, max=221285, avg=98856.71, stdev=29971.24
     lat (usec): min=1435, max=221295, avg=98869.07, stdev=29970.46
    clat percentiles (msec):
     |  1.00th=[   12],  5.00th=[   24], 10.00th=[   75], 20.00th=[   84],
     | 30.00th=[   89], 40.00th=[   94], 50.00th=[   99], 60.00th=[  105],
     | 70.00th=[  110], 80.00th=[  117], 90.00th=[  128], 95.00th=[  138],
     | 99.00th=[  194], 99.50th=[  203], 99.90th=[  218], 99.95th=[  220],
     | 99.99th=[  222]
   bw (  KiB/s): min=27092, max=138608, per=99.80%, avg=41096.13, stdev=3423.45, samples=240
   iops        : min=  103, max=  540, avg=159.05, stdev=13.38, samples=240
  lat (msec)   : 2=0.02%, 4=0.06%, 10=0.43%, 20=3.74%, 50=1.26%
  lat (msec)   : 100=47.27%, 250=47.21%
  cpu          : usr=0.09%, sys=0.15%, ctx=84537, majf=0, minf=261
  IO depths    : 1=0.1%, 2=0.2%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4836,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
  WRITE: bw=40.2MiB/s (42.2MB/s), 40.2MiB/s-40.2MiB/s (42.2MB/s-42.2MB/s), io=1209MiB (1268MB), run=30064-30064msec

Disk stats (read/write):
  sdd: ios=0/4807, merge=0/0, ticks=0/474905, in_queue=470163, util=9.89%

zbc_report_zones can be used to explore the state of the disk at the end of this workload execution.

# zbc_report_zones -ro full -n /dev/sdd
Device /dev/sdd:
    Vendor ID: ATA xxxx xxxxxxxxxxx xxxx
    Zoned block device interface, Host-managed zone model
    29297213440 512-bytes sectors
    3662151680 logical blocks of 4096 B
    3662151680 physical blocks of 4096 B
    15000.173 GB capacity
    Read commands are unrestricted
    Maximum number of open sequential write required zones: 128
    0 zone from 0, reporting option 0x05
# zbc_report_zones -ro closed -n /dev/sdd
Device /dev/sdd:
    Vendor ID: ATA xxxx xxxxxxxxxxx xxxx
    Zoned block device interface, Host-managed zone model
    29297213440 512-bytes sectors
    3662151680 logical blocks of 4096 B
    3662151680 physical blocks of 4096 B
    15000.173 GB capacity
    Read commands are unrestricted
    Maximum number of open sequential write required zones: 128
    4498 zones from 0, reporting option 0x04
# zbc_report_zones -ro imp_open -n /dev/sdd
Device /dev/sdd:
    Vendor ID: ATA xxxx xxxxxxxxxxx xxxx
    Zoned block device interface, Host-managed zone model
    29297213440 512-bytes sectors
    3662151680 logical blocks of 4096 B
    3662151680 physical blocks of 4096 B
    15000.173 GB capacity
    Read commands are unrestricted
    Maximum number of open sequential write required zones: 128
    128 zones from 0, reporting option 0x02

This indicates that 4498+128=4626 zones were written to, with none of the sequential write zones fully written (no full zone). Switching the operation mode to read, the sectors written in this last test can be randomly read.

# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \                
      --offset=140660178944 --numjobs=4 --group_reporting=1 --runtime=30 \      
      --ioengine=libaio --iodepth=4 --rw=randread --bs=256K
...
fio-3.13
Starting 4 processes
Jobs: 4 (f=4): [r(4)][0.0%][r=31.0MiB/s][r=124 IOPS][eta 23d:02h:02m:24s]
zbc: (groupid=0, jobs=4): err= 0: pid=4494: Fri May 24 13:24:08 2019
  read: IOPS=118, BW=29.7MiB/s (31.1MB/s)(894MiB/30156msec)
    slat (usec): min=6, max=183, avg= 7.86, stdev= 8.75
    clat (usec): min=1252, max=1537.8k, avg=133931.63, stdev=123584.49
     lat (usec): min=1260, max=1537.8k, avg=133939.56, stdev=123584.42
    clat percentiles (msec):
     |  1.00th=[    8],  5.00th=[   16], 10.00th=[   20], 20.00th=[   35],
     | 30.00th=[   53], 40.00th=[   72], 50.00th=[   96], 60.00th=[  127],
     | 70.00th=[  169], 80.00th=[  218], 90.00th=[  296], 95.00th=[  368],
     | 99.00th=[  558], 99.50th=[  625], 99.90th=[  869], 99.95th=[  919],
     | 99.99th=[ 1536]
   bw (  KiB/s): min=15855, max=50152, per=100.00%, avg=30364.05, stdev=1762.17, samples=240
   iops        : min=   60, max=  195, avg=117.22, stdev= 6.92, samples=240
  lat (msec)   : 2=0.22%, 4=0.06%, 10=1.40%, 20=8.44%, 50=18.26%
  lat (msec)   : 100=23.15%, 250=33.30%, 500=13.47%, 750=1.51%, 1000=0.17%
  cpu          : usr=0.06%, sys=0.12%, ctx=82815, majf=0, minf=1281
  IO depths    : 1=0.1%, 2=0.2%, 4=99.7%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=3577,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=29.7MiB/s (31.1MB/s), 29.7MiB/s-29.7MiB/s (31.1MB/s-31.1MB/s), io=894MiB (938MB), run=30156-30156msec

Disk stats (read/write):
  sdd: ios=3565/0, merge=0/0, ticks=475774/0, in_queue=472274, util=11.93%

Resetting all sequential write zones of the disk and executing again the random read workload leads to similar results as for the previous sequential read workload case, that is, no read I/O is executed.

# blkzone reset -o 274726912 /dev/sdd
# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \
      --offset=140660178944 --numjobs=4 --group_reporting=1 --runtime=30 \
      --ioengine=libaio --iodepth=4 --rw=randread --bs=256K
...
fio-3.13
Starting 4 processes


Run status group 0 (all jobs):

Disk stats (read/write):
  sdd: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Changing the range to be read to include the conventional zones of the disk will lead to normal results.

# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \
      --offset=0 --size=140660178944 --numjobs=4 --group_reporting=1 \
      --runtime=30 --ioengine=libaio --iodepth=4 --rw=randread --bs=256K
zbc: (g=0): rw=randread, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=4
...
fio-3.13
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=58.0MiB/s][r=232 IOPS][eta 00m:00s]
zbc: (groupid=0, jobs=4): err= 0: pid=4570: Fri May 24 13:35:15 2019
  read: IOPS=215, BW=53.8MiB/s (56.4MB/s)(1617MiB/30079msec)
    slat (usec): min=6, max=143, avg= 7.29, stdev= 6.22
    clat (usec): min=1065, max=959229, avg=74313.80, stdev=83291.27
     lat (usec): min=1072, max=959237, avg=74321.15, stdev=83291.39
    clat percentiles (msec):
     |  1.00th=[    8],  5.00th=[    9], 10.00th=[   12], 20.00th=[   17],
     | 30.00th=[   24], 40.00th=[   34], 50.00th=[   47], 60.00th=[   62],
     | 70.00th=[   82], 80.00th=[  113], 90.00th=[  174], 95.00th=[  239],
     | 99.00th=[  409], 99.50th=[  472], 99.90th=[  634], 99.95th=[  651],
     | 99.99th=[  961]
   bw (  KiB/s): min=27136, max=81920, per=99.97%, avg=55030.83, stdev=2685.56, samples=240
   iops        : min=  106, max=  320, avg=214.10, stdev=10.53, samples=240
  lat (msec)   : 2=0.02%, 10=8.23%, 20=16.22%, 50=28.40%, 100=23.61%
  lat (msec)   : 250=19.08%, 500=4.07%, 750=0.36%, 1000=0.03%
  cpu          : usr=0.02%, sys=0.07%, ctx=6483, majf=0, minf=1070
  IO depths    : 1=0.1%, 2=0.1%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=6468,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=53.8MiB/s (56.4MB/s), 53.8MiB/s-53.8MiB/s (56.4MB/s-56.4MB/s), io=1617MiB (1696MB), run=30079-30079msec

Disk stats (read/write):
  sdd: ios=6452/0, merge=0/0, ticks=479006/0, in_queue=472619, util=21.62%

Direct Access sg IO Engine

The SCSI generic direct access interface can also be used with the zbd zone mode, as long as the block device file (/dev/sdX) is used to specify the disk. The zbd zone mode will not be enabled if the SCSI generic node file (/dev/sgY) is used to specify the disk.

The example below illustrates the use of the sg I/O engine with 8 jobs executing a 64KB random write workload to sequential write zones.

# fio --name=zbc --filename=/dev/sdd --direct=1 --zonemode=zbd \
      --offset=140660178944 --size=1G --numjobs=8 --group_reporting=1 \
      --ioengine=sg --rw=randwrite --bs=64K
...
fio-3.13
Starting 8 processes
zbc: (groupid=0, jobs=8): err= 0: pid=4792: Fri May 24 14:18:52 2019
  write: IOPS=2007, BW=125MiB/s (132MB/s)(8192MiB/65278msec); 32 zone resets
    clat (usec): min=148, max=327494, avg=1589.26, stdev=4060.13
     lat (usec): min=149, max=327497, avg=1590.55, stdev=4060.13
    clat percentiles (usec):
     |  1.00th=[   848],  5.00th=[   988], 10.00th=[  1090], 20.00th=[  1172],
     | 30.00th=[  1221], 40.00th=[  1287], 50.00th=[  1369], 60.00th=[  1434],
     | 70.00th=[  1500], 80.00th=[  1631], 90.00th=[  2180], 95.00th=[  2638],
     | 99.00th=[  3064], 99.50th=[  3392], 99.90th=[ 24773], 99.95th=[ 77071],
     | 99.99th=[291505]
   bw (  KiB/s): min=39001, max=178678, per=100.00%, avg=131457.71, stdev=3143.56, samples=1015
   iops        : min=  606, max= 2791, avg=2052.81, stdev=49.16, samples=1015
  lat (usec)   : 250=0.07%, 500=0.12%, 750=0.38%, 1000=4.89%
  lat (msec)   : 2=81.56%, 4=12.76%, 10=0.05%, 20=0.06%, 50=0.05%
  lat (msec)   : 100=0.03%, 250=0.02%, 500=0.01%
  cpu          : usr=0.14%, sys=0.20%, ctx=251460, majf=0, minf=139
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,131072,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=125MiB/s (132MB/s), 125MiB/s-125MiB/s (132MB/s-132MB/s), io=8192MiB (8590MB), run=65278-65278msec

Disk stats (read/write):
  sdd: ios=278/0, merge=0/0, ticks=1953/0, in_queue=1677, util=0.39%

Note

SCSI generic direct access bypasses the block layer I/O scheduler. For zoned block devices, this means that the deadline I/O scheduler zone write locking is enable to provide write command ordering guarantees. However, the zbd mode ensures mutual exclusion between jobs for write access to the same zone. SUch synchronization is in essence identical to zone write locking and execute all write commands without any error.

Zone Write Streams

A typical zoned block device compliant application will write zones sequentially until the zone is full, then switch to another zone and continue writing. Multiple threads may be operating in this manner, with each thread operating on a different zone.

Such typical behavior can be emulated using the option --rw_sequencer together with a number of I/O operations specified at the end of the --rw=randwrite argument. Below is an example script of 4 jobs sequentially writing zones up to full using 512KB write operations (that is, 512 I/Os per 256 MB zone). The zones being written are chosen randomly within disjoint zone ranges for each job. This is controlled with the offset, size and rw arguments. The script file streams.fio achieving such workload is shown below.

#
# streams.fio: 4 write streams
#

[global]
ioengine=psync
direct=1
thread=1
bs=512K
continue_on_error=none
filename=/dev/sdd
group_reporting=1
zonemode=zbd

[stream1]
offset=140660178944
size=3714878275584
rw=randwrite:512
rw_sequencer=sequential
io_size=2G

[stream2]
offset=3855538454528
size=3714878275584
rw=randwrite:512
rw_sequencer=sequential
io_size=2G

[stream3]
offset=7570416730112
size=3714878275584
rw=randwrite:512
rw_sequencer=sequential
io_size=2G

[stream4]
offset=11285295005696
size=3714878275584
rw=randwrite:512
rw_sequencer=sequential
io_size=2G

The result for this script execution is shown below.

# fio streams.fio
stream1: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=psync, iodepth=1
stream2: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=psync, iodepth=1
stream3: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=psync, iodepth=1
stream4: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=psync, iodepth=1
fio-3.13
Starting 4 threads
Jobs: 1 (f=1): [_(3),w(1)][98.3%][w=159MiB/s][w=318 IOPS][eta 00m:01s]
stream1: (groupid=0, jobs=4): err= 0: pid=5161: Fri May 24 15:01:21 2019
  write: IOPS=285, BW=143MiB/s (150MB/s)(8192MiB/57362msec); 0 zone resets
    clat (usec): min=992, max=5788.2k, avg=12731.27, stdev=95315.03
     lat (usec): min=996, max=5788.2k, avg=12742.01, stdev=95315.03
    clat percentiles (usec):
     |  1.00th=[   1106],  5.00th=[   2024], 10.00th=[   2114],
     | 20.00th=[   2278], 30.00th=[   2769], 40.00th=[   3687],
     | 50.00th=[   3884], 60.00th=[   4047], 70.00th=[   4228],
     | 80.00th=[   4817], 90.00th=[   5342], 95.00th=[  50594],
     | 99.00th=[ 196084], 99.50th=[ 258999], 99.90th=[ 876610],
     | 99.95th=[2071987], 99.99th=[4731175]
   bw (  KiB/s): min= 9211, max=840925, per=100.00%, avg=185793.45, stdev=44491.80, samples=352
   iops        : min=   17, max= 1642, avg=361.42, stdev=86.97, samples=352
  lat (usec)   : 1000=0.23%
  lat (msec)   : 2=1.95%, 4=54.69%, 10=37.05%, 20=0.15%, 50=0.90%
  lat (msec)   : 100=2.34%, 250=2.16%, 500=0.42%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.04%, >=2000=0.06%
  cpu          : usr=0.15%, sys=0.12%, ctx=16505, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,16384,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=143MiB/s (150MB/s), 143MiB/s-143MiB/s (150MB/s-150MB/s), io=8192MiB (8590MB), run=57362-57362msec

Disk stats (read/write):
  sdd: ios=144/16333, merge=0/0, ticks=573/208225, in_queue=192356, util=28.64%