Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checksum error with btrfs (lru+wb, fifo+ro too) #20

Open
d-a-v opened this issue Jul 24, 2017 · 13 comments
Open

checksum error with btrfs (lru+wb, fifo+ro too) #20

d-a-v opened this issue Jul 24, 2017 · 13 comments

Comments

@d-a-v
Copy link

d-a-v commented Jul 24, 2017

Hello,
On ubuntu 16.04, kernel 4.10, 32GiB ram,
ssd partition of 64GB
btrfs filesystem on spinning hard drive,
bonnie++ (creating 64GiB data) to check performances, I get these errors:

[ 1004.048598] BTRFS warning (device sdb3): csum failed ino 258 off 66056749056 csum 1995617934 expected csum 3912343800
[ 1055.395836] BTRFS warning (device sdb3): csum failed ino 258 off 1221169152 csum 1152196041 expected csum 3153844041
[ 1073.145194] BTRFS warning (device sdb3): csum failed ino 258 off 1963663360 csum 2858193683 expected csum 1022787776
[ 1074.487867] BTRFS warning (device sdb3): csum failed ino 258 off 2022420480 csum 3031311257 expected csum 355173908
[ 1075.927695] BTRFS warning (device sdb3): csum failed ino 258 off 2076913664 csum 1105526511 expected csum 2993740284
...
[ 1295.325269] BTRFS warning (device sdb3): csum failed ino 258 off 10913980416 csum 2406927829 expected 
csum 1599545349
[ 1300.186300] __readpage_endio_check: 8419 callbacks suppressed
[ 1300.186303] BTRFS warning (device sdb3): csum failed ino 258 off 10913980416 csum 2406927829 expected 
csum 1599545349
...

...

# dmesg | grep 'sdb3.*csum failed' | wc -l
230

and bonnie complains about corrupted files.

@d-a-v
Copy link
Author

d-a-v commented Jul 24, 2017

configuration: lru + write-back.

@d-a-v d-a-v changed the title checksum error with btrfs checksum error with btrfs (lru+wb) Jul 24, 2017
@d-a-v
Copy link
Author

d-a-v commented Jul 25, 2017

I made further tests, and

  • no error without enhanceio (spinning disk alone)
  • same error with (wb,fifo) (wt,fifo) (ro,fifo)

I guess (ro,fifo) is easier to debug.
I'd be happy to provide more relevant informations or run more tests.

@d-a-v
Copy link
Author

d-a-v commented Jul 27, 2017

Here is the stats log of an error. The three columns happen at T, T+10s, T+20s. The error (btrfs checksum failure) happens between column 2 and 3. T is about 4000s after the beginning of the test (bonnie). eio's error file says nothing (all 0).

configuration is (ro, fifo) with a 4GB zram blockdev as ssd, and bonnie continuously restarting its tests with 14GB data (bonnie -r 14000):
(host's memory tested with memtest86, hard drive tested with smartctl, working without eio).
The hard drive partition is empty before starting eio and bonnie.

src_name   /dev/sdb3
ssd_name   /dev/zram0
src_size   808998960
ssd_size   972544
set_size          256
block_size       4096
mode                2
eviction            1
num_sets         3799
num_blocks     972544
metadata        large
state        normal
flags      0x00000020

stats at T, T+10s, T+20s:

reads                     2864       2864       4134032
writes                    977538488  979759640  981952584
read_hits                 16         16         4121752
read_hit_pct              0          0          99
write_hits                0          0          0
write_hit_pct             0          0          0
dirty_write_hits          0          0          0
dirty_write_hit_pct       0          0          0
cached_blocks             267        267        1339
rd_replace                0          0          0
wr_replace                0          0          0
noroom                    0          0          107
cleanings                 0          0          0
md_write_dirty            0          0          0
md_write_clean            0          0          0
md_ssd_writes             0          0          0
do_clean                  0          0          0
nr_blocks                 972544     972544     972544
nr_dirty                  0          0          0
nr_sets                   3799       3799       3799
clean_index               0          0          0
uncached_reads            29         29         38
uncached_writes           509497     510592     511801
uncached_map_size         508895     509990     511198
uncached_map_uncacheable  602        602        603
disk_reads                2848       2848       12280
disk_writes               977538488  979759640  981952584
ssd_reads                 16         16         4121760
ssd_writes                2848       2848       11424
ssd_readfills             2848       2848       11424
ssd_readfill_unplugs      29         29         38
readdisk                  29         29         38
writedisk                 29         29         38
readcache                 2          2          515220
readfill                  356        356        1428
writecache                356        356        1428
readcount                 31         31         515258
writecount                509497     510592     511801
kb_reads                  1432       1432       2067020
kb_writes                 488769244  489879820  490976292
rdtime_ms                 184        184        3672
wrtime_ms                 639811952  641276356  642510548
unaligned_ios             0          0          0

I'm going to provide some more (hopefully useful) data.
Is all of this of any interest to anyone ?

@d-a-v d-a-v changed the title checksum error with btrfs (lru+wb) checksum error with btrfs (lru+wb, fifo+ro too) Jul 27, 2017
@lanconnected
Copy link
Owner

Hi! Thanks for testing! I'll have a look at the issue once i'm back from vacation. As always, it is very helpful to have precise steps to reproduce the issue, i.e. commands for cache creation, fs creation, bonnie test etc.

@d-a-v
Copy link
Author

d-a-v commented Jul 28, 2017

For the record: same error happens with kernel 4.4.

@lanconnected Hi, thanks!

To reproduce the issue I use:

  • the currently opened pull request (mount/umount hooks)
  • the following line in /etc/fstab (set your own uuid or /dev/sdX[N]) (change the eiozramsize= according to your host, I have 32GiB, I ask for 4GB, or use a SSD with eiodev=/dev/sdY[M]):
/dev/disk/by-uuid/7b7e61d9-afec-4ace-b1d2-e0185d8ec537  /test     eio     noauto,defaults,helper=eio,eiozramsize=4000000000,eiopol=fifo,eiomode=ro,eioblksz=4096,eioname=test 0 0
  • btrfs partition filesystem (of size at least 2*ram size for bonnie) (btrfs because of integrated crc checking) on some spinning drive (which uuid is the first column in the fstab line)
    (do not use btrfs subvolume with enhanceio, it will break coherency)
  • the commands:
# cd /; mkdir -p /test; mount /test && ( cd /test && bonnie -u 0 )

mount/umount hooks are currently not really working with CentOS7, because zramctl is not available. I will change the zram commands with something compatible with all OSes. In that case let me know if you need the exact walkthrough with regular eio_cli commands.

@d-a-v
Copy link
Author

d-a-v commented Aug 17, 2017

Hi,

Here are simple steps to reproduce the bug: only one unused partition is needed (here sdc1).
In the example, sdc1's FS is btrfs (with native crc check), of size greater than 2* host's ram.
bonnie/bonnie++ must be installed.
zramctl (unavailable in CentOS) is not needed.

#!/bin/bash

device=/dev/sdc1        # eio's HD
cache=/dev/zram0        # eio's ssd
cache_gb=1              # zram size used as eio's SSD

set -x
set -e

modprobe zram
echo $((cache_gb * 1024 * 1024 * 1024)) > /sys/block/zram0/disksize

eio_cli create -d $device -s $cache -p fifo -m ro -c test

more /proc/enhanceio/test/{config,stats}

mount $device /mnt
cd /mnt
bonnie -u 0

@d-a-v
Copy link
Author

d-a-v commented Sep 18, 2017

@lanconnected Were you able to reproduce this issue ? Is there a way I can help ?

@lanconnected
Copy link
Owner

sry for delay, i'll get to it this week.

@dmytroleonenko
Copy link

any updates? I'm not worried about btrfs but rather about data consistency

@libgradev
Copy link

libgradev commented Apr 3, 2020

Tried this yesterday, still broken. Arch 5.5.13 kernel, EnhanceIO from git.

Default cache options in WriteThrough mode led to a ton of CSUM errors on the underlying BTRFS partition.

@hradec
Copy link

hradec commented Mar 26, 2021

Hi there. I'm having the same issues with BTRFS and eio read only. Is anyone working on it or it's been stale since last year?

@Ristovski
Copy link

@hradec: Safe to assume it has been abandoned for well over a year. EnhanceIO does not even work on newer kernels anymore, due to a breaking change in block IO subsystem.

@hradec
Copy link

hradec commented Jun 27, 2021

@Ristovski: That's sad news! I really loved the simplicity and flexibility of enhancedIO! Being able to add/remove ssd/nvme caching to a filesystem on the fly is an amazing feature!

If I had enough kernel IO knowledge, I would try to help to keep this alive... but unfortunately I'm not there yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants