forked from cea-hpc/robinhood
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLog
440 lines (395 loc) · 22 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
Changes in version 2.5.0:
Summary:
- filesystem disaster recovery features
- new namespace management (new DB schema to properly handle hardlinks, renames...)
- scanning and changelog processing optimizations
- database optimizations (requests batching)
- many other changes, improvements and code cleaning...
Details:
- rbh-diff:
* new command to detect differences between the filesystem and the information
in robinhood database.
* option "--apply=fs" for disaster recovery purpose: restore the filesystem
metadata from robinhood DB.
* makes it possible to rebuild a Lustre MDT from scratch, or from a LVM snapshot
(see "Robinhood Lustre disaster recovery guide" for more details).
- database:
* new namespace implementation in database with new NAMES table (Cray contribution)
- fixes/improves hardlink support
- fixes/improves Lustre ChangeLog hardlink/rename/unlink support
- saves DB storage space
* database request batching: significantly increase database ingest rate.
No longer needs innodb_flush_log_at_tx_commit != 1 to speed up DB operations.
* additional information in DB that can help for disaster recovery:
symlink info, access rights, stripe object indexes, stripe order, nlink...
* set default commit behavior to transaction (prevent from DB inconsistencies)
* optimized multi-table requests
* optimization: minimized attribute set in DB update operations
(don't update attributes that didn't change)
* Fix: deal with mysql case insensitivity for string matching
* triggers and stored procedures versioning mechanism
* prevent from overflows for large INSERT requests, wide stripes...
* prevent from DB deadlocks
- scanning:
* --partial-scan option is deprecated and replaced by an optional argument to --scan (e.g. --scan=/fs/subdir).
* better management of partial scans:
- better detection of removed entries vs. entries moved from a directory to another.
- partial scans can be used for initial DB population (even if the DB is initially empty).
* garbage collection of removed entries in DB is a long operation when terminating a scan (and even more
when terminating a partial scan). Added --no-gc option to skip it (recommanded for partial scans).
* automatically enabling --no-gc if the DB is initially empty (eg. for initial scan).
* optimization: use *at() functions (openat, fstatat) and readdir by chunk (using getdents) instead of POSIX lstat() and readdir_r().
* optimization: use NOATIME flag to access entries as much as possible
* optimizations of get_stripe and get_fid operations.
* new --diff option for robinhood --scan and --readlog: output detected changes in a diff-like format.
- Lustre changelogs:
* changelog batching (Cray contribution): to speed up changelog processing,
robinhood retains changelog records in memory a short time,
to aggregate similar/redundant Changelog records on the same entry before
updating its database.
* support multiple changelog readers (for DNE) as multiple threads (default)
or as multiple processes, possibly on different hosts, by giving a MDT index
to --readlog option.
* resilience to filesystem umount/mount.
- rbh-report:
* new option --entry-info to get all the stored information about an entry
* option --dump-ost can now list multiple OSTs and support ranges notation (e.g. 3,5-8,12-23).
* --dump-ost output indicates if a file has data on a given OST (could be striped on the OST but have no data on it).
- rbh-find:
* new option -crtime to filter entries on creation time.
* output ordering closer to find output
* added missing info in 'rbh-find -ls' output (nlink, mode, symlink info...)
- robinhood-backup:
* by default, use a built-in copy function to avoid the cost of forking copy commands.
* rbh-backup-rebind: tool to rebind an entry in the backend if its fid changed in the filesystem
for any reason (file copied to a new one to change its stripe, etc...)
* rbh-backup-recov new features and options:
--list (list information about entries to be recovered)
--ost <ost_set> to only recover entries for a given set of OSTs (support range notation):
the basic use-case is OST disaster recovery.
--since <time> to only recover entries modified since a given date:
the basic use case is after restoring an OST snapshot.
* symlinks archiving to backend made optional (new parameter 'archive_symlinks')
as they can now be restored using robinhood database information.
- configuration:
* can specify environment variables in config file (e.g. fs_path = $ROOT_DIR ;)
* prevent from using a wrong config file (Cray contribution):
- only check files in /etc/robinhood.d/<purpose>, no longer in the current directory
- fails if to many config files are available.
Changes between version 2.4.2 and 2.4.3:
* [lustre] support of Lustre 2.4
- DNE not fully supported yet: if running multiple MDS,
run 1 instance of changelog reader per MDT.
- Detect file layout changes (new changelog record CL_LAYOUT).
* [lustre] added statistics about changelog processing speed.
* [policies] new parameter 'recheck_ignored_classes' to allow/avoid
rematching entries from ignored classes in migration and purge policies.
* [web ui] security patch to prevent from SQL injection.
* [lustre] fix stack overflow when handling files with wide stripes.
* [DB] better handling of ER_QUERY_INTERRUPTED MySQL error.
* [DB] fixed DB connection leaks.
* Backup & HSM modes:
- [fix] fix segfault in import command when uid/gid can't be resolved.
- [rbh-report] fix bad display of total volume with -u or -g.
* Migration policy features and optimizations:
- [feature] new parameter 'lru_sort_attr' to select LRU sort criteria for policy application.
Previously based on last modification time, it can now be one of:
creation, last_archive, last_mod, last_access.
- [feature] special meaning for condition 'last_archive == 0':
matches entries that have never been archived.
- [feature] suspend migration if copy error rate exceed a threshold.
This is controled by 'suspend_error_pct' and 'suspend_error_min' parameters.
- [stats] migration stats while migration is running: added skipped and error counters.
- [optim] avoid rechecking ignored entries at each pass
- [optim] smoother feeding of migration workers queue
* Code & environment:
- [build] can specify a path to alternative lustre source tree in ./configure
- [tests] allow specifying an alternative path to lfs command
Changes between version 2.4.1 and 2.4.2:
* [general] immediate exit on ctrl+C: don't process all queued operations, just finish current.
* [general] LSB compliance if daemon is already started.
* [DB] validation with MariaDB (replacement for MySQL in Fedora19).
* [config] can set default config file using RBH_CFG_DEFAULT environment variable.
* [config] more precise message if no config file is found.
* [rbh-find] added -not/-! option to rbh-find.
* [lustre] fix for 16 chars pool names.
* [bugfix] fixed memleak in rbh-find.
* [bugfix] fixed segfault if checking scan deadline occured exactly when scan ended.
* [logs] display bandwidth and rate stats during migration run.
* [logs] fix: DB get operations were counted twice in stats.
* [cosmetic] removed "connection failed" warning for one shot commands.
* [cosmetic] fix typos in logs.
* [devel] port to automake 1.12 (since Fedora18).
Changes between version 2.4.0 and 2.4.1:
* [lustre] better file size change detection using CLOSE events
from MDT ChangeLog (requires Lustre 2.2 or +)
* [scan] optimization: using fstatat and getdents
* [rbh-find] added -atime/-amin options
* [logs] add'l information in logs (DB operations, HSM_rm details)
* [logs] log to stderr if opening of the log file fails
* [fix] scan blocked if final DB operation failed
* [fix] avoid DB lock exhaustion for huge requests
* [backup] manage cross device rename in backend
* [backup] rebind an entry in backend after fid change (e.g. restripe)
New features in robinhood 2.4:
* rbh-du and rbh-find: "du" and "find" clones querying robinhood's database
Faster way to search for entries in a filesystem!
Performance comparison for a 1 million entries Lustre v2 filesystem
find /lustre -user foo -type f -size -32M -ls
(no possible criteria on OST index)
> 58m13s
lfs find /lustre -user foo -type f --obd lustre-OST0001
(no possible criteria on size)
> 20m46s
rbh-find /lustre -user foo -type f -size -32M -ost 1 -ls
> 1.2s
* Directory reporting: top directories per dirent count, per avg file size
(useful for small file hunting)
* File size profiling: global, per user, per group, per fileclass...
(+additional section in webUI)
* Sorting user/groups by size range (eg. percentage of files < 1G)
(useful for small file hunting)
* Partial scans to update only a subset of the filesystem.
=> allow distributed scans by splitting the namespace into
partial scans running on multiple clients.
Other changes in 2.4.0:
* [packaging] rpm name 'robinhood-tmp_fs_mgr' changed to 'robinhood-tmpfs'
* [packaging] 'rbh-config' command moved to new RPM 'robinhood-adm'
* [report] refurbished rbh-report output format
* [policies] new criteria on file creation time
* [database] use innodb by default for MySQL engine
* [system] ability to detect "fake mtime" (mtime != actual modification time)
* [system] improved filesystem detection,
using fsname or devid as FS identifier (config driven)
* [scan] can trigger external completion command when a scan ends
* [misc.] can use short config name instead of full path
(eg. "-f <name>" instead of "-f /full/path/to/name.conf")
* [backup] directory and symlink recovery
* [lustre] port to Lustre 2.2 and 2.3
* [lustre] support for new Changelog record struct (lu-1331)
* [fix] max_rm_count=0 resulted in no rm (instead of unlimited)
* [fix] segfault in realpath() on Ubuntu
* [fix] unsigned arithmetic issue with MySQL 5.5
Changes between version 2.3.3 and 2.3.4:
- Faster and safer shutdown on SIGINT/SIGTERM
- Can use short config name instead of full config file path.
E.g. "-f myconf" instead of "-f /etc/robinhood.d/tmp_fs/myconf.cfg"
- Consider all non-dirs for classinfo (instead of files only)
- Implemented max_rm_count in hsm remove policy
- clearer messages about DB connection and retries
- added lu543 configure option (must be enabled if this patch
is integrated to your Lustre distribution)
- Better block counting for purges
- backup/shook modes:
- import of existing files from backend
- entry state set to 'archive_running' during migration
- recovery for entries with 'release pending' or 'restore running' state on startup
(new parameter: check_purge_status_on_startup)
- enable DB rebuild if it is lost
- fix: symlink recovery
- improvements of rbhext_tool_clnt/svr (timeout, traces, ...)
- user.shook_state xattr changed to security.shook_state
(to avoid users to change it)
- Fix: Don't consider 'released' entries for quota-like purge triggers
- Fix: migrate-group did migrate user
- Generate up-to-date template automatically at RPM installation
- 72 new regression tests (all policies conditions and config file parameters are tested)
Changes between version 2.3.2 and 2.3.3:
[webgui]
- added FS name to page title and page header
- added missing file in RPM (.htaccess)
[reports]
- new options for top-users/top-groups: --by-avgsize, --count-min, --reverse
- Lustre changelog stats in 'rbh-report -a'
[policies]
- fix: 'tree' condition must match root entry
- fix: migration class matching at scan time
[config]
- simpler parameter 'scan_interval'
- fix: don't reload config of disabled modules on SIGHUP
- fix: on SIGHUP, don't reload parameters specified on cmd line
[database]
- retry on connection failure
[stats]
- dump process stats on SIGUSR1
[backup]
- clean special chars in archive names
- fix issues in symlink archiving
[lustre]
- specific compilation option for jira's LU-543
[misc]
- code cleaning, sanity checks, improved traces...
Changes between version 2.3.1 and 2.3.2:
- [webgui] Web interface (beta)
- [quota/alerts] Implemented quota alerts on inode count (users and groups)
- [reporting] New option --by-count for --top-users, to sort users by entry count
- [database] Support of InnoDB MySQL engine
- [database] MySQL 4 compatibility fix
- [bugfix](minor) handling DB deadlock error
- [bugfix](tweak) added acct parameters to default and template outputs
- [testing] big tests with 1M entries
- [backup] about backup mode (beta):
- [bugfix](major) fixed error determining symlink status
- [bugfix](minor) don't consider 'new' entries in deferred removal
- [trace] display warning if mtime in FS < mtime in backend
Changes between version 2.3.0 and 2.3.1:
- [bugfix](major) Wrong accounting values if file owner changes
- [bugfix](major) SQL error for widely striped files
- [compat] Compatibility fix for MySQL servers between 5.0.0 and 5.0.32
Changes between version 2.2.3 and 2.3.0:
- [optim.] instant accounting reports (user/group usage, fs content summary, ...)
- [reporting] split user usage per group (--split-user-groups option)
- [reporting] split group usage per user (--split-user-groups option)
- [feature] new policy criteria for Lustre FileSystems: ost_index
- [reporting] detailed FS scan statistics in "rbh-report -a"
- [misc.] fast and clean abort on ctrl^c (during scan, migration and purge)
- [admin.] automatically disables features that are not defined in config file
- [admin.] "rbh-config backup_db" helper to create a robinhood DB backup
- [misc.] -V option displays Lustre version and release number
- [tweak] changed 'watermark' parameters to 'threshold'
- [tweak] changed 'notify_lw' and 'alert_hw' parameters to 'alert_low' and 'alert_high'
- [database] alternative port or socket file can be used for MySQL connection
- [database] limiting DB access rights for reporting command
- [bugfix](major) fixed inconsistent pool names
- [bugfix](minor) kill -HUP terminated the process if no trigger was defined
- [bugfix](minor) 'unknown' status not correctly filtered in '--dump-status' report
- [bugfix](tweak) added 'reload' in short help of SLES init script
- [misc.] code cleaning, error message cleaning, removed some obsolete code
- [feature] new robinhood flavor to track modifications in a Lustre v2 filesystem, and backup data to an external storage (current status: Alpha testing only).
As part of this feature:
- soft rm + command to retrieve removed files
- disaster recovery command
- "--migrate-file" option to archive a single file
- pre-maintenance mode to smoothly backup the whole filesystem content before a due date.
Changes between version 2.2.2 and 2.2.3:
- [feature] periodic purge trigger
- [feature] options for controlling trigger notifications
- [doc] pdf documentation updated
Changes between version 2.2.1 and 2.2.2:
- [bugfix] (major) fixed "duplicate key" errors
- [bugfix] (major) FS scan sometimes blocks on Lustre 2
- [misc.] integration to automatic testing suite (Hudson)
Changes between version 2.2.0 and 2.2.1:
- [feature] new purge command: --purge-class to apply purge policy on files in a given class
- [feature] new migration command: --migrate-class to apply migration policy on files in a given class
- [feature] support of syslog for logging
- [report cmd] Added summary line to all reports, with total nbr entries and volume.
- [report cmd] Added '-q' option to hide headers and footers in reports.
- [optim.] changed primary key format to reduce DB requests
- [misc.] new command 'repair_db' in rbh-config, to fix tables after a MySQL server crash.
- [compat.] Support for Lustre MDT changelogs on Lustre v2.0 final
- [compat.] port to FreeBSD
- [admin] added 'reload' action to init.d script
- [misc.] a gap in OST index list should displays a warning, not an error
- [pkg] common spec file for both el4, el5 and el6
- [bugfix] handling large UNIX groups (>4k) and long lists of alt groups.
- [bugfix] retrieving Lustre pool fails with error "Unsupported Lustre magic number"
- [bugfix] wrong class matching on OST pools when scanning
- [bugfix] unescaped SQL strings caused error for filenames with single quotes
- [bugfix] error in init script when RBH_OPT contains several options
Changes between version 2.1.5 and 2.2.0:
- [feature] fileclass union/intersection/negation
- [feature] rbh-report displays last matched fileclass
- [feature] new reporting command '--class-info' generates fileclass summary
- [feature] new reporting option '--filter-class' to dump entries per fileclass
- [feature] alert batching: send a mail summary instead of 1 mail per matching entry
- [feature] alert improvements: named alerts, tweak changes
- [feature] special wildcard '**' in 'path ' or 'tree' conditions matches any count of directory levels
- [feature] quota-like purge triggers fully implemented (on group or user)
- [feature] triggers on used inode count in filesystem
- [feature] '--check-triggers' option to check triggers without purging files
- [feature] notification can be sent when a high watermark is reached (for triggers)
- [feature] rbh-config helper now supports batch commands
- [feature] Lustre 2.0 ready
- [optim.] configurable fileclass periodic matching to reduce calls to filesystem
- [optim.] configurable attr/path periodic update in DB to reduce calls to filesystem
- [bugfix] explicit trace when readdir fails
- [bugfix] issue when filtering on fields with NULL values in DB
- [bugfix] check migration timeout on last effective action, not on last queued entry
- [bugfix] name-based conditions complaining about missing auto-generated fields
- [bugfix] race condition when appplying policy lead to handle the same entry several times
- [bugfix] removing removed directories from database for recursive rmdir policies
- [misc.] added documented file in /etc/sysconfig for robinhood service parameters
- [misc.] changing source directory layout
- [misc.] documentation update
Changes between version 2.1.4 and 2.1.5:
- Major bug fix: incomplete database content after scan
Changes between version 2.1.3 and 2.1.4:
- New recursive rmdir policy (for TMP_FS_MGR purpose)
- changed default value for max_pending_operations
(unlimited value could result in excessive memory usage)
- removing useless fields and redundant information in database
- rh-* commands renamed to rbh-*, to avoid conflicts and confusions
with RedHat commands.
- check conflicting flags in configure
Changes between version 2.1.2 and 2.1.3:
- SQLite support (should only be used for testing purpose or small filesystems)
- Support of relative paths in 'path' and 'tree' conditions
- Migration timeout mechanism
- Prompting for database admin password in rh-config script
Changes between version 2.1.1 and 2.1.2:
- New reporting commands: Dump all files (--dump-all) and dump files
by status (--dump-status).
- New configuration helper script: "rh-config"
- Made RPM relocatable
- BUG FIX: wrong scan duration when using volume-based purge triggers
- Lustre-HSM: Checking previous migrations status when restarting
- Lustre-HSM: CL_TIME record support (bz 19505)
- Lustre-HSM: multi-archive support (archive_num)
- Lustre-HSM: new --sync option (immediately archive all modified files)
- Lustre-HSM: changed --handle-events action switch to --readlog
- Fixed SLES portability issues
Changes between version 2.1.0.beta2 and 2.1.1:
- Added new report options: --dump-ost, --dump-user, --dump-group
- Added --filter-path option to reporting tool.
- TMP FS MGR purpose ported to Lustre 2.0-alpha5 (including changelog
support).
- documentation updates (in doc/admin_guides)
- Each purpose has its own service and binary names,
to make it possible to install and run several robinhood with
differents purposes on the same machine.
- Added '--disable-lustre' compilation switch for disabling Lustre specific
features
- Added '--disable-fid-support' compilation switch, to force addressing
entries by path, not by their Lustre fid.
- Integration of new purpose "SHERPA" (software suite for cache management)
- Generated RPM name includes lustre version it was built for.
- report command displays help if is started without option.
Changes between 2.1.0.beta1 and 2.1.0.beta2:
- Extended attributes support in policy definition
- 32 bits plateforms compatibility fixes
- Fixed bug when using mysql4
- Added parameter to force changelog polling
- Fixed minor compilation warning
- commands now search for config file in /etc/robinhood.d if no config file is
given on command line
Changes from v2.0.1 to 2.1.0.beta1:
- added '--dry-run' option instead of "simulation_mode" parameter in config file
- added '--once' option, to perform an single pass of a given policy or action
and exit (same as '--one-shot' option).
- Compatibility fixes for MySQL 4 and 5
- Fixed dependencies on lustre include files.
- Compatibility fixes for 32 bits platforms
Lustre-HSM specific features:
- Porting to the new changelog interface (handling changelog records as
structures instead of text, and using CHANGELOG_FLAG_FOLLOW
and CHANGELOG_FLAG_BLOCK options)
- Adapting to changes in changelog timestamp (secs+nano instead of jiffies)
- Use fid as primary key in database schema (for better performance)
- Added calls to llapi_hsm_request() to trigger migration, release, removal
in HSM.
- Customizable migration hints to be passed to the copytool
- Command line options to trigger manual migrations (by user, by OST...)
- Deferred removal in HSM
- Taking HSM file status into account (dirty, released, ...)
- HSM event support
- Changelog flag support (for UNLINK and HSM event)
- Added '--ignore-policies' option to perform migration/purge to all eligible
files without checking policy conditions.
Changes between v2.0-beta2 and v2.0.1:
- New policy definition semantics, using filesets
- Multiple fileset/policy associations
- Several changes in configuration syntax, to avoid confusions
- Support of OST pool names (on Lustre) for fileset definition and policies
- Optimizations of policy application
- Added features for Lustre-HSM