forked from drunkard/ceph-Chinese-doc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated to: "2016-04-12" Signed-off-by: Drunkard Zhang <[email protected]>
- Loading branch information
Showing
13 changed files
with
1,460 additions
and
835 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
.. _Capabilities in CephFS: | ||
|
||
=================== | ||
CephFS 支持的能力 | ||
=================== | ||
|
||
When a client wants to operate on an inode, it will query the MDS in various | ||
ways, which will then grant the client a set of **capabilities**. These | ||
grant the client permissions to operate on the inode in various ways. One | ||
of the major differences from other network filesystems (e.g NFS or SMB) is | ||
that the capabilities granted are quite granular, and it's possible that | ||
multiple clients can hold different capabilities on the same inodes. | ||
|
||
.. _Types of Capabilities: | ||
|
||
能力的种类 | ||
---------- | ||
|
||
There are several "generic" capability bits. These denote what sort of ability | ||
the capability grants. | ||
|
||
.. code-block:: cpp | ||
/* generic cap bits */ | ||
#define CEPH_CAP_GSHARED 1 /* client can reads (s) */ | ||
#define CEPH_CAP_GEXCL 2 /* client can read and update (x) */ | ||
#define CEPH_CAP_GCACHE 4 /* (file) client can cache reads (c) */ | ||
#define CEPH_CAP_GRD 8 /* (file) client can read (r) */ | ||
#define CEPH_CAP_GWR 16 /* (file) client can write (w) */ | ||
#define CEPH_CAP_GBUFFER 32 /* (file) client can buffer writes (b) */ | ||
#define CEPH_CAP_GWREXTEND 64 /* (file) client can extend EOF (a) */ | ||
#define CEPH_CAP_GLAZYIO 128 /* (file) client can perform lazy io (l) */ | ||
These are then shifted by a particular number of bits. These denote a part of | ||
the inode's data or metadata on which the capability is being granted: | ||
|
||
.. code-block:: cpp | ||
/* per-lock shift */ | ||
#define CEPH_CAP_SAUTH 2 /* A */ | ||
#define CEPH_CAP_SLINK 4 /* L */ | ||
#define CEPH_CAP_SXATTR 6 /* X */ | ||
#define CEPH_CAP_SFILE 8 /* F */ | ||
Only certain generic cap types are ever granted for some of those "shifts", | ||
however. In particular, only the FILE shift ever has more than the first two | ||
bits. :: | ||
|
||
| AUTH | LINK | XATTR | FILE | ||
2 4 6 8 | ||
|
||
From the above, we get a number of constants, that are generated by taking | ||
each bit value and shifting to the correct bit in the word: | ||
|
||
.. code-block:: cpp | ||
#define CEPH_CAP_AUTH_SHARED (CEPH_CAP_GSHARED << CEPH_CAP_SAUTH) | ||
These bits can then be or'ed together to make a bitmask denoting a set of | ||
capabilities. | ||
|
||
There is one exception: | ||
|
||
.. code-block:: cpp | ||
#define CEPH_CAP_PIN 1 /* no specific capabilities beyond the pin */ | ||
The "pin" just pins the inode into memory, without granting any other caps. | ||
|
||
图形化就是: :: | ||
|
||
+---+---+---+---+---+---+---+---+ | ||
| p | _ |As x |Ls x |Xs x | | ||
+---+---+---+---+---+---+---+---+ | ||
|Fs x c r w b a l | | ||
+---+---+---+---+---+---+---+---+ | ||
|
||
当前尚未使用第二个 bit 。 | ||
|
||
.. _Abilities granted by each cap: | ||
|
||
各个 cap 授予的能力: | ||
--------------------- | ||
While that is how capabilities are granted (and communicated), the important | ||
bit is what they actually allow the client to do: | ||
|
||
* PIN: this just pins the inode into memory. This is sufficient to allow the | ||
client to get to the inode number, as well as other immutable things like | ||
major or minor numbers in a device inode, or symlink contents. | ||
|
||
* AUTH: this grants the ability to get to the authentication-related metadata. | ||
In particular, the owner, group and mode. Note that doing a full permission | ||
check may require getting at ACLs as well, which are stored in xattrs. | ||
|
||
* LINK: the link count of the inode | ||
|
||
* XATTR: ability to access or manipulate xattrs. Note that since ACLs are | ||
stored in xattrs, it's also sometimes necessary to access them when checking | ||
permissions. | ||
|
||
* FILE: this is the big one. These allow the client to access and manipulate | ||
file data. It also covers certain metadata relating to file data -- the | ||
size, mtime, atime and ctime, in particular. | ||
|
||
.. _Shorthand: | ||
|
||
简写 | ||
---- | ||
|
||
需要注意的是,客户端日志里会紧凑地表达各个能力,例如: :: | ||
|
||
pAsLsXsFs | ||
|
||
其中, p 表示 pin ,各大写字母对应位移值,而位移值后面的小写\ | ||
字母是真正赋予此位置的的能力。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
.. _Experimental Features: | ||
|
||
实验性功能 | ||
========== | ||
|
||
CephFS includes a number of experimental features which are not fully stabilized | ||
or qualified for users to turn on in real deployments. We generally do our best | ||
to clearly demarcate these and fence them off so they can't be used by mistake. | ||
|
||
Some of these features are closer to being done than others, though. We describe | ||
each of them with an approximation of how risky they are and briefly describe | ||
what is required to enable them. Note that doing so will *irrevocably* flag maps | ||
in the monitor as having once enabled this flag to improve debugging and | ||
support processes. | ||
|
||
|
||
.. _Directory Fragmentation: | ||
|
||
目录分片 | ||
-------- | ||
|
||
CephFS directories are generally stored within a single RADOS object. But this has | ||
certain negative results once they become large enough. The filesystem is capable | ||
of "fragmenting" these directories into multiple objects. There are no known bugs | ||
with doing so but it is not sufficiently tested to support at this time. | ||
|
||
Directory fragmentation has always been off by default and required setting | ||
```mds bal frag = true`` in the MDS' config file. It has been further protected | ||
by requiring the user to set the "allow_dirfrags" flag for Jewel. | ||
|
||
|
||
.. _Inline data: | ||
|
||
内联数据 | ||
-------- | ||
By default, all CephFS file data is stored in RADOS objects. The inline data | ||
feature enables small files (generally <2KB) to be stored in the inode | ||
and served out of the MDS. This may improve small-file performance but increases | ||
load on the MDS. It is not sufficiently tested to support at this time, although | ||
failures within it are unlikely to make non-inlined data inaccessible | ||
|
||
Inline data has always been off by default and requires setting | ||
the "inline_data" flag. | ||
|
||
|
||
.. _Multi-MDS filesystem clusters: | ||
|
||
多个 MDS 驱动文件系统 | ||
--------------------- | ||
CephFS has been designed from the ground up to support fragmenting the metadata | ||
hierarchy across multiple active metadata servers, to allow horizontal scaling | ||
to arbitrary throughput requirements. Unfortunately, doing so requires a lot | ||
more working code than having a single MDS which is authoritative over the | ||
entire filesystem namespace. | ||
|
||
Multiple active MDSes are generally stable under trivial workloads, but often | ||
break in the presence of any failure, and do not have enough testing to offer | ||
any stability guarantees. If a filesystem with multiple active MDSes does | ||
experience failure, it will require (generally extensive) manual intervention. | ||
There are serious known bugs. | ||
|
||
Multi-MDS filesystems have always required explicitly increasing the "max_mds" | ||
value and have been further protected with the "allow_multimds" flag for Jewel. | ||
|
||
|
||
.. _`Mantle: Programmable Metadata Load Balancer`: | ||
|
||
Mantle: 可编程的元数据负载均衡器 | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Mantle is a programmable metadata balancer built into the MDS. The idea is to | ||
protect the mechanisms for balancing load (migration, replication, | ||
fragmentation) but stub out the balancing policies using Lua. For details, see | ||
:doc:`/cephfs/mantle`. | ||
|
||
.. _Snapshots: | ||
|
||
快照 | ||
---- | ||
Like multiple active MDSes, CephFS is designed from the ground up to support | ||
snapshotting of arbitrary directories. There are no known bugs at the time of | ||
writing, but there is insufficient testing to provide stability guarantees and | ||
every expansion of testing has generally revealed new issues. If you do enable | ||
snapshots and experience failure, manual intervention will be needed. | ||
|
||
Snapshots are known not to work properly with multiple filesystems (below) in | ||
some cases. Specifically, if you share a pool for multiple FSes and delete | ||
a snapshot in one FS, expect to lose snapshotted file data in any other FS using | ||
snapshots. See the :doc:`/dev/cephfs-snapshots` page for more information. | ||
|
||
Snapshots are known not to work with multi-MDS filesystems. | ||
|
||
Snapshotting was blocked off with the "allow_new_snaps" flag prior to Firefly. | ||
|
||
|
||
.. _Multiple filesystems within a Ceph cluster: | ||
|
||
Ceph 单集群、多个文件系统 | ||
------------------------- | ||
Code was merged prior to the Jewel release which enables administrators | ||
to create multiple independent CephFS filesystems within a single Ceph cluster. | ||
These independent filesystems have their own set of active MDSes, cluster maps, | ||
and data. But the feature required extensive changes to data structures which | ||
are not yet fully qualified, and has security implications which are not all | ||
apparent nor resolved. | ||
|
||
There are no known bugs, but any failures which do result from having multiple | ||
active filesystems in your cluster will require manual intervention and, so far, | ||
will not have been experienced by anybody else -- knowledgeable help will be | ||
extremely limited. You also probably do not have the security or isolation | ||
guarantees you want or think you have upon doing so. | ||
|
||
Note that snapshots and multiple filesystems are *not* tested in combination | ||
and may not work together; see above. | ||
|
||
Multiple filesystems were available starting in the Jewel release candidates | ||
but were protected behind the "enable_multiple" flag before the final release. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.