doc: sync with mainline

Updated to: "2016-04-12" Signed-off-by: Drunkard Zhang <[email protected]>
tleydxdy · Mar 23, 2017 · 9aa69c9 · 9aa69c9
1 parent 0a29f1c
commit 9aa69c9
Show file tree

Hide file tree

Showing 13 changed files with 1,460 additions and 835 deletions.
diff --git a/cephfs/capabilities.rst b/cephfs/capabilities.rst
@@ -0,0 +1,115 @@
+.. _Capabilities in CephFS:
+
+===================
+ CephFS 支持的能力
+===================
+
+When a client wants to operate on an inode, it will query the MDS in various
+ways, which will then grant the client a set of **capabilities**. These
+grant the client permissions to operate on the inode in various ways. One
+of the major differences from other network filesystems (e.g NFS or SMB) is
+that the capabilities granted are quite granular, and it's possible that
+multiple clients can hold different capabilities on the same inodes.
+
+.. _Types of Capabilities:
+
+能力的种类
+----------
+
+There are several "generic" capability bits. These denote what sort of ability
+the capability grants.
+
+.. code-block:: cpp
+
+        /* generic cap bits */
+        #define CEPH_CAP_GSHARED     1  /* client can reads (s) */
+        #define CEPH_CAP_GEXCL       2  /* client can read and update (x) */
+        #define CEPH_CAP_GCACHE      4  /* (file) client can cache reads (c) */
+        #define CEPH_CAP_GRD         8  /* (file) client can read (r) */
+        #define CEPH_CAP_GWR        16  /* (file) client can write (w) */
+        #define CEPH_CAP_GBUFFER    32  /* (file) client can buffer writes (b) */
+        #define CEPH_CAP_GWREXTEND  64  /* (file) client can extend EOF (a) */
+        #define CEPH_CAP_GLAZYIO   128  /* (file) client can perform lazy io (l) */
+
+These are then shifted by a particular number of bits. These denote a part of
+the inode's data or metadata on which the capability is being granted:
+
+.. code-block:: cpp
+
+        /* per-lock shift */
+        #define CEPH_CAP_SAUTH      2 /* A */
+        #define CEPH_CAP_SLINK      4 /* L */
+        #define CEPH_CAP_SXATTR     6 /* X */
+        #define CEPH_CAP_SFILE      8 /* F */
+
+Only certain generic cap types are ever granted for some of those "shifts",
+however. In particular, only the FILE shift ever has more than the first two
+bits. ::
+
+        | AUTH | LINK | XATTR | FILE
+        2      4      6       8
+
+From the above, we get a number of constants, that are generated by taking
+each bit value and shifting to the correct bit in the word:
+
+.. code-block:: cpp
+
+        #define CEPH_CAP_AUTH_SHARED  (CEPH_CAP_GSHARED  << CEPH_CAP_SAUTH)
+
+These bits can then be or'ed together to make a bitmask denoting a set of
+capabilities.
+
+There is one exception:
+
+.. code-block:: cpp
+
+        #define CEPH_CAP_PIN  1  /* no specific capabilities beyond the pin */
+
+The "pin" just pins the inode into memory, without granting any other caps.
+
+图形化就是： ::
+
+    +---+---+---+---+---+---+---+---+
+    | p | _ |As   x |Ls   x |Xs   x |
+    +---+---+---+---+---+---+---+---+
+    |Fs   x   c   r   w   b   a   l |
+    +---+---+---+---+---+---+---+---+
+
+当前尚未使用第二个 bit 。
+
+.. _Abilities granted by each cap:
+
+各个 cap 授予的能力：
+---------------------
+While that is how capabilities are granted (and communicated), the important
+bit is what they actually allow the client to do:
+
+* PIN: this just pins the inode into memory. This is sufficient to allow the
+  client to get to the inode number, as well as other immutable things like
+  major or minor numbers in a device inode, or symlink contents.
+
+* AUTH: this grants the ability to get to the authentication-related metadata.
+  In particular, the owner, group and mode. Note that doing a full permission
+  check may require getting at ACLs as well, which are stored in xattrs.
+
+* LINK: the link count of the inode
+
+* XATTR: ability to access or manipulate xattrs. Note that since ACLs are
+  stored in xattrs, it's also sometimes necessary to access them when checking
+  permissions.
+
+* FILE: this is the big one. These allow the client to access and manipulate
+  file data. It also covers certain metadata relating to file data -- the
+  size, mtime, atime and ctime, in particular.
+
+.. _Shorthand:
+
+简写
+----
+
+需要注意的是，客户端日志里会紧凑地表达各个能力，例如： ::
+
+        pAsLsXsFs
+
+其中， p 表示 pin ，各大写字母对应位移值，而位移值后面的小写\
+字母是真正赋予此位置的的能力。
diff --git a/cephfs/experimental-features.rst b/cephfs/experimental-features.rst
@@ -0,0 +1,117 @@
+.. _Experimental Features:
+
+实验性功能
+==========
+
+CephFS includes a number of experimental features which are not fully stabilized
+or qualified for users to turn on in real deployments. We generally do our best
+to clearly demarcate these and fence them off so they can't be used by mistake.
+
+Some of these features are closer to being done than others, though. We describe
+each of them with an approximation of how risky they are and briefly describe
+what is required to enable them. Note that doing so will *irrevocably* flag maps
+in the monitor as having once enabled this flag to improve debugging and
+support processes.
+
+
+.. _Directory Fragmentation:
+
+目录分片
+--------
+
+CephFS directories are generally stored within a single RADOS object. But this has
+certain negative results once they become large enough. The filesystem is capable
+of "fragmenting" these directories into multiple objects. There are no known bugs
+with doing so but it is not sufficiently tested to support at this time.
+
+Directory fragmentation has always been off by default and required setting
+```mds bal frag = true`` in the MDS' config file. It has been further protected
+by requiring the user to set the "allow_dirfrags" flag for Jewel.
+
+
+.. _Inline data:
+
+内联数据
+--------
+By default, all CephFS file data is stored in RADOS objects. The inline data
+feature enables small files (generally <2KB) to be stored in the inode
+and served out of the MDS. This may improve small-file performance but increases
+load on the MDS. It is not sufficiently tested to support at this time, although
+failures within it are unlikely to make non-inlined data inaccessible
+
+Inline data has always been off by default and requires setting
+the "inline_data" flag.
+
+
+.. _Multi-MDS filesystem clusters:
+
+多个 MDS 驱动文件系统
+---------------------
+CephFS has been designed from the ground up to support fragmenting the metadata
+hierarchy across multiple active metadata servers, to allow horizontal scaling
+to arbitrary throughput requirements. Unfortunately, doing so requires a lot
+more working code than having a single MDS which is authoritative over the
+entire filesystem namespace.
+
+Multiple active MDSes are generally stable under trivial workloads, but often
+break in the presence of any failure, and do not have enough testing to offer
+any stability guarantees. If a filesystem with multiple active MDSes does
+experience failure, it will require (generally extensive) manual intervention.
+There are serious known bugs.
+
+Multi-MDS filesystems have always required explicitly increasing the "max_mds"
+value and have been further protected with the "allow_multimds" flag for Jewel.
+
+
+.. _`Mantle: Programmable Metadata Load Balancer`:
+
+Mantle: 可编程的元数据负载均衡器
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Mantle is a programmable metadata balancer built into the MDS. The idea is to
+protect the mechanisms for balancing load (migration, replication,
+fragmentation) but stub out the balancing policies using Lua. For details, see
+:doc:`/cephfs/mantle`.
+
+.. _Snapshots:
+
+快照
+----
+Like multiple active MDSes, CephFS is designed from the ground up to support
+snapshotting of arbitrary directories. There are no known bugs at the time of
+writing, but there is insufficient testing to provide stability guarantees and
+every expansion of testing has generally revealed new issues. If you do enable
+snapshots and experience failure, manual intervention will be needed.
+
+Snapshots are known not to work properly with multiple filesystems (below) in
+some cases. Specifically, if you share a pool for multiple FSes and delete
+a snapshot in one FS, expect to lose snapshotted file data in any other FS using
+snapshots. See the :doc:`/dev/cephfs-snapshots` page for more information.
+
+Snapshots are known not to work with multi-MDS filesystems.
+
+Snapshotting was blocked off with the "allow_new_snaps" flag prior to Firefly.
+
+
+.. _Multiple filesystems within a Ceph cluster:
+
+Ceph 单集群、多个文件系统
+-------------------------
+Code was merged prior to the Jewel release which enables administrators
+to create multiple independent CephFS filesystems within a single Ceph cluster.
+These independent filesystems have their own set of active MDSes, cluster maps,
+and data. But the feature required extensive changes to data structures which
+are not yet fully qualified, and has security implications which are not all
+apparent nor resolved.
+
+There are no known bugs, but any failures which do result from having multiple
+active filesystems in your cluster will require manual intervention and, so far,
+will not have been experienced by anybody else -- knowledgeable help will be
+extremely limited. You also probably do not have the security or isolation
+guarantees you want or think you have upon doing so.
+
+Note that snapshots and multiple filesystems are *not* tested in combination
+and may not work together; see above.
+
+Multiple filesystems were available starting in the Jewel release candidates
+but were protected behind the "enable_multiple" flag before the final release.
diff --git a/cephfs/index.rst b/cephfs/index.rst
@@ -78,6 +78,7 @@ Ceph 文件系统要求 Ceph 存储集群内至少有一个 :term:`Ceph 元数
 
 	CephFS 管理命令 <administration>
 	POSIX 兼容性 <posix>
+	实验性功能 <experimental-features>
 	CephFS 配额管理 <quota>
 	在 Ceph 上使用 Hadoop <hadoop>
 	libcephfs <../../api/libcephfs-java/>
@@ -93,3 +94,15 @@ Ceph 文件系统要求 Ceph 存储集群内至少有一个 :term:`Ceph 元数
 .. raw:: html
 
 	</td></tr></tbody></table>
+
+.. _For developers:
+
+开发者文档
+==========
+
+.. toctree:: 
+    :maxdepth: 1
+
+    客户端的能力 <capabilities>
+    libcephfs <../../api/libcephfs-java/>
+    Mantle <mantle>