diff --git a/README.md b/README.md index ee2ed8aa..83b3ef43 100644 --- a/README.md +++ b/README.md @@ -522,7 +522,7 @@ Main Functions -------- * The `jieba.cut` function accepts three input parameters: the first parameter is the string to be cut; the second parameter is `cut_all`, controlling the cut mode; the third parameter is to control whether to use the Hidden Markov Model. * `jieba.cut_for_search` accepts two parameter: the string to be cut; whether to use the Hidden Markov Model. This will cut the sentence into short words suitable for search engines. -* The input string can be an unicode/str object, or a str/bytes object which is encoded in UTF-8 or GBK. Note that using GBK encoding is not recommended because it may be unexpectly decoded as UTF-8. +* The input string can be an unicode/str object, or a str/bytes object which is encoded in UTF-8 or GBK. Note that using GBK encoding is not recommended because it may be unexpectedly decoded as UTF-8. * `jieba.cut` and `jieba.cut_for_search` returns an generator, from which you can use a `for` loop to get the segmentation result (in unicode). * `jieba.lcut` and `jieba.lcut_for_search` returns a list. * `jieba.Tokenizer(dictionary=DEFAULT_DICT)` creates a new customized Tokenizer, which enables you to use different dictionaries at the same time. `jieba.dt` is the default Tokenizer, to which almost all global functions are mapped. diff --git a/jieba/__init__.py b/jieba/__init__.py index 90f0bcd5..3c2177cd 100644 --- a/jieba/__init__.py +++ b/jieba/__init__.py @@ -382,7 +382,7 @@ def load_userdict(self, f): Load personalized dict to improve detect rate. Parameter: - - f : A plain text file contains words and their ocurrences. + - f : A plain text file contains words and their occurrences. Can be a file-like object, or the path of the dictionary file, whose encoding must be utf-8.