Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string schema properties #587

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 77 additions & 1 deletion src/malli/core.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -570,6 +570,82 @@
(when-let [ns-name (some-> properties :namespace name)]
(fn [x] (= (namespace x) ns-name))))

;;
;; string schema helpers
;;

#?(:cljs (defn -numeric-char? [c] (and (< 47 c) (< c 58))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks work differently than the JVM versions.

#?(:cljs (defn -upper-alpha-char? [c] (and (< 64 c) (< c 91))))
#?(:cljs (defn -lower-alpha-char? [c] (and (< 96 c) (< c 123))))
#?(:cljs (defn -letter? [c] (or (-lower-alpha-char? c) (-upper-alpha-char? c))))
#?(:cljs (defn -alphanumeric? [c] (or (-letter? c) (-numeric-char? c))))

(defn -charset-predicate
[o]
(case o
:digit #?(:clj #(Character/isDigit ^char %) :cljs -numeric-char?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. there is both char and int versions of the predicates in Java:

https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isLetter(char)
https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isLetter(int)

The int version supports unicode characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is probably fine. Char supports 0-65535 so it is enough to support unicode range 0x0000 - 0xFFFF.

Clojure characters don't support supplementary char ranges either. Though strings support:

0x2F81A:

\冬
=> Unsupported character: \冬

(.codePointAt "冬" 0)
=> 194586

(char (.codePointAt "冬" 0))
=> Value out of range for char: 194586

(Character/isLetter (int 0x2F81A))
=> true

(int (.charAt "冬" 0))
=> 55422 in this case .charAt only returns first two bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS charCodeAt works the same as JVM charAt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is I'm using charAt which returns char, not int

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x2F81A (which is \⾁) is supported under the 12161 unicode: (link)

(int \⾁)
=> 12161

Same with all other unicode characters:

(char 8809)
=> \≩
(char 508)
=> \Ǽ
(int \Ǽ)
=> 508
(int \≩)
=> 8809
(int \Θ)
=> 920
(char 33071)
=> \脯

:letter #?(:clj #(Character/isLetter ^char %) :cljs -letter?)
:letter-or-digit #?(:clj #(Character/isLetterOrDigit ^char %) :cljs -alphanumeric?)
:alphanumeric #?(:clj #(Character/isLetterOrDigit ^char %) :cljs -alphanumeric?)
:alphabetic #?(:clj #(Character/isAlphabetic (int %)) :cljs -letter?)
(cond
(set? o) (miu/-some-pred (mapv -charset-predicate o))
(char? o) #?(:clj #(= ^char o %) :cljs (let [i (.charCodeAt o 0)] #(= i %)))
(fn? o) o
bsless marked this conversation as resolved.
Show resolved Hide resolved
(nil? o) nil
bsless marked this conversation as resolved.
Show resolved Hide resolved
:else (throw (ex-info "Invalid string predicate" {:pred o})))))
bsless marked this conversation as resolved.
Show resolved Hide resolved

(defn string-char-predicate
[p]
(fn charset-pred ^Boolean [^String s]
(let [n #?(:clj (.length s) :cljs (.-length s))]
(loop [i 0]
(if (= i n)
true
(if (p #?(:clj (.charAt s (unchecked-int i))
:cljs (.charCodeAt s (unchecked-int i))))
(recur (unchecked-inc i))
false))))))

#?(:clj
(defn find-blank-method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could lift the minimum java to 11 and remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a huge breaking change for users. Sadly, there's still plenty of Java 8 in the world and we must accommodate

Copy link
Member

@Deraen Deraen Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikitommi Definitely ~half of our work projects are still on Java 8

[]
(try
(.getMethod String "isBlank" (into-array Class []))
#(.isBlank ^String %)
(catch Exception _
(require 'clojure.string)
clojure.string/blank?))))

#?(:clj (def blank? (find-blank-method))
:cljs (defn blank? [^String s] (zero? (.-length (.trim s)))))

(defn -string-predicates
([{:keys [charset pattern non-blank]}]
(let [pattern
(when pattern
(let [pattern (re-pattern pattern)]
#?(:clj #(.find (.matcher ^Pattern pattern ^String %))
:cljs #(boolean (re-find pattern %)))))
charset
(when charset
(let [p (-charset-predicate charset)]
(string-char-predicate p)))
non-blank (when non-blank #(not (blank? %)))]
(-> non-blank
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the :non-blank needed? much shorter to use :min:

[:string {:non-blank true}]
[:string {:min 1}]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid min 1 string: " ", but it's invalid for non-blank

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be solved by transformers, but users can certainly want to specify they want a string with minimum length which is not blank. Blank in this case is a superset of empty, which I almost missed in the beginning

(miu/-maybe-and charset)
(miu/-maybe-and pattern)))))

(defn -string-property-pred
[]
(fn [properties]
(miu/-maybe-and
((-min-max-pred
#?(:clj #(.length ^String %)
:cljs #(.-length ^String %)))
properties)
(-string-predicates properties))))

;;
;; Schemas
;;
Expand Down Expand Up @@ -625,7 +701,7 @@

(defn -nil-schema [] (-simple-schema {:type :nil, :pred nil?}))
(defn -any-schema [] (-simple-schema {:type :any, :pred any?}))
(defn -string-schema [] (-simple-schema {:type :string, :pred string?, :property-pred (-min-max-pred count)}))
(defn -string-schema [] (-simple-schema {:type :string, :pred string?, :property-pred (-string-property-pred)}))
(defn -int-schema [] (-simple-schema {:type :int, :pred int?, :property-pred (-min-max-pred nil)}))
(defn -double-schema [] (-simple-schema {:type :double, :pred double?, :property-pred (-min-max-pred nil)}))
(defn -boolean-schema [] (-simple-schema {:type :boolean, :pred boolean?}))
Expand Down
7 changes: 7 additions & 0 deletions src/malli/impl/util.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,10 @@
(def ^{:arglists '([[& preds]])} -some-pred
#?(:clj (-pred-composer or 16)
:cljs (fn [preds] (fn [x] (boolean (some #(% x) preds))))))

(defn -maybe-and
[f g]
(cond
(and f g) #(and (f %) (g %))
f f
g g))
41 changes: 41 additions & 0 deletions test/malli/core_test.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -2622,3 +2622,44 @@
(is (= ["1"] (m/-vmap str (subvec [1 2] 0 1))))
(is (= ["1"] (m/-vmap str (lazy-seq [1]))))
(is (= ["1" "2"] (m/-vmap str [1 2]))))

(deftest string-test
(testing "pattern"
(let [s (m/schema [:string {:pattern "foo"}])]
(is (true? (m/validate s "foo")))
(is (true? (m/validate s "afoo")))
(is (true? (m/validate s "fooa")))
(is (false? (m/validate s "foao"))))
(let [s (m/schema [:string {:pattern "^foo"}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "afoo")))
(is (true? (m/validate s "fooa")))
(is (false? (m/validate s "foao")))))
(testing "charset"
(let [s (m/schema [:string {:charset :alphabetic}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "fo1o"))))
(let [s (m/schema [:string {:charset :letter}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "fo1o"))))
(let [s (m/schema [:string {:charset :letter-or-digit}])]
(is (true? (m/validate s "foo")))
(is (true? (m/validate s "fo0")))
(is (false? (m/validate s "f-1o"))))
(let [s (m/schema [:string {:charset #{\- :letter-or-digit}}])]
(is (true? (m/validate s "foo")))
(is (true? (m/validate s "fo0")))
(is (true? (m/validate s "f-1x")))
(is (false? (m/validate s "f?1o")))))
(testing "non blank"
(let [s (m/schema [:string {:non-blank true}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "")))
(is (false? (m/validate s " ")))))
(testing "Combined"
(let [s (m/schema [:string {:non-blank true :pattern "foo" :charset :letter-or-digit}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "")))
(is (false? (m/validate s " ")))
(is (false? (m/validate s " foo ")))
(is (true? (m/validate s "foo0"))))))