diff --git a/README.md b/README.md index 274bced..e76f0cf 100755 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ There are several non-obvious privacy concerns you should keep in mind while usi - While this does prevent the redirect website from putting cookies on your browser and possibly gives it the false impression you clicked the link, it gives the website certainty you viewed the link. - In the hopefully never going to happen case of someone hijacking a supported redirect site, this could allow an attacker to reliably grab your IP by sending it in an email/DM. - While you can configure URL Cleaner to use a proxy to avoid the IP grabbing, it would still let them know when you're online. -- For some websites URL Cleaner strips out more than just tracking stuff. I'm still not sure if or when this ever becomes a security issue. +- For some websites, URL Cleaner strips out more than just tracking stuff. I'm still not sure if or when this ever becomes a security issue. If you are in any way using URL Cleaner in a life-or-death scenario, PLEASE always use the `no-network` flag and be extremely careful of people you even remotely don't trust sending you URLs. @@ -180,15 +180,28 @@ On a mostly stock lenovo thinkpad T460S (Intel i5-6300U (4) @ 3.000GHz) running ``` In practice, when using [URL Cleaner Site and its userscript](https://github.com/Scripter17/url-cleaner-site), performance is significantly (but not severely) worse. -Often the first few cleanings will take a few hundred milliseconds each because the page is still loading. Subsequent cleanings should generally be in the 10ms-50ms range. +Often the first few cleanings will take a few hundred milliseconds each because the page is still loading. +However, because of the overhead of using HTTP (even if it's just to localhost) subsequent cleanings, for me, are basically always at least 10ms. Mileage varies wildly but as long as you're not spawning a new instance of URL Cleaner for each URL it should be fast enough. -There is (currently still experimental) support for multithreading. -In its default configuration, it's able to do 10k of the above amazon URL in 51 milliseconds on the same laptop, an almost 2x speedup on a computer with only 2 cores. -On a i5-8500 (6) @ 4.100GHz, times can get as low as 17 milliseconds. If anyone wants to test this on 32+ cores I would be quite interested in the result. -Additionally, spawning more threads than you have cores can be helpful in netowrk latency bound jobs, AKA redirects. What exactly the limits and side effects of that is is likely website-dependent. -Also its effects on caching are yet to be figured out. +Also startup time varies wildly. My laptop takes 5-6ms to start it but every other computer I've tested takes 10ms. Really not sure why because the other computers are massively faster. + +##### Parallelization + +There is (currently still experimental) support for parallelization. + +On the same laptop as the above benchmarks, the default settings make 10k of the amazon URL go from 95ms to 51ms. +On my desktop with an Intel i5-8500 (6) @ 4.100GHz, that benchmark gets around 17ms and one *hundred* thousand of the URL takes about 138ms. +On my friend's dekstop with an AMD Ryzen 9 7950X3D (32) @ 5.759GHz, doing the same 100k amazon URL benchmark takes about (TODO: REBENCH). + +Network requests and interacting with the cache have effects on performance that I haven't yet properly looked into. + +Please note that at this time parallelization has no effects on the library's API. +It's not obvious how I would design it so I'm waiting for inspiration to strike. + +Also please note that compiling with parallelization then setting the thread count to 1 gives worse performance than not compiling with parallelization. +Through very basic testing, 2 threads seems to be about the same as not compiling with parallelization. #### Credits diff --git a/benchmarking/benchmark.sh b/benchmarking/benchmark.sh index 739fe1e..18607b2 100755 --- a/benchmarking/benchmark.sh +++ b/benchmarking/benchmark.sh @@ -45,8 +45,8 @@ for arg in "$@"; do --only-massif) if [ $an_only_is_set -eq 0 ]; then an_only_is_set=1; hyperfine=0 ; callgrind=0; cachegrind=0 ; dhat=0; memcheck=0; else echo "Error: Multiple --only- flags were set."; exit 1; fi ;; --only-dhat) if [ $an_only_is_set -eq 0 ]; then an_only_is_set=1; hyperfine=0 ; callgrind=0; cachegrind=0; massif=0 ; memcheck=0; else echo "Error: Multiple --only- flags were set."; exit 1; fi ;; --only-memcheck) if [ $an_only_is_set -eq 0 ]; then an_only_is_set=1; hyperfine=0 ; callgrind=0; cachegrind=0; massif=0; dhat=0 ; else echo "Error: Multiple --only- flags were set."; exit 1; fi ;; - --nums) mode=nums ; just_set_mode=1 ;; --urls) mode=urls ; just_set_mode=1 ;; + --nums) mode=nums ; just_set_mode=1 ;; --features) mode=features; just_set_mode=1 ;; --out-file) mode=out_file; just_set_mode=1 ;; --) break ;; diff --git a/build.rs b/build.rs index 6b3c9d4..1ed51c2 100644 --- a/build.rs +++ b/build.rs @@ -3,12 +3,19 @@ use std::io::Write; fn main() { + let default_config = serde_json::from_str::(&std::fs::read_to_string("default-config.json").expect("Reading the default config to work.")).expect("Deserializing the default config to work."); + + if std::fs::exists("default-config.minified.json").expect("Checking the existence of default-config.minified.json to work") { + let maybe_old_minified_default_config = serde_json::from_str::(&std::fs::read_to_string("default-config.minified.json").expect("Reading the minified default config to work.")).expect("Deserializing the minified default config to work."); + if default_config == maybe_old_minified_default_config {return;} + } + std::fs::OpenOptions::new() .create(true) .write(true) .truncate(true) .open("default-config.minified.json") .expect("Opening default-config.minified.json to work.") - .write_all(serde_json::to_string(&serde_json::from_str::(&std::fs::read_to_string("default-config.json").expect("Reading the default config to work.")).expect("Deserializing the default config to work.")).expect("Serializing the default config to work.").as_bytes()) + .write_all(serde_json::to_string(&default_config).expect("Serializing the default config to work.").as_bytes()) .expect("Writing the minified default config to work."); } diff --git a/default-config.json b/default-config.json index b5ed426..d4baa80 100755 --- a/default-config.json +++ b/default-config.json @@ -76,22 +76,21 @@ "nerd.whatever.social", "z.opnxng.com" ], "redirect-host-without-www-dot-prefixes": [ - "2kgam.es", "4.nbcla.com", "a.co", "ab.co", "abc7.la", "abc7ne.ws", "adobe.ly", "aje.io", "aje.io", "amzn.asia", "amzn.ew", - "amzn.to", "api.link.agorapulse.com", "apple.co", "b23.tv", "bbc.in", "bit.ly", "bitly.com", "bitly.com", "bityl.co", "blizz.ly", - "blockclubchi.co", "bloom.bg", "boxd.it", "buff.ly", "bzfd.it", "cbsn.ws", "cfl.re", "chn.ge", "chng.it", "clckhl.co", "cnb.cx", - "cnn.it", "cons.lv", "cos.lv", "cutt.ly", "db.tt", "dcdr.me", "depop.app.link", "dis.gd", "dlvr.it", "econ.st", "etsy.me", "fal.cn", - "fanga.me", "fb.me", "fdip.fr", "flip.it", "forms.gle", "g.co", "glo.bo", "go.bsky.app", "go.forbes.com", "go.microsoft.com", - "go.nasa.gov", "gofund.me", "goo.gl", "goo.su", "gum.co", "hmstr.fr", "hulu.tv", "ift.tt", "intel.ly", "interc.pt", "is.gd", - "iwe.one", "j.mp", "jbgm.es", "k00.fr", "katy.to", "kck.st", "kre.pe", "kre.pe", "l.leparisien.fr", "l.leparisien.fr", "lin.ee", - "link.animaapp.com", "linkr.it", "lnk.to", "loom.ly", "loom.ly", "lpc.ca", "m.sesame.org", "msft.it", "mzl.la", "n.pr", "nas.cr", - "nbc4i.co", "ninten.do", "ntdo.co.uk", "nvda.ws", "ny.ti", "nyer.cm", "nyp.st", "nyti.ms", "nyto.ms", "on.forbes.com", "on.ft.com", - "on.ft.com", "on.msnbc.com", "on.nyc.gov", "onl.bz", "onl.la", "onl.sc", "operagx.gg", "orlo.uk", "ow.ly", "peoplem.ag", "perfht.ml", - "pin.it", "pixiv.me", "play.st", "politi.co", "prn.to", "propub.li", "pulse.ly", "py.pl", "qr1.be", "rb.gy", "rb.gy", "rblx.co", - "rdbl.co", "redd.it", "reurl.cc", "reut.rs", "rzr.to", "s.goodsmile.link", "s.team", "s76.co", "share.firefox.dev", "shor.tf", - "shorturl.at", "sonic.frack.deals", "spoti.fi", "spr.ly", "spr.ly", "spr.ly", "sqex.to", "t.co", "t.ly", "theatln.tc", "thecut.io", - "thef.pub", "thr.cm", "thrn.co", "tiny.cc", "tmz.me", "to.pbs.org", "tps.to", "tr.ee", "trib.al", "u.jd.com", "unes.co", "unf.pa", - "uni.cf", "uniceflink.org", "visitlink.me", "w.wiki", "wlgrn.com", "wlo.link", "wn.nr", "wwdc.io", "x.gd", "xbx.ly", "xhslink.com", - "yrp.ca" + "2kgam.es", "4.nbcla.com", "a.co", "ab.co", "abc7.la", "abc7ne.ws", "adobe.ly", "aje.io", "aje.io", "amzn.asia", "amzn.ew", "amzn.to", + "api.link.agorapulse.com", "apple.co", "b23.tv", "bbc.in", "bit.ly", "bitly.com", "bitly.com", "bityl.co", "blizz.ly", "blockclubchi.co", + "bloom.bg", "boxd.it", "buff.ly", "bzfd.it", "cbsn.ws", "cfl.re", "chn.ge", "chng.it", "clckhl.co", "cnb.cx", "cnn.it", "cons.lv", + "cos.lv", "cutt.ly", "db.tt", "dcdr.me", "depop.app.link", "dis.gd", "dlvr.it", "econ.st", "etsy.me", "fal.cn", "fanga.me", "fb.me", + "fdip.fr", "flip.it", "forms.gle", "g.co", "glo.bo", "go.bsky.app", "go.forbes.com", "go.microsoft.com", "go.nasa.gov", "gofund.me", + "goo.gl", "goo.su", "gum.co", "hmstr.fr", "hulu.tv", "ift.tt", "intel.ly", "interc.pt", "is.gd", "iwe.one", "j.mp", "jbgm.es", + "k00.fr", "katy.to", "kck.st", "kre.pe", "kre.pe", "kre.pe", "l.leparisien.fr", "l.leparisien.fr", "lin.ee", "link.animaapp.com", + "linkr.it", "lnk.to", "loom.ly", "loom.ly", "lpc.ca", "m.sesame.org", "msft.it", "mzl.la", "n.pr", "nas.cr", "nbc4i.co", "ninten.do", + "ntdo.co.uk", "nvda.ws", "ny.ti", "nyer.cm", "nyp.st", "nyti.ms", "nyto.ms", "on.forbes.com", "on.ft.com", "on.ft.com", "on.msnbc.com", + "on.nyc.gov", "onl.bz", "onl.la", "onl.sc", "operagx.gg", "orlo.uk", "ow.ly", "peoplem.ag", "perfht.ml", "pin.it", "pixiv.me", "play.st", + "politi.co", "prn.to", "propub.li", "pulse.ly", "py.pl", "qr1.be", "rb.gy", "rb.gy", "rblx.co", "rdbl.co", "redd.it", "reurl.cc", + "reut.rs", "rzr.to", "s.goodsmile.link", "s.team", "s76.co", "share.firefox.dev", "shor.tf", "shorturl.at", "sonic.frack.deals", + "spoti.fi", "spr.ly", "spr.ly", "spr.ly", "sqex.to", "t.co", "t.ly", "theatln.tc", "thecut.io", "thef.pub", "thr.cm", "thrn.co", + "tiny.cc", "tmz.me", "to.pbs.org", "tps.to", "tr.ee", "trib.al", "u.jd.com", "unes.co", "unf.pa", "uni.cf", "uniceflink.org", + "visitlink.me", "w.wiki", "wlgrn.com", "wlo.link", "wn.nr", "wwdc.io", "x.gd", "xbx.ly", "xhslink.com", "yrp.ca" ], "redirect-not-subdomains": [ "lnk.to", "visitlink.me", "goo.gl", "o93x.net", "pusle.ly" @@ -1416,7 +1415,7 @@ "PartMap": { "part": "Path", "map": { - "/search": {"AllowQueryParams": ["hl", "q", "tbm", "p", "udm", "filter"]}, + "/search": {"AllowQueryParams": ["hl", "q", "tbm", "p", "udm", "filter", "vsrid", "vsdim", "vsint", "ins_vfs"]}, "/setprefs": {"RemoveQueryParams": ["sa", "ved"]} } } diff --git a/src/glue/advanced_http.rs b/src/glue/advanced_http.rs index a444dd4..3e00226 100644 --- a/src/glue/advanced_http.rs +++ b/src/glue/advanced_http.rs @@ -8,7 +8,7 @@ use url::Url; use serde::{Deserialize, Serialize}; use reqwest::{Method, header::{HeaderName, HeaderValue, HeaderMap}}; use thiserror::Error; -#[allow(unused_imports, reason = "Used in a doc comment.")] +#[expect(unused_imports, reason = "Used in a doc comment.")] use reqwest::cookie::Cookie; use crate::types::*; diff --git a/src/glue/caching.rs b/src/glue/caching.rs index e43ed55..1ddcf1c 100644 --- a/src/glue/caching.rs +++ b/src/glue/caching.rs @@ -259,7 +259,7 @@ impl InnerCache { /// /// If unconnected, connect to the path then return the connection. /// - /// If the path is a file and doesn't exist, writes [`EMPTY_CACHE`] to the path. + /// If the path is a file and doesn't exist, makes the file. /// /// If the path is `:memory:`, the database is storeed ephemerally in RAM and not saved to disk. /// # Errors diff --git a/src/glue/command.rs b/src/glue/command.rs index 7cc6ee2..186e9cc 100644 --- a/src/glue/command.rs +++ b/src/glue/command.rs @@ -15,7 +15,6 @@ use thiserror::Error; use serde::{Serialize, Deserialize}; use which::which; -#[allow(unused_imports, reason = "Used in a doc comment.")] use crate::types::*; use crate::util::*; diff --git a/src/glue/headermap.rs b/src/glue/headermap.rs index 72b90ce..6f1d499 100644 --- a/src/glue/headermap.rs +++ b/src/glue/headermap.rs @@ -4,7 +4,7 @@ use std::collections::HashMap; use serde::{Deserialize, ser::{Serializer, Error as _}, de::{Deserializer, Error as _}}; use reqwest::header::HeaderMap; -#[allow(unused_imports, reason = "Used in a doc comment.")] // [`HeaderValue`] is imported for [`serialize`]'s documentation. +#[expect(unused_imports, reason = "Used in a doc comment.")] // [`HeaderValue`] is imported for [`serialize`]'s documentation. use reqwest::header::HeaderValue; /// Deserializes a [`HeaderMap`] diff --git a/src/glue/headervalue.rs b/src/glue/headervalue.rs index ddbc644..f13d238 100644 --- a/src/glue/headervalue.rs +++ b/src/glue/headervalue.rs @@ -1,7 +1,6 @@ //! Provides serialization and deserialization functions for [`HeaderValue`]. use serde::{Deserialize, ser::{Serializer, Error as _}, de::{Deserializer, Error as _}}; -#[allow(unused_imports, reason = "Used in a doc comment.")] use reqwest::header::HeaderValue; /// Deserializes a [`HeaderValue`] diff --git a/src/glue/proxy.rs b/src/glue/proxy.rs index 0dad8e0..0cad9c8 100644 --- a/src/glue/proxy.rs +++ b/src/glue/proxy.rs @@ -11,7 +11,7 @@ use reqwest::Proxy; use crate::util::is_default; -#[allow(unused_imports, reason = "Used in a doc comment.")] +#[expect(unused_imports, reason = "Used in a doc comment.")] use crate::glue::HttpClientConfig; /// Used by [`HttpClientConfig`] to detail how a [`reqwest::Proxy`] should be made. diff --git a/src/glue/regex/regex_parts.rs b/src/glue/regex/regex_parts.rs index de262a7..d1d67eb 100644 --- a/src/glue/regex/regex_parts.rs +++ b/src/glue/regex/regex_parts.rs @@ -9,7 +9,7 @@ use std::str::FromStr; use serde::{Serialize, Deserialize}; use regex::{Regex, RegexBuilder}; use regex_syntax::{ParserBuilder, Parser, Error as RegexSyntaxError}; -#[allow(unused_imports, reason = "Used in a doc comment.")] +#[expect(unused_imports, reason = "Used in a doc comment.")] use super::RegexWrapper; use crate::util::*; diff --git a/src/lib.rs b/src/lib.rs index 0dbf78d..c56a0b4 100755 --- a/src/lib.rs +++ b/src/lib.rs @@ -43,11 +43,6 @@ //! } //! ``` -#[allow(unused_imports, reason = "Used in the module's doc comment.")] -use std::str::FromStr; -#[allow(unused_imports, reason = "Used in the module's doc comment.")] -use serde::Deserialize; - pub mod glue; pub mod types; pub(crate) mod util; diff --git a/src/main.rs b/src/main.rs index 07fb6f2..5088a60 100755 --- a/src/main.rs +++ b/src/main.rs @@ -4,7 +4,6 @@ use std::path::PathBuf; use std::io::{self, IsTerminal}; use std::borrow::Cow; use std::process::ExitCode; -use std::collections::HashMap; use std::str::FromStr; use clap::{Parser, CommandFactory}; @@ -57,56 +56,19 @@ pub struct Args { #[cfg(not(feature = "default-config"))] #[arg(short , long)] pub config: PathBuf, + /// Overrides the config's [`Config::cache_path`]. + #[cfg(feature = "cache")] + #[arg( long)] + pub cache_path: Option, /// Output JSON. It is intended to be identical to URL Cleaner Site's output, so while some of the output is "redundant", it's important. #[arg(short , long)] pub json: bool, /// Additional ParamsDiffs to apply before the rest of the options. #[arg( long)] pub params_diff: Vec, - /// Set flags. - #[arg(short , long, value_names = ["NAME"])] - pub flag : Vec, - /// Unset flags set by the config. - #[arg(short = 'F', long, value_names = ["NAME"])] - pub unflag: Vec, - /// For each occurrence of this option, its first argument is the variable name and the second argument is its value. - #[arg(short , long, num_args(2), value_names = ["NAME", "VALUE"])] - pub var: Vec>, - /// Unset variables set by the config. - #[arg(short = 'V', long, value_names = ["NAME"])] - pub unvar : Vec, - /// For each occurrence of this option, its first argument is the set name and subsequent arguments are the values to insert. - #[arg( long, num_args(2..), value_names = ["NAME", "VALUE"])] - pub insert_into_set: Vec>, - /// For each occurrence of this option, its first argument is the set name and subsequent arguments are the values to remove. - #[arg( long, num_args(2..), value_names = ["NAME", "VALUE"])] - pub remove_from_set: Vec>, - /// For each occurrence of this option, its first argument is the map name, the second is the map key, and subsequent arguments are the values to insert. - #[arg( long, num_args(3..), value_names = ["NAME", "KEY1", "VALUE1"])] - pub insert_into_map: Vec>, - /// For each occurrence of this option, its first argument is the map name, the second is the map key, and subsequent arguments are the values to remove. - #[arg( long, num_args(2..), value_names = ["NAME", "KEY1"])] - pub remove_from_map: Vec>, - /// Overrides the config's [`Config::cache_path`]. - #[cfg(feature = "cache")] - #[arg( long)] - pub cache_path: Option, - /// Read stuff from caches. Default value is controlled by the config. Omitting a value means true. - #[cfg(feature = "cache")] - #[arg( long, num_args(0..=1), default_missing_value("true"))] - pub read_cache : Option, - /// Write stuff to caches. Default value is controlled by the config. Omitting a value means true. - #[cfg(feature = "cache")] - #[arg( long, num_args(0..=1), default_missing_value("true"))] - pub write_cache: Option, - /// The proxy to use. Example: socks5://localhost:9150 - #[cfg(feature = "http")] - #[arg( long)] - pub proxy: Option, - /// Disables all HTTP proxying. - #[cfg(feature = "http")] - #[arg( long, num_args(0..=1), default_missing_value("true"))] - pub no_proxy: Option, + /// Stuff to make a [`ParamsDiff`] from the CLI. + #[command(flatten)] + pub params_diff_args: ParamsDiffArgParser, /// Print the parsed arguments for debugging. /// When this, any other `--print-...` flag, or `--test-config` is set, no URLs are cleaned. #[arg( long, verbatim_doc_comment)] @@ -132,11 +94,7 @@ pub struct Args { /// Zero gets the current CPU threads. #[cfg(feature = "experiment-parallel")] #[arg(long, default_value_t = 0)] - pub threads: usize, - /// Amount of jobs to do in each thread while waiting for other threads to return. - #[cfg(feature = "experiment-parallel")] - #[arg(long, default_value_t = 100)] - pub thread_queue: usize + pub threads: usize } /// The enum of all errors that can occur when using the URL Cleaner CLI tool. @@ -163,26 +121,6 @@ fn main() -> Result { let args = Args::parse(); - for invocation in args.insert_into_map.iter() { - if invocation.is_empty() { - Args::command() - .error(clap::error::ErrorKind::WrongNumberOfValues, "--insert-into-map needs a map to insert key-value pairs into.") - .exit(); - } - if invocation.len() % 2 != 1 { - Args::command() - .error(clap::error::ErrorKind::WrongNumberOfValues, "--insert-into-map found a key without a value at the end.") - .exit(); - } - } - - for invocation in args.remove_from_map.iter() { - if invocation.is_empty() { - Args::command() - .error(clap::error::ErrorKind::WrongNumberOfValues, "--remove-from-map needs a map to remove keys from.") - .exit(); - } - } let print_args = args.print_args; if print_args {println!("{args:?}");} @@ -192,45 +130,24 @@ fn main() -> Result { #[cfg(not(feature = "default-config"))] let mut config = Config::load_from_file(&args.config)?; - let mut params_diffs = args.params_diff + let mut params_diffs: Vec = args.params_diff .into_iter() .map(|path| serde_json::from_str(&std::fs::read_to_string(path).map_err(CliError::CantLoadParamsDiffFile)?).map_err(CliError::CantParseParamsDiffFile)) .collect::, _>>()?; - #[allow(unused_mut, reason = "Attributes on expressions WHEN. PLEASE.")] - let mut feature_flag_make_params_diff = false; - #[cfg(feature = "cache")] #[allow(clippy::unnecessary_operation, reason = "False positive.")] {feature_flag_make_params_diff = feature_flag_make_params_diff || args.read_cache.is_some()}; - #[cfg(feature = "cache")] #[allow(clippy::unnecessary_operation, reason = "False positive.")] {feature_flag_make_params_diff = feature_flag_make_params_diff || args.write_cache.is_some()}; - #[cfg(feature = "http" )] #[allow(clippy::unnecessary_operation, reason = "False positive.")] {feature_flag_make_params_diff = feature_flag_make_params_diff || args.proxy.is_some()}; - if !args.flag.is_empty() || !args.unflag.is_empty() || !args.var.is_empty() || !args.unvar.is_empty() || !args.insert_into_set.is_empty() || !args.remove_from_set.is_empty() || !args.insert_into_map.is_empty() || !args.remove_from_map.is_empty() || feature_flag_make_params_diff { - params_diffs.push(ParamsDiff { - flags : args.flag .into_iter().collect(), // `impl::Item>> From for Y`? - unflags: args.unflag.into_iter().collect(), // It's probably not a good thing to do a global impl for, - vars : args.var .into_iter().map(|x| x.try_into().expect("Clap guarantees the length is always 2")).map(|[name, value]: [String; 2]| (name, value)).collect(), // Either let me TryFrom a Vec into a tuple or let me collect a [T; 2] into a HashMap. Preferably both. - unvars : args.unvar .into_iter().collect(), // but surely once specialization lands in Rust 2150 it'll be fine? - init_sets: Default::default(), - insert_into_sets: args.insert_into_set.into_iter().map(|mut x| (x.swap_remove(0), x)).collect(), - remove_from_sets: args.remove_from_set.into_iter().map(|mut x| (x.swap_remove(0), x)).collect(), - delete_sets : Default::default(), - init_maps : Default::default(), - insert_into_maps: args.insert_into_map.into_iter().map(|x| { - let mut values = HashMap::new(); - let mut args_iter = x.into_iter(); - let map = args_iter.next().expect("The validation to have worked."); - while let Some(k) = args_iter.next() { - values.insert(k, args_iter.next().expect("The validation to have worked.")); - } - (map, values) - }).collect::>(), - remove_from_maps: args.remove_from_map.into_iter().map(|mut x| (x.swap_remove(0), x)).collect::>(), - delete_maps : Default::default(), - #[cfg(feature = "cache")] read_cache : args.read_cache, - #[cfg(feature = "cache")] write_cache: args.write_cache, - #[cfg(feature = "http")] http_client_config_diff: Some(HttpClientConfigDiff { - set_proxies: args.proxy.map(|x| vec![x]), - no_proxy: args.no_proxy, - ..HttpClientConfigDiff::default() - }) - }); + if args.params_diff_args.does_anything() { + match args.params_diff_args.try_into() { + Ok(params_diff) => params_diffs.push(params_diff), + Err(e) => Args::command() + .error( + clap::error::ErrorKind::WrongNumberOfValues, + match e { + ParamsDiffArgParserValueWrong::InsertIntoMapNeedsAMap => "--insert-into-map needs a map to insert key-value pairs into.", + ParamsDiffArgParserValueWrong::InsertIntoMapNeedsAValue => "--insert-into-map found a key without a value at the end.", + ParamsDiffArgParserValueWrong::RemoveFromMapNeedsAMap => "--remove-from-map needs a map to remove keys from." + } + ) + .exit() + } } let print_params_diffs = args.print_params_diffs; @@ -258,8 +175,8 @@ fn main() -> Result { { let mut threads = args.threads; if threads == 0 {threads = std::thread::available_parallelism().expect("To be able to get the available parallelism.").into();} - let (in_senders , in_recievers ) = (0..threads).map(|_| std::sync::mpsc::sync_channel::>(args.thread_queue)).collect::<(Vec<_>, Vec<_>)>(); - let (out_senders, out_recievers) = (0..threads).map(|_| std::sync::mpsc::sync_channel::, MakeJobError>>(args.thread_queue)).collect::<(Vec<_>, Vec<_>)>(); + let (in_senders , in_recievers ) = (0..threads).map(|_| std::sync::mpsc::channel::>()).collect::<(Vec<_>, Vec<_>)>(); + let (out_senders, out_recievers) = (0..threads).map(|_| std::sync::mpsc::channel::, MakeJobError>>()).collect::<(Vec<_>, Vec<_>)>(); let config_ref = &config; #[cfg(feature = "cache")] @@ -268,10 +185,26 @@ fn main() -> Result { let cache_ref = &cache; std::thread::scope(|s| { + s.spawn(move || { + let job_config_strings_source: Box>> = { + let ret = args.urls.into_iter().map(Ok); + if !io::stdin().is_terminal() { + Box::new(ret.chain(io::stdin().lines())) + } else { + Box::new(ret) + } + }; + + for (i, job_config_string) in job_config_strings_source.enumerate() { + #[allow(clippy::arithmetic_side_effects, reason = "Whatever exactly the issue with `i % threads` is it will, at worst, give slightly worse load balancing around each multiple of usize::MAX jobs. I think that's fine.")] + in_senders.get(i % threads).expect("The amount of senders to not exceed the count of senders to make.").send(job_config_string).expect("To successfuly send the Job."); + } + }); + in_recievers.into_iter().zip(out_senders).map(|(ir, os)| { s.spawn(move || { while let Ok(maybe_job_config_string) = ir.recv() { - os.send(match maybe_job_config_string { + let ret = match maybe_job_config_string { Ok(job_config_string) => JobConfig::from_str(&job_config_string) .map(|JobConfig{url, context}| Job { @@ -284,7 +217,9 @@ fn main() -> Result { ) .map_err(MakeJobError::MakeJobConfigError), Err(e) => Err(MakeJobError::MakeJobConfigError(MakeJobConfigError::IoError(e))) - }).expect("The receiver to still exist."); + }; + + os.send(ret).expect("The receiver to still exist."); } }); }).for_each(drop); @@ -300,10 +235,8 @@ fn main() -> Result { let mut some_error_ref_lock = some_error_ref.lock().expect("No panics."); print!("{{\"Ok\":{{\"urls\":["); - for or in out_recievers.iter().cycle() { - let recieved = or.recv(); - match recieved { + match or.recv() { Ok(Ok(Ok(url))) => { if !first_job {print!(",");} print!("{{\"Ok\":{{\"Ok\":{}}}}}", str_to_json_str(url.as_str())); @@ -323,7 +256,7 @@ fn main() -> Result { first_job = false; }, Err(_) => { - #[allow(clippy::arithmetic_side_effects, reason = "Can't happen.")] + #[allow(clippy::arithmetic_side_effects, reason = "Can't even come close to usize::MAX threads and this is capped by thread count.")] {disconnected += 1;} if disconnected == threads {break;} } @@ -339,8 +272,7 @@ fn main() -> Result { let mut some_error_ref_lock = some_error_ref.lock().expect("No panics."); for or in out_recievers.iter().cycle() { - let recieved = or.recv(); - match recieved { + match or.recv() { Ok(Ok(Ok(url))) => { println!("{url}"); *some_ok_ref_lock = true; @@ -356,7 +288,7 @@ fn main() -> Result { *some_error_ref_lock = true; } Err(_) => { - #[allow(clippy::arithmetic_side_effects, reason = "Can't happen.")] + #[allow(clippy::arithmetic_side_effects, reason = "Can't even come close to usize::MAX threads and this is capped by thread count.")] {disconnected += 1;} if disconnected == threads {break;} } @@ -364,20 +296,6 @@ fn main() -> Result { } }); } - - let job_config_strings_source: Box>> = { - let ret = args.urls.into_iter().map(Ok); - if !io::stdin().is_terminal() { - Box::new(ret.chain(io::stdin().lines())) - } else { - Box::new(ret) - } - }; - for (i, job_config_string) in job_config_strings_source.enumerate() { - #[allow(clippy::arithmetic_side_effects, reason = "Can't happen.")] - in_senders.get(i % threads).expect("The amount of senders to not exceet the count of senders to make.").send(job_config_string).expect("To successfuly send the Job."); - } - drop(in_senders); }) } @@ -401,7 +319,6 @@ fn main() -> Result { print!("{{\"Ok\":{{\"urls\":["); let mut first_job = true; - #[cfg(not(feature = "experiment-parallel"))] for job in jobs.iter() { if !first_job {print!(",");} match job { diff --git a/src/schema.rs b/src/schema.rs deleted file mode 100644 index ca72806..0000000 --- a/src/schema.rs +++ /dev/null @@ -1,10 +0,0 @@ -// @generated automatically by Diesel CLI. - -diesel::table! { - cache (id) { - id -> Integer, - category -> Text, - k -> Text, - value -> Text, - } -} diff --git a/src/types/config/docs.rs b/src/types/config/docs.rs index 48f4200..dc68753 100755 --- a/src/types/config/docs.rs +++ b/src/types/config/docs.rs @@ -4,7 +4,6 @@ use std::collections::HashMap; use serde::{Serialize, Deserialize}; -#[allow(unused_imports, reason = "Used in a doc comment.")] use crate::types::*; use crate::util::*; diff --git a/src/types/config/params.rs b/src/types/config/params.rs index 90834ae..3305fd3 100644 --- a/src/types/config/params.rs +++ b/src/types/config/params.rs @@ -4,7 +4,6 @@ use std::collections::{HashMap, HashSet}; use serde::{Serialize, Deserialize}; -#[allow(unused_imports, reason = "Used in a doc comment.")] use crate::types::*; use crate::glue::*; use crate::util::*; @@ -183,3 +182,128 @@ impl ParamsDiff { debug!(ParamsDiff::apply, self, old_to, to); } } + +/// Shared argument parser for generating [`ParamsDiff`]s from the CLI. +/// +/// Used with the [`#[command(flatten)]`](https://docs.rs/clap/latest/clap/_derive/index.html#command-attributes) part of [`clap::Parser`]'s derive macro. +#[derive(Debug, Clone, PartialEq, Eq, clap::Args)] +pub struct ParamsDiffArgParser { + /// Set flags. + #[arg(short , long, value_names = ["NAME"])] + pub flag : Vec, + /// Unset flags set by the config. + #[arg(short = 'F', long, value_names = ["NAME"])] + pub unflag: Vec, + /// For each occurrence of this option, its first argument is the variable name and the second argument is its value. + #[arg(short , long, num_args(2), value_names = ["NAME", "VALUE"])] + pub var: Vec>, + /// Unset variables set by the config. + #[arg(short = 'V', long, value_names = ["NAME"])] + pub unvar : Vec, + /// For each occurrence of this option, its first argument is the set name and subsequent arguments are the values to insert. + #[arg( long, num_args(1..), value_names = ["NAME", "VALUE1"])] + pub insert_into_set: Vec>, + /// For each occurrence of this option, its first argument is the set name and subsequent arguments are the values to remove. + #[arg( long, num_args(1..), value_names = ["NAME", "VALUE1"])] + pub remove_from_set: Vec>, + /// For each occurrence of this option, its first argument is the map name, the second is the map key, and subsequent arguments are the values to insert. + #[arg( long, num_args(2..), value_names = ["NAME", "KEY1", "VALUE1"])] + pub insert_into_map: Vec>, + /// For each occurrence of this option, its first argument is the map name, and subsequent arguments are the keys to remove. + #[arg( long, num_args(1..), value_names = ["NAME", "KEY1"])] + pub remove_from_map: Vec>, + /// Read stuff from caches. Default value is controlled by the config. Omitting a value means true. + #[cfg(feature = "cache")] + #[arg( long, num_args(0..=1), default_missing_value("true"))] + pub read_cache : Option, + /// Write stuff to caches. Default value is controlled by the config. Omitting a value means true. + #[cfg(feature = "cache")] + #[arg( long, num_args(0..=1), default_missing_value("true"))] + pub write_cache: Option, + /// The proxy to use. Example: socks5://localhost:9150 + #[cfg(feature = "http")] + #[arg( long)] + pub proxy: Option, + /// Disables all HTTP proxying. + #[cfg(feature = "http")] + #[arg( long, num_args(0..=1), default_missing_value("true"))] + pub no_proxy: Option +} + +/// The errors that deriving [`clap::Parser`] can't catch. +#[derive(Debug, Error)] +pub enum ParamsDiffArgParserValueWrong { + /// --insert-into-map needs a map to insert key-value pairs into. + #[error("InsertIntoMapNeedsAMap")] + InsertIntoMapNeedsAMap, + /// --insert-into-map found a key without a value at the end. + #[error("InsertIntoMapNeedsAValue")] + InsertIntoMapNeedsAValue, + /// --remove-from-map needs a map to remove keys from. + #[error("RemoveFromMapNeedsAMap")] + RemoveFromMapNeedsAMap +} + +impl TryFrom for ParamsDiff { + type Error = ParamsDiffArgParserValueWrong; + + fn try_from(value: ParamsDiffArgParser) -> Result { + for invocation in value.insert_into_map.iter() { + if invocation.is_empty() { + Err(ParamsDiffArgParserValueWrong::InsertIntoMapNeedsAMap)?; + } + if invocation.len() % 2 != 1 { + Err(ParamsDiffArgParserValueWrong::InsertIntoMapNeedsAValue)?; + } + } + + for invocation in value.remove_from_map.iter() { + if invocation.is_empty() { + Err(ParamsDiffArgParserValueWrong::RemoveFromMapNeedsAMap)?; + } + } + + Ok(ParamsDiff { + flags : value.flag .into_iter().collect(), // `impl::Item>> From for Y`? + unflags: value.unflag.into_iter().collect(), // It's probably not a good thing to do a global impl for, + vars : value.var .into_iter().map(|x| x.try_into().expect("Clap guarantees the length is always 2")).map(|[name, value]: [String; 2]| (name, value)).collect(), // Either let me TryFrom a Vec into a tuple or let me collect a [T; 2] into a HashMap. Preferably both. + unvars : value.unvar .into_iter().collect(), // but surely once specialization lands in Rust 2150 it'll be fine? + init_sets: Default::default(), + insert_into_sets: value.insert_into_set.into_iter().map(|mut x| (x.swap_remove(0), x)).collect(), + remove_from_sets: value.remove_from_set.into_iter().map(|mut x| (x.swap_remove(0), x)).collect(), + delete_sets : Default::default(), + init_maps : Default::default(), + insert_into_maps: value.insert_into_map.into_iter().map(|x| { + let mut values = HashMap::new(); + let mut args_iter = x.into_iter(); + let map = args_iter.next().expect("The validation to have worked."); + while let Some(k) = args_iter.next() { + values.insert(k, args_iter.next().expect("The validation to have worked.")); + } + (map, values) + }).collect::>(), + remove_from_maps: value.remove_from_map.into_iter().map(|mut x| (x.swap_remove(0), x)).collect::>(), + delete_maps : Default::default(), + #[cfg(feature = "cache")] read_cache : value.read_cache, + #[cfg(feature = "cache")] write_cache: value.write_cache, + #[cfg(feature = "http")] http_client_config_diff: Some(HttpClientConfigDiff { + set_proxies: value.proxy.map(|x| vec![x]), + no_proxy: value.no_proxy, + ..HttpClientConfigDiff::default() + }) + }) + } +} + +impl ParamsDiffArgParser { + /// Returns [`true`] if this would make a [`ParamsDiff`] that actually does anything. + /// + /// It's much faster to check this than make and apply the [`ParamsDiff`]. + pub fn does_anything(&self) -> bool { + let mut feature_flag_make_params_diff = false; + #[cfg(feature = "cache")] #[allow(clippy::unnecessary_operation, reason = "False positive.")] {feature_flag_make_params_diff = feature_flag_make_params_diff || self.read_cache.is_some()}; + #[cfg(feature = "cache")] #[allow(clippy::unnecessary_operation, reason = "False positive.")] {feature_flag_make_params_diff = feature_flag_make_params_diff || self.write_cache.is_some()}; + #[cfg(feature = "http" )] #[allow(clippy::unnecessary_operation, reason = "False positive.")] {feature_flag_make_params_diff = feature_flag_make_params_diff || self.proxy.is_some()}; + !self.flag.is_empty() || !self.unflag.is_empty() || !self.var.is_empty() || !self.unvar.is_empty() || !self.insert_into_set.is_empty() || !self.remove_from_set.is_empty() || !self.insert_into_map.is_empty() || !self.remove_from_map.is_empty() || feature_flag_make_params_diff + } +} diff --git a/src/types/rules.rs b/src/types/rules.rs index a867759..a5beb55 100644 --- a/src/types/rules.rs +++ b/src/types/rules.rs @@ -289,12 +289,12 @@ impl Rule { } else { Err(RuleError::FailedCondition) }, - Self::PartMap {part , map, r#else} => Ok(map.get(&part.get(job_state.url).map(|x| x.into_owned())).or(r#else.as_ref ()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), - Self::PartRuleMap {part , map, r#else} => Ok(map.get(&part.get(job_state.url).map(|x| x.into_owned())).or(r#else.as_deref()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), - Self::PartRulesMap {part , map, r#else} => Ok(map.get(&part.get(job_state.url).map(|x| x.into_owned())).or(r#else.as_ref ()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), - Self::StringMap {value, map, r#else} => Ok(map.get(&get_option_string!(value, job_state) ).or(r#else.as_ref ()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), - Self::StringRuleMap {value, map, r#else} => Ok(map.get(&get_option_string!(value, job_state) ).or(r#else.as_deref()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), - Self::StringRulesMap {value, map, r#else} => Ok(map.get(&get_option_string!(value, job_state) ).or(r#else.as_ref ()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), + Self::PartMap {part , map, r#else} => Ok(map.get(&part.get(job_state.url).map(|x| x.into_owned())).or(r#else.as_ref ()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), + Self::PartRuleMap {part , map, r#else} => Ok(map.get(&part.get(job_state.url).map(|x| x.into_owned())).or(r#else.as_deref()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), + Self::PartRulesMap {part , map, r#else} => Ok(map.get(&part.get(job_state.url).map(|x| x.into_owned())).or(r#else.as_ref ()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), + Self::StringMap {value, map, r#else} => Ok(map.get(&get_option_string!(value, job_state) ).or(r#else.as_ref ()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), + Self::StringRuleMap {value, map, r#else} => Ok(map.get(&get_option_string!(value, job_state) ).or(r#else.as_deref()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), + Self::StringRulesMap {value, map, r#else} => Ok(map.get(&get_option_string!(value, job_state) ).or(r#else.as_ref ()).ok_or(RuleError::ValueNotInMap)?.apply(job_state)?), Self::Repeat{rules, stop_loop_condition, limit} => { // MAKE SURE THIS IS ALWAYS SYNCED UP WITH [`Rules::apply`]!!! @@ -354,9 +354,9 @@ impl Rule { /// Internal method to make sure I don't accidentally commit Debug variants and other stuff unsuitable for the default config. pub(crate) fn is_suitable_for_release(&self, config: &Config) -> bool { assert!(match self { - Self::PartMap {part , map, r#else} => part.is_suitable_for_release(config) && map.iter().all(|(_, mapper)| mapper.is_suitable_for_release(config)) && r#else.as_ref().is_none_or(|x| x.is_suitable_for_release(config)), - Self::PartRuleMap {part , map, r#else} => part.is_suitable_for_release(config) && map.iter().all(|(_, rule)| rule.is_suitable_for_release(config)) && r#else.as_ref().is_none_or(|x| x.is_suitable_for_release(config)), - Self::PartRulesMap {part , map, r#else} => part.is_suitable_for_release(config) && map.iter().all(|(_, rules)| rules.is_suitable_for_release(config)) && r#else.as_ref().is_none_or(|x| x.is_suitable_for_release(config)), + Self::PartMap {part , map, r#else} => part.is_suitable_for_release(config) && map.iter().all(|(_, mapper)| mapper.is_suitable_for_release(config)) && r#else.as_ref().is_none_or(|x| x.is_suitable_for_release(config)), + Self::PartRuleMap {part , map, r#else} => part.is_suitable_for_release(config) && map.iter().all(|(_, rule)| rule.is_suitable_for_release(config)) && r#else.as_ref().is_none_or(|x| x.is_suitable_for_release(config)), + Self::PartRulesMap {part , map, r#else} => part.is_suitable_for_release(config) && map.iter().all(|(_, rules)| rules.is_suitable_for_release(config)) && r#else.as_ref().is_none_or(|x| x.is_suitable_for_release(config)), Self::StringMap {value, map, r#else} => value.as_ref().is_none_or(|value| value.is_suitable_for_release(config)) && map.iter().all(|(_, mapper)| mapper.is_suitable_for_release(config)) && r#else.as_ref().is_none_or(|x| x.is_suitable_for_release(config)), Self::StringRuleMap {value, map, r#else} => value.as_ref().is_none_or(|value| value.is_suitable_for_release(config)) && map.iter().all(|(_, rule)| rule.is_suitable_for_release(config)) && r#else.as_ref().is_none_or(|x| x.is_suitable_for_release(config)), Self::StringRulesMap {value, map, r#else} => value.as_ref().is_none_or(|value| value.is_suitable_for_release(config)) && map.iter().all(|(_, rules)| rules.is_suitable_for_release(config)) && r#else.as_ref().is_none_or(|x| x.is_suitable_for_release(config)), diff --git a/src/types/string_modification.rs b/src/types/string_modification.rs index 8ba8838..e930ced 100644 --- a/src/types/string_modification.rs +++ b/src/types/string_modification.rs @@ -6,7 +6,7 @@ use std::str::FromStr; use serde::{Serialize, Deserialize}; use thiserror::Error; use percent_encoding::{percent_decode_str, utf8_percent_encode, NON_ALPHANUMERIC}; -#[allow(unused_imports, reason = "Used in a doc comment.")] +#[expect(unused_imports, reason = "Used in a doc comment.")] #[cfg(feature = "regex")] use ::regex::Regex; #[cfg(feature = "base64")]