-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 382/ support X-Robots-Tag as a typed http header XRobotsTag #393
Open
hafihaf123
wants to merge
25
commits into
plabayo:main
Choose a base branch
from
hafihaf123:issue-382/x-robots-tag
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+539
−12
Open
Changes from 14 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
ce6e2b7
add XRobotsTag, initial implementation
hafihaf123 ff26238
add value_string.rs
hafihaf123 caefce6
add more context with comments
hafihaf123 a7b8ebd
add ValidDate, custom rules
hafihaf123 f696c50
fix value_string.rs visibility issues
hafihaf123 78c2ba6
rename Iterator to ElementIter
hafihaf123 23c8fef
fix visibility issues
hafihaf123 36af384
change trait TryFrom<&[&str]> to private function from_iter
hafihaf123 4dacfcb
separate 'split_csv_str' function from 'from_comma_delimited'
hafihaf123 a57a00b
change bot_name field type to 'HeaderValueString' and indexing_rule f…
hafihaf123 d4fa1ad
implement FromStr for Element
hafihaf123 e66d95b
reformat with rustfmt
hafihaf123 6d0cf14
todo/ fix XRobotsTag::decode()
hafihaf123 6c350db
Merge branch 'plabayo:main' into issue-382/x-robots-tag
hafihaf123 2c2dcfa
Merge branch 'plabayo:main' into issue-382/x-robots-tag
hafihaf123 97230f5
add chrono crate to dependencies
hafihaf123 881e70c
Merge branch 'plabayo:main' into issue-382/x-robots-tag
hafihaf123 e003827
Merge remote-tracking branch 'origin/issue-382/x-robots-tag' into iss…
hafihaf123 2ea9085
rework API
hafihaf123 707a209
fix chrono dependency placement
hafihaf123 f280156
enhance code, add valid_date.rs
hafihaf123 92cd0cc
Merge branch 'plabayo:main' into issue-382/x-robots-tag
hafihaf123 bd571c4
add x_robots_tag.rs
hafihaf123 5933ec2
implement FromStr for ValidDate
hafihaf123 f10e6df
enhance code
hafihaf123 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
pub(crate) mod csv; | ||
/// Internal utility functions for headers. | ||
pub(crate) mod quality_value; | ||
|
||
pub(crate) mod value_string; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
use std::{ | ||
fmt, | ||
str::{self, FromStr}, | ||
}; | ||
|
||
use bytes::Bytes; | ||
use http::header::HeaderValue; | ||
|
||
use crate::headers::Error; | ||
|
||
/// A value that is both a valid `HeaderValue` and `String`. | ||
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
pub struct HeaderValueString { | ||
/// Care must be taken to only set this value when it is also | ||
/// a valid `String`, since `as_str` will convert to a `&str` | ||
/// in an unchecked manner. | ||
value: HeaderValue, | ||
} | ||
|
||
impl HeaderValueString { | ||
pub(crate) fn from_val(val: &HeaderValue) -> Result<Self, Error> { | ||
if val.to_str().is_ok() { | ||
Ok(HeaderValueString { value: val.clone() }) | ||
} else { | ||
Err(Error::invalid()) | ||
} | ||
} | ||
|
||
pub(crate) fn from_string(src: String) -> Option<Self> { | ||
// A valid `str` (the argument)... | ||
let bytes = Bytes::from(src); | ||
HeaderValue::from_maybe_shared(bytes) | ||
.ok() | ||
.map(|value| HeaderValueString { value }) | ||
} | ||
|
||
pub(crate) fn from_static(src: &'static str) -> HeaderValueString { | ||
// A valid `str` (the argument)... | ||
HeaderValueString { | ||
value: HeaderValue::from_static(src), | ||
} | ||
} | ||
|
||
pub(crate) fn as_str(&self) -> &str { | ||
// HeaderValueString is only created from HeaderValues | ||
// that have validated they are also UTF-8 strings. | ||
unsafe { str::from_utf8_unchecked(self.value.as_bytes()) } | ||
} | ||
} | ||
|
||
impl fmt::Debug for HeaderValueString { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
fmt::Debug::fmt(self.as_str(), f) | ||
} | ||
} | ||
|
||
impl fmt::Display for HeaderValueString { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
fmt::Display::fmt(self.as_str(), f) | ||
} | ||
} | ||
|
||
impl<'a> From<&'a HeaderValueString> for HeaderValue { | ||
fn from(src: &'a HeaderValueString) -> HeaderValue { | ||
src.value.clone() | ||
} | ||
} | ||
|
||
#[derive(Debug)] | ||
pub struct FromStrError(()); | ||
|
||
impl FromStr for HeaderValueString { | ||
type Err = FromStrError; | ||
|
||
fn from_str(src: &str) -> Result<Self, Self::Err> { | ||
// A valid `str` (the argument)... | ||
src.parse() | ||
.map(|value| HeaderValueString { value }) | ||
.map_err(|_| FromStrError(())) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
use crate::headers::util::csv::{fmt_comma_delimited, split_csv_str}; | ||
use crate::headers::util::value_string::HeaderValueString; | ||
use crate::headers::x_robots_tag::rule::Rule; | ||
use rama_core::error::{ErrorContext, OpaqueError}; | ||
use regex::Regex; | ||
use std::fmt::Formatter; | ||
use std::str::FromStr; | ||
|
||
#[derive(Debug, Clone, PartialEq, Eq)] | ||
pub struct Element { | ||
bot_name: Option<HeaderValueString>, | ||
indexing_rules: Vec<Rule>, | ||
} | ||
|
||
impl Element { | ||
pub fn new() -> Self { | ||
Self { | ||
bot_name: None, | ||
indexing_rules: Vec::new(), | ||
} | ||
} | ||
|
||
pub fn with_bot_name(bot_name: HeaderValueString) -> Self { | ||
Self { | ||
bot_name: Some(bot_name), | ||
indexing_rules: Vec::new(), | ||
} | ||
} | ||
|
||
pub fn add_indexing_rule(&mut self, indexing_rule: Rule) { | ||
self.indexing_rules.push(indexing_rule); | ||
} | ||
|
||
pub fn bot_name(&self) -> Option<&HeaderValueString> { | ||
self.bot_name.as_ref() | ||
} | ||
|
||
pub fn indexing_rules(&self) -> &[Rule] { | ||
&self.indexing_rules | ||
} | ||
} | ||
|
||
impl std::fmt::Display for Element { | ||
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { | ||
match self.bot_name() { | ||
None => fmt_comma_delimited(f, self.indexing_rules().iter()), | ||
Some(bot) => { | ||
write!(f, "{bot}: ")?; | ||
fmt_comma_delimited(f, self.indexing_rules().iter()) | ||
} | ||
} | ||
} | ||
} | ||
|
||
impl FromStr for Element { | ||
type Err = OpaqueError; | ||
|
||
fn from_str(s: &str) -> Result<Self, Self::Err> { | ||
let regex = Regex::new(r"^\s*([^:]+?):\s*(.+)$") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be no need for a regex here, it's a pretty linear process, so you should be able to easily parse out rules. E.g. something like |
||
.context("Failed to compile a regular expression")?; | ||
|
||
let mut bot_name = None; | ||
let mut rules_str = s; | ||
|
||
if let Some(captures) = regex.captures(s) { | ||
let bot_name_candidate = captures | ||
.get(1) | ||
.context("Failed to capture the target bot name")? | ||
.as_str() | ||
.trim(); | ||
|
||
if bot_name_candidate.parse::<Rule>().is_err() { | ||
bot_name = HeaderValueString::from_string(bot_name_candidate.to_owned()); | ||
rules_str = captures | ||
.get(2) | ||
.context("Failed to capture the indexing rules")? | ||
.as_str() | ||
.trim(); | ||
} | ||
} | ||
|
||
let indexing_rules = split_csv_str(rules_str) | ||
.collect::<Result<Vec<_>, _>>() | ||
.context("Failed to parse the indexing rules")?; | ||
|
||
Ok(Self { | ||
bot_name, | ||
indexing_rules, | ||
}) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
use crate::headers::x_robots_tag::Element; | ||
|
||
#[derive(Debug, Clone)] | ||
/// An iterator over the `XRobotsTag` header's elements. | ||
pub struct ElementIter(std::vec::IntoIter<Element>); | ||
|
||
impl Iterator for ElementIter { | ||
type Item = Element; | ||
|
||
fn next(&mut self) -> Option<Self::Item> { | ||
self.0.next() | ||
} | ||
} | ||
|
||
impl ElementIter { | ||
pub fn new(elements: std::vec::IntoIter<Element>) -> Self { | ||
Self(elements) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
mod rule; | ||
|
||
mod element; | ||
|
||
mod element_iter; | ||
|
||
mod valid_date; | ||
|
||
// ----------------------------------------------- \\ | ||
|
||
use crate::headers::Header; | ||
use element::Element; | ||
use element_iter::ElementIter; | ||
use http::{HeaderName, HeaderValue}; | ||
use std::fmt::Formatter; | ||
use std::iter::Iterator; | ||
|
||
#[derive(Debug, Clone, PartialEq, Eq)] | ||
pub struct XRobotsTag(Vec<Element>); | ||
|
||
impl Header for XRobotsTag { | ||
fn name() -> &'static HeaderName { | ||
&crate::header::X_ROBOTS_TAG | ||
} | ||
|
||
fn decode<'i, I>(values: &mut I) -> Result<Self, headers::Error> | ||
where | ||
Self: Sized, | ||
I: Iterator<Item = &'i HeaderValue>, | ||
{ | ||
todo!(); | ||
crate::headers::util::csv::from_comma_delimited(values).map(XRobotsTag) // wouldn't really work, need more complex logic | ||
} | ||
|
||
fn encode<E: Extend<HeaderValue>>(&self, values: &mut E) { | ||
use std::fmt; | ||
struct Format<F>(F); | ||
impl<F> fmt::Display for Format<F> | ||
where | ||
F: Fn(&mut Formatter<'_>) -> fmt::Result, | ||
{ | ||
fn fmt(&self, f: &mut Formatter) -> fmt::Result { | ||
self.0(f) | ||
} | ||
} | ||
let s = format!( | ||
"{}", | ||
Format(|f: &mut Formatter<'_>| { | ||
crate::headers::util::csv::fmt_comma_delimited(&mut *f, self.0.iter()) | ||
}) | ||
); | ||
values.extend(Some(HeaderValue::from_str(&s).unwrap())) | ||
} | ||
} | ||
|
||
impl FromIterator<Element> for XRobotsTag { | ||
fn from_iter<T>(iter: T) -> Self | ||
where | ||
T: IntoIterator<Item = Element>, | ||
{ | ||
XRobotsTag(iter.into_iter().collect()) | ||
} | ||
} | ||
|
||
impl IntoIterator for XRobotsTag { | ||
type Item = Element; | ||
type IntoIter = ElementIter; | ||
|
||
fn into_iter(self) -> Self::IntoIter { | ||
ElementIter::new(self.0.into_iter()) | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending how you structure it, this actually has to be either:
or
Because when a botname is mentioned it applies to all rules that follow it, until another botname is mentioned