Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circulars Archive Group View #2617

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

Courey
Copy link
Contributor

@Courey Courey commented Oct 10, 2024

This work is based off #2538 and will be turned into a non-draft PR once that work has been merged in and I rebase off main.

Description

This work includes:

  • a button to toggle from the index view to the group view
  • a drop down list of the circular titles when viewed from the circulars archive
  • a link to each circular group in the circulars archive group view
  • a route for a synonym group overview
  • full circular data in a drop down on the group overview page with a link to each individual circular
  • a feature flag hiding the synonyms work

Related Issue(s)

Resolves #2544

Testing

This has been thoroughly manually tested locally, but should be tested on dev WITHOUT the feature flag on to ensure that no existing functionality has broken.

Images

Screenshot 2024-10-10 at 3 26 08 PM Screenshot 2024-10-10 at 3 26 19 PM Screenshot 2024-10-10 at 3 26 34 PM Screenshot 2024-10-10 at 3 26 44 PM

Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 6.25000% with 105 lines in your changes missing coverage. Please review.

Project coverage is 6.17%. Comparing base (7c94cb4) to head (772831d).

Files with missing lines Patch % Lines
app/routes/circulars._archive._index/route.tsx 0.00% 34 Missing ⚠️
app/routes/circulars.group.$synonymId.tsx 0.00% 29 Missing ⚠️
app/routes/synonyms/synonyms.server.ts 29.16% 17 Missing ⚠️
app/routes/circulars.group/route.tsx 0.00% 9 Missing ⚠️
...es/circulars._archive._index/SynonymGroupIndex.tsx 0.00% 7 Missing ⚠️
app/components/pagination/Pagination.tsx 0.00% 2 Missing ⚠️
.../routes/circulars.$circularId.($version)/route.tsx 0.00% 2 Missing ⚠️
...es/circulars.edit.$circularId/CircularEditForm.tsx 0.00% 2 Missing ⚠️
app/components/circularDisplay/FrontMatter.tsx 0.00% 1 Missing ⚠️
...es/circulars.edit.$circularId/RichEditor/index.tsx 0.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@           Coverage Diff            @@
##            main   #2617      +/-   ##
========================================
- Coverage   6.21%   6.17%   -0.05%     
========================================
  Files        167     170       +3     
  Lines       4231    4308      +77     
  Branches     467     480      +13     
========================================
+ Hits         263     266       +3     
- Misses      3966    4040      +74     
  Partials       2       2              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@lpsinger lpsinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase.

@Courey Courey force-pushed the courey/grouping branch 2 times, most recently from e64cd81 to d91fa13 Compare October 16, 2024 15:00
@Courey Courey marked this pull request as ready for review October 16, 2024 15:46
@Courey Courey requested a review from lpsinger October 16, 2024 15:46
@Courey
Copy link
Contributor Author

Courey commented Oct 16, 2024

Added "Open/Close All" button on group overview page as requested by @jracusin
Screenshot 2024-10-16 at 11 47 38 AM
Screenshot 2024-10-16 at 11 47 32 AM

@Courey
Copy link
Contributor Author

Courey commented Oct 16, 2024

Please rebase.

That was the plan as I said in the original PR description:

This work is based off #2538 and will be turned into a non-draft PR once that work has been merged in and I rebase off main.

I rebased and took the PR out of draft status.

@Courey Courey requested a review from dakota002 October 16, 2024 15:54
Copy link
Member

@lpsinger lpsinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more conflicts. Please rebase again.

Copy link
Member

@lpsinger lpsinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, some high-level UX feedback:

  • Please use numbered lists, not bulleted lists, for Circulars in the index view. The list number value should be the Circular number.
  • For now, don't have disclosure arrows on the per-group page. Just display all of the Circulars belonging to the group. We can fine-tune the UI to quickly navigate within a group in a future PR.

@Courey
Copy link
Contributor Author

Courey commented Oct 22, 2024

Removed details element from group overview:
Screenshot 2024-10-22 at 09-36-40 GCN - Circulars

Made ul an ol with circularId values:
Screenshot 2024-10-22 at 9 36 13 AM

@Courey Courey requested a review from lpsinger October 22, 2024 13:44
@@ -29,6 +29,7 @@ export const AstroDataContext = createContext<AstroDataContextProps>({})
/**
* An Astro Flavored Markdown enriched link.
*/
// eslint-disable-next-line react/display-name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this addition necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I moved the component into the components directory, it is judged as a component definition and causes this warning:
Screenshot 2024-10-22 at 1 58 18 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to make the move have no changes to the code if possible

app/components/pagination/PaginationSelectionFooter.tsx Outdated Show resolved Hide resolved
@@ -53,6 +64,7 @@ export default function PaginationSelectionFooter({
page={page}
limit={limit}
totalPages={totalPages}
view={view}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the Pagination component need to know about the view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because in getPageLink if the view is not set as a search param, then when it creates the link, it doesn't include view. If it doesn't include view, then the view defaults to index. If it's always index, then the pagination links will never work for groups. To show you what I mean, here is what happens when I remove the view from getPageLinks. When I hover over the 2 button, this is the link. You will see that it doesn't include the view so when that page is navigated to, it's the index view which is the default.
pagination_view

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

The last time that I looked at this component, I noticed that a lot of apparently separate concerns were leaking into it from pages that use it. At some point, I'd like to come back to this and try to refactor it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. It did feel odd to have to add view specific code to it.

app/routes/synonyms.$synonymId.tsx Outdated Show resolved Hide resolved
@Courey Courey requested a review from lpsinger October 22, 2024 18:03
@lpsinger
Copy link
Member

Removed details element from group overview: Screenshot 2024-10-22 at 09-36-40 GCN - Circulars

Made ul an ol with circularId values: Screenshot 2024-10-22 at 9 36 13 AM

Instead of the boxes, could we just put a horizontal rule between Circulars for now?

app/routes/group.$synonymId.tsx Outdated Show resolved Hide resolved
app/routes/group.$synonymId.tsx Outdated Show resolved Hide resolved
app/routes/group.$synonymId.tsx Outdated Show resolved Hide resolved
app/routes/group.$synonymId.tsx Outdated Show resolved Hide resolved
app/routes/group/route.tsx Outdated Show resolved Hide resolved
app/routes/synonyms/synonyms.server.ts Outdated Show resolved Hide resolved
@Courey Courey requested a review from lpsinger October 24, 2024 15:49
Copy link
Member

@lpsinger lpsinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please place the new event pages under /circulars/group/ rather than /group/.
  • The synonym IDs make for URLs that are not very presentable. Instead, can we use a slugified form of the event name (any of the event names that belong to the group) as the path component?

app/routes/circulars._archive._index/SynonymGroupIndex.tsx Outdated Show resolved Hide resolved
app/routes/group.$synonymId.tsx Outdated Show resolved Hide resolved
app/routes/group.$synonymId.tsx Outdated Show resolved Hide resolved
@Courey
Copy link
Contributor Author

Courey commented Nov 12, 2024

The synonym IDs make for URLs that are not very presentable. Instead, can we use a slugified form of the event name (any of the event names that belong to the group) as the path component?

I have a couple of concerns about this if I'm understanding correctly.

  1. It doesn't seem very RESTful. the id of the group would be the synonymId, not the eventId (because sometimes it is comprised of more than one eventId). it is the synonym we are representing here, not the event.
  2. Let's say we had GRB 111224A, GRB 111224B, and GRB 111224C. If we slugified that into something like GRB_111224A-GRB_111224B-GRB_111224C if a user bookmarked that uri and later a moderator decided that GRB 111224C was actually an unrelated event and removed that from the group, then the users bookmarked uri wouldn't work anymore. It is fragile.
  3. We could handle for that so it just picks one eventId and pulls up the record based on that, if the group that eventId was in changed, it would pull up a totally different group than originally. so circulars/group/GRB_111224C would now bring up just that record instead of the original group it belonged to before a moderator changing it. So it wouldn't go to the same group as when the bookmark/link was created.
  4. If we disregarded any concerns about uri fragility or possibly pulling up the wrong record than intended, it also changes how things work on the back end:
    • we would have to parse the slug which isn't a huge deal, but it's not as terse as just looking up by synonymId.
    • If we slugified all the eventIds in the group, we would be able to get the records based on those eventIds. doing it that way may include records from a different group as it's not specifying the synonymId.
    • if we try to handle that by checking to see if the records have the same synonymId, and we have two records and they both have different synonymIds, which synonymId wins out?
    • if we accepted just one eventId of the group, we'd have to make a db call to get the synonymId for that eventId, then using those results, we'd have to make another call to get all the other eventIds with that syonymId. So it increases queries.

Can you explain what about uuids feels not presentable to you?

@Courey Courey requested a review from lpsinger November 12, 2024 16:05
adds validation and error handling

fixes tests

adds modal warning prior to delete

changes removal button to words instead of only icons

code review change requests

removing unused className

formatting

autofill moderator synonym eventId selector

adding create sad path test

removing feature flag check

adding a 3 second debounce

Adds grouped view to circulars archive index

adds missing route and flag checks
@lpsinger
Copy link
Member

The synonym IDs make for URLs that are not very presentable. Instead, can we use a slugified form of the event name (any of the event names that belong to the group) as the path component?

I have a couple of concerns about this if I'm understanding correctly.

  1. It doesn't seem very RESTful. the id of the group would be the synonymId, not the eventId (because sometimes it is comprised of more than one eventId). it is the synonym we are representing here, not the event.

  2. Let's say we had GRB 111224A, GRB 111224B, and GRB 111224C. If we slugified that into something like GRB_111224A-GRB_111224B-GRB_111224C if a user bookmarked that uri and later a moderator decided that GRB 111224C was actually an unrelated event and removed that from the group, then the users bookmarked uri wouldn't work anymore. It is fragile.

  3. We could handle for that so it just picks one eventId and pulls up the record based on that, if the group that eventId was in changed, it would pull up a totally different group than originally. so circulars/group/GRB_111224C would now bring up just that record instead of the original group it belonged to before a moderator changing it. So it wouldn't go to the same group as when the bookmark/link was created.

  4. If we disregarded any concerns about uri fragility or possibly pulling up the wrong record than intended, it also changes how things work on the back end:

    • we would have to parse the slug which isn't a huge deal, but it's not as terse as just looking up by synonymId.
    • If we slugified all the eventIds in the group, we would be able to get the records based on those eventIds. doing it that way may include records from a different group as it's not specifying the synonymId.
    • if we try to handle that by checking to see if the records have the same synonymId, and we have two records and they both have different synonymIds, which synonymId wins out?
    • if we accepted just one eventId of the group, we'd have to make a db call to get the synonymId for that eventId, then using those results, we'd have to make another call to get all the other eventIds with that syonymId. So it increases queries.

I proposed using "any of the event names that belong to the group" as the path component. So /circulars/groups/GW170817, /circulars/groups/GRB170817A, /circulars/groups/GW170817+GRB170817A would all represent the same resource. (One would need to be reported as the canonical URL for SEO purposes; probably the version with the maximal set of terms.)

Another option (but not my preference) would be to do what Stack Overflow does: have an opaque UUID followed by a slug that is ignored. So for example, all of the following refer to the same resource:

although the first is canonical.

Can you explain what about uuids feels not presentable to you?

It's a usability issue: https://en.wikipedia.org/wiki/Clean_URL

@Courey
Copy link
Contributor Author

Courey commented Nov 14, 2024

I proposed using "any of the event names that belong to the group" as the path component. So /circulars/groups/GW170817, /circulars/groups/GRB170817A, /circulars/groups/GW170817+GRB170817A would all represent the same resource. (One would need to be reported as the canonical URL for SEO purposes; probably the version with the maximal set of terms.)

That still does not solve for or answer the issues I brought up in the above comment. If we take your example of /circulars/groups/GW170817, /circulars/groups/GRB170817A, /circulars/groups/GW170817+GRB170817A. Jane Doe is interested in this grouping that she discovered when searching for GRB170817A. She bookmarks /circulars/groups/GRB170817A so she can go back and look at this group. A moderator looks at the grouping and decides that while the events were very close together, they are actually different events. So the moderator removes GRB170817A from the group with GRB170817. So now when Jane visits the website looking for the information from both GRB170817A and GRB170817, the link she bookmarked would take her to the group that only has GRB170817A in it.

This request would be like removing the circularId from the circulars path and replacing it with the circular subject instead. But you shouldn't do that because the subject could be edited and if so, it would break links because it's fragile.

clean_url circular_path synonym_path

The only difference in these patterns is UUID vs integer id. But the integer ID isn't giving any additional information in a human readable format. I have no insight into what circular 12345 is about.

To make it restful, it should follow the pattern /{resource}/{id}

https://restfulapi.net/resource-naming/
Screenshot 2024-11-14 at 10 07 33 AM

@lpsinger
Copy link
Member

I don't disagree that RESTfulness is a desirable property for URL routing, but an entity need not have a unique URL, does it? By analogy, files on a filesystem don't have unique paths.

It's true that we are using Circular IDs as URL path components, but Circular IDs are part of the publicly-visible bibliographic record of Circulars. UUIDs are an internal detail.

@Courey
Copy link
Contributor Author

Courey commented Nov 14, 2024

I disagree that what you are proposing is "clean".
from your wiki link on clean urls:

Other reasons for using clean URLs include search engine optimization (SEO),[1] conforming to the representational state transfer (REST) style of software architecture, and ensuring that individual web resources remain consistently at the same URL. This makes the World Wide Web a more stable and useful system, and allows more durable and reliable bookmarking of web resources

your request violates:

conforming to the representational state transfer (REST) style of software architecture

because calling a resource by one of it's members is not restful. not restful != clean

and

ensuring that individual web resources remain consistently at the same URL. This makes the World Wide Web a more stable and useful system, and allows more durable and reliable bookmarking of web resources

because of the situations outlined above when the members of the group change. If I have a group of GRB A, GRB B, GRB C, and GRB D. If I bookmark circulars/groups/GRB_D it it had been removed, it would take me to the new group for GRB D without any reference to why GRB A, GRB B, and GRB C that were there before are now missing. This is not a durable uri because it is based on changeable values. One would instead expect to visit the group bookmarked originally and see that GRB D was no longer there, then one could search for GRB D to see what new group it was part of. Just having it jump to the new group with no indication of why the new group is not the old group is not an anticipated behavior and it does not make for reliable bookmarking. Since the slugs would be based on changeable values, bookmarks would always be fragile which is also not clean.

Additionally, is it really worth the complexity that it creates?
things that would have to be accounted for:

  1. slugging and de-slugging singular eventIds
  2. handling slugging and de-slugging more than 1 eventId
  3. looking for a synonym group by getting the synonym off the de-slugified eventId and then making a second query for the members of the group by the results of that query.
  4. keeping track of SEO for changing group members

@lpsinger
Copy link
Member

In a URL like /circulars/group/GW170817, the "group" component is a the resource and "GW170817" is the ID. I view the event names as permanent, stable IDs. The UUIDs are not.

Although this approach appears RESTful to me, I don't want to get bogged down in that argument, because this path is not an API endpoint and it is not all that important whether it is RESTful or not. It's more important that they are human-readable and predictable.

Here are some examples of style guidelines and advice related to human-readable URLs:

Additionally, is it really worth the complexity that it creates?
things that would have to be accounted for:

  1. slugging and de-slugging singular eventIds
  2. handling slugging and de-slugging more than 1 eventId
  3. looking for a synonym group by getting the synonym off the de-slugified eventId and then making a second query for the members of the group by the results of that query.
  4. keeping track of SEO for changing group members

That looks like a reasonable implementation plan. Is there a good open-source slugging library we could use?

@Courey
Copy link
Contributor Author

Courey commented Nov 15, 2024

So if the group consists of events GRB 123456A, GRB 123456B, and GRB 123456C, is GRB 123456A the group? or is GRB 123456A a member of the group?

if we have a data structure that is like so:

{
  synonymId: 1,
  eventIds: [GRB 123456A, GRB 123456B, GRB 123456C]
}

Which is how we are representing the concept of a synonym group. I acknowledge that how it is stored in dynamo does not make it apparent, but that is because we had to do some unusual things to use a document store relationally. But the general concept of a group is the above structure.

According to that record above, is the resource id of the group 1 or is it GRB 123456A?

If I remove GRB 123456C from the synonym group, thus making it a group unto itself I would have these two records:

{
  synonymId: 1,
  eventIds: [GRB 123456A, GRB 123456B]
}

{
  synonymId: 2,
  eventIds: [GRB 123456C]
}

Is the group the same for GRB 123456C as it was before? it used to be in group 1, now it is in group 2. Are group 1 & 2 the same resource?

@lpsinger
Copy link
Member

So if the group consists of events GRB 123456A, GRB 123456B, and GRB 123456C, is GRB 123456A the group? or is GRB 123456A a member of the group?

I would consider the group to be a collection of circulars. I don't care that there is a DynamoDB table of synonyms. Synonyms are not a user-visible aspect of our application (except to moderators).

In your example, I suggest that all of the following URLs resolve to the same resource:

  • /circulars/group/grb_123456a
  • /circulars/group/grb_123456b
  • /circulars/group/grb_123456c
  • /circulars/group/grb_123456a-grb_123456b
  • /circulars/group/grb_123456a-grb_123456c
  • /circulars/group/grb_123456b-grb_123456a
  • /circulars/group/grb_123456b-grb_123456c
  • /circulars/group/grb_123456c-grb_123456a
  • /circulars/group/grb_123456c-grb_123456b
  • /circulars/group/grb_123456a-grb_123456b-grb_123456c
  • /circulars/group/grb_123456a-grb_123456c-grb_123456b
  • /circulars/group/grb_123456b-grb_123456a-grb_123456c
  • /circulars/group/grb_123456b-grb_123456c-grb_123456a
  • /circulars/group/grb_123456c-grb_123456a-grb_123456b
  • /circulars/group/grb_123456c-grb_123456b-grb_123456a

The canonical URL should be /circulars/group/grb_123456a-grb_123456b-grb_123456c, which consists of all of the synonyms in lexical order.

To resolve the group for a given URL, take the entity ID and split on the separator (in the example above, a dash). Take just the first name in the group, and look up its synonyms. Return all Circulars for all synonyms belonging to that group.

@Courey
Copy link
Contributor Author

Courey commented Nov 15, 2024

Okay, so according to your example, I have bookmarked group:
/circulars/group/grb_123456c-grb_123456a
except between the time I bookmarked it and the time that I went back to look at it, a moderator removed grb_123456c from the group. Which do I go to? The group for grb_123456c or the group for grb_123456a? What if they bookmarked that link but were going back to look at grb_123456a, so if we just went to the first one in the list, it would send them to the wrong group.

@Courey
Copy link
Contributor Author

Courey commented Nov 15, 2024

in the above case /circulars/group/grb_123456c-grb_123456a would take you to a group that didn't actually include grb_123456a. Is that expected?

@lpsinger
Copy link
Member

in the above case /circulars/group/grb_123456c-grb_123456a would take you to a group that didn't actually include grb_123456a. Is that expected?

No, it would not. As I said:

To resolve the group for a given URL, take the entity ID and split on the separator (in the example above, a dash). Take just the first name in the group, and look up its synonyms. Return all Circulars for all synonyms belonging to that group.

@Courey
Copy link
Contributor Author

Courey commented Nov 15, 2024

so you are saying that if they looked up /circulars/group/grb_123456c-grb_123456a and c had been removed from the group, then it would take them to group c while the bookmarked uri still says that it's taking them to something with c and a in it. why would a bookmark that tells me that it includes both c and a take me somewhere that didn't include a?

@lpsinger
Copy link
Member

so you are saying that if they looked up /circulars/group/grb_123456c-grb_123456a and c had been removed from the group, then it would take them to group c while the bookmarked uri still says that it's taking them to something with c and a in it. why would a bookmark that tells me that it includes both c and a take me somewhere that didn't include a?

I'll refer you again to this algorithm:

To resolve the group for a given URL, take the entity ID and split on the separator (in the example above, a dash). Take just the first name in the group, and look up its synonyms. Return all Circulars for all synonyms belonging to that group.

The alternatives that I see are to raise a 404 error if the URL does not contain all of the event names, or to only return the circulars belonging to the event names in the URL. Both of those alternatives would be more surprising to the user.

@Courey
Copy link
Contributor Author

Courey commented Nov 18, 2024

The alternatives that I see are to raise a 404 error if the URL does not contain all of the event names, or to only return the circulars belonging to the event names in the URL. Both of those alternatives would be more surprising to the user.

But what wouldn't be surprising to anyone in any case is if the user had a link to a group id instead of to the changing members of the group.

if you had the uri as /group/123, Since the resource is the group itself, not it's members, the resource id would be the group id (in our case the synonymId).
synonyms_all_together_1

that way, when the editable field gets changed, the uri is not comprised of editable members, so it doesn't get confusing.
synonyms_structure_2

You would always go to the group and who the members of that group are can change freely without impacting anything while also following REST. That's why it's the convention. if the user bookmarks group/123 then no matter what was in that group, they know they would always be getting the intended group.

@lpsinger
Copy link
Member

The URL structure does not need to reflect the database structure. The UUIDs are a bit of dirty laundry that I don't want users to see, ever.

@lpsinger
Copy link
Member

We decided as a group that we would use the following style of URL: https://gcn.nasa.gov/circulars/event/grb_123456c. The final path component is an event name, and the resulting page will show all of the circulars belonging to that event name and all of it synonyms.

Consequently, an event consisting of GRB 123456A, GRB 123456B, GRB 123456C, would have the following URLs, which would all be equivalent:

  • https://gcn.nasa.gov/circulars/event/grb_123456a
  • https://gcn.nasa.gov/circulars/event/grb_123456b
  • https://gcn.nasa.gov/circulars/event/grb_123456c

@Courey, please note:

  • The page for the lexically first event name must be identified as the canonical URL for SEO purposes.
  • Please research and propose a slugging style for our feedback.

@Courey
Copy link
Contributor Author

Courey commented Dec 2, 2024

There are several characters we use in eventIds that are not alphanumeric. they are:
'/', '.', ' ', '-', '+'

Some of these are acceptable in URIs, others are reserved and can not be used.
'.' and '-' are acceptable, the rest are not.

Here are examples of the different characters that are used in eventIds:
HAWC-230527A
SGR 1935+2154
GRB 231115A
LIGO/Virgo S190425z
GRB 160225.81

  1. Since we are using the slug to look up the synonym, we need to ensure that any slugification can be de-slugified exactly. Typical slugging behavior is not reversible, which creates a challenge in our case. You can not exactly de-slug them because they typically convert unsafe characters to dashes (or a specified character) and we would not be able to tell what the original characters were. Since we are using the eventId as an id for lookup, this is not acceptable.
    the slugified versions of the examples above would be as follows:
    HAWC-230527A
    SGR-1935-2154
    GRB-231115A
    LIGO-Virgo-S190425z
    GRB-160225-81

If we tried to use the slug to query the database, we would not be able to return them exactly to their original format, we would just know that reserved or unacceptable characters had been turned into dashes. This would have a similar impact to SEO as the other options where the keywords are not exact. (GRB-160225-81 would be GRB, 160225, and 81 instead of 160225.81)
If we used a relational databse, we could use a series of wildcard where clauses to search for the appropriate record, but in dynamodb this is not supported. We could use a table scan (which is expensive and not performant) and/or multiple queries with a begins with and contains. This is not exact and the results could be wrong in addition to it being not very performant and expensive.
Pure sluggification is not an option in our case.

  1. The best option for reversability would be to percent encode.
    The cons of this approach are that it is ugly and not super human readable since it would have percent encoded values sprinkled in.
    Crawlers can understand them, but it sort of defeats the purpose of using something human readable.
    SEO impact would be that the keywords may be obscured which may impact searchability.

here are the examples above url encoded:
HAWC-230527A
SGR%201935%2B2154
GRB%20231115A
LIGO%2FVirgo%20S190425z
GRB%20160225.81

  1. An approach that includes exact reversibility would be to create a custom encoder/decoder.
    This would be a function that replaces the reserved characters with a string representing each character.
    This would get us a moderately human readable slug that can be reversed.
    It would have meaningful separators, so while it is not as clean as a normal slug, it is still fairly SEO friendly.
    Keywords would be preserved, so SEO could still extract useful keywords.

Encoded with our custom encoder they would appear like so:
HAWC--dash--230527A
SGR--space--1935--plus--2154
GRB--space--231115A
LIGO--slash--Virgo--space--S190425z
GRB--space--160225--dot--81

it's unusual, but functional. It can be revesed exactly so the value can be used in a query and it would preserve keywords.
It would obscure parts of the id for SEO though. For example GRB--space--160225--dot--81 would have keywords GRB, 160225, and 81 instead of GRB and 160225.81

None of the options are ideal and all will have some impact on SEO.

Option 1 is not an option.
Option 2 is not super human readable and has minimal SEO keyword impact with no impact to Crawlers
Option 3 is weird but reversible and preserves most keywords though it does have some minimal SEO impact similar to the pure slug option.

@Courey
Copy link
Contributor Author

Courey commented Dec 2, 2024

- . _ ~ are the only non-alphanumeric unreseved characters
Screenshot 2024-12-02 at 1 03 21 PM

@Courey
Copy link
Contributor Author

Courey commented Dec 2, 2024

An additional option would be to add a field to the synonyms table called slug and when an eventId is created or updated, we could replace all reserved characters with dashes. For example, the record would be:

{
  synonymId: 'uuid-uuid-uuid-uuid',
  eventId: 'LIGO/Virgo S190425z',
  slug: 'LIGO-Virgo-S190425z'
}

Then we would have to add an additional index to the slug field. That would change the lookup slightly from a pkey lookup to a GSI lookup.

We would need an additional backfill for the existing synonyms to create the slug.
It would still have some SEO impact, since the keywords would be separated differently than the eventId itself, but it would be the most human readable option.

@lpsinger
Copy link
Member

lpsinger commented Dec 3, 2024

@Courey, if we want the slugs to be case insensitive (and I think we do), then we'll need to add the slugs to the database anyway because DynamoDB keys are case sensitive.

Given that, I think we can abandon the requirement that the slugs are reversible, and just replace anything that would require percent encoding with a dash.

There are a number of slugging libraries on NPM. I believe that we already have a dependency on https://www.npmjs.com/package/github-slugger.

@Courey
Copy link
Contributor Author

Courey commented Dec 3, 2024

Additional work for this feature:

  • add slug field to dynamodb record (all lowercase with dashes)
  • add index to the slug field
  • add code to save slug in any place that a synonym could be created or updated (using github slugger package)
  • create function to get synonym by slug
  • update path to be /circulars/events/<slug>
  • create additional ticket to backfill any existing synonyms' slugs
  • update existing ticket and code for synonym backfill to include slug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Circulars Grouped by Synonyms index
2 participants