Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When annotating local documents, remove paths from public annotations #1661

Closed
csillag opened this issue Nov 10, 2014 · 15 comments
Closed

When annotating local documents, remove paths from public annotations #1661

csillag opened this issue Nov 10, 2014 · 15 comments
Assignees

Comments

@csillag
Copy link
Contributor

csillag commented Nov 10, 2014

See previous discussion starting here.

@dwhly
Copy link
Member

dwhly commented Nov 10, 2014

Wondering if the path is perhaps good enough. It might be useful to still show filenames.

UPDATED the issue name.

@csillag
Copy link
Contributor Author

csillag commented Nov 10, 2014

Maybe we should retain the filenames, but don't expose them to anybody else, except the user who created the annotation? That would make it possible for him to access the document by clicking on the source link, but would hide it from others. (Who don't have access to the document either way.)

@csillag
Copy link
Contributor Author

csillag commented Nov 10, 2014

See also some other ideas in #1664.

@aron
Copy link
Contributor

aron commented Nov 10, 2014

@csillag rather than linking to a previous discussion could you update the description to include a summary rather than force someone landing on this issue to read through a discussion for context.

@dwhly dwhly changed the title When annotating local documents, remove file names from public annotations When annotating local documents, remove paths from public annotations Nov 18, 2014
@gergely-ujvari gergely-ujvari self-assigned this Nov 19, 2014
@gergely-ujvari
Copy link
Contributor

Q: Is it necessary to keep the path for local files, for private annotations?
What I mean here, that considering that the edit flow can change the privacy from public to private and vica versa, it'd require adding additional logic to remove to path when turning from private to public and reinsert it if we change the annotation from public to private.

Of course it is doable, I'm only asking if is it worth doing it?

@gergely-ujvari
Copy link
Contributor

I've edited my last comment because I left out the key part from my question (That the question was referring to private annotations in local files)

@gergely-ujvari
Copy link
Contributor

@dwhly: What do you think?

@dwhly
Copy link
Member

dwhly commented Nov 20, 2014

Is it necessary to keep the path for local files, for private annotations?

Pros:

  • For private annotations on Local PDFs, keeping the path means that people could potentially click on the link of an annotation in their stream and fetch the PDF associated with it from their hard drive.

Cons:

  • Local file locations are probably fairly like to change with more regularity than things like URLs.
  • There's little chance that clicking on the link, even if it found the file, would actually fetch the PDF into the browser, since most OS file-type defaults are probably set for desktop PDF reader apps (Adobe Reader, Preview, etc).
  • As you suggest: What happens if they edit their annotation and change the visibility of it to public (or a future group mode), should the path be stripped? What then happens if they change the visibility back to private?
  • But mostly: If we have an issue with storing paths we probably shouldn't be storing them at all. Don't send them to our service.

I'd vote for stripping paths (not filenames)-- keeps things simple.

gergely-ujvari added a commit that referenced this issue Nov 21, 2014
The formatter function given to the bridge plugin is now
guest.formatAnnotation() for better testability.

If annotation.document.link exists, it is filtered,
no hrefs starting with file:// protocol can get through
to the sidebar. With this we do not store any local path.

Fix #1661
@gergely-ujvari
Copy link
Contributor

An another question here. Where should we store the remaining filename?

Currently, the document part looks like this:

    "document": {
        "link": [
            {"href": "file:///home/ujvari/Let%C3%B6lt%C3%A9sek/pdf-test.pdf" }, 
            {"href": "urn:x-pdf:c21f21ea44c1e2ed2581435fa5a2dcce" }
        ], 
        "title": "PDF Test Page"

First option: we can change the href with file URI scheme to only the filename {"href": "pdf-test.pdf" }, then our store plugin will try to load annotations for that filename. (Which is not good for trivial file names like: test.pdf or downloaded-(x).pdf)

Second option: we can change the annotator.document plugin to have a new field (maybe: 'filename') for collection the filenames for file URIs respectfully and storing it there. But does this seem useful for the Document plugin?

Third option: Just store the filename in annotation.filename (a new field of our own). But it is a bit out of place.

What do you think, which should be done?

@aron
Copy link
Contributor

aron commented Nov 21, 2014

document.filename seems sensible to me.

@tilgovi
Copy link
Contributor

tilgovi commented Nov 21, 2014

The discussion concluded with the resolution to remove the paths. Why do we need the filenames? Shouldn't the title of the PDF be enough for the user to identify it? Why store the filename at all?

@gergely-ujvari
Copy link
Contributor

About filename, @dwhly mentioned in one of his comments (I just can't find it right now) that storing the filename would be good to search for the pdf name in the local file systems.

Leaving the title does not grant this opportunity.
(I've no strong opinion here.)

@tilgovi
Copy link
Contributor

tilgovi commented Nov 22, 2014

As a case study, if I were a Mac user, the title would actually be sufficient. Spotlight will index the titles of PDFs, and I should be able to search for the title without ever knowing the filename.

I consider filenames an implementation detail of filesystems. It's really a tragedy that as users we are ever still required to even be aware of such a thing.

@csillag
Copy link
Contributor Author

csillag commented Nov 22, 2014

As a case study, if I were a Mac user, the title would actually be sufficient.

I have seen PDF files with empty or null titles. Not even Spotlight can fix that...

@dwhly
Copy link
Member

dwhly commented Nov 22, 2014

Why do we need the filenames? Shouldn't the title of the PDF be enough for the user to identify it?

Here are a few of my reasons:

  • The title, when present, would be helpful too. Having both is better.
  • The filename is a visual string that I probably recognize. It's short and symbolic.
  • The title may often be long. What's visible, until truncated with ellipses on our cards, may not be uniquely distinctive, or may be more difficult to quickly distinguish..
  • I, like many if not most folks probably, encode the filenames of journal articles that I save in a way that specifically records the name, year and sometimes title of the document, like an abbreviated bibliographic reference (if it's not already saved this way, which it often is). e.g. "Watson_and_Crick_1953.pdf" That info is not likely to be what is returned by title metadata for the paper.
  • Sometimes titles aren't there (including particularly with scanned PDFs for when we support image annotation). Filenames always are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants