Skip to content
This repository has been archived by the owner on May 12, 2023. It is now read-only.

Strip HTML out of descriptions. #318

Open
rossjones opened this issue Feb 15, 2017 · 3 comments
Open

Strip HTML out of descriptions. #318

rossjones opened this issue Feb 15, 2017 · 3 comments
Assignees
Milestone

Comments

@rossjones
Copy link
Contributor

We have some descriptions that contain HTML, it's rendered safely but we probably should remove it.

Related: should we auto-link anything that looks like a link?

@rossjones rossjones added this to the Alpha milestone Feb 17, 2017
@gtdata
Copy link

gtdata commented Feb 21, 2017

@rossjones not sure what this means - will you show me pls?

@rossjones
Copy link
Contributor Author

@gtdata http://alpha-find-data.herokuapp.com/dataset/leeds-inspired-api is a good example, the dataset was harvested from a system that uses HTML for description, and so they get pulled through. In this case it's just a para tag, we could remove those. Some datasets though (can't easily find one atm) use HTML for formatting and lists.

The existing system allows Markdown, which is stored as markdown, but rendered as HTML. But not all of the things we harvest, do this.

@rossjones rossjones self-assigned this Mar 22, 2017
@rossjones
Copy link
Contributor Author

I think stripping HTML out of harvested datasets makes sense.

When we import legacy stuff, we can also strip it there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants