You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we do a search for content_type_ext:doc AND content_type:"application/msword" in the Danish Netarchive Search, we get the facet for content_type_norm:
other : 3577875
word : 17606
There seems to be a problem with deriving the normalised content type with Word documents?
Maybe a more overall issue would be to search for all records that has other as nrmalised content type and facet on the different content type fields to see if there are more heavy hitters that are not handled?
The text was updated successfully, but these errors were encountered:
This may be related to #289 where at least part of the problem is that the content type does not fall back on the content_type_served when format identification via Tika/DROID fails.EDIT: Hmm probably not.
If we do a search for
content_type_ext:doc AND content_type:"application/msword"
in the Danish Netarchive Search, we get the facet forcontent_type_norm
:There seems to be a problem with deriving the normalised content type with Word documents?
Maybe a more overall issue would be to search for all records that has
other
as nrmalised content type and facet on the different content type fields to see if there are more heavy hitters that are not handled?The text was updated successfully, but these errors were encountered: