✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 #132

lc4t · 2023-05-09T09:15:03Z

动机

生成的.nfo元数据不包含作者和标签，导致刮削时不能按作者分类；
不能自动将一个UP的稿件放在同一目录中，导致不同UP的同名文件无法下载(要么会忽略要么会覆盖)；

cloese #15

解决方案

注意，此PR对metadata有重大修改，由于dicttoxml生成的xml无法移除标签和重名标签，改用dict2xml生成，通过emby刮削验证

修改MetaData字段；
在视频基本信息API中提取更多信息，并添加获取视频tag的API；
ugc视频时tp参数可以为{owner_uid}，例如yutto 'https://www.bilibili.com/video/BV1vZ4y1M7mQ/' -d 'download' --with-metadata -tp='{owner_uid}/{name}' 将存放到 download/100969474目录

类型

SigureMo

感谢贡献，但有些细节麻烦看一下～

另外需要注意下 CI 中的 Lint error～

SigureMo · 2023-05-09T12:54:23Z

yutto/api/ugc_video.py

+
+    actors: list[Actor] = []
+    if res_json_data.get("staff") and isinstance(res_json_data["staff"], list):
+        _index: int = 0
+        for staff in res_json_data["staff"]:
+            actors.append(
+                Actor(
+                    name=staff["name"],
+                    role=staff["title"],
+                    thumb=staff["face"],
+                    profile=f"https://space.bilibili.com/{staff['mid']}",
+                    order=_index,
+                )
+            )
+            _index += 1
+    elif res_json_data.get("owner") and isinstance(res_json_data["owner"], dict):
+        actors.append(
+            Actor(
+                name=res_json_data["owner"]["name"],
+                role="UP主",
+                thumb=res_json_data["owner"]["face"],
+                profile=f"https://space.bilibili.com/{res_json_data['owner']['mid']}",
+                order=0,
+            )
+        )
+    else:
+        Logger.warning(f"视频 {avid} 未找到演职人员信息")
+
+    genres: list[str] = []
+    if res_json_data.get("tname") and isinstance(res_json_data["tname"], str):
+        genres.append(res_json_data["tname"])
+
+    tags: list[str] = await get_ugc_video_tag(session, avid)
+


这段逻辑是否可以提取到一个函数里呢？

这里有些奇怪，我还没仔细看你函数颗粒度，所以把生成metadata的都放在一起了，如果看起来比较长的话，可以补充一个ugc_info函数，稍等

SigureMo · 2023-05-09T12:54:53Z

yutto/api/ugc_video.py

@@ -45,6 +45,9 @@ class _UgcVideoInfo(TypedDict):
    pubdate: int
    description: str
    pages: list[_UgcVideoPageInfo]
+    genre: list[str]


请问 genre 是什么的缩写呢？

看到 commit message 里的描述了 genre 是分区是嘛？好奇怪的缩写 😂

有一说一我也觉得奇怪，在emby刮削的时候，genre被认为是「流派」...而且我自己把ugc内容当做movie来处理的，确实genre能识别；
于是使用genre对应分区名字；
如果有更好的选择，我觉得也可以改掉，目前这个nfo我还没有在emby和infuse上测试，尚不清楚genre能不能工作

SigureMo · 2023-05-09T12:57:13Z

yutto/utils/metadata.py

 class MetaData(TypedDict):
    title: str
    show_title: str
    plot: str
    thumb: str
    premiered: str
    dateadded: str
+    actor: list[Actor]
+    genre: list[str]
+    tag: list[str]


metadata 新增了字段，Bangumi 和 Cheese 是否可以对齐一下呢？可以先空着，记个 TODO 即可

担心Bangumi和Cheese和UGC的MetaData格式不同，我觉得可以先不加，不加有影响吗？如果不加会导致bangumi报错的话，那就加个可好了...实际上MetaData应该区分UGCMetaData, BangumiMetaData吧？

不加有影响吗

不加会影响类型提示吧，Linter（pyright）应该会过不了

实际上MetaData应该区分UGCMetaData, BangumiMetaData吧？

Metadata 这块因为我没有深度使用过细节上不太清楚，不过确实是可以这样子的

本地没lint到...ok，刚刚看了下，我准备加上这几个actor genre tag website

lc4t · 2023-05-09T13:08:00Z

嗯..稍等，还有一些想改的

nfo添加一个website指向视频主页
tp参数中，添加一个owner_uid，因为出现了不同UP主同名稿件，需要分文件夹，不然会导致无法下载第二个(overwrite会覆盖，不overwrite会忽略)

SigureMo · 2023-05-09T15:20:00Z

yutto/extractor/common.py

@@ -147,6 +152,7 @@ async def extract_ugc_video_data(
            "series_title": UNKNOWN,
            "pubdate": UNKNOWN,
            "download_date": ugc_video_info["metadata"]["dateadded"],
+            "owner_uid": owner_uid,


这里也在 cheese 和 bangumi 里补充一下吧，直接 UNKNOWN 即可，另外文档（README.md）里的「存放子路径模板」需要加一下这一个字段，并在表格里说明下什么情况会有这个字段～

稍晚一些，可能要等几天，我先测试emby的nfo，一起调整

SigureMo · 2023-05-09T15:22:20Z

yutto/extractor/common.py

@@ -139,6 +139,11 @@ async def extract_ugc_video_data(
        subtitles = await get_ugc_video_subtitles(session, avid, cid) if args.require_subtitle else []
        danmaku = await get_danmaku(session, cid, args.danmaku_format) if args.require_danmaku else EmptyDanmakuData
        metadata = ugc_video_info["metadata"] if args.require_metadata else None
+        owner_uid: str | None = (


这里有可能是 None 吗？如果可能是 None 的话，是怎么通过 PathTemplateVariableDict 的类型检查的呢？奇怪……

嗯..考虑改为str好了，UGC内容理论上一定有uid，但是非UGC就不一定了

lc4t · 2023-05-09T16:23:18Z

在emby中选择混合/家庭内容，最终都会被刮削到movie, 整体来看yutto生成的metadata格式有点和emby不一致，导致无法识别，目前尚不清楚其他用户是如何使用yutto生成的metadata的。

几个主要的区别在于：

emby使用movie而不是episodedetails
emby的premiered仅为%Y-%m-%d
emby的列表格式是<x>1</x><x>2</x>而无需item标签, 这里主要是库造成的，作者没有合并PR处理这个问题fix issue #39 ability to not include item tags quandyfactory/dicttoxml#44
emby的actor有固定选项
genre是风格的意思
emby允许出现同名标签，同样，库也没提供这个选项，PR也没合并Make folding for lists optional quandyfactory/dicttoxml#64

一个emby刮削完成后的参考是：

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<movie>
  <plot><![CDATA[这里是简介~~~~]]></plot>
  <outline />
  <lockdata>false</lockdata>
  <dateadded>2023-05-10 00:13:55</dateadded>
  <title>这是标题~~~</title>
  <actor>
    <name>人物2</name>
    <type>Actor</type>
  </actor>
  <actor>
    <name>人物3</name>
    <role>UP主</role>
    <type>Actor</type>
  </actor>
  <actor>
    <name>人物1</name>
    <type>Producer</type>
  </actor>
  <director>人物4</director>
  <year>2023</year>
  <sorttitle>这也是标题~~~~</sorttitle>
  <premiered>2023-05-08</premiered>
  <releasedate>2023-05-08</releasedate>
  <genre>风格1</genre>
  <genre>风格2</genre>
  <studio>工作室1</studio>
  <studio>工作室2</studio>
  <tag>标签1</tag>
  <tag>标签2</tag>
  <fileinfo>
    <streamdetails />
  </fileinfo>
  <show_title>这也是标题~~~</show_title>
  <source />
  <original_filename />
  <website>https://www.bilibili.com/video/BV</website>
</movie>

SigureMo · 2023-05-10T01:34:49Z

在emby中选择混合/家庭内容，最终都会被刮削到movie, 整体来看yutto生成的metadata格式有点和emby不一致，导致无法识别，目前尚不清楚其他用户是如何使用yutto生成的metadata的。

这个我也不太清楚，这个功能是 @WhileKing 最初在 #20 添加的，如果 @WhileKing 认为可以的话，这些字段是可以修改的～

SigureMo · 2023-05-10T16:15:28Z

切换dicttoxml->dict2xml

~~@lc4t 请问切换 dicttoxml 到 dict2xml 的原因是什么呢？这两个库有什么区别呢？~~

没事了，看到上面的回复 edit 过了 😂

lc4t · 2023-05-10T16:19:30Z

@SigureMo 可以看下上面关于emby生成的nfo和原来metadata生成的nfo对比说明，主要问题在于：在emby支持的.nfo中，列表中的元素不希望使用<item>并列，同名节点直接并列即可，而dicttoxml有两个PR支持了这个feature，但是作者没有合并，于是切换到dict2xml

这里存在一个假设，我认为需要metadata的用户，应该都是要刮削的，那么符合emby应该是较为通用的方案。

SigureMo

LGTM

README.md

更新视频NFO信息: 分区=genre, 标签=tag, 作者=actor

2a5c018

SigureMo reviewed May 9, 2023

View reviewed changes

SigureMo changed the title ~~🍱 feat: 在.nfo中为UGC视频添加作者、标签、分区字段~~ ✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 May 9, 2023

SigureMo changed the title ~~✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段~~ ✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 May 9, 2023

nfo添加website;在ugc视频tp参数中支持{owner_uid}以规避同名稿件

b8ea250

SigureMo reviewed May 9, 2023

View reviewed changes

lc4t added 2 commits May 10, 2023 22:37

修改文档, 调整owner_uid

9736331

元数据重大修改,支持emby识别！切换dicttoxml->dict2xml

87da904

SigureMo approved these changes May 10, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

add a whitespace

c74e366

SigureMo merged commit b42ff9f into yutto-dev:main May 10, 2023

lc4t deleted the update_nfo branch May 10, 2023 17:22

lc4t mentioned this pull request May 25, 2023

🐛 元数据时间精度 #139

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 #132

✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 #132

lc4t commented May 9, 2023 •

edited

Loading

SigureMo left a comment •

edited

Loading

SigureMo May 9, 2023

lc4t May 9, 2023

SigureMo May 9, 2023

SigureMo May 9, 2023

lc4t May 9, 2023

SigureMo May 9, 2023

lc4t May 9, 2023

SigureMo May 9, 2023

lc4t May 9, 2023

lc4t commented May 9, 2023

SigureMo May 9, 2023

lc4t May 9, 2023

SigureMo May 9, 2023

lc4t May 9, 2023

lc4t commented May 9, 2023 •

edited

Loading

SigureMo commented May 10, 2023

SigureMo commented May 10, 2023 •

edited

Loading

lc4t commented May 10, 2023 •

edited

Loading

SigureMo left a comment

✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 #132

✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 #132

Conversation

lc4t commented May 9, 2023 • edited Loading

动机

解决方案

类型

SigureMo left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lc4t commented May 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lc4t commented May 9, 2023 • edited Loading

SigureMo commented May 10, 2023

SigureMo commented May 10, 2023 • edited Loading

lc4t commented May 10, 2023 • edited Loading

SigureMo left a comment

Choose a reason for hiding this comment

lc4t commented May 9, 2023 •

edited

Loading

SigureMo left a comment •

edited

Loading

lc4t commented May 9, 2023 •

edited

Loading

SigureMo commented May 10, 2023 •

edited

Loading

lc4t commented May 10, 2023 •

edited

Loading