Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 #132

Merged
merged 5 commits into from
May 10, 2023

Conversation

lc4t
Copy link
Contributor

@lc4t lc4t commented May 9, 2023

动机

  1. 生成的.nfo元数据不包含作者和标签,导致刮削时不能按作者分类;
  2. 不能自动将一个UP的稿件放在同一目录中,导致不同UP的同名文件无法下载(要么会忽略要么会覆盖);

cloese #15

解决方案

注意,此PR对metadata有重大修改,由于dicttoxml生成的xml无法移除标签和重名标签,改用dict2xml生成,通过emby刮削验证

  1. 修改MetaData字段;
  2. 在视频基本信息API中提取更多信息,并添加获取视频tag的API;
  3. ugc视频时tp参数可以为{owner_uid},例如yutto 'https://www.bilibili.com/video/BV1vZ4y1M7mQ/' -d 'download' --with-metadata -tp='{owner_uid}/{name}' 将存放到 download/100969474目录

类型

  • ✨ feat: 添加新功能
  • 🐛 fix: 修复 bug
  • 📝 docs: 对文档进行修改
  • ♻️ refactor: 代码重构(既不是新增功能,也不是修改 bug 的代码变动)
  • ⚡ perf: 提高性能的代码修改
  • 🧑‍💻 dx: 优化开发体验
  • 🔨 workflow: 工作流变动
  • 🏷️ types: 类型声明修改
  • 🚧 wip: 工作正在进行中
  • ✅ test: 测试用例添加及修改
  • 🔨 build: 影响构建系统或外部依赖关系的更改
  • 👷 ci: 更改 CI 配置文件和脚本
  • ❓ chore: 其它不涉及源码以及测试的修改
  • ⬆️ deps: 依赖项修改
  • 🔖 release: 发布新版本

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢贡献,但有些细节麻烦看一下~

另外需要注意下 CI 中的 Lint error~

Comment on lines 98 to 131

actors: list[Actor] = []
if res_json_data.get("staff") and isinstance(res_json_data["staff"], list):
_index: int = 0
for staff in res_json_data["staff"]:
actors.append(
Actor(
name=staff["name"],
role=staff["title"],
thumb=staff["face"],
profile=f"https://space.bilibili.com/{staff['mid']}",
order=_index,
)
)
_index += 1
elif res_json_data.get("owner") and isinstance(res_json_data["owner"], dict):
actors.append(
Actor(
name=res_json_data["owner"]["name"],
role="UP主",
thumb=res_json_data["owner"]["face"],
profile=f"https://space.bilibili.com/{res_json_data['owner']['mid']}",
order=0,
)
)
else:
Logger.warning(f"视频 {avid} 未找到演职人员信息")

genres: list[str] = []
if res_json_data.get("tname") and isinstance(res_json_data["tname"], str):
genres.append(res_json_data["tname"])

tags: list[str] = await get_ugc_video_tag(session, avid)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段逻辑是否可以提取到一个函数里呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有些奇怪,我还没仔细看你函数颗粒度,所以把生成metadata的都放在一起了,如果看起来比较长的话,可以补充一个ugc_info函数,稍等

@@ -45,6 +45,9 @@ class _UgcVideoInfo(TypedDict):
pubdate: int
description: str
pages: list[_UgcVideoPageInfo]
genre: list[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请问 genre 是什么的缩写呢?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看到 commit message 里的描述了 genre 是分区是嘛?好奇怪的缩写 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有一说一我也觉得奇怪,在emby刮削的时候,genre被认为是「流派」...而且我自己把ugc内容当做movie来处理的,确实genre能识别;
于是使用genre对应分区名字;
如果有更好的选择,我觉得也可以改掉,目前这个nfo我还没有在emby和infuse上测试,尚不清楚genre能不能工作

class MetaData(TypedDict):
title: str
show_title: str
plot: str
thumb: str
premiered: str
dateadded: str
actor: list[Actor]
genre: list[str]
tag: list[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata 新增了字段,Bangumi 和 Cheese 是否可以对齐一下呢?可以先空着,记个 TODO 即可

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

担心Bangumi和Cheese和UGC的MetaData格式不同,我觉得可以先不加,不加有影响吗?如果不加会导致bangumi报错的话,那就加个可好了...实际上MetaData应该区分UGCMetaData, BangumiMetaData吧?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不加有影响吗

不加会影响类型提示吧,Linter(pyright)应该会过不了

实际上MetaData应该区分UGCMetaData, BangumiMetaData吧?

Metadata 这块因为我没有深度使用过细节上不太清楚,不过确实是可以这样子的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本地没lint到...ok,刚刚看了下,我准备加上这几个actor genre tag website

@SigureMo SigureMo changed the title 🍱 feat: 在.nfo中为UGC视频添加作者、标签、分区字段 ✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 May 9, 2023
@SigureMo SigureMo changed the title ✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 ✨ feat: 在.nfo中为UGC视频添加作者、标签、分区字段 May 9, 2023
@lc4t
Copy link
Contributor Author

lc4t commented May 9, 2023

嗯..稍等,还有一些想改的

  • nfo添加一个website指向视频主页
  • tp参数中,添加一个owner_uid,因为出现了不同UP主同名稿件,需要分文件夹,不然会导致无法下载第二个(overwrite会覆盖,不overwrite会忽略)

@@ -147,6 +152,7 @@ async def extract_ugc_video_data(
"series_title": UNKNOWN,
"pubdate": UNKNOWN,
"download_date": ugc_video_info["metadata"]["dateadded"],
"owner_uid": owner_uid,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里也在 cheese 和 bangumi 里补充一下吧,直接 UNKNOWN 即可,另外文档(README.md)里的「存放子路径模板」需要加一下这一个字段,并在表格里说明下什么情况会有这个字段~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

稍晚一些,可能要等几天,我先测试emby的nfo,一起调整

@@ -139,6 +139,11 @@ async def extract_ugc_video_data(
subtitles = await get_ugc_video_subtitles(session, avid, cid) if args.require_subtitle else []
danmaku = await get_danmaku(session, cid, args.danmaku_format) if args.require_danmaku else EmptyDanmakuData
metadata = ugc_video_info["metadata"] if args.require_metadata else None
owner_uid: str | None = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有可能是 None 吗?如果可能是 None 的话,是怎么通过 PathTemplateVariableDict 的类型检查的呢?奇怪……

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯..考虑改为str好了,UGC内容理论上一定有uid,但是非UGC就不一定了

@lc4t
Copy link
Contributor Author

lc4t commented May 9, 2023

在emby中选择混合/家庭内容,最终都会被刮削到movie, 整体来看yutto生成的metadata格式有点和emby不一致,导致无法识别,目前尚不清楚其他用户是如何使用yutto生成的metadata的。

几个主要的区别在于:

  1. emby使用movie而不是episodedetails
  2. emby的premiered仅为%Y-%m-%d
  3. emby的列表格式是<x>1</x><x>2</x>而无需item标签, 这里主要是库造成的,作者没有合并PR处理这个问题fix issue #39 ability to not include item tags quandyfactory/dicttoxml#44
  4. emby的actor有固定选项
  5. genre是风格的意思
  6. emby允许出现同名标签,同样,库也没提供这个选项,PR也没合并Make folding for lists optional quandyfactory/dicttoxml#64

一个emby刮削完成后的参考是:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<movie>
  <plot><![CDATA[这里是简介~~~~]]></plot>
  <outline />
  <lockdata>false</lockdata>
  <dateadded>2023-05-10 00:13:55</dateadded>
  <title>这是标题~~~</title>
  <actor>
    <name>人物2</name>
    <type>Actor</type>
  </actor>
  <actor>
    <name>人物3</name>
    <role>UP主</role>
    <type>Actor</type>
  </actor>
  <actor>
    <name>人物1</name>
    <type>Producer</type>
  </actor>
  <director>人物4</director>
  <year>2023</year>
  <sorttitle>这也是标题~~~~</sorttitle>
  <premiered>2023-05-08</premiered>
  <releasedate>2023-05-08</releasedate>
  <genre>风格1</genre>
  <genre>风格2</genre>
  <studio>工作室1</studio>
  <studio>工作室2</studio>
  <tag>标签1</tag>
  <tag>标签2</tag>
  <fileinfo>
    <streamdetails />
  </fileinfo>
  <show_title>这也是标题~~~</show_title>
  <source />
  <original_filename />
  <website>https://www.bilibili.com/video/BV</website>
</movie>

@SigureMo
Copy link
Member

在emby中选择混合/家庭内容,最终都会被刮削到movie, 整体来看yutto生成的metadata格式有点和emby不一致,导致无法识别,目前尚不清楚其他用户是如何使用yutto生成的metadata的。

这个我也不太清楚,这个功能是 @WhileKing 最初在 #20 添加的,如果 @WhileKing 认为可以的话,这些字段是可以修改的~

@SigureMo
Copy link
Member

SigureMo commented May 10, 2023

切换dicttoxml->dict2xml

@lc4t 请问切换 dicttoxml 到 dict2xml 的原因是什么呢?这两个库有什么区别呢?

没事了,看到上面的回复 edit 过了 😂

@lc4t
Copy link
Contributor Author

lc4t commented May 10, 2023

@SigureMo 可以看下上面关于emby生成的nfo和原来metadata生成的nfo对比说明,主要问题在于:在emby支持的.nfo中,列表中的元素不希望使用<item>并列,同名节点直接并列即可,而dicttoxml有两个PR支持了这个feature,但是作者没有合并,于是切换到dict2xml

这里存在一个假设,我认为需要metadata的用户,应该都是要刮削的,那么符合emby应该是较为通用的方案。

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

README.md Outdated Show resolved Hide resolved
@SigureMo SigureMo merged commit b42ff9f into yutto-dev:main May 10, 2023
@lc4t lc4t deleted the update_nfo branch May 10, 2023 17:22
@lc4t lc4t mentioned this pull request May 25, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants