Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The video reconstruction results have gray borders. #31

Open
EveningLin opened this issue Oct 29, 2024 · 13 comments
Open

The video reconstruction results have gray borders. #31

EveningLin opened this issue Oct 29, 2024 · 13 comments

Comments

@EveningLin
Copy link

image
Currently, the reconstructed videos from multiple inputs of different sizes show gray borders of varying sizes. I would like to know what kind of input can produce reconstructed videos without gray borders.
目前多次不同大小输入的视频对应的重建结果会出现大小不固定的灰色边框,我想知道怎么样的输入才能得到没有灰色边框的重建视频

@nightsnack
Copy link
Contributor

你的调用代码能发一下吗?我们是720x1280的呀,你的为啥是个正方形?

@EveningLin
Copy link
Author

我们用自己的图片进行测试 图片大小就是(512,512)的 那其他大小的输入怎么解决灰边问题呢

@nightsnack
Copy link
Contributor

呃 为什么会有图片?我们还没release ti2v模型呀?

@nightsnack
Copy link
Contributor

“怎么解决灰边问题呢”
切了行不行?

@EveningLin
Copy link
Author

笑死 这么粗暴嘛哈哈哈 我想着是不是有一个在一个倍率范围内都ok 我用的是vae的重建代码

@EveningLin
Copy link
Author

就你们提供的这份哈哈哈
vae_inference.py

@nightsnack
Copy link
Contributor

里面有几个参数,控制切片大小的,我们切的320的patch, spacial和temporal都有一定的overlap,在一定范围内是可以兼容的。你看着改改吧。

@EveningLin
Copy link
Author

okk非常感谢!!!

@EveningLin
Copy link
Author

https://github.com/user-attachments/assets/862a2eeb-84a8-4636-a8bb-ec81452db3d0
全灰都给我试出来了哈哈哈 好像越小灰色面积越大哈哈哈

@nightsnack
Copy link
Contributor

直接拿你的数据post train呗,vae又不大。

@OliviaWang123456
Copy link

https://github.com/user-attachments/assets/862a2eeb-84a8-4636-a8bb-ec81452db3d0 全灰都给我试出来了哈哈哈 好像越小灰色面积越大哈哈哈

img的话直接走spatial层ok(直接forwrd时判断layer name不带'temp'的就可以),要注意下spatial tiling size和overlapping的大小,如果是512的图直接disable tiling行的,我们现在提供的vae code适用于当前这个video tube的tiling压缩和解压,每个tube大小都设计了各自的tiling&overlapping size, 你也可以自行调节哈

@HanLiii
Copy link

HanLiii commented Nov 16, 2024

我也遇到了这个问题,请问 w,h 的选择有什么要求呢?以下是我的代码
`def run_base():
# num frames: 65 or 221
# change num_gpus for multi-gpu inference
config = AllegroConfig(model_path="rhymes-ai/Allegro",
cpu_offload=False,
num_gpus=2)
engine = VideoSysEngine(config)

positive_prompt = """

(masterpiece), (best quality), (ultra-detailed), (unwatermarked),
{}
emotional, harmonious, vignette, 4k epic detailed, shot on kodak, 35mm photo,
sharp focus, high budget, cinemascope, moody, epic, gorgeous
"""

negative_prompt = """

nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality,
low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry.
"""

user_prompt = "A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats."
num_step, cfg_scale, rand_seed = 50, 7.5, 42
input_prompt = positive_prompt.format(user_prompt.lower().strip())

height = 320
width = 640
video = engine.generate(
    input_prompt, 
    negative_prompt=negative_prompt, 
    num_frames=88,
    height=height,
    width=width,
    num_inference_steps=num_step,
    guidance_scale=cfg_scale,
    max_sequence_length=512,
    seed=rand_seed
).video[0]



engine.save_video(video, f"./outputs/test.mp4")`
test_320.mp4

@nightsnack
Copy link
Contributor

@HanLiii w,h 的要求是必须严格按照模型参数里面提供的数值来设定。即720x1280,任何其他的数值和比例都会出现未知的结果。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants