Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于 _C_neuron.ParametricLIF_hard_reset_fptt_with_grad #14

Open
Jee-King opened this issue Jul 15, 2022 · 13 comments
Open

关于 _C_neuron.ParametricLIF_hard_reset_fptt_with_grad #14

Jee-King opened this issue Jul 15, 2022 · 13 comments
Labels
question Further information is requested

Comments

@Jee-King
Copy link

class ParametricLIFMultiStep(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x_seq, v, v_threshold, v_reset, alpha, detach_reset, grad_surrogate_function_index, reciprocal_tau, detach_input):
        if v_reset is None:
            raise NotImplementedError

        spike_seq, v_next, grad_s_to_h, grad_v_to_h, grad_h_to_rtau = _C_neuron.ParametricLIF_hard_reset_fptt_with_grad(x_seq, v, v_threshold, v_reset, alpha, detach_reset, grad_surrogate_function_index, reciprocal_tau, detach_input)
        ctx.save_for_backward(grad_s_to_h, grad_v_to_h, grad_h_to_rtau)
        ctx.reciprocal_tau = reciprocal_tau
        ctx.detach_input = detach_input

        return spike_seq, v_next

您好,我有两个问题希望您帮忙解答一下,

  1. 这段代码 中 _C_neuron.ParametricLIF_hard_reset_fptt_with_grad 内部数据处理的流程是什么,然后函数返回的几个变量分别是什么意思呢?
  2. 我调试发现_C_neuron.ParametricLIF_hard_reset_fptt_with_grad 输入的x_seq维度是[T,B,C,H,W],假设两层是conv1-LIFnode1-conv2-LIFnode2。我理解的处理数据正常的流程应该是每一个时刻的数据[1,B,C,H,W]顺序进入conv1-LIFnode1-conv2-LIFnode2。但是目前的处理方式感觉是所有的数据[T,B,C,H,W]先经过conv1-LIFnode1,然后再进全部进入conv2-LIFnode2。不知道我理解的对不对,这两种方式是否有所区别?
@fangwei123456 fangwei123456 added the question Further information is requested label Jul 15, 2022
@fangwei123456
Copy link
Owner

这应该是老版本的代码。_C_neuron是C++/CUDA扩展,需要去原始的C++函数看函数的实现。ParametricLIF_hard_reset_fptt_with_grad 是PLIF神经元,在hard reset的情况下,使用多步前向传播并计算部分数据的梯度。
spike_seq是神经元的输出脉冲序列。
grad_s_to_h 表示的是spikeh的导数,其他的类似。

第二个问题,参考以下教程:
https://spikingjelly.readthedocs.io/zh_CN/0.0.0.0.12/clock_driven/10_propagation_pattern.html

@Jee-King
Copy link
Author

非常感谢您的解答,另外SEW的代码是不是只能使用老版本代码,因为readme 将下载的spikingjelly版本回滚了。因为不太熟悉C++/CUDA,所以新版本可以用的话就太好了:)

@fangwei123456
Copy link
Owner

SEW用新版本没问题,而且有教程:
https://spikingjelly.readthedocs.io/zh_CN/latest/activation_based/train_large_scale_snn.html

@fangwei123456
Copy link
Owner

这个教程是spiking resnet的,但很容易改成sew resnet,而且框架里面提供了网络定义的代码:
https://github.com/fangwei123456/spikingjelly/blob/master/spikingjelly/activation_based/model/sew_resnet.py

@Jee-King
Copy link
Author

好的 非常感谢您的解答!

@Jee-King
Copy link
Author

Jee-King commented Jul 16, 2022

您好,不好意思再次打扰您,请问是否有方式同时返回 当前层的spike和对应的的膜电压

@fangwei123456
Copy link
Owner

某个版本后的SJ框架中就支持返回v_seq和spike_seq了

@Jee-King
Copy link
Author

    def neuronal_charge(self, x: torch.Tensor):
        self.v_float_to_tensor(x)
        if self.decay_input:
            if self.v_reset is None or self.v_reset == 0.:
                self.v = self.neuronal_charge_decay_input_reset0(x, self.v, self.tau)
            else:
                self.v = self.neuronal_charge_decay_input(x, self.v, self.v_reset, self.tau)

        else:
            if self.v_reset is None or self.v_reset == 0.:
                self.v = self.neuronal_charge_no_decay_input_reset0(x, self.v, self.tau)
            else:
                self.v = self.neuronal_charge_no_decay_input(x, self.v, self.v_reset, self.tau)

    @staticmethod
    @torch.jit.script
    def neuronal_charge_decay_input_reset0(x: torch.Tensor, v: torch.Tensor, tau: float):
        v = v + (x - v) / tau
        return v

您好,我在自己项目使用您的代码过程中,出现了下面的报错。我的输入是[5,20,3,288,288], 使用的是LIF模型。在debug过程中发现,multi_step_forward()函数中对每个事件步进行 y = self.single_step_forward(x_seq[t]). 当t=0时,代码不会报错。但是当t=1的时候,代码在调用neuronal_charge()时就会出现下面的错误。最终定位到出错的代码行是上面的第五行。请问这个问题您是否遇到过。

Traceback (most recent call last):
  File "/home/iccd/cv23-spiking/pytracking/ltr/trainers/base_trainer.py", line 70, in train
    self.train_epoch()
  File "/home/iccd/cv23-spiking/pytracking/ltr/trainers/ltr_trainer.py", line 80, in train_epoch
    self.cycle_dataset(loader)
  File "/home/iccd/cv23-spiking/pytracking/ltr/trainers/ltr_trainer.py", line 66, in cycle_dataset
    loss.backward()
  File "/home/iccd/miniconda3/envs/pytracking/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/iccd/miniconda3/envs/pytracking/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
    Variable._execution_engine.run_backward(
  File "/home/iccd/miniconda3/envs/pytracking/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "/home/iccd/miniconda3/envs/pytracking/lib/python3.8/site-packages/spikingjelly-0.0.0.0.13-py3.8.egg/spikingjelly/activation_based/surrogate.py", line 638, in backward
    return atan_backward(grad_output, ctx.saved_tensors[0], ctx.alpha)
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed: 

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)


template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void func_1(float* t0, float v1, float* t2, float* aten_mul_flat) {
{
  float v = __ldg(t2 + 18 * ((512 * blockIdx.x + threadIdx.x) / 18) + (512 * blockIdx.x + threadIdx.x) % 18);
  float v_1 = __ldg(t0 + 18 * ((512 * blockIdx.x + threadIdx.x) / 18) + (512 * blockIdx.x + threadIdx.x) % 18);
  aten_mul_flat[512 * blockIdx.x + threadIdx.x] = ((1.f / (v + 1.f)) * v1) * v_1;
}
}

@fangwei123456
Copy link
Owner

在新版SJ框架中,如果不使用cupy后端,多步的前向传播是通过多次调用单步实现的,而单步的实现是纯python,但你这个报错是cuda编译问题:

RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

应该是你在某些地方引入自定义CUDA扩展,但CUDA的代码有问题导致的

@Jee-King
Copy link
Author

但是第一次事件步循环是没有问题的,第二次循环才报错。我再调试看看。谢谢您的解答。

@Jee-King
Copy link
Author

Jee-King commented Aug 2, 2022

hi,请问你是否试过使用SEW-Resnet18训练测试过dvsgesture?
我尝试了使用您paper里用的结构和SEW-Resnet18分别训练dvsgesture。【我使用你的回滚版本训练你paper里的结构,使用新的spikingjelly训练的SEW-Resnet18】。能训练出你文章的效果,但是SEW-Resnet18结果很差,acc1只有85%左右,使用的训练参数是一样的。
681030bda7f2947620c59ad48cbc277
请问这个原因可能是什么,是对训练参数比较敏感吗?

@fangwei123456
Copy link
Owner

SEW-Resnet18这个网络适合imagenet,对于dvs手势而言过于庞大了,因此原文才重新设计了一个小网络

@Jee-King
Copy link
Author

Jee-King commented Aug 2, 2022

噢噢好的,多谢您及时的解答

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants