Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to binding a custom iterator? #300

Closed
starwing opened this issue Jul 30, 2023 · 20 comments
Closed

How to binding a custom iterator? #300

starwing opened this issue Jul 30, 2023 · 20 comments

Comments

@starwing
Copy link

I'm trying to binding my API to Lua, but I have meet troubles when binding iterator, this is a minimual test for that:

fn test_iter() -> LuaResult<()> {
    use std::collections::BTreeSet;

    let lua = Lua::new();
    #[derive(Default)]
    struct MySet(BTreeSet<i32>);

    impl Deref for MySet {
        type Target = BTreeSet<i32>;

        fn deref(&self) -> &Self::Target {
            &self.0
        }
    }

    impl DerefMut for MySet {
        fn deref_mut(&mut self) -> &mut Self::Target {
            &mut self.0
        }
    }

    struct MyIter<'a>(std::collections::btree_set::Iter<'a, i32>);

    impl<'a> Iterator for MyIter<'a> {
        type Item = i32;

        fn next(&mut self) -> Option<Self::Item> {
            self.0.next().cloned()
        }
    }

    impl<'a> LuaUserData for MyIter<'a> {
        fn add_methods<'lua, M: LuaUserDataMethods<'lua, Self>>(methods: &mut M) {
            methods.add_meta_method_mut(LuaMetaMethod::Call, |_, iter, ()| Ok(iter.next()));
        }
    }

    impl LuaUserData for MySet {
        fn add_methods<'lua, M: LuaUserDataMethods<'lua, Self>>(methods: &mut M) {
            methods.add_method("iter", |lua, set, ()| {
                let iter = MyIter::<'lua>(set.iter());
                Ok(LuaValue::UserData(lua.create_userdata(iter)?)) // <--- error here
            });
        }
    }

    let mut set = MySet::default();
    set.insert(1);
    set.insert(2);
    set.insert(3);
    lua.globals().set("set", set)?;
    lua.load(
        r#"
        print(set)
        for v in set:iter() do
            print(v)
        end
        "#,
    )
    .exec()
}

it errors:

error[E0521]: borrowed data escapes outside of closure
  --> main.rs:57:39
   |
55 |             methods.add_method("iter", |lua, set, ()| {
   |                                              ---
   |                                              |
   |                                              `set` is a reference that is only valid in the closure body
   |                                              has type `&'1 MySet`
56 |                 let iter = MyIter::<'lua>(set.iter());
57 |                 Ok(LuaValue::UserData(lua.create_userdata(iter)?))
   |                                       ^^^^^^^^^^^^^^^^^^^^^^^^^
   |                                       |
   |                                       `set` escapes the closure body here
   |                                       argument requires that `'1` must outlive `'static`

error: lifetime may not live long enough
  --> main.rs:57:39
   |
54 |         fn add_methods<'lua, M: LuaUserDataMethods<'lua, Self>>(methods: &mut M) {
   |                        ---- lifetime `'lua` defined here
...
57 |                 Ok(LuaValue::UserData(lua.create_userdata(iter)?))
   |                                       ^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'lua` must outlive `'static`

For more information about this error, try `rustc --explain E0521`.

So how could I deal with it?

@KKould
Copy link

KKould commented Jan 3, 2024

I have the same question, how to deal with the iterator lifetime

@zerodegress
Copy link

Try this.
If you are single thread program, use Rc<RefCell>,if not,use Arc<Mutex>.
I suggest always let script engines run on single thread to reduce unnecessary hassles.

fn test_iter() -> mlua::Result<()> {
    use std::collections::BTreeSet;

    let lua = Lua::new();
    #[derive(Default, Clone)]
    struct MySet(Rc<RefCell<BTreeSet<i32>>>);

    impl Deref for MySet {
        type Target = RefCell<BTreeSet<i32>>;

        fn deref(&self) -> &Self::Target {
            &self.0
        }
    }

    struct MyIter{
        set: MySet,
        cur_index: usize,
    }

    impl MyIter {
        fn new(set: MySet) -> Self {
            MyIter {
                set,
                cur_index: 0,
            }
        }
    }

    impl Iterator for MyIter {
        type Item = i32;

        fn next(&mut self) -> Option<Self::Item> {
            let res = (*self.set).borrow().iter().nth(self.cur_index).map(|x| *x);
            self.cur_index += 1;
            res
        }
    }

    impl<'a> UserData for MyIter {
        fn add_methods<'lua, M: UserDataMethods<'lua, Self>>(methods: &mut M) {
            methods.add_meta_method_mut(MetaMethod::Call, |_, iter, ()| Ok(iter.next()));
        }
    }

    impl UserData for MySet {
        fn add_methods<'lua, M: UserDataMethods<'lua, Self>>(methods: &mut M) {
            methods.add_method("iter", |lua, set, ()| {
                let iter = MyIter::new(set.clone());
                Ok(Value::UserData(lua.create_userdata(iter)?)) // <--- error here
            });
        }
    }

    let mut set = MySet::default();
    set.borrow_mut().insert(1);
    set.borrow_mut().insert(2);
    set.borrow_mut().insert(3);
    lua.globals().set("set", set)?;
    lua.load(
        r#"
        print(set)
        for v in set:iter() do
            print(v)
        end
        "#,
    )
    .exec()
}

@starwing
Copy link
Author

starwing commented Jan 11, 2024

@zerodegress it makes iterating a BTreeSet O(n^2*log(n)) complexity.

@zerodegress
Copy link

@zerodegress it makes iterating a BTreeSet O(n^2*log(n)) complexity.

Unfortunately,you cannot predict that the set behind the iter is written as they are read,so you have to repeatedly iterate it to make sure sequentially read the iter while the set owned by other.
Better solution is to copy the whole set when iterate it if you can afford the costs of copy.

@khvzak
Copy link
Member

khvzak commented Jan 13, 2024

Sorry for the delayed response. There are few ways how to bind a custom iterator, eg. using scoped userdata, or tricks with self-referential types.

An example of using a self-referential struct that holds iterator to BTreeSet (any other container would work too):

use std::collections::{btree_set, BTreeSet};
use std::rc::Rc;

use mlua::{AnyUserData, Function, Lua, Result, UserDataMethods, UserDataRefMut};
use ouroboros::self_referencing;

#[self_referencing]
struct BTreeSetIter {
    set: Rc<BTreeSet<String>>,
    #[borrows(set)]
    #[covariant]
    iter: btree_set::Iter<'this, String>,
}

fn main() -> Result<()> {
    let lua = Lua::new();

    lua.register_userdata_type::<Rc<BTreeSet<String>>>(|reg| {
        reg.add_method("iter", |_, this, ()| {
            let next = Function::wrap(|lua, mut it: UserDataRefMut<BTreeSetIter>| {
                it.with_iter_mut(|iter| match iter.next() {
                    Some(value) => Ok(Some(lua.create_string(value)?)),
                    None => Ok(None),
                })
            });
            let iter_ud = AnyUserData::wrap(
                BTreeSetIterBuilder {
                    set: this.clone(),
                    iter_builder: |set| set.iter(),
                }
                .build(),
            );
            Ok((next, iter_ud))
        });
    })?;

    let my_set = Rc::new(
        ["foo", "bar", "baz"]
            .iter()
            .map(ToString::to_string)
            .collect::<BTreeSet<_>>(),
    );
    lua.globals().set("my_set", AnyUserData::wrap(my_set))?;

    lua.load(
        r#"
            for value in my_set:iter() do
                print(value)
            end
        "#,
    )
    .exec()
}

It prints

bar
baz
foo

when called and has the same complexity as iterating directly from Rust.

@starwing
Copy link
Author

Thanks for the reply!

Could you possibly provide some examples of how scoped userdata is used?

@khvzak
Copy link
Member

khvzak commented Jan 15, 2024

After running few experiments I can say thay putting iterator to a scope to create nonstatic userdata is not the best idea / not ergonomic. Basically you will need to have an extra function with own scope to obtain a valid iterator that can be used as non-static userdata.

@starwing
Copy link
Author

Okay, thanks for explain! :-)

@sxyazi
Copy link

sxyazi commented Jan 20, 2024

Hey @khvzak, thanks for your answer. I've been watching this issue for a while, and the code above is very close to what I want to achieve.

However, my data is scoped userdata, not a string. Simply changing lua.create_string() to lua.create_any_userdata() doesn't make it work. So my question is, is there any way to create userdata within the register?

Currently, I'm using a clumsy method where I create all the needed userdata at once during the initialization of the scope, then store them using set_named_user_value, and later retrieve them in register_userdata_type. However, this limits its use case - I can't apply it to a larger amount of data - the complete data.

@khvzak
Copy link
Member

khvzak commented Jan 23, 2024

@sxyazi would be possible to provide an example please, that I can use to run/fix?

In your code, is the window method problematic? Am I right that you want to return iterable object from window method?

@sxyazi
Copy link

sxyazi commented Jan 25, 2024

@khvzak Yeah of course. Here it is:

use mlua::{AnyUserData, IntoLua, Lua, MetaMethod, UserDataFields, UserDataMethods, Variadic};

const LUA_CODE: &str = r#"
  for _, file in pairs(folder) do
    print(file.name)
  end
"#;

struct Folder {
  files: Vec<File>,
}

#[derive(Clone)]
struct File {
  name: String,
  // ...other fields
}

fn main() -> anyhow::Result<()> {
  let lua = Lua::new();
  let folder = Folder { files: (0..100).map(|i| File { name: i.to_string() }).collect() };

  lua.register_userdata_type::<Folder>(|reg| {
    reg.add_meta_function(MetaMethod::Pairs, |lua, me: AnyUserData| {
      let iter = lua.create_function(|lua, (me, i): (AnyUserData, usize)| {
        let folder = me.borrow::<Folder>()?;
        let i = i + 1;
        Ok(if i > folder.files.len() {
          Variadic::new()
        } else {
          Variadic::from_iter([
            i.into_lua(lua)?,
            // This doesn't work.
            // lua.create_any_userdata_ref(&folder.files[i - 1]),

            // This `clone()` is not necessary, what I want is to return a reference to the file,
            // but I don't know how to do that. Since the `Folder` and `File` are not native types,
            // I don't want to add `Rc` or `Arc` to them.
            lua.create_any_userdata(folder.files[i - 1].clone())?.into_lua(lua)?,
          ])
        })
      })?;
      Ok((iter, me, 0))
    });
  })?;

  lua.register_userdata_type::<File>(|reg| {
    reg.add_field_method_get("name", |lua, me| lua.create_string(&me.name));
  })?;

  Ok(lua.scope(|scope| {
    lua.globals().set("folder", scope.create_any_userdata_ref(&folder)?)?;

    lua.load(LUA_CODE).exec()?;

    Ok(())
  })?)
}

And this is my current approach: initializing all the necessary file data at once in the scope:

use mlua::{Lua, UserDataFields, Value};

const LUA_CODE: &str = r#"
  for _, file in ipairs(folder.window) do
    print(file.name)
  end
"#;

struct Folder {
  files: Vec<File>,
}

#[derive(Clone)]
struct File {
  name: String,
  // ...other fields
}

fn main() -> anyhow::Result<()> {
  let lua = Lua::new();
  let folder = Folder { files: (0..100).map(|i| File { name: i.to_string() }).collect() };

  lua.register_userdata_type::<Folder>(|reg| {
    reg.add_field_function_get("window", |_, me| me.named_user_value::<Value>("partial_files"));
  })?;

  lua.register_userdata_type::<File>(|reg| {
    reg.add_field_method_get("name", |lua, me| lua.create_string(&me.name));
  })?;

  Ok(lua.scope(|scope| {
    let partial_files: Vec<_> = folder
      .files
      .iter()
      .take(30)
      .map(|file| scope.create_any_userdata_ref(file))
      .filter_map(|result| result.ok())
      .collect();

    let folder = scope.create_any_userdata_ref(&folder)?;
    folder.set_named_user_value("partial_files", partial_files)?;
    lua.globals().set("folder", folder)?;

    lua.load(LUA_CODE).exec()?;

    Ok(())
  })?)
}

However, this limits the use cases - I cannot initialize a large number (let's say 100,000) of files via create_any_userdata_ref(), it's very slow.

But by using an iterator, when the conditions are met, I can break in Lua to avoid the subsequent create_any_userdata_ref()s overhead.

@sxyazi
Copy link

sxyazi commented Feb 7, 2024

I want to add a create_any_userdata_ref to Lua, but after reading some of the source code, and it seems difficult to achieve without causing breaking changes, because register_userdata_type cannot access the scope's lifecycle.

@khvzak Would you accept adding an unsafe create_any_userdata_ref? We can use a different name to avoid confusion, so I only need to ensure the data's safety myself. If this is acceptable, I am willing to create a PR for it :)

@khvzak
Copy link
Member

khvzak commented Feb 7, 2024

The Scope::create_any_userdata_ref is already exist, maybe you meant something else?

@khvzak
Copy link
Member

khvzak commented Feb 7, 2024

so I only need to ensure the data's safety myself.

If you want to go to unsafe territory, you probably don't need a scope too.

Your example can be easily solved with just one unsafe:

struct PointerCell<T>(*const T);

impl<T> Deref for PointerCell<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        unsafe { &*self.0 }
    }
}

fn main() -> anyhow::Result<()> {
    let lua = Lua::new();
    let folder = Folder {
        files: (0..100)
            .map(|i| File {
                name: i.to_string(),
            })
            .collect(),
    };

    lua.register_userdata_type::<PointerCell<Folder>>(|reg| {
        reg.add_meta_function(MetaMethod::Pairs, |lua, me: AnyUserData| {
            let iter = lua.create_function(
                |lua, (folder, i): (UserDataRef<PointerCell<Folder>>, usize)| {
                    let i = i + 1;
                    if i > folder.files.len() {
                        ().into_lua_multi(lua)
                    } else {
                        (i, AnyUserData::wrap(PointerCell(&folder.files[i - 1])))
                            .into_lua_multi(lua)
                    }
                },
            )?;
            Ok((iter, me, 0))
        });
    })?;

    lua.register_userdata_type::<PointerCell<File>>(|reg| {
        reg.add_field_method_get("name", |lua, me| lua.create_string(&me.name));
    })?;

    lua.globals()
        .set("folder", AnyUserData::wrap(PointerCell(&folder)))?;

    lua.load(LUA_CODE).exec()?;

    Ok(())
}

It would work fine, you just need to ensure that reference to a &File will never be used after drop

@sxyazi
Copy link

sxyazi commented Feb 7, 2024

The Scope::create_any_userdata_ref is already exist, maybe you meant something else?

Yeah I mean to add a Lua::create_any_userdata_ref() next to Lua::create_any_userdata() but with an unsafe keyword - maybe we can make better name for it, since I can only access the Lua instead of Scope in the register_userdata_type() block, but I know that all my data comes from the scope, and I have a large scope block in the outer.

@sxyazi
Copy link

sxyazi commented Feb 7, 2024

so I only need to ensure the data's safety myself.

If you want to go to unsafe territory, you probably don't need a scope too.

Your example can be easily solved with just one unsafe:

struct PointerCell<T>(*const T);

impl<T> Deref for PointerCell<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        unsafe { &*self.0 }
    }
}

fn main() -> anyhow::Result<()> {
    let lua = Lua::new();
    let folder = Folder {
        files: (0..100)
            .map(|i| File {
                name: i.to_string(),
            })
            .collect(),
    };

    lua.register_userdata_type::<PointerCell<Folder>>(|reg| {
        reg.add_meta_function(MetaMethod::Pairs, |lua, me: AnyUserData| {
            let iter = lua.create_function(
                |lua, (folder, i): (UserDataRef<PointerCell<Folder>>, usize)| {
                    let i = i + 1;
                    if i > folder.files.len() {
                        ().into_lua_multi(lua)
                    } else {
                        (i, AnyUserData::wrap(PointerCell(&folder.files[i - 1])))
                            .into_lua_multi(lua)
                    }
                },
            )?;
            Ok((iter, me, 0))
        });
    })?;

    lua.register_userdata_type::<PointerCell<File>>(|reg| {
        reg.add_field_method_get("name", |lua, me| lua.create_string(&me.name));
    })?;

    lua.globals()
        .set("folder", AnyUserData::wrap(PointerCell(&folder)))?;

    lua.load(LUA_CODE).exec()?;

    Ok(())
}

It would work fine, you just need to ensure that reference to a &File will never be used after drop

Thanks for your pointer! I never thought it would be so simple!

Out of curiosity, are there any other safer ways to achieve #300 (comment)? If not, I'll go with this approach ;)

Edit to provide more details: the file list is just a part of the data I need to initialize. I have a lot of other information that needs to be initialized as well, so I need to "scope" it.

@khvzak
Copy link
Member

khvzak commented Feb 8, 2024

Out of curiosity, are there any other safer ways to achieve #300 (comment)? If not, I'll go with this approach ;)

I'm thinking about providing access to a current scope from userdata methods/etc. I need to run few experiments to see soundness of this approach.
It seems would solve many problems.

@sxyazi
Copy link

sxyazi commented Feb 8, 2024

Thanks for your response @khvzak!

Yesterday, I made some attempts and drew inspiration from your pointer. I successfully converted Yazi to unsafe mode in sxyazi/yazi@c12a735 and sxyazi/yazi@25c9655, and I'm posting it here hope it will be helpful for your experiments.

First, I created a separate scope function where I called Lua's scope, then stored &'static Scope<'static, 'static> in a global read-only variable SCOPE using unsafe { mem::transmute(scope) }.

Afterwards, I created a File structure which holds a pointer to a foreign type of yazi_core::folder::Folder and implemented Deref. Whenever I need to return a reference to a File in an iterator, I call its make method to invoke the SCOPE.create_any_userdata() method then returns an AnyUserData<'static>, here 'static is used to trick the compiler. (SCOPE.create_any_userdata_ref() doesn't work at this point because it requires a static lifetime)

I made some tests, and it works well. Even if the memory pointed to by File becomes invalid, it doesn't produce undefined behavior; instead, it throws an error indicating that the UserData has been destroyed, which is thanks to seal_userdata.

@khvzak
Copy link
Member

khvzak commented Feb 10, 2024

Unfortunately I don't see any good way to expose Scope into userdata methods or functions. It's simply unsound for some edge cases. Also there was an earlier discussion about this in mlua-rs/rlua#158

You approach is neat and should work well. The 'static lifetime for scoped references is indeed a problem and requires more unsafe to make references 'static.

@Joakker
Copy link

Joakker commented Apr 13, 2024

Is there a way to do this but with hashmaps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants