Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ready for production? #23

Open
ychakiris opened this issue Feb 3, 2019 · 16 comments
Open

ready for production? #23

ychakiris opened this issue Feb 3, 2019 · 16 comments
Assignees

Comments

@ychakiris
Copy link

This looks very interesting. How close is it to "production ready?"

@kraison kraison self-assigned this Feb 3, 2019
@kraison
Copy link
Owner

kraison commented Feb 3, 2019

Yes, I have been using it in production environments for quite a few years. Are there specific features that would make it production-ready in your mind?

@ychakiris
Copy link
Author

ychakiris commented Feb 3, 2019

Mainly reliability and performace. I realize that these terms don't have one-dim definitions. So I mean them in a more simple minded way. "reliability" = not losing data and availabillity. "performace" = fast enough to be usable and one can find things that were put in there.

I am research scientist that works at a elementary school. We are trying to optimize the learning enviroments for children and we use quite a bit of home grown tech to do it. For example cameras in all the classrooms, lots of whatsapp texting and some other tools like google apps. I like to work in common lisp and it would be nice to have a graph type db to keep the texting in. We have quite a bit of stuff in whatsapp.

@ychakiris
Copy link
Author

Perhaps you can tell me more about how you have used in production?

@kraison
Copy link
Owner

kraison commented Feb 5, 2019

So, VG is ACID compliant and pretty darn fast. Zach and I built the transaction system around an optimistic currency control model. Data is stored in memory mapped files and in a transaction log. There is also a primary/secondary replication scheme built in. You can also replay transaction logs to create a new instance as you will. Snapshotting is also available. As far as getting data out, you can use the available Lisp methods or Prolog; see example.lisp.

I have used VG as an online catalog for millions of products, as the back end for a complex, adaptable VoIP-based IVR, as well as data store for several complex big data analysis systems, and finally as the engine for two recommender systems.

The main bottleneck in VG is data serialization and deserialization; the system makes heavy use of caching to overcome this. memory maps make going to and from disk quite fast, but the Lisp data structures must be pickled in order to be written to disk. I have investigated a feature of some older Lisps that once had user-definable memory areas; these memory areas were extensible in much the way CLOS is. Using such a technology could allow for writing Lisp data structures unadulterated to disk, which would eliminate the need for serialization; however, it is a big task and I have not had time nor funding to make it happen. That said, VG has been fast enough for my purposes. Please run some benchmarks if you like; I would be happy to hear about your experiences.

The project is also looking for contributors, as it lacks sufficient docs and has a few warts (see the other issues here on github) that need addressing.

@arademaker
Copy link

Hi @kraison do you have any documentation and 'quick start' guide? I was looking for CL libs for RDF/OWL and recently made some small improvements in Wilbur (https://github.com/arademaker/incf-wilbur). I didn't know that VG has so mature, maybe I could try to play with that and contribute.

@kraison
Copy link
Owner

kraison commented Feb 5, 2019

Please see the wiki for a very basic tutorial.

@kraison
Copy link
Owner

kraison commented Feb 5, 2019

It looks like people are actually interested in the project, so I will make an effort to provide more documentation when I am home from traveling next week.

@ychakiris
Copy link
Author

@kraison Thanks for the information!!

Definitely intrigues me enough to load some data into it and experiment to see whether it fits my usecase. Also see how easy it is to use with no documention (other than the source code).

@ychakiris
Copy link
Author

ychakiris commented Feb 5, 2019

The main bottleneck in VG is data serialization and deserialization; the system makes heavy use of caching to overcome this. memory maps make going to and from disk quite fast, but the Lisp data structures must be pickled in order to be written to disk. I have investigated a feature of some older Lisps that once had user-definable memory areas; these memory areas were extensible in much the way CLOS is. Using such a technology could allow for writing Lisp data structures unadulterated to disk, which would eliminate the need for serialization; however, it is a big task and I have not had time nor funding to make it happen. That said, VG has been fast enough for my purposes. Please run some benchmarks if you like; I would be happy to hear about your experiences.

For user defined memory areas you might want to take a look at two common lisp projects: cl-mpi and static-vectors. I saw the cl-mpi project on a youtube video on high performace computing and he mentioned that for MPI to work properly (via cffi) one needs large memory regions that don't move. It seems he uses static-vectors for that purpose

Not sure this is fully relevant but might be worth a look. The video is interesting in its own right.

@kraison
Copy link
Owner

kraison commented Feb 6, 2019

@ychakiris do check out the GitHub wiki for the project for some usage examples.

@ychakiris
Copy link
Author

ychakiris commented Feb 6, 2019

very good!! I will work though it.

I will be modelling parts of what I will call the "behavioral ecology" of a Montessori (hybrid) elementary school. Lets say at the lowest level of modeling there are what I will call "actors" and "events." Actors can be both human and non-human (a la Bruno Latour) and events are simply changes in the configuration of actors.

Some examples:

  1. A child is sitting at a table doing some work on a worksheet in a classroom. The actors would be the child, the table, chairs, worksheet, the other parts of the classroom, etc. Events would occur for each change in this configuration (e.g. doing a problem, or a friend stopping by to talk, etc)
  2. At "circle time" the teacher and students are all sitting around the circumference of a large rug listening to a lesson using Montessori materials. Actors include the children, teacher, rug, etc.
  3. There are four cameras in the classroom continuously recording. This system is also made of up of actors and events.
  4. Teachers and staff members have smart phones and are constantly using whatsapp to record comments and discussion about the classroom ecology. Actors here are all the messages, phones, teachers, staff members, etc.

Looking at actors, events, and classifying them according to the ecology via their behavior analytics (behavioral history, reinforcing events, etc) is the most natural way to model this and store it in a database of some sort. Events that represent the interaction of actors are the most interesting part of things.

Since each actor has a behavioral history (all the events that occurred to them) clearly this is immutable data. Once an event occurs, it will never be changed. However the interpretations of that event can certainly change.

Seems to me there will be a lot of immutable data in this.

@ychakiris
Copy link
Author

Some questions:

  1. Lets say we have 50K text messages or varying length with an average of about 1K. How much space would you estimate that will be in your database?
  2. Is there a way of using the Fset library with your graph db to handle immutable data?

@fiddlerwoaroof
Copy link

Jumping in, I'd say that I'd be very interested in this, if there were better documentation: I did some experiments with it a while ago, but couldn't figure out a reasonable way to handle things like unique constraints.

@joshcho
Copy link

joshcho commented Feb 4, 2023

Why is vivace graph so fast? I have been comparing it with SQL-based approach and Neo4j, and vivace graph is much, much faster.

@kraison
Copy link
Owner

kraison commented Feb 9, 2023

Through a combination of linear hash tables, skip lists for indexing and MCAS for updates. I'm in the Donbas right now, so apologies for not having time to explain in more depth.

@gwangjinkim
Copy link

@kraison just now saw vivacegraph. I am convinced graph databases are in many ways more flexible and useful than relational databases. I love lisp - I did not know about vivace graph before.

I read in the issues comments that you are volunteering in the Ukraine. May God bless you and protect you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants