Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROADMAP 2024 #338

Open
77 of 95 tasks
writinwaters opened this issue Dec 21, 2023 · 10 comments
Open
77 of 95 tasks

ROADMAP 2024 #338

writinwaters opened this issue Dec 21, 2023 · 10 comments
Assignees

Comments

@writinwaters
Copy link
Contributor

writinwaters commented Dec 21, 2023

v0.6.0 planning

Core

  • Unify the memory management of index and data
  • Use memory usage to decide when flush the memory index
  • Switch for near-real-time and real-time index
  • Performance improvement of 'long text'
  • Supports DiskANN index. [Feature Request]: DiskAnn unify #1953
  • Supports system level data backup and recovery
  • Cluster fail over

Tools

v0.5.0

Core:

Integration

Tools

v0.4.0

Core:

Integration

API

Tools

v0.3.0

Core:

v0.2.0

v0.1.0

Backlog

Core

  • Native supports MacOS(m1) and Windows
  • Supports authentication with default roles.
  • Use KV store as the meta data store.

Integration

  • Supports NFS.
  • Integrates with Langchain
  • Integrates with llamdindex
  • Embedding function.

Tools

@yuzhichang
Copy link
Member

yuzhichang commented Dec 21, 2023

CI improvement: post logs of infinity when CI failure, use Ubuntu 20.04 as base of dev image.
Fuzz test of infinity.

@cjkbjhb
Copy link

cjkbjhb commented Dec 22, 2023

Secordary index on structured data type.
--->
Secondary index on structured data types.

Here is a mis-spelling error.

@JinHai-CN
Copy link
Contributor

JinHai-CN commented Dec 22, 2023

  • Secondary

Fixed and thank you.

@yuzhichang
Copy link
Member

yuzhichang commented Jan 8, 2024

compatibility testing

image tag refer
centos 7 8 https://hub.docker.com/_/centos/
ubuntu 20.04 22.04 24.04 https://hub.docker.com/_/ubuntu https://releases.ubuntu.com/
debian 8 9 10 11 12 https://hub.docker.com/_/debian https://www.debian.org/releases/
opensuse/leap 15.0 15.1 15.2 15.3 15.4 15.5 https://hub.docker.com/r/opensuse/leap
openeuler/openeuler 20.03   22.03 https://hub.docker.com/r/openeuler/openeuler
openanolis/anolisos 8.6 23 https://hub.docker.com/r/openanolis/anolisos
openkylin/openkylin 1.0 https://hub.docker.com/r/openkylin/openkylin

@Kelvinyu1117
Copy link

I would like to contribute to this project, which issue would be a good start?

@JinHai-CN
Copy link
Contributor

@Kelvinyu1117
We do have a couple of issues that might work for contributors new to this project.

  1. Add minmax information to blocks/segments in the current datastore. This information is primarily used for data filtering. (Minmax of column data. #448)
  2. Implement a bloomfilter for the blocks/segments to enhance point queries. ([Feature Request]: Add bloomfilter to segment/block column. #467)
  3. Currently, query results are stored in memory in a columnar format. However, the client expects the results in Apache Arrow format. At the moment, the format conversion is executed on the Python client, but this worsens the performance, so we plan to convert the results to Apache Arrow format on the server side before sending them to the client.
  4. There are several optimizer rules to implement, such as constant folding and simplification of arithmetic expressions, which are not yet on the roadmap. Feel free to work on them if interested.
  5. We have additional complicated tasks not listed here. For instance, the current executor operates with one thread per CPU. We're considering using coroutine to enhance efficiency, but we don't have a solid solution yet. If you have experience in this area, you are very welcome to propose your solution.
  6. We understand you're interested in contributing C++ code. However, if that's not the case, there's also unimplemented Python code, such as test cases and the Python SDK API.

@abdullah-alnahas
Copy link

Your work is exceptional! I would like to propose that, considering the current landscape, incorporating binary quantization and ColBERT-like ranking would be crucial for any vector database.
Apologies for commenting on the road map issue instead of creating a separate feature request.

@JinHai-CN
Copy link
Contributor

Your work is exceptional! I would like to propose that, considering the current landscape, incorporating binary quantization and ColBERT-like ranking would be crucial for any vector database. Apologies for commenting on the road map issue instead of creating a separate feature request.

Nice, we will put this request into v0.2.0 release.

@niebayes
Copy link

niebayes commented May 10, 2024

@JinHai-CN Hi, I have experience in developing a database using Arrow. Is the issue that converting query results to Arrow format still active? I'd like to take it.

@JinHai-CN
Copy link
Contributor

@niebayes #1198, issue is created and we can discuss the requirement in that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants
@yuzhichang @abdullah-alnahas @niebayes @Kelvinyu1117 @JinHai-CN @cjkbjhb @writinwaters and others