-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f3fda40
commit c8e86e8
Showing
31 changed files
with
5,284 additions
and
2,204 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
ARG PYTHON_VERSION=3.11 | ||
ARG BEAM_VERSION=2.49.0 | ||
|
||
FROM apache/beam_python${PYTHON_VERSION}_sdk:${BEAM_VERSION} | ||
|
||
ENV PYTHONPATH src: | ||
|
||
WORKDIR /app/src/dense-retrieval | ||
|
||
COPY src/dense-retrieval/pyproject.toml pyproject.toml | ||
COPY src/dense-retrieval/poetry.lock poetry.lock | ||
COPY src/dense-retrieval/src src | ||
|
||
WORKDIR /app/src/amazon-product-search | ||
|
||
COPY src/amazon-product-search/pyproject.toml pyproject.toml | ||
COPY src/amazon-product-search/poetry.lock poetry.lock | ||
COPY src/amazon-product-search/src src | ||
|
||
WORKDIR /app/src/indexing | ||
|
||
COPY src/indexing/pyproject.toml pyproject.toml | ||
COPY src/indexing/poetry.lock poetry.lock | ||
COPY src/indexing/src src | ||
|
||
RUN pip install --upgrade pip && \ | ||
pip install -U poetry --no-cache-dir | ||
RUN poetry config virtualenvs.create false && \ | ||
poetry install --without dev --no-interaction --no-ansi | ||
RUN python -m unidic download |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Indexing - Amazon Product Search | ||
|
||
## Installation | ||
|
||
```shell | ||
$ pyenv install 3.11.8 | ||
$ pyenv local 3.11.8 | ||
$ pip install poetry | ||
$ poetry env use python | ||
$ poetry install | ||
``` | ||
|
||
The following libraries are necessary for Japanese text processing. | ||
|
||
```shell | ||
# For macOS | ||
$ brew install mecab mecab-ipadic | ||
$ poetry run python -m unidic download | ||
``` | ||
|
||
## Index Products | ||
|
||
This project involves indexing products into search engines. If you'd like to test it on your own machine, you can start by launching Elasticsearch or Vespa locally. Then, execute the document indexing pipeline against the created index. | ||
|
||
```shell | ||
$ docker compose --profile elasticsearch up | ||
$ poetry run inv es.create-index --index-name=products_jp | ||
$ poetry run inv indexing.feed \ | ||
--index-name=products_jp \ | ||
--locale=jp \ | ||
--dest=es \ | ||
--dest-host=http://localhost:9200 \ | ||
--nrows=10 | ||
``` |
Oops, something went wrong.