QPoland Global Quantum Hackathon 2024 | QVTM

We implemented a novel Quantum Variable Topic Model (QVTM) using variational quantum circuits and an encoder-decoder-like structure using KL-divergence as the training loss function. For a given group of $N = D_1,D_2,...,D_N$ documents represented by the input tensor $X_{N,V}$ (where $N$ is the number of documents and $V$ is the corresponding vocabulary size) containing a Bag-Of-Words (BoW) representation of both the documents and their corresponding word distributions, the encoder portion of the model maps this to a probability distribution $p_{enc}(θ|D)$ corresponding to a parameterized amplitude embedding, where the input tensor $X = X_{real} + iX_{imaginary}$ contains a real portion $X_{real}$ and imaginary portion $X_{imaginary}$ produced by a classical encoder with 2 layers. This feature engineering is crucial for taking advantage of amplitude embedding, where both the phase and amplitude are used to represent data.

The wavefunction for the encoder is given by: $$|ϕ_α(X)⟩ = V(α_L^ϕ)...V(α_2^ϕ)V(α_1^ϕ)|X⟩$$, where the initial quantum state $|X⟩$ (representing the input tensor $X$) undergoes evolution by passing through the variational layers $|ψ_α(X)⟩$. We utilize the following numerically stable Gaussian metric kernel to minimze the distance between the encoder and Dirichlet prior distribution $p(θ)$, allowing the encoder to properly match the distribution: $$k(x,y)=exp(-\frac{||x-y||^2}{2σ^2})$$

After taking measurement probabilities over all possible quantum states, we utilize a decoder to output the topic distribution $p'(D_t|θ)$ for a given dataset. Similar to the encoder, the wavefunction for the decoder is given by: $$|μ_α(W)⟩ = V(α_L^μ)...V(α_2^μ)V(α_1^μ)|W⟩$$.

The parameterized layers utilize the entire Hilbert space correspoding to $2^n$ (where $n$ is the number of qubits) quantum states. By extracting the probabilities over all possible quantum states of the system (instead of simply taking expectation values), we are able to take full advantage of the rich representation offered by the Hilbert space. For normalization and computational purposes, the number of qubits used for the encoder is $n_{enc}=log_2(K)$ (where $K$ represents the desired number of topics) and the number of qubits used for the decoder is $n_{dec}=log_2(V)$ (where $V$ represents the total vocabulary size of the dataset).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Qpoland_OpenTrack_Quantum_topic_model.ipynb		Qpoland_OpenTrack_Quantum_topic_model.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QPoland Global Quantum Hackathon 2024 | QVTM

About

Releases

Packages

Languages

License

saifibnaezhararko/QVTM

Folders and files

Latest commit

History

Repository files navigation

QPoland Global Quantum Hackathon 2024 | QVTM

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages