diff --git a/notebooks/algorithms/link_prediction/Jaccard-Similarity.ipynb b/notebooks/algorithms/link_prediction/Jaccard-Similarity.ipynb index 1e6cd032650..b5f09c0c145 100755 --- a/notebooks/algorithms/link_prediction/Jaccard-Similarity.ipynb +++ b/notebooks/algorithms/link_prediction/Jaccard-Similarity.ipynb @@ -8,19 +8,7 @@ "# Jaccard Similarity\n", "----\n", "\n", - "In this notebook we will explore the Jaccard vertex similarity metrics available in cuGraph. cuGraph supports:\n", - "- Jaccard Similarity (also called the Jaccard Index)\n", - "- Weight Jaccard\n", - "\n", - "Similarity can be between neighboring vertices (default) or second hop neighbors\n", - "\n", - "\n", - "\n", - "| Author Credit | Date | Update | cuGraph Version | Test Hardware |\n", - "| --------------|------------|------------------|-----------------|-----------------------|\n", - "| Brad Rees | 10/14/2019 | created | 0.14 | GV100 32 GB, CUDA 10.2 |\n", - "| Don Acosta | 07/20/2022 | tested/updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5 |\n", - "| Ralph Liu | 06/29/2023 | updated | 23.08 nightly | DGX Tesla V100 CUDA 12.0" + "In this notebook we will explore the Jaccard vertex similarity metrics available in cuGraph." ] }, { @@ -28,27 +16,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Introduction - Common Neighbor Similarity \n", + "## Introduction\n", "\n", - "One of the most common types of vertex similarity is to evaluate the neighborhood of vertex pairs and looks at the number of common neighbors. TThat type of similar comes from statistics and is based on set comparison. Both Jaccard and the Overlap Coefficient operate on sets, and in a graph setting, those sets are the list of neighboring vertices.
\n", - "For those that like math: The neighbors of a vertex, _v_, is defined as the set, _U_, of vertices connected by way of an edge to vertex v, or _N(v) = {U} where v ∈ V and ∀ u ∈ U ∃ edge(v,u)∈ E_.\n", - "\n", - "For the rest of this introduction, set __A__ will equate to _A = N(i)_ and set __B__ will quate to _B = N(j)_. That just make the rest of the text more readable." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Jaccard Similarity" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ "The Jaccard similarity between two sets is defined as the ratio of the volume of their intersection divided by the volume of their union. \n", "\n", "The Jaccard Similarity can then be expressed as\n", @@ -63,93 +32,32 @@ "\n", "Returns:\n", "\n", - " df: cudf.DataFrame with three names columns:\n", + " df: cudf.DataFrame with three columns:\n", " df[\"first\"]: The first vertex id of each pair.\n", - " df[\"second\"]: The second vertex i of each pair.\n", + " df[\"second\"]: The second vertex id of each pair.\n", " df[\"jaccard_coeff\"]: The jaccard coefficient computed between the vertex pairs.\n", "
\n", "\n", "__References__ \n", - "- https://research.nvidia.com/publication/2017-11_Parallel-Jaccard-and" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "### Weighted Jaccard" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Weighted Jaccard is similar to the Jaccard Similarity but takes into account vertex weights placed. \n", - "\n", - "given:\n", - "The neighbors of a vertex, v, is defined as the set, U, of vertices connected by way of an edge to vertex v, or N(v) = {U} where v ∈V and ∀ u∈U ∃ edge(v,u)∈E.\n", - "and\n", - "wt(i) is the weight on vertex i\n", - " \n", - "we can now define weight summing function as
\n", - "$WT(U) = \\sum_{v \\in U} {wt(v)}$\n", - "\n", - "$WtJaccard(i, j) = \\frac{WT(N(i) \\cap N(j))}{WT(N(i) \\cup N(j))}$\n", - "\n", - "To compute the weighted Jaccard similarity between each pair of vertices connected by an edge in cuGraph use:
\n", - "\n", - "__df = cugraph.jaccard_w(input_graph, vect_weights_ptr)__\n", + "- https://research.nvidia.com/publication/2017-11_Parallel-Jaccard-and \n", "\n", - " input_graph: A cugraph.Graph object\n", - " vect_weights_ptr: An array of vertex weights\n", - "\n", - "Returns: \n", - "\n", - " df: cudf.DataFrame with three names columns:\n", - " df['first']: The first vertex id of each pair.\n", - " df['second']: The second vertex id of each pair.\n", - " df['jaccard_coeff']: The weighted jaccard coefficient computed between the vertex pairs.\n", - " \n", - "\n", - "__Note:__ For this example we will be using PageRank as the edge weights. Please review the PageRank notebook if you have any questions about running PageRank\n" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Additional Reading\n", + "__Additional Reading__ \n", "- [Wikipedia: Jaccard](https://en.wikipedia.org/wiki/Jaccard_index)\n" ] }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Some notes about vertex IDs...\n", - "* cuGraph will automatically renumber graphs to an internal format consisting of a contiguous series of integers starting from 0, and convert back to the original IDs when returning data to the caller. If the vertex IDs of the data are already a contiguous series of integers starting from 0, the auto-renumbering step can be skipped for faster graph creation times.\n", - " * To skip auto-renumbering, set the `renumber` boolean arg to `False` when calling the appropriate graph creation API (eg. `G.from_cudf_edgelist(gdf_r, source='src', destination='dst', renumber=False)`).\n", - " * For more advanced renumbering support, see the examples in `structure/renumber.ipynb` and `structure/renumber-2.ipynb`\n" - ] - }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Test Data\n", - "We will be using the Zachary Karate club dataset \n", + "We will be using the Zachary Karate club dataset.\n", "*W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of\n", "Anthropological Research 33, 452-473 (1977).*\n", "\n", - "\n", + "\n", "\n", - "This is a small graph which allows for easy visual inspection to validate results. " + "This is a small graph which allows for easy visual inspection to validate results." ] }, { @@ -172,136 +80,10 @@ "# Import needed libraries\n", "import cugraph\n", "import cudf\n", - "from collections import OrderedDict" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "----\n", - "### Define some Print functions\n", - "(the `del` are not needed since going out of scope should free memory, just good practice)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# define a function for printing the top most similar vertices\n", - "def print_most_similar_jaccard(df):\n", - " d_local = df.query('first != second')\n", - " jmax = d_local['jaccard_coeff'].max()\n", - " dm = d_local.query('jaccard_coeff >= @jmax') \n", - " \n", - " #find the best\n", - " for i in range(len(dm)): \n", - " print(\"Vertices \" + str(dm['first'].iloc[i]) + \" and \" + \n", - " str(dm['second'].iloc[i]) + \" are most similar with score: \" \n", - " + str(dm['jaccard_coeff'].iloc[i]))\n", - " del jmax\n", - " del dm" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# define a function for printing jaccard similar vertices based on a threshold\n", - "def print_jaccard_threshold(_d, limit):\n", - " \n", - " filtered = _d.query('jaccard_coeff > @limit')\n", - " \n", - " for i in range(len(filtered)):\n", - " print(\"Vertices \" + str(filtered['first'].iloc[i]) + \" and \" + \n", - " str(filtered['second'].iloc[i]) + \" are similar with score: \" + \n", - " str(filtered['jaccard_coeff'].iloc[i]))" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Read the CSV datafile using cuDF\n", - "data file is actually _tab_ separated, so we need to set the delimiter" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Test file \n", - "from cugraph.datasets import karate\n", - "gdf = karate.get_edgelist(download=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Let's look at the DataFrame. There should be two columns and 156 records\n", - "gdf.shape" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Look at the first few data records - the output should be two columns: 'src' and 'dst'\n", - "gdf.head()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a Graph" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create a Graph \n", - "G = karate.get_graph()\n", - "G = G.to_undirected()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# How many vertices are in the graph? Remember that Graph is zero based\n", - "G.number_of_vertices()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "_The test graph has only 34 vertices, so why is the Graph listing 35?_\n", "\n", - "As mentioned above, cuGraph vertex numbering is zero-based, meaning that the first vertex ID starts at zero. The test dataset is 1-based. Because of that, the Graph object adds an extra isolated vertex with an ID of zero. Hence the difference in vertex count. \n", - "We could have run _renumbering_ on the data, or updated the value of each element _gdf['src'] = gdf['src'] - 1_ \n", - "for now, we will just state that vertex 0 is not part of the dataset and can be ignored" + "# The cugraph.datasets package contains several common graph datasets useful\n", + "# for testing and demonstrations.\n", + "from cugraph.datasets import karate" ] }, { @@ -309,20 +91,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "--- \n", - "# Jaccard " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#%%time\n", - "# Call cugraph.nvJaccard\n", - "jdf = cugraph.jaccard(G)\n", - "jdf.head(5)" + "### Create the Graph object" ] }, { @@ -331,9 +100,9 @@ "metadata": {}, "outputs": [], "source": [ - "# Which two vertices are the most similar?\n", - "\n", - "print_most_similar_jaccard(jdf)\n" + "# Create a cugraph.Graph object from the karate dataset. Download the karate\n", + "# dataset if not already present on disk.\n", + "G = karate.get_graph(download=True)" ] }, { @@ -341,9 +110,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The Most similar should be 33 and 34.\n", - "Vertex 33 has 12 neighbors, vertex 34 has 17 neighbors. They share 10 neighbors in common:\n", - "$jaccard = 10 / (10 + (12 -10) + (17-10)) = 10 / 19 = 0.526$" + "### Run `jaccard`" ] }, { @@ -352,24 +119,9 @@ "metadata": {}, "outputs": [], "source": [ - "### let's look at all similarities over a threshold\n", - "print_jaccard_threshold(jdf, 0.4)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Since it is a small graph we can print all scores, notice that only vertices that are neighbors are being compared\n", - "#\n", - "# Before printing, let's get rid of the duplicates (x compared to y is the same as y compared to x). We will do that\n", - "# by performing a query. Then let's sort the data by score\n", - "\n", - "jdf_s = jdf.query('first < second').sort_values(by='jaccard_coeff', ascending=False)\n", - "\n", - "print_jaccard_threshold(jdf_s, 0.0)" + "# Compute Jaccard coefficients for all pairs of vertices that are part of the\n", + "# two-hop neighborhood for each vertex.\n", + "jaccard_coeffs = cugraph.jaccard(G)" ] }, { @@ -377,8 +129,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "---\n", - "# Expanding vertex pairs similarity scoring to 2-hop vertex pair" + "### Analyze the results" ] }, { @@ -387,8 +138,11 @@ "metadata": {}, "outputs": [], "source": [ - "# get all two-hop vertex pairs\n", - "p = G.get_two_hop_neighbors()" + "# Remove redundancies (remove (b, a) if (a, b) is present) and pairs consisting\n", + "# of the same vertices (a, a) from the results, then sort from most similar to\n", + "# least.\n", + "jaccard_coeffs = jaccard_coeffs.query(\"first < second\")\n", + "jaccard_coeffs = jaccard_coeffs.sort_values(\"jaccard_coeff\", ascending=False)" ] }, { @@ -397,26 +151,8 @@ "metadata": {}, "outputs": [], "source": [ - "# Let's look at the Jaccard score\n", - "j2 = cugraph.jaccard(G, vertex_pair=p)\n", - "j2.query('first == 14')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print_most_similar_jaccard(j2)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---" + "# Show the top-20 most similar vertices.\n", + "jaccard_coeffs.head(20)" ] }, { @@ -424,7 +160,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Weighted Jaccard" + "We can see that several pairs have a coefficient of 1.0, meaning they have\n", + "the same set of neighbors. This can be easily verified in the plot above." ] }, { @@ -432,7 +169,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For graph weights, we are going to use the PageRank scores. If you are unfamillar with PageRank please see the notebook on PageRank" + "We have to specify vertices in a DataFrame to see their similarity if they\n", + "are not part of the same two-hop neighborhood." ] }, { @@ -441,18 +179,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Call PageRank on the graph to get weights to use:\n", - "pr_df = cugraph.pagerank(G)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# take a peek at the PageRank values\n", - "pr_df.head()" + "cugraph.jaccard(G, cudf.DataFrame([(16, 33)]))" ] }, { @@ -460,27 +187,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Now compute the Weighted Jaccard " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pr_df.rename(columns={'pagerank': 'weight'}, inplace=True)\n", - "# Call weighted Jaccard using the PageRank scores as weights:\n", - "wdf = cugraph.jaccard_w(G, pr_df)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print_most_similar_jaccard(wdf)" + "As expected, the coefficient is 0.0 because vertices 16 and 33 do not share any\n", + "neighbors." ] }, { @@ -491,7 +199,7 @@ "---\n", "### It's that easy with cuGraph\n", "\n", - "Copyright (c) 2019-2023, NVIDIA CORPORATION.\n", + "Copyright (c) 2019-2024, NVIDIA CORPORATION.\n", "\n", "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\n", "\n", @@ -502,7 +210,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3.9.13 ('cugraph_dev')", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -516,7 +224,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.10.13" }, "vscode": { "interpreter": { diff --git a/notebooks/img/karate_similarity.png b/notebooks/img/karate_similarity.png index d8a67e7733c..94df87b69ac 100644 Binary files a/notebooks/img/karate_similarity.png and b/notebooks/img/karate_similarity.png differ diff --git a/python/cugraph/cugraph/__init__.py b/python/cugraph/cugraph/__init__.py index f635d215696..ba7e23df800 100644 --- a/python/cugraph/cugraph/__init__.py +++ b/python/cugraph/cugraph/__init__.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. +# Copyright (c) 2019-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -80,9 +80,6 @@ overlap_coefficient, sorensen, sorensen_coefficient, - jaccard_w, - overlap_w, - sorensen_w, ) from cugraph.traversal import ( @@ -100,9 +97,6 @@ from cugraph.utilities import utils -from cugraph.experimental import strong_connected_component -from cugraph.experimental import find_bicliques - from cugraph.linear_assignment import hungarian, dense_hungarian from cugraph.layout import force_atlas2 diff --git a/python/cugraph/cugraph/dask/link_prediction/jaccard.py b/python/cugraph/cugraph/dask/link_prediction/jaccard.py index 5362c7a9e1e..3b8edc8daa5 100644 --- a/python/cugraph/cugraph/dask/link_prediction/jaccard.py +++ b/python/cugraph/cugraph/dask/link_prediction/jaccard.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022-2023, NVIDIA CORPORATION. +# Copyright (c) 2022-2024, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -66,38 +66,13 @@ def jaccard(input_graph, vertex_pair=None, use_weight=False): of their intersection divided by the volume of their union. In the context of graphs, the neighborhood of a vertex is seen as a set. The Jaccard similarity weight of each edge represents the strength of connection - between vertices based on the relative similarity of their neighbors. If - first is specified but second is not, or vice versa, an exception will be - thrown. - - NOTE: If the vertex_pair parameter is not specified then the behavior - of cugraph.jaccard is different from the behavior of - networkx.jaccard_coefficient. + between vertices based on the relative similarity of their neighbors. cugraph.dask.jaccard, in the absence of a specified vertex pair list, will compute the two_hop_neighbors of the entire graph to construct a vertex pair list and will return the jaccard coefficient for those vertex pairs. This is not advisable as the vertex_pairs can grow exponentially with respect to the - size of the datasets - - networkx.jaccard_coefficient, in the absence of a specified vertex - pair list, will return an upper triangular dense matrix, excluding - the diagonal as well as vertex pairs that are directly connected - by an edge in the graph, of jaccard coefficients. Technically, networkx - returns a lazy iterator across this upper triangular matrix where - the actual jaccard coefficient is computed when the iterator is - dereferenced. Computing a dense matrix of results is not feasible - if the number of vertices in the graph is large (100,000 vertices - would result in 4.9 billion values in that iterator). - - If your graph is small enough (or you have enough memory and patience) - you can get the interesting (non-zero) values that are part of the networkx - solution by doing the following: - - But please remember that cugraph will fill the dataframe with the entire - solution you request, so you'll need enough memory to store the 2-hop - neighborhood dataframe. - + size of the datasets. Parameters ---------- diff --git a/python/cugraph/cugraph/experimental/__init__.py b/python/cugraph/cugraph/experimental/__init__.py index 7e8fd666972..b0d1f4f1e90 100644 --- a/python/cugraph/cugraph/experimental/__init__.py +++ b/python/cugraph/cugraph/experimental/__init__.py @@ -11,59 +11,41 @@ # See the License for the specific language governing permissions and # limitations under the License. -from cugraph.utilities.api_tools import experimental_warning_wrapper -from cugraph.utilities.api_tools import deprecated_warning_wrapper -from cugraph.utilities.api_tools import promoted_experimental_warning_wrapper +from pylibcugraph.utilities.api_tools import ( + experimental_warning_wrapper, + promoted_experimental_warning_wrapper, +) + +# Passing in the namespace name of this module to the *_wrapper functions +# allows them to bypass the expensive inspect.stack() lookup. +_ns_name = __name__ from cugraph.structure.property_graph import EXPERIMENTAL__PropertyGraph -PropertyGraph = experimental_warning_wrapper(EXPERIMENTAL__PropertyGraph) +PropertyGraph = experimental_warning_wrapper(EXPERIMENTAL__PropertyGraph, _ns_name) from cugraph.structure.property_graph import EXPERIMENTAL__PropertySelection -PropertySelection = experimental_warning_wrapper(EXPERIMENTAL__PropertySelection) +PropertySelection = experimental_warning_wrapper( + EXPERIMENTAL__PropertySelection, _ns_name +) from cugraph.dask.structure.mg_property_graph import EXPERIMENTAL__MGPropertyGraph -MGPropertyGraph = experimental_warning_wrapper(EXPERIMENTAL__MGPropertyGraph) +MGPropertyGraph = experimental_warning_wrapper(EXPERIMENTAL__MGPropertyGraph, _ns_name) from cugraph.dask.structure.mg_property_graph import EXPERIMENTAL__MGPropertySelection -MGPropertySelection = experimental_warning_wrapper(EXPERIMENTAL__MGPropertySelection) - -# FIXME: Remove experimental.triangle_count next release -from cugraph.community.triangle_count import triangle_count - -triangle_count = promoted_experimental_warning_wrapper(triangle_count) +MGPropertySelection = experimental_warning_wrapper( + EXPERIMENTAL__MGPropertySelection, _ns_name +) from cugraph.experimental.components.scc import EXPERIMENTAL__strong_connected_component strong_connected_component = experimental_warning_wrapper( - EXPERIMENTAL__strong_connected_component -) - -from cugraph.experimental.structure.bicliques import EXPERIMENTAL__find_bicliques - -find_bicliques = deprecated_warning_wrapper( - experimental_warning_wrapper(EXPERIMENTAL__find_bicliques) + EXPERIMENTAL__strong_connected_component, _ns_name ) from cugraph.gnn.data_loading import BulkSampler -BulkSampler = promoted_experimental_warning_wrapper(BulkSampler) - - -from cugraph.link_prediction.jaccard import jaccard, jaccard_coefficient - -jaccard = promoted_experimental_warning_wrapper(jaccard) -jaccard_coefficient = promoted_experimental_warning_wrapper(jaccard_coefficient) - -from cugraph.link_prediction.sorensen import sorensen, sorensen_coefficient - -sorensen = promoted_experimental_warning_wrapper(sorensen) -sorensen_coefficient = promoted_experimental_warning_wrapper(sorensen_coefficient) - -from cugraph.link_prediction.overlap import overlap, overlap_coefficient - -overlap = promoted_experimental_warning_wrapper(overlap) -overlap_coefficient = promoted_experimental_warning_wrapper(overlap_coefficient) +BulkSampler = promoted_experimental_warning_wrapper(BulkSampler, _ns_name) diff --git a/python/cugraph/cugraph/experimental/gnn/__init__.py b/python/cugraph/cugraph/experimental/gnn/__init__.py index 9c366a2ee28..558a4c9d1e0 100644 --- a/python/cugraph/cugraph/experimental/gnn/__init__.py +++ b/python/cugraph/cugraph/experimental/gnn/__init__.py @@ -1,4 +1,4 @@ -# Copyright (c) 2023, NVIDIA CORPORATION. +# Copyright (c) 2023-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -12,6 +12,10 @@ # limitations under the License. from cugraph.gnn.data_loading import BulkSampler -from cugraph.utilities.api_tools import promoted_experimental_warning_wrapper +from pylibcugraph.utilities.api_tools import promoted_experimental_warning_wrapper -BulkSampler = promoted_experimental_warning_wrapper(BulkSampler) +# Passing in the namespace name of this module to the *_wrapper functions +# allows them to bypass the expensive inspect.stack() lookup. +_ns_name = __name__ + +BulkSampler = promoted_experimental_warning_wrapper(BulkSampler, _ns_name) diff --git a/python/cugraph/cugraph/link_prediction/__init__.py b/python/cugraph/cugraph/link_prediction/__init__.py index a8517ee7c0f..38c8b9a2d3b 100644 --- a/python/cugraph/cugraph/link_prediction/__init__.py +++ b/python/cugraph/cugraph/link_prediction/__init__.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. +# Copyright (c) 2019-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -11,26 +11,9 @@ # See the License for the specific language governing permissions and # limitations under the License. - -from cugraph.utilities.api_tools import deprecated_warning_wrapper from cugraph.link_prediction.jaccard import jaccard from cugraph.link_prediction.jaccard import jaccard_coefficient - from cugraph.link_prediction.sorensen import sorensen from cugraph.link_prediction.sorensen import sorensen_coefficient - from cugraph.link_prediction.overlap import overlap from cugraph.link_prediction.overlap import overlap_coefficient - -# To be deprecated -from cugraph.link_prediction.wjaccard import jaccard_w - -jaccard_w = deprecated_warning_wrapper(jaccard_w) - -from cugraph.link_prediction.woverlap import overlap_w - -overlap_w = deprecated_warning_wrapper(overlap_w) - -from cugraph.link_prediction.wsorensen import sorensen_w - -sorensen_w = deprecated_warning_wrapper(sorensen_w) diff --git a/python/cugraph/cugraph/link_prediction/jaccard.py b/python/cugraph/cugraph/link_prediction/jaccard.py index 27bfa58e6b0..f114b4a6d03 100644 --- a/python/cugraph/cugraph/link_prediction/jaccard.py +++ b/python/cugraph/cugraph/link_prediction/jaccard.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. +# Copyright (c) 2019-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -56,7 +56,6 @@ def ensure_valid_dtype(input_graph, vertex_pair): def jaccard( input_graph: Graph, vertex_pair: cudf.DataFrame = None, - do_expensive_check: bool = False, # deprecated use_weight: bool = False, ): """ @@ -66,43 +65,13 @@ def jaccard( of their intersection divided by the volume of their union. In the context of graphs, the neighborhood of a vertex is seen as a set. The Jaccard similarity weight of each edge represents the strength of connection - between vertices based on the relative similarity of their neighbors. If - first is specified but second is not, or vice versa, an exception will be - thrown. - - NOTE: If the vertex_pair parameter is not specified then the behavior - of cugraph.jaccard is different from the behavior of - networkx.jaccard_coefficient. + between vertices based on the relative similarity of their neighbors. cugraph.jaccard, in the absence of a specified vertex pair list, will compute the two_hop_neighbors of the entire graph to construct a vertex pair list and will return the jaccard coefficient for those vertex pairs. This is not advisable as the vertex_pairs can grow exponentially with respect to the - size of the datasets - - networkx.jaccard_coefficient, in the absence of a specified vertex - pair list, will return an upper triangular dense matrix, excluding - the diagonal as well as vertex pairs that are directly connected - by an edge in the graph, of jaccard coefficients. Technically, networkx - returns a lazy iterator across this upper triangular matrix where - the actual jaccard coefficient is computed when the iterator is - dereferenced. Computing a dense matrix of results is not feasible - if the number of vertices in the graph is large (100,000 vertices - would result in 4.9 billion values in that iterator). - - If your graph is small enough (or you have enough memory and patience) - you can get the interesting (non-zero) values that are part of the networkx - solution by doing the following: - - >>> from cugraph.datasets import karate - >>> input_graph = karate.get_graph(download=True, ignore_weights=True) - >>> pairs = input_graph.get_two_hop_neighbors() - >>> df = cugraph.jaccard(input_graph, pairs) - - But please remember that cugraph will fill the dataframe with the entire - solution you request, so you'll need enough memory to store the 2-hop - neighborhood dataframe. - + size of the datasets. Parameters ---------- @@ -121,21 +90,11 @@ def jaccard( current implementation computes the jaccard coefficient for all adjacent vertices in the graph. - do_expensive_check : bool, optional (default=False) - Deprecated. - - This option added a check to ensure integer vertex IDs are sequential - values from 0 to V-1. That check is now redundant because cugraph - unconditionally renumbers and un-renumbers integer vertex IDs for - optimal performance, therefore this option is deprecated and will be - removed in a future version. - use_weight : bool, optional (default=False) Flag to indicate whether to compute weighted jaccard (if use_weight==True) or un-weighted jaccard (if use_weight==False). 'input_graph' must be weighted if 'use_weight=True'. - Returns ------- df : cudf.DataFrame @@ -161,13 +120,6 @@ def jaccard( >>> df = jaccard(input_graph) """ - if do_expensive_check: - warnings.warn( - "do_expensive_check is deprecated since vertex IDs are no longer " - "required to be consecutively numbered", - FutureWarning, - ) - if input_graph.is_directed(): raise ValueError("Input must be an undirected Graph.") @@ -220,7 +172,6 @@ def jaccard( def jaccard_coefficient( G: Union[Graph, "networkx.Graph"], ebunch: Union[cudf.DataFrame, Iterable[Union[int, str, float]]] = None, - do_expensive_check: bool = False, # deprecated ): """ For NetworkX Compatability. See `jaccard` @@ -244,14 +195,6 @@ def jaccard_coefficient( pairs. Otherwise, the current implementation computes the overlap coefficient for all adjacent vertices in the graph. - do_expensive_check : bool, optional (default=False) - Deprecated. - This option added a check to ensure integer vertex IDs are sequential - values from 0 to V-1. That check is now redundant because cugraph - unconditionally renumbers and un-renumbers integer vertex IDs for - optimal performance, therefore this option is deprecated and will be - removed in a future version. - Returns ------- df : cudf.DataFrame @@ -277,13 +220,6 @@ def jaccard_coefficient( >>> df = jaccard_coefficient(G) """ - if do_expensive_check: - warnings.warn( - "do_expensive_check is deprecated since vertex IDs are no longer " - "required to be consecutively numbered", - FutureWarning, - ) - vertex_pair = None G, isNx = ensure_cugraph_obj_for_nx(G) diff --git a/python/cugraph/cugraph/link_prediction/wjaccard.py b/python/cugraph/cugraph/link_prediction/wjaccard.py deleted file mode 100644 index ec538bbc0ed..00000000000 --- a/python/cugraph/cugraph/link_prediction/wjaccard.py +++ /dev/null @@ -1,139 +0,0 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from cugraph.link_prediction import jaccard -import cudf -import warnings - -from cugraph.structure import Graph -from cugraph.utilities.utils import import_optional - -# FIXME: the networkx.Graph type used in type annotations is specified -# using a string literal to avoid depending on and importing networkx. -# Instead, networkx is imported optionally, which may cause a problem -# for a type checker if run in an environment where networkx is not installed. -networkx = import_optional("networkx") - - -# FIXME: Move this function to the utility module so that it can be -# shared by other algos -def ensure_valid_dtype(input_graph, vertex_pair): - - vertex_dtype = input_graph.edgelist.edgelist_df.dtypes[0] - vertex_pair_dtypes = vertex_pair.dtypes - - if vertex_pair_dtypes[0] != vertex_dtype or vertex_pair_dtypes[1] != vertex_dtype: - warning_msg = ( - "Jaccard requires 'vertex_pair' to match the graph's 'vertex' type. " - f"input graph's vertex type is: {vertex_dtype} and got " - f"'vertex_pair' of type: {vertex_pair_dtypes}." - ) - warnings.warn(warning_msg, UserWarning) - vertex_pair = vertex_pair.astype(vertex_dtype) - - return vertex_pair - - -def jaccard_w( - input_graph: Graph, - weights: cudf.DataFrame = None, # deprecated - vertex_pair: cudf.DataFrame = None, - do_expensive_check: bool = False, # deprecated -): - """ - Compute the weighted Jaccard similarity between each pair of vertices - connected by an edge, or between arbitrary pairs of vertices specified by - the user. Jaccard similarity is defined between two sets as the ratio of - the volume of their intersection divided by the volume of their union. In - the context of graphs, the neighborhood of a vertex is seen as a set. The - Jaccard similarity weight of each edge represents the strength of - connection between vertices based on the relative similarity of their - neighbors. If first is specified but second is not, or vice versa, an - exception will be thrown. - - NOTE: This algorithm doesn't currently support datasets with vertices that - are not (re)numebred vertices from 0 to V-1 where V is the total number of - vertices as this creates isolated vertices. - - Parameters - ---------- - input_graph : cugraph.Graph - cuGraph Graph instance , should contain the connectivity information - as an edge list (edge weights are not used for this algorithm). The - adjacency list will be computed if not already present. - - weights : cudf.DataFrame - Specifies the weights to be used for each vertex. - Vertex should be represented by multiple columns for multi-column - vertices. - - weights['vertex'] : cudf.Series - Contains the vertex identifiers - weights['weight'] : cudf.Series - Contains the weights of vertices - - vertex_pair : cudf.DataFrame, optional (default=None) - A GPU dataframe consisting of two columns representing pairs of - vertices. If provided, the jaccard coefficient is computed for the - given vertex pairs, else, it is computed for all vertex pairs. - - do_expensive_check : bool, optional (default=False) - Deprecated. - This option added a check to ensure integer vertex IDs are sequential - values from 0 to V-1. That check is now redundant because cugraph - unconditionally renumbers and un-renumbers integer vertex IDs for - optimal performance, therefore this option is deprecated and will be - removed in a future version. - - Returns - ------- - df : cudf.DataFrame - GPU data frame of size E (the default) or the size of the given pairs - (first, second) containing the Jaccard weights. The ordering is - relative to the adjacency list, or that given by the specified vertex - pairs. - - df['first'] : cudf.Series - The first vertex ID of each pair. - df['second'] : cudf.Series - The second vertex ID of each pair. - df['jaccard_coeff'] : cudf.Series - The computed weighted Jaccard coefficient between the first and the - second vertex ID. - - Examples - -------- - >>> import random - >>> from cugraph.datasets import karate - >>> G = karate.get_graph(download=True) - >>> # Create a dataframe containing the vertices with their - >>> # corresponding weight - >>> weights = cudf.DataFrame() - >>> # Sample 10 random vertices from the graph and drop duplicates if - >>> # there are any to avoid duplicates vertices with different weight - >>> # value in the 'weights' dataframe - >>> weights['vertex'] = G.nodes().sample(n=10).drop_duplicates() - >>> # Reset the indices and drop the index column - >>> weights.reset_index(inplace=True, drop=True) - >>> # Create a weight column with random weights - >>> weights['weight'] = [random.random() for w in range( - ... len(weights['vertex']))] - >>> df = cugraph.jaccard_w(G, weights) - - """ - warning_msg = ( - "jaccard_w is deprecated. To compute weighted jaccard, please use " - "jaccard(input_graph, vertex_pair=False, use_weight=True)" - ) - warnings.warn(warning_msg, FutureWarning) - return jaccard(input_graph, vertex_pair, do_expensive_check, use_weight=True) diff --git a/python/cugraph/cugraph/link_prediction/woverlap.py b/python/cugraph/cugraph/link_prediction/woverlap.py deleted file mode 100644 index 5f43ad0670b..00000000000 --- a/python/cugraph/cugraph/link_prediction/woverlap.py +++ /dev/null @@ -1,122 +0,0 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from cugraph.link_prediction import overlap -import cudf -import warnings - -from cugraph.structure import Graph -from cugraph.utilities.utils import import_optional - -# FIXME: the networkx.Graph type used in type annotations is specified -# using a string literal to avoid depending on and importing networkx. -# Instead, networkx is imported optionally, which may cause a problem -# for a type checker if run in an environment where networkx is not installed. -networkx = import_optional("networkx") - - -def overlap_w( - input_graph: Graph, - weights: cudf.DataFrame = None, # deprecated - vertex_pair: cudf.DataFrame = None, - do_expensive_check: bool = False, # deprecated -): - """ - Compute the weighted Overlap Coefficient between each pair of vertices - connected by an edge, or between arbitrary pairs of vertices specified by - the user. Overlap Coefficient is defined between two sets as the ratio of - the volume of their intersection divided by the smaller of their volumes. - In the context of graphs, the neighborhood of a vertex is seen as a set. - The Overlap Coefficient weight of each edge represents the strength of - connection between vertices based on the relative similarity of their - neighbors. If first is specified but second is not, or vice versa, an - exception will be thrown. - - NOTE: This algorithm doesn't currently support datasets with vertices that - are not (re)numebred vertices from 0 to V-1 where V is the total number of - vertices as this creates isolated vertices. - - Parameters - ---------- - input_graph : cugraph.Graph - cuGraph Graph instance, should contain the connectivity information - as an edge list (edge weights are not used for this algorithm). The - adjacency list will be computed if not already present. - - weights : cudf.DataFrame - Specifies the weights to be used for each vertex. - Vertex should be represented by multiple columns for multi-column - vertices. - - weights['vertex'] : cudf.Series - Contains the vertex identifiers - - weights['weight'] : cudf.Series - Contains the weights of vertices - - vertex_pair : cudf.DataFrame, optional (default=None) - A GPU dataframe consisting of two columns representing pairs of - vertices. If provided, the overlap coefficient is computed for the - given vertex pairs, else, it is computed for all vertex pairs. - - do_expensive_check : bool, optional (default=False) - Deprecated. - This option added a check to ensure integer vertex IDs are sequential - values from 0 to V-1. That check is now redundant because cugraph - unconditionally renumbers and un-renumbers integer vertex IDs for - optimal performance, therefore this option is deprecated and will be - removed in a future version. - - Returns - ------- - df : cudf.DataFrame - GPU data frame of size E (the default) or the size of the given pairs - (first, second) containing the overlap coefficients. The ordering is - relative to the adjacency list, or that given by the specified vertex - pairs. - - df['first'] : cudf.Series - The first vertex ID of each pair. - - df['second'] : cudf.Series - The second vertex ID of each pair. - - df['overlap_coeff'] : cudf.Series - The computed weighted Overlap coefficient between the first and the - second vertex ID. - - Examples - -------- - >>> import random - >>> from cugraph.datasets import karate - >>> G = karate.get_graph(download=True) - >>> # Create a dataframe containing the vertices with their - >>> # corresponding weight - >>> weights = cudf.DataFrame() - >>> # Sample 10 random vertices from the graph and drop duplicates if - >>> # there are any to avoid duplicates vertices with different weight - >>> # value in the 'weights' dataframe - >>> weights['vertex'] = G.nodes().sample(n=10).drop_duplicates() - >>> # Reset the indices and drop the index column - >>> weights.reset_index(inplace=True, drop=True) - >>> # Create a weight column with random weights - >>> weights['weight'] = [random.random() for w in range( - ... len(weights['vertex']))] - >>> df = cugraph.overlap_w(G, weights) - """ - warning_msg = ( - " overlap_w is deprecated. To compute weighted overlap, please use " - "overlap(input_graph, vertex_pair=False, use_weight=True)" - ) - warnings.warn(warning_msg, FutureWarning) - return overlap(input_graph, vertex_pair, do_expensive_check, use_weight=True) diff --git a/python/cugraph/cugraph/link_prediction/wsorensen.py b/python/cugraph/cugraph/link_prediction/wsorensen.py deleted file mode 100644 index ff502b36837..00000000000 --- a/python/cugraph/cugraph/link_prediction/wsorensen.py +++ /dev/null @@ -1,118 +0,0 @@ -# Copyright (c) 2021-2023, NVIDIA CORPORATION. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from cugraph.link_prediction import sorensen -import cudf -import warnings - -from cugraph.structure import Graph -from cugraph.utilities.utils import import_optional - -# FIXME: the networkx.Graph type used in type annotations is specified -# using a string literal to avoid depending on and importing networkx. -# Instead, networkx is imported optionally, which may cause a problem -# for a type checker if run in an environment where networkx is not installed. -networkx = import_optional("networkx") - - -def sorensen_w( - input_graph: Graph, - weights: cudf.DataFrame = None, # deprecated - vertex_pair: cudf.DataFrame = None, - do_expensive_check: bool = False, # deprecated -): - """ - Compute the weighted Sorensen similarity between each pair of vertices - connected by an edge, or between arbitrary pairs of vertices specified by - the user. Sorensen coefficient is defined between two sets as the ratio of - twice the volume of their intersection divided by the volume of each set. - - NOTE: This algorithm doesn't currently support datasets with vertices that - are not (re)numebred vertices from 0 to V-1 where V is the total number of - vertices as this creates isolated vertices. - - Parameters - ---------- - input_graph : cugraph.Graph - cuGraph Graph instance, should contain the connectivity information - as an edge list (edge weights are not used for this algorithm). The - adjacency list will be computed if not already present. - - weights : cudf.DataFrame - Specifies the weights to be used for each vertex. - Vertex should be represented by multiple columns for multi-column - vertices. - - weights['vertex'] : cudf.Series - Contains the vertex identifiers - - weights['weight'] : cudf.Series - Contains the weights of vertices - - vertex_pair : cudf.DataFrame, optional (default=None) - A GPU dataframe consisting of two columns representing pairs of - vertices. If provided, the sorensen coefficient is computed for the - given vertex pairs, else, it is computed for all vertex pairs. - - do_expensive_check : bool, optional (default=False) - Deprecated. - This option added a check to ensure integer vertex IDs are sequential - values from 0 to V-1. That check is now redundant because cugraph - unconditionally renumbers and un-renumbers integer vertex IDs for - optimal performance, therefore this option is deprecated and will be - removed in a future version. - - Returns - ------- - df : cudf.DataFrame - GPU data frame of size E (the default) or the size of the given pairs - (first, second) containing the Sorensen weights. The ordering is - relative to the adjacency list, or that given by the specified vertex - pairs. - - df['first'] : cudf.Series - The first vertex ID of each pair. - - df['second'] : cudf.Series - The second vertex ID of each pair. - - df['sorensen_coeff'] : cudf.Series - The computed weighted Sorensen coefficient between the first and the - second vertex ID. - - Examples - -------- - >>> import random - >>> from cugraph.datasets import karate - >>> G = karate.get_graph(download=True) - >>> # Create a dataframe containing the vertices with their - >>> # corresponding weight - >>> weights = cudf.DataFrame() - >>> # Sample 10 random vertices from the graph and drop duplicates if - >>> # there are any to avoid duplicates vertices with different weight - >>> # value in the 'weights' dataframe - >>> weights['vertex'] = G.nodes().sample(n=10).drop_duplicates() - >>> # Reset the indices and drop the index column - >>> weights.reset_index(inplace=True, drop=True) - >>> # Create a weight column with random weights - >>> weights['weight'] = [random.random() for w in range( - ... len(weights['vertex']))] - >>> df = cugraph.sorensen_w(G, weights) - - """ - warning_msg = ( - "sorensen_w is deprecated. To compute weighted sorensen, please use " - "sorensen(input_graph, vertex_pair=False, use_weight=True)" - ) - warnings.warn(warning_msg, FutureWarning) - return sorensen(input_graph, vertex_pair, use_weight=True) diff --git a/python/cugraph/cugraph/tests/community/test_louvain.py b/python/cugraph/cugraph/tests/community/test_louvain.py index 5441998fb46..44b403b2b7c 100644 --- a/python/cugraph/cugraph/tests/community/test_louvain.py +++ b/python/cugraph/cugraph/tests/community/test_louvain.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. +# Copyright (c) 2019-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -13,7 +13,6 @@ import gc -import time import pytest import networkx as nx @@ -48,11 +47,7 @@ def cugraph_call(graph_file, edgevals=False, directed=False): G = graph_file.get_graph( create_using=cugraph.Graph(directed=directed), ignore_weights=not edgevals ) - # cugraph Louvain Call - t1 = time.time() parts, mod = cugraph.louvain(G) - t2 = time.time() - t1 - print("Cugraph Time : " + str(t2)) return parts, mod @@ -62,13 +57,8 @@ def networkx_call(M): Gnx = nx.from_pandas_edgelist( M, source="0", target="1", edge_attr="weight", create_using=nx.Graph() ) - # Networkx louvain Call - print("Solving... ") - t1 = time.time() parts = community.best_partition(Gnx) - t2 = time.time() - t1 - print("Networkx Time : " + str(t2)) return parts diff --git a/python/cugraph/cugraph/tests/community/test_triangle_count.py b/python/cugraph/cugraph/tests/community/test_triangle_count.py index a4d267719ba..449df32b52a 100644 --- a/python/cugraph/cugraph/tests/community/test_triangle_count.py +++ b/python/cugraph/cugraph/tests/community/test_triangle_count.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. +# Copyright (c) 2019-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -163,11 +163,3 @@ def test_triangles_directed_graph(): with pytest.raises(ValueError): cugraph.triangle_count(G) - - -# FIXME: Remove this test once experimental.triangle count is removed -@pytest.mark.sg -def test_experimental_triangle_count(input_combo): - G = input_combo["G"] - with pytest.warns(Warning): - cugraph.experimental.triangle_count(G) diff --git a/python/cugraph/cugraph/tests/link_prediction/test_jaccard.py b/python/cugraph/cugraph/tests/link_prediction/test_jaccard.py index 7ce7d263eda..3691ad5a8c9 100644 --- a/python/cugraph/cugraph/tests/link_prediction/test_jaccard.py +++ b/python/cugraph/cugraph/tests/link_prediction/test_jaccard.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020-2023, NVIDIA CORPORATION. +# Copyright (c) 2020-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -23,7 +23,6 @@ from cugraph.datasets import netscience from cugraph.testing import utils, UNDIRECTED_DATASETS from cudf.testing import assert_series_equal -from cudf.testing.testing import assert_frame_equal SRC_COL = "0" DST_COL = "1" @@ -177,35 +176,20 @@ def test_jaccard(read_csv, gpubenchmark, use_weight): cu_src, cu_dst, cu_coeff = cugraph_call( gpubenchmark, graph_file, input_df=M_cu, use_weight=use_weight ) - if not use_weight: - nx_src, nx_dst, nx_coeff = networkx_call(M) - # Calculating mismatch - err = 0 - tol = 1.0e-06 + nx_src, nx_dst, nx_coeff = networkx_call(M) - assert len(cu_coeff) == len(nx_coeff) - for i in range(len(cu_coeff)): - if abs(cu_coeff[i] - nx_coeff[i]) > tol * 1.1: - err += 1 + # Calculating mismatch + err = 0 + tol = 1.0e-06 - print("Mismatches: %d" % err) - assert err == 0 - else: - G = graph_file.get_graph() - res_w_jaccard = cugraph.jaccard_w(G, vertex_pair=M_cu[[SRC_COL, DST_COL]]) - res_w_jaccard = res_w_jaccard.sort_values( - [VERTEX_PAIR_FIRST_COL, VERTEX_PAIR_SECOND_COL] - ).reset_index(drop=True) - res_jaccard = cudf.DataFrame() - res_jaccard[VERTEX_PAIR_FIRST_COL] = cu_src - res_jaccard[VERTEX_PAIR_SECOND_COL] = cu_dst - res_jaccard[JACCARD_COEFF_COL] = cu_coeff - assert_frame_equal( - res_w_jaccard, res_jaccard, check_dtype=False, check_like=True - ) + assert len(cu_coeff) == len(nx_coeff) + for i in range(len(cu_coeff)): + if abs(cu_coeff[i] - nx_coeff[i]) > tol * 1.1: + err += 1 - # FIXME: compare weighted jaccard results against resultset api + print("Mismatches: %d" % err) + assert err == 0 @pytest.mark.sg diff --git a/python/cugraph/cugraph/tests/link_prediction/test_overlap.py b/python/cugraph/cugraph/tests/link_prediction/test_overlap.py index e24deaa61ac..11ef0047b63 100644 --- a/python/cugraph/cugraph/tests/link_prediction/test_overlap.py +++ b/python/cugraph/cugraph/tests/link_prediction/test_overlap.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019-2023, NVIDIA CORPORATION. +# Copyright (c) 2019-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -21,7 +21,6 @@ import cugraph from cugraph.testing import utils, UNDIRECTED_DATASETS from cudf.testing import assert_series_equal -from cudf.testing.testing import assert_frame_equal SRC_COL = "0" DST_COL = "1" @@ -63,13 +62,10 @@ def cugraph_call(benchmark_callable, graph_file, pairs, use_weight=False): create_using=cugraph.Graph(directed=False), ignore_weights=not use_weight ) # cugraph Overlap Call - df = benchmark_callable(cugraph.overlap, G, pairs) + df = benchmark_callable(cugraph.overlap, G, pairs, use_weight=use_weight) df = df.sort_values(by=[VERTEX_PAIR_FIRST_COL, VERTEX_PAIR_SECOND_COL]).reset_index( drop=True ) - if use_weight: - res_w_overlap = cugraph.overlap_w(G, vertex_pair=pairs) - assert_frame_equal(res_w_overlap, df, check_dtype=False, check_like=True) return df[OVERLAP_COEFF_COL].to_numpy() diff --git a/python/cugraph/cugraph/tests/link_prediction/test_sorensen.py b/python/cugraph/cugraph/tests/link_prediction/test_sorensen.py index 6b4074fce30..8806f135302 100644 --- a/python/cugraph/cugraph/tests/link_prediction/test_sorensen.py +++ b/python/cugraph/cugraph/tests/link_prediction/test_sorensen.py @@ -1,4 +1,4 @@ -# Copyright (c) 2021-2023, NVIDIA CORPORATION. +# Copyright (c) 2021-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -21,7 +21,6 @@ from cugraph.testing import utils, UNDIRECTED_DATASETS from cugraph.datasets import netscience from cudf.testing import assert_series_equal -from cudf.testing.testing import assert_frame_equal SRC_COL = "0" DST_COL = "1" @@ -58,37 +57,29 @@ def compare_sorensen_two_hop(G, Gnx, use_weight=False): # print(f'G = {G.edgelist.edgelist_df}') - df = cugraph.sorensen(G, pairs) + df = cugraph.sorensen(G, pairs, use_weight=use_weight) df = df.sort_values(by=[VERTEX_PAIR_FIRST_COL, VERTEX_PAIR_SECOND_COL]).reset_index( drop=True ) - if not use_weight: - nx_pairs = list(pairs.to_records(index=False)) + nx_pairs = list(pairs.to_records(index=False)) - # print(f'nx_pairs = {len(nx_pairs)}') + # print(f'nx_pairs = {len(nx_pairs)}') - preds = nx.jaccard_coefficient(Gnx, nx_pairs) + preds = nx.jaccard_coefficient(Gnx, nx_pairs) - # FIXME: Use known correct values of Sorensen for few graphs, - # hardcode it and compare to Cugraph Sorensen to get a more robust test + # FIXME: Use known correct values of Sorensen for few graphs, + # hardcode it and compare to Cugraph Sorensen to get a more robust test - # Conversion from Networkx Jaccard to Sorensen - # No networkX equivalent + # Conversion from Networkx Jaccard to Sorensen + # No networkX equivalent - nx_coeff = list(map(lambda x: (2 * x[2]) / (1 + x[2]), preds)) + nx_coeff = list(map(lambda x: (2 * x[2]) / (1 + x[2]), preds)) - assert len(nx_coeff) == len(df) - for i in range(len(df)): - diff = abs(nx_coeff[i] - df[SORENSEN_COEFF_COL].iloc[i]) - assert diff < 1.0e-6 - else: - # FIXME: compare results against resultset api - res_w_sorensen = cugraph.sorensen_w(G, vertex_pair=pairs) - res_w_sorensen = res_w_sorensen.sort_values( - [VERTEX_PAIR_FIRST_COL, VERTEX_PAIR_SECOND_COL] - ).reset_index(drop=True) - assert_frame_equal(res_w_sorensen, df, check_dtype=False, check_like=True) + assert len(nx_coeff) == len(df) + for i in range(len(df)): + diff = abs(nx_coeff[i] - df[SORENSEN_COEFF_COL].iloc[i]) + assert diff < 1.0e-6 def cugraph_call(benchmark_callable, graph_file, input_df=None, use_weight=False): diff --git a/python/cugraph/cugraph/utilities/api_tools.py b/python/cugraph/cugraph/utilities/api_tools.py deleted file mode 100644 index 195a5885818..00000000000 --- a/python/cugraph/cugraph/utilities/api_tools.py +++ /dev/null @@ -1,28 +0,0 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pylibcugraph.utilities.api_tools as api_tools - -experimental_prefix = "EXPERIMENTAL" - - -def experimental_warning_wrapper(obj): - return api_tools.experimental_warning_wrapper(obj) - - -def promoted_experimental_warning_wrapper(obj): - return api_tools.promoted_experimental_warning_wrapper(obj) - - -def deprecated_warning_wrapper(obj): - return api_tools.deprecated_warning_wrapper(obj) diff --git a/python/pylibcugraph/pylibcugraph/__init__.py b/python/pylibcugraph/pylibcugraph/__init__.py index 1d02498ea30..ab518e24cae 100644 --- a/python/pylibcugraph/pylibcugraph/__init__.py +++ b/python/pylibcugraph/pylibcugraph/__init__.py @@ -1,4 +1,4 @@ -# Copyright (c) 2021-2023, NVIDIA CORPORATION. +# Copyright (c) 2021-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -15,8 +15,6 @@ strongly_connected_components, ) -from pylibcugraph import experimental - from pylibcugraph.graphs import SGGraph, MGGraph from pylibcugraph.resource_handle import ResourceHandle diff --git a/python/pylibcugraph/pylibcugraph/experimental/__init__.py b/python/pylibcugraph/pylibcugraph/experimental/__init__.py deleted file mode 100644 index 6194ace5956..00000000000 --- a/python/pylibcugraph/pylibcugraph/experimental/__init__.py +++ /dev/null @@ -1,90 +0,0 @@ -# Copyright (c) 2022-2023, NVIDIA CORPORATION. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -""" -The "experimental" package contains packages, functions, classes, etc. that -are ready for use but do not have their API signatures or implementation -finalized yet. This allows users to provide early feedback while still -permitting bigger design changes to take place. - -ALL APIS IN EXPERIMENTAL ARE SUBJECT TO CHANGE OR REMOVAL. - -Calling experimental objects will raise a PendingDeprecationWarning warning. - -If an object is "promoted" to the public API, the experimental namespace will -continue to also have that object present for at least another release. A -different warning will be output in that case, indicating that the experimental -API has been promoted and will no longer be importable from experimental much -longer. -""" - -from pylibcugraph.utilities.api_tools import ( - experimental_warning_wrapper, - promoted_experimental_warning_wrapper, -) - -# experimental_warning_wrapper() wraps the object in a function that provides -# the appropriate warning about using experimental code. - -# promoted_experimental_warning_wrapper() is used instead when an object is present -# in both the experimental namespace and its final, public namespace. - -# The convention of naming functions with the "EXPERIMENTAL__" prefix -# discourages users from directly importing experimental objects that don't have -# the appropriate warnings, such as what the wrapper and the "experimental" -# namespace name provides. - -from pylibcugraph.graphs import SGGraph - -SGGraph = promoted_experimental_warning_wrapper(SGGraph) - -from pylibcugraph.graphs import MGGraph - -MGGraph = promoted_experimental_warning_wrapper(MGGraph) - -from pylibcugraph.resource_handle import ResourceHandle - -ResourceHandle = promoted_experimental_warning_wrapper(ResourceHandle) - -from pylibcugraph.graph_properties import GraphProperties - -GraphProperties = promoted_experimental_warning_wrapper(GraphProperties) - -from pylibcugraph.pagerank import pagerank - -pagerank = promoted_experimental_warning_wrapper(pagerank) - -from pylibcugraph.sssp import sssp - -sssp = promoted_experimental_warning_wrapper(sssp) - -from pylibcugraph.hits import hits - -hits = promoted_experimental_warning_wrapper(hits) - -from pylibcugraph.node2vec import node2vec - - -# from pylibcugraph.jaccard_coefficients import EXPERIMENTAL__jaccard_coefficients - -# jaccard_coefficients = experimental_warning_wrapper(EXPERIMENTAL__jaccard_coefficients) - -# from pylibcugraph.overlap_coefficients import EXPERIMENTAL__overlap_coefficients - -# overlap_coefficients = experimental_warning_wrapper(EXPERIMENTAL__overlap_coefficients) - -# from pylibcugraph.sorensen_coefficients import EXPERIMENTAL__sorensen_coefficients - -# sorensen_coefficients = experimental_warning_wrapper( -# EXPERIMENTAL__sorensen_coefficients -# ) diff --git a/python/pylibcugraph/pylibcugraph/utilities/api_tools.py b/python/pylibcugraph/pylibcugraph/utilities/api_tools.py index b0b1e9b893e..94b51cea974 100644 --- a/python/pylibcugraph/pylibcugraph/utilities/api_tools.py +++ b/python/pylibcugraph/pylibcugraph/utilities/api_tools.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022, NVIDIA CORPORATION. +# Copyright (c) 2022-2024, NVIDIA CORPORATION. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -19,7 +19,7 @@ experimental_prefix = "EXPERIMENTAL" -def experimental_warning_wrapper(obj): +def experimental_warning_wrapper(obj, obj_namespace_name=None): """ Wrap obj in a function or class that prints a warning about it being "experimental" (ie. it is in the public API but subject to change or @@ -31,6 +31,13 @@ def experimental_warning_wrapper(obj): public API so it can remain hidden while it is still experimental, but have a public name within the experimental namespace so it can be easily discovered and used. + + obj_namespace_name can be passed in to make the warning message + clearer. For example, if the obj is the function sssp and is accessed as + part of a package like "pylibcugraph.experimental.sssp", obj_namespace_name + should be "pylibcugraph.experimental". If obj_namespace_name is not + passed, the namespace will be found using inspect.stack(), which can be + expensive. """ obj_type = type(obj) if not callable(obj): @@ -40,16 +47,17 @@ def experimental_warning_wrapper(obj): obj_name = obj_name.lstrip(experimental_prefix) obj_name = obj_name.lstrip("__") - # Assume the caller of this function is the module containing the - # experimental obj and try to get its namespace name. Default to no - # namespace name if it could not be found. - call_stack = inspect.stack() - calling_frame = call_stack[1].frame - ns_name = calling_frame.f_locals.get("__name__") - dot = "." if ns_name is not None else "" + if obj_namespace_name is None: + # Assume the caller of this function is the module containing the + # experimental obj and try to get its namespace name. Default to no + # namespace name if it could not be found. + call_stack = inspect.stack() + calling_frame = call_stack[1].frame + obj_namespace_name = calling_frame.f_locals.get("__name__") + dot = "." if obj_namespace_name is not None else "" warning_msg = ( - f"{ns_name}{dot}{obj_name} is experimental and will " + f"{obj_namespace_name}{dot}{obj_name} is experimental and may " "change or be removed in a future release." ) @@ -73,7 +81,7 @@ def __init__(self, *args, **kwargs): else: self = obj(*args, **kwargs) - WarningWrapperClass.__module__ = ns_name + WarningWrapperClass.__module__ = obj_namespace_name WarningWrapperClass.__qualname__ = obj_name WarningWrapperClass.__name__ = obj_name WarningWrapperClass.__doc__ = obj.__doc__ @@ -88,7 +96,7 @@ def warning_wrapper_function(*args, **kwargs): warnings.warn(warning_msg, PendingDeprecationWarning) return obj(*args, **kwargs) - warning_wrapper_function.__module__ = ns_name + warning_wrapper_function.__module__ = obj_namespace_name warning_wrapper_function.__qualname__ = obj_name warning_wrapper_function.__name__ = obj_name warning_wrapper_function.__doc__ = obj.__doc__ @@ -96,7 +104,7 @@ def warning_wrapper_function(*args, **kwargs): return warning_wrapper_function -def promoted_experimental_warning_wrapper(obj): +def promoted_experimental_warning_wrapper(obj, obj_namespace_name=None): """ Wrap obj in a function of class that prints a warning about it being close to being removed, prior to calling obj and returning its value. @@ -106,6 +114,13 @@ def promoted_experimental_warning_wrapper(obj): same object. This wrapper is applied to the one with the "private" name, urging the user to instead use the one in the public API, which does not have the experimental namespace. + + obj_namespace_name can be passed in to make the warning message + clearer. For example, if the obj is the function sssp and is accessed as + part of a package like "pylibcugraph.experimental.sssp", obj_namespace_name + should be "pylibcugraph.experimental". If obj_namespace_name is not + passed, the namespace will be found using inspect.stack(), which can be + expensive. """ obj_type = type(obj) if not callable(obj): @@ -115,13 +130,17 @@ def promoted_experimental_warning_wrapper(obj): obj_name = obj_name.lstrip(experimental_prefix) obj_name = obj_name.lstrip("__") - call_stack = inspect.stack() - calling_frame = call_stack[1].frame - ns_name = calling_frame.f_locals.get("__name__") - dot = "." if ns_name is not None else "" + if obj_namespace_name is None: + # Assume the caller of this function is the module containing the + # experimental obj and try to get its namespace name. Default to no + # namespace name if it could not be found. + call_stack = inspect.stack() + calling_frame = call_stack[1].frame + obj_namespace_name = calling_frame.f_locals.get("__name__") + dot = "." if obj_namespace_name is not None else "" warning_msg = ( - f"{ns_name}{dot}{obj_name} has been promoted out of " + f"{obj_namespace_name}{dot}{obj_name} has been promoted out of " "experimental. Use the non-experimental version instead, " "as this one will be removed in a future release." ) @@ -139,7 +158,7 @@ def __init__(self, *args, **kwargs): else: self = obj(*args, **kwargs) - WarningWrapperClass.__module__ = ns_name + WarningWrapperClass.__module__ = obj_namespace_name WarningWrapperClass.__qualname__ = obj_name WarningWrapperClass.__name__ = obj_name @@ -150,33 +169,43 @@ def warning_wrapper_function(*args, **kwargs): warnings.warn(warning_msg, DeprecationWarning) return obj(*args, **kwargs) - warning_wrapper_function.__module__ = ns_name + warning_wrapper_function.__module__ = obj_namespace_name warning_wrapper_function.__qualname__ = obj_name warning_wrapper_function.__name__ = obj_name return warning_wrapper_function -def deprecated_warning_wrapper(obj): +def deprecated_warning_wrapper(obj, obj_namespace_name=None): """ Wrap obj in a function or class that prints a warning about it being deprecated (ie. it is in the public API but will be removed or replaced by a refactored version), prior to calling obj and returning its value. + + obj_namespace_name can be passed in to make the warning message + clearer. For example, if the obj is the function sssp and is accessed as + part of a package like "pylibcugraph.experimental.sssp", obj_namespace_name + should be "pylibcugraph.experimental". If obj_namespace_name is not + passed, the namespace will be found using inspect.stack(), which can be + expensive. """ obj_type = type(obj) if not callable(obj): raise TypeError("obj must be a class or a function type, got " f"{obj_type}") obj_name = obj.__name__ - call_stack = inspect.stack() - calling_frame = call_stack[1].frame - ns_name = calling_frame.f_locals.get("__name__") - dot = "." if ns_name is not None else "" - + if obj_namespace_name is None: + # Assume the caller of this function is the module containing the + # deprecated obj and try to get its namespace name. Default to no + # namespace name if it could not be found. + call_stack = inspect.stack() + calling_frame = call_stack[1].frame + obj_namespace_name = calling_frame.f_locals.get("__name__") + + dot = "." if obj_namespace_name is not None else "" warning_msg = ( - f"{ns_name}{dot}{obj_name} has been deprecated and will " - "be removed next release. If an experimental version " - "exists, it may replace this version in a future release." + f"{obj_namespace_name}{dot}{obj_name} has been deprecated and will " + "be removed in a future release." ) if obj_type is type: @@ -192,7 +221,7 @@ def __init__(self, *args, **kwargs): else: self = obj(*args, **kwargs) - WarningWrapperClass.__module__ = ns_name + WarningWrapperClass.__module__ = obj_namespace_name WarningWrapperClass.__qualname__ = obj_name WarningWrapperClass.__name__ = obj_name @@ -203,7 +232,7 @@ def warning_wrapper_function(*args, **kwargs): warnings.warn(warning_msg, DeprecationWarning) return obj(*args, **kwargs) - warning_wrapper_function.__module__ = ns_name + warning_wrapper_function.__module__ = obj_namespace_name warning_wrapper_function.__qualname__ = obj_name warning_wrapper_function.__name__ = obj_name