Updated CLI for Cambridge #4

jdm010 · 2023-08-29T06:54:05Z

CLI downloads new releases for a chosen subject in a chosen directory as a .tsv file.

Fetches files for relevant subjects from Cambridge University Press

File containing functions that will be called from other files

Retrieves data and modifies it, so that the CLI can compare its newly fetched files

CLI to fetch new releases.

Structure of code changed

CLI for Cambridge University Press. get_test_data.py must be run first to fetch and store files locally. These files are then randomly modified, so we can check for new releases. cambridge_cli.py can be used to fetch new releases for the subject specified by the user. Future versions of the code will be higher-level (where the user can specify the publisher in the CLI), and also be able to fetch releases for all subjects.

Springer CLI

ErnestaP · 2023-08-29T08:38:58Z

src/providers/cambridge/utils.py

+        return response.text
+    except requests.exceptions.HTTPError as err:
+        print(f"HTTP Error: Failed to fetch the webpage ({err})")
+        return None


No need to return None. Function without return statement, by default, returns None:
https://realpython.com/python-return-statement/#implicit-return-statements

Added tests

ErnestaP · 2023-08-30T12:36:10Z

src/providers/cambridge/utils.py

+        response.raise_for_status()  # Raise an HTTPError if the response status code indicates an error
+        return response.text
+    except requests.exceptions.HTTPError as err:
+        print(f"HTTP Error: Failed to fetch the webpage ({err})")


You can also change fprints to structlog. Here you can find step by step example: https://www.structlog.org/en/stable/bound-loggers.html#step-by-step-example

From docs, you can see that there are different log methods like: debug(), info(), warning(), error(), and critical(). In this case we need error().

Improved tests

Added an exception for get_page_content. The function now returns response rather than response.text. Replaced f-string with structlog.

Minor change to accommodate changes with utils.py

Utils for Springer

Created CLI for Springer

Tests for Springer

ErnestaP · 2023-08-31T12:15:34Z

src/providers/cambridge/test_utils.py

+    response = get_page_content(url)
+    soup = BeautifulSoup(response.text, 'html.parser')
+    found_links = find_links_and_tags(soup, ['computer science'], 'cambridge ebooks and partner presses: 2023 ')
+    assert not len(found_links) == 0


Please, put how many links are actually found, instead of: not len, which makes it a more precise test

Also, can you add a test, where is 0 links found? You can simulate that response, just read it from a file. Let's say copy the xml to the file and remove the tags you are looking for. And then you can use this file for a test input.
OR you can try to find an XML that doesn't have one and use this one

ErnestaP · 2023-08-31T12:16:25Z

src/providers/cambridge/utils.py

+def get_page_content(url):
+    try:
+        response = requests.get(url)
+        response.raise_for_status()  # Raise an HTTPError if the response status code indicates an error


I don't think we need a comment, the function raise_for_status() is quite informative

ErnestaP · 2023-08-31T12:17:43Z

src/providers/cambridge/utils.py

@@ -0,0 +1,36 @@
+import requests
+import structlog
+logger = structlog.get_logger()


maybe separate imports and a variable with a new line? Improves the readability

ErnestaP · 2023-08-31T12:21:45Z

src/providers/cambridge/utils.py

+    for subject in subjects:
+        target_word = prefix + subject
+        for tag in soup.find_all(string=lambda text: text and target_word in text.lower()):
+            parent_tag = tag.parent


what happens if the tag doesn't have a parent? Does it return None or crash? let's say the parent tag is root?

So should we have something like

else: return None

or

else: continue

To check, please write a simple function to see what happens when:

Return None

Using else: contunue
What is the difference?

For my question, there are few ways to check it: reading docs and trying it manually:
Please try to run a small function that reads super small HTML, and try to find the parent that doesn’t exist:

soup = BeautifulSoup("<p>Some</p>") for tag in soup.find_all('p'): tag.parent

@jdm010 ⬆️

ErnestaP · 2023-08-31T12:30:26Z

src/providers/cambridge/test_utils.py

+def test_download_file(tmp_path):
+    url = 'https://www.cambridge.org/core/services/aop-cambridge-core/kbart/create/bespoke/717854B1C18FD5D0B882344E83E6F52B'
+    desired_filename = 'computer science'
+    target_filepath = str(tmp_path) + '/'


are you sure that you need + "/" ? since I see you use os.path.join in line 30, it's not needed, because os.path.join adds "/" after every string you called function with.

With your solution expected_filepath has two backslashes after the value of target_filepath.

ErnestaP · 2023-08-31T12:31:35Z

src/providers/cambridge/utils.py

+    response = requests.get(url)
+    if response.status_code == 200:
+        filename = f"{desired_filename}.tsv"
+        with open(target_filepath + filename, 'wb') as file:


you can use os.path.join for constructing the file path, instead of using +

ErnestaP · 2023-08-31T12:33:06Z

src/providers/springer/test_utils.py

+def test_download_file(tmp_path):
+    url = 'https://adminportal.springernature.com/metadata/kbart/Springer_Global_Springer_Computer_Science_eBooks_2023_English+International_2023-08-01.txt'
+    desired_filename = 'computer science'
+    target_filepath = str(tmp_path) + '/'


you don't need to add backslash if you will use os.path.join in download_file

ErnestaP · 2023-08-31T12:33:37Z

src/providers/springer/utils.py

+    response = requests.get(url)
+    if response.status_code == 200:
+        filename = f"{desired_filename}.tsv"
+        with open(target_filepath + filename, 'wb') as file:


can you os.path.join

ErnestaP · 2023-08-31T12:36:04Z

src/providers/springer/utils.py

+def get_page_content(url):
+    try:
+        response = requests.get(url)
+        response.raise_for_status()  # Raise an HTTPError if the response status code indicates an error


same: I don't think we need a comment, the function raise_for_status() is quite informative

Utils for Taylor & Francis

CLI for Taylor & Francis

Improved readability, added else statement for ind_links_and_tags

Improved test_found_links and added case with 0 links found

Better readability, improved find_links_and_tags

Improved test_found_links, added case with no links found.

Added a higher-level CLI that gives the option of choosing the publisher

ErnestaP · 2023-09-11T06:16:30Z

src/providers/springer/utils.py

+        file_path = os.path.join(target_filepath, filename)
+        with open(file_path, 'wb') as file:
+            file.write(response.content)
+        print(f'Successfully downloaded {filename}')


instead of all prints, please use logger

jdm010 added 16 commits August 4, 2023 16:14

Create cambridge_fetchfiles.py

7ab11d6

Fetches files for relevant subjects from Cambridge University Press

Create functions.py

f3ee332

File containing functions that will be called from other files

Create get_test_data.py

019a958

Retrieves data and modifies it, so that the CLI can compare its newly fetched files

Create cambridge_cli.py

156459e

CLI to fetch new releases.

Delete cambridge_fetchfiles.py

c21dc98

Structure of code changed

Create functions.py

d972ee3

Add files via upload

cca5082

Merge pull request #2 from jdm010/jdm010-patch-1

649196d

Springer CLI

Create README.md

617be2c

Create README.md

fd36c76

Delete src/providers/springer directory

4b26b35

Update and rename functions.py to utils.py

4e9e590

Delete get_test_data.py

9bd378a

Update and rename cambridge_cli.py to cli.py

93b93ac

Update README.md

27d8520

ErnestaP reviewed Aug 29, 2023

View reviewed changes

Create test_utils.py

1867fb4

Added tests

ErnestaP reviewed Aug 30, 2023

View reviewed changes

jdm010 added 6 commits August 31, 2023 11:49

Update test_utils.py

3580917

Improved tests

Update utils.py

57be339

Added an exception for get_page_content. The function now returns response rather than response.text. Replaced f-string with structlog.

Update cli.py

4b1d16a

Minor change to accommodate changes with utils.py

Create utils.py

22b9559

Utils for Springer

Create cli.py

313b9dd

Created CLI for Springer

Create test_utils.py

070d31f

Tests for Springer

ErnestaP reviewed Aug 31, 2023

View reviewed changes

jdm010 added 4 commits September 5, 2023 09:32

Create utils.py

a5374cc

Utils for Taylor & Francis

Create cli.py

7231777

CLI for Taylor & Francis

Update utils.py

330c7e9

Improved readability, added else statement for ind_links_and_tags

Update test_utils.py

13ce132

Improved test_found_links and added case with 0 links found

jdm010 added 4 commits September 6, 2023 08:36

Update utils.py

2e85656

Better readability, improved find_links_and_tags

Update test_utils.py

979f7c1

Improved test_found_links, added case with no links found.

Create cli.py

db4f301

Added a higher-level CLI that gives the option of choosing the publisher

Create README.md

121fd15

ErnestaP reviewed Sep 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated CLI for Cambridge #4

Updated CLI for Cambridge #4

jdm010 commented Aug 29, 2023

ErnestaP Aug 29, 2023

ErnestaP Aug 30, 2023

ErnestaP Aug 31, 2023 •

edited

Loading

ErnestaP Aug 31, 2023

ErnestaP Aug 31, 2023

ErnestaP Aug 31, 2023

jdm010 Sep 1, 2023

ErnestaP Sep 1, 2023

ErnestaP Sep 11, 2023

ErnestaP Aug 31, 2023

ErnestaP Aug 31, 2023

ErnestaP Aug 31, 2023

ErnestaP Aug 31, 2023

ErnestaP Aug 31, 2023

ErnestaP Sep 11, 2023

Updated CLI for Cambridge #4

Are you sure you want to change the base?

Updated CLI for Cambridge #4

Conversation

jdm010 commented Aug 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ErnestaP Aug 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ErnestaP Aug 31, 2023 •

edited

Loading