Skip to content

Commit

Permalink
Merge dtq-dev updated to lindat (#829)
Browse files Browse the repository at this point in the history
* UFAL/Removed duplicities of the bitstreams in the cmdi (#766)

* Removed duplicities of the bitstreams in the cmdi.

* Fixed checkstyle violation

* used lindat code instead of vanilla.

* Ufal/Preview issues (#764)

* Ensure the content preview doesn't overload maximum length of the column in the database. And encode the input stream into UTF-8.

* Do not store HTML content in the database because it could be longer than the limit of the database column.

* UFAL/Encoded the UTF-8 characters from the redirect URL to UTF (#758)

* Encoded the UTF-8 characters from the redirect URL to UTF

* Moved ClarinUtils into Utils class

* Added a new `dq` package into ComponentScan

* Moved dq.Utils into DSpace utils.Utils because the components with the same name causes conflicts.

* Removed *.dq component scan from the App

* Merge pull request DSpace#9790 from DSpace/backport-9775-to-dspace-7_x (#769)

[Port dspace-7_x] Make statistics autocommit much more frequently

Co-authored-by: Tim Donohue <[email protected]>

* test for bitstream with null value of sizebytes

* Update README.md

* UFAL/Shibboleth - load more net-id headers e.g. persistent-id (#772)

* Load netid from more than one header. authentication-shibboleth.netid-header could be list, not only single value

* Shibboleth login - sort the emails passed in the shibboleth email header and get the first one.

* The user is redirected to the login page when it is trying to update eperson email which is already assigned to another eperson.

* Sorting emails is moved into specific method and ShibbolethLoginFilter is updated following the ShibAuthentication changes

* Fixed failing tests

* The ClarinShibbolethLoginFilter and ClarinShibAuthentication has duplicate code, I put it into static method.

* Propagate the verification token to the DB after the email is successfully sent. (#786)

* UFAL/Enhanced type-bind feature (#762)

* type bind is correctly rendered in the FE, but BE is still not working

* Synchronized the `submission-forms_cs.xml`

* Added doc into `submission-forms` about enhanced type-bind `field`

* Updated `local.cfg` for tests - added type-bind property

* Updated docs for the customized type-bind configuration property.

* Updated MetadataValidation following the type-bind customization.

* Added isAllowed function for multiple type-bind definitions

* Added some docs for the new method

* The values of the input wasn't loaded.

* Allowed fields could be empty when they should have values.

* Used isEmpty function and created constant for the `=>`.

* create preview content for tar files (#759)

* create preview content for tar files

* Added right logs

* devided extractFile funs into several separated smaller funs

* added comment and removed empty line

* added empty lines and removed unwanted comments

* removed empty line

* used consts

* try incorrect identification level

* log errors and removed unneeded consts

---------

Co-authored-by: milanmajchrak <[email protected]>

* Internal/fix failing Clarin integration test (#796)

* Initial commit

* Ignore the test class from where the tests has started failing.

* Ignored half of tests in the ClarinShibbolethLoginFilterIT file

* Ignored all tests

* unignore some tests

* 3 tests ignored 3 allowed

* Maybe the problematic test is hidden between 3 unignored tests

* two candidates

* The last candidate

* Ignore just wrong test and all test should pass

* Clean up object created in the test.

* Removed unused import.

* Check the user which is going to be deleted is not null.

* Rest api for handle resolution with metadata

* decoded rawvalues and response json modification

* used static extractMetadata funct in HandlePlugin

* return dict:

* removed property for test from local

* Add default licenses - from ZCU update (#801)

* Added flyway file to insert default licenses with license labels and mappings

* Added required header

* UFAL/share submission by email (#780)

* Updated table workspace with share token, created endpoint to generate share token and it is sent via email

* Added method to get workspaceitem via share token.

* Added an endpoint for changing the submission's owner.

* Added license headers

* Added test for fetching item with share Token

* Added tests to check the owner is changed

* Added better explanation why the BE must return Page object in the search endpoint

* Validate the user in the SubmissionController, it cannot be null

* Updated email - some values are fetched directly from the configuration property

* Updated preAuthorization method to ADD instead of WRITE (write is used for controlling authorization for modifying the Item) and updated shareURL

* Authorize the submitter which is trying to take sharing item via shareToken.

* Update integration test following the feature update

* Import default licenses only if the license tables are empty. (#808)

* Oai elg crosswalk (#798)

* problem with language code

* fix amount and sizeUnitOther rest-tests errors

* added language coding

* new language and funding project mishmash array position

* added isoCodes

* removed handle from item submission (#812)

* UFAL/Shibboleth - netid-header should use getArrayProperty everywhere (#807)

* Fetch netid as array from the cfg. Now netid as array is used everywhere. Added integration test to ask for an email when the user send only persistent-id in the shib header.

* Fixed checkstyle issue

* The user is not signed in without using link with the verification token from the email/ (#809)

* UFAL/Shibboleth - show error in the UI when shibboleth authentication is failed (#810)

* The user is not signed in without using link with the verification token from the email/

* Send a redirect to UI with specific parameter that the Shibboleth authorization wasn't successful

* UFAL/Autocomplete enhancement (#768)

* Added solr index `handle_title_ac` and `_comp` for the Item

* Added support for searching results from specific solr indexes.

1. Updated submission-forms autocomplete definition to specify a specific index. 2. Updated configuration is provided via REST API. 3. Create a new `/suggestions` endpoint for searching values from custom solr index - it returns VocabularyEntry page.

* Supported searching Item byHandle when passed a handle as parameter without handle canonical prefix.

* Added autocompleteCustom `solr-subject_ac` and `handle_title_ac`.

* Added autocompleteCustom `solr-publisher_ac`.

* Added cfg property to define a separator from the solr value to get only display value.

* Added autocompleteCustom `solr-dataProvider_ac`

* Refactored code and created integration test for the SuggestionRestController

* Updated suggestion integration tests because it has had a conflict with another IT

* Added doc and changed `autocomplete.custom.format` to `autocomplete.custom.separator` for proper naming.

* Added support for loading suggestions from the json file - need to refactor.

* Refactored and added docs.

* Created tests and fixed failing ones due to updated solr definition

* Synchronized submission-forms_cs.xml with the original-english one

* Added docs about magic constants

* Added doc why the handle is updated to canonical form in the searchbyHandle endpoint

* Allow searching only within the solr indexes or JSON files permitted by the configuration.

* Removed normalization of handle prefix because there could be more prefixes. Expect only handle value.

* Fixed if condition and config property default value.

* Fixed integration tests. Allowed autocomplete custom was missing in the test cfg.

* The suggestion endpoint is allowed only for authorized users

* Refactored method for the normalizing the query for the discoverQuery and added unit tests.

* UFAL/Changed position of rows in submission-forms.xml following v5 (#802)

* Changed position of rows in submission-forms.xml following v5

* Fixed the alignment of some text

* Removed rows which are not in the v5

* Removed license selector from the `teaching` and `clariah-data` collections

* Updated input differences in the submission-forms.xml following the v5.

---------

Co-authored-by: Juraj Roka <[email protected]>
Co-authored-by: milanmajchrak <[email protected]>

* Show db connection statistics in the log file or the `dbstatistics` endpoint (#815)

* Show db statistics in the log file or the `dbstatistics` endpoint

* Finding out why github checks are failed - undo hibernate.cfg

* Disabled automatic logging

* Use scheduled CRON job instead of PostConstruct

* hibernate generating property true

---------

Co-authored-by: Paurikova2 <[email protected]>

* Translation of submission-forms to _cs (#816)

* Translation of submission-forms to _cs

* Translated bitstream metadata and complex input fields

* Translated the rest of submission-froms_cs.xml

* Fixed regex... it must contain regex value, not the message.

---------

Co-authored-by: Juraj Roka <[email protected]>
Co-authored-by: milanmajchrak <[email protected]>

* Updated cfg to pretify the author suggestions (#819)

* crosswalk-embargo (#821)

* added fn for embargo

* using of res policy end_date and added comments

* fix string format problem with %s

* integration tests are falling down

* checkstyle violations

* removed findHandle duplicity

* added deleted line

* checkstyle violations

* For now the complex input field is without autocomplete for the size and contact person (#823)

* Send the custom type bind `field` to the FE configuration (#822)

* fix date converting to string (#825)

* fix date converting to string

* made const from format

* checkstyle

* cherry-pick clarin v7 into dtq dev (#820)

* cherry-picked DataCite related changes from customer/uk

* Add a script that adds a file from url to an item

intended for large file workflows

* Add ways to influence the bitstream name

* add more options to specify an item

* Expose resourceId (DSpace#1134)

A BE part of DSpace#1127 - this exposes the resourceId so it can be used in the handle mgmt table

* fixes ufal#1135 - findEpersonByNetId should stop searching when it finds an eperson

- moved the `return eperson` inside the for cycle (after eperson non null
check).
- removed the eperson param (both callers were passing in `null`)

* Test release without db logs (#827)

* UFAL/Matomo statistics with dimension (#813)

* Updated the version of matomo dependency and tried to change request from Custom Variables to Dimension

* Added a custom dimension with item's handle URL

* Send custom dimension also in oai tracker

* Use only IPv4 address, the Matomo tracker has a problem with IPv6

* Do not change custom dimension when the Item is null

* First custom dimension should have ID '1'.

* Use a valid URL for Matomo tracker in the IT

* Configure handle custom dimension ID in the clarin-dspace.cfg

* Refactored ipv4 method to be more readable - return null

---------

Co-authored-by: Juraj Roka <[email protected]>
Co-authored-by: milanmajchrak <[email protected]>
Co-authored-by: milanmajchrak <[email protected]>

* The `dspace.name.short` is not working properly in the email, use `dspace.shortname` instead (#828)

---------

Co-authored-by: Tim Donohue <[email protected]>
Co-authored-by: Paurikova2 <[email protected]>
Co-authored-by: Jozef Misutka <[email protected]>
Co-authored-by: jurinecko <[email protected]>
Co-authored-by: Juraj Roka <[email protected]>
Co-authored-by: Paurikova2 <[email protected]>
Co-authored-by: Ondřej Košarko <[email protected]>
  • Loading branch information
8 people authored Dec 6, 2024
1 parent c1b95e9 commit 83b295f
Show file tree
Hide file tree
Showing 91 changed files with 13,510 additions and 1,458 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,4 +128,7 @@ The full license is available in the [LICENSE](LICENSE) file or online at http:/

DSpace uses third-party libraries which may be distributed under different licenses. Those licenses are listed
in the [LICENSES_THIRD_PARTY](LICENSES_THIRD_PARTY) file.


# Additional notes

This project is tested with BrowserStack.
4 changes: 2 additions & 2 deletions dspace-api/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -339,8 +339,8 @@
<dependencies>
<dependency>
<groupId>org.piwik.java.tracking</groupId>
<artifactId>matomo-java-tracker</artifactId>
<version>2.0</version>
<artifactId>matomo-java-tracker-java11</artifactId>
<version>3.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
Expand Down
229 changes: 229 additions & 0 deletions dspace-api/src/main/java/org/dspace/administer/FileDownloader.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
/**
* The contents of this file are subject to the license and copyright
* detailed in the LICENSE and NOTICE files at the root of the source
* tree and available online at
*
* http://www.dspace.org/license/
*/
package org.dspace.administer;

import java.io.IOException;
import java.io.InputStream;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.sql.SQLException;
import java.util.List;
import java.util.UUID;
import java.util.stream.Stream;

import org.apache.commons.cli.ParseException;
import org.dspace.authorize.AuthorizeException;
import org.dspace.content.Bitstream;
import org.dspace.content.BitstreamFormat;
import org.dspace.content.Bundle;
import org.dspace.content.DSpaceObject;
import org.dspace.content.Item;
import org.dspace.content.factory.ContentServiceFactory;
import org.dspace.content.service.BitstreamFormatService;
import org.dspace.content.service.BitstreamService;
import org.dspace.content.service.ItemService;
import org.dspace.content.service.WorkspaceItemService;
import org.dspace.core.Context;
import org.dspace.eperson.EPerson;
import org.dspace.eperson.factory.EPersonServiceFactory;
import org.dspace.eperson.service.EPersonService;
import org.dspace.identifier.IdentifierNotFoundException;
import org.dspace.identifier.IdentifierNotResolvableException;
import org.dspace.identifier.factory.IdentifierServiceFactory;
import org.dspace.identifier.service.IdentifierService;
import org.dspace.scripts.DSpaceRunnable;
import org.dspace.scripts.configuration.ScriptConfiguration;
import org.dspace.utils.DSpace;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


public class FileDownloader extends DSpaceRunnable<FileDownloaderConfiguration> {

private static final Logger log = LoggerFactory.getLogger(FileDownloader.class);
private boolean help = false;
private UUID itemUUID;
private int workspaceID;
private String pid;
private URI uri;
private String epersonMail;
private String bitstreamName;
private EPersonService epersonService;
private ItemService itemService;
private WorkspaceItemService workspaceItemService;
private IdentifierService identifierService;
private BitstreamService bitstreamService;
private BitstreamFormatService bitstreamFormatService;
private final HttpClient httpClient = HttpClient.newBuilder()
.followRedirects(HttpClient.Redirect.NORMAL)
.build();

/**
* This method will return the Configuration that the implementing DSpaceRunnable uses
*
* @return The {@link ScriptConfiguration} that this implementing DspaceRunnable uses
*/
@Override
public FileDownloaderConfiguration getScriptConfiguration() {
return new DSpace().getServiceManager().getServiceByName("file-downloader",
FileDownloaderConfiguration.class);
}

/**
* This method has to be included in every script and handles the setup of the script by parsing the CommandLine
* and setting the variables
*
* @throws ParseException If something goes wrong
*/
@Override
public void setup() throws ParseException {
log.debug("Setting up {}", FileDownloader.class.getName());
if (commandLine.hasOption("h")) {
help = true;
return;
}

if (!commandLine.hasOption("u")) {
throw new ParseException("No URL option has been provided");
}

if (!commandLine.hasOption("i") && !commandLine.hasOption("w") && !commandLine.hasOption("p")) {
throw new ParseException("No item id option has been provided");
}

if (getEpersonIdentifier() == null && !commandLine.hasOption("e")) {
throw new ParseException("No eperson option has been provided");
}


this.epersonService = EPersonServiceFactory.getInstance().getEPersonService();
this.itemService = ContentServiceFactory.getInstance().getItemService();
this.workspaceItemService = ContentServiceFactory.getInstance().getWorkspaceItemService();
this.bitstreamService = ContentServiceFactory.getInstance().getBitstreamService();
this.bitstreamFormatService = ContentServiceFactory.getInstance().getBitstreamFormatService();
this.identifierService = IdentifierServiceFactory.getInstance().getIdentifierService();

try {
uri = new URI(commandLine.getOptionValue("u"));
} catch (URISyntaxException e) {
throw new ParseException("The provided URL is not a valid URL");
}

if (commandLine.hasOption("i")) {
itemUUID = UUID.fromString(commandLine.getOptionValue("i"));
} else if (commandLine.hasOption("w")) {
workspaceID = Integer.parseInt(commandLine.getOptionValue("w"));
} else if (commandLine.hasOption("p")) {
pid = commandLine.getOptionValue("p");
}

epersonMail = commandLine.getOptionValue("e");

if (commandLine.hasOption("n")) {
bitstreamName = commandLine.getOptionValue("n");
}
}

/**
* This method has to be included in every script and this will be the main execution block for the script that'll
* contain all the logic needed
*
* @throws Exception If something goes wrong
*/
@Override
public void internalRun() throws Exception {
log.debug("Running {}", FileDownloader.class.getName());
if (help) {
printHelp();
return;
}

Context context = new Context();
context.setCurrentUser(getEperson(context));

//find the item by the given id
Item item = findItem(context);
if (item == null) {
throw new IllegalArgumentException("No item found for the given ID");
}

HttpRequest request = HttpRequest.newBuilder()
.uri(uri)
.build();

HttpResponse<InputStream> response = httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream());

if (response.statusCode() >= 400) {
throw new IllegalArgumentException("The provided URL returned a status code of " + response.statusCode());
}

//use the provided value, the content-disposition header, the last part of the uri
if (bitstreamName == null) {
bitstreamName = response.headers().firstValue("Content-Disposition")
.filter(value -> value.contains("filename=")).flatMap(value -> Stream.of(value.split(";"))
.filter(v -> v.contains("filename="))
.findFirst()
.map(fvalue -> fvalue.replaceFirst("filename=", "").replaceAll("\"", "")))
.orElse(uri.getPath().substring(uri.getPath().lastIndexOf('/') + 1));
}

try (InputStream is = response.body()) {
saveFileToItem(context, item, is, bitstreamName);
}

context.commit();
}

private Item findItem(Context context) throws SQLException {
if (itemUUID != null) {
return itemService.find(context, itemUUID);
} else if (workspaceID != 0) {
return workspaceItemService.find(context, workspaceID).getItem();
} else {
try {
DSpaceObject dso = identifierService.resolve(context, pid);
if (dso instanceof Item) {
return (Item) dso;
} else {
throw new IllegalArgumentException("The provided identifier does not resolve to an item");
}
} catch (IdentifierNotFoundException | IdentifierNotResolvableException e) {
throw new IllegalArgumentException(e);
}
}
}

private void saveFileToItem(Context context, Item item, InputStream is, String name)
throws SQLException, AuthorizeException, IOException {
log.debug("Saving file to item {}", item.getID());
List<Bundle> originals = item.getBundles("ORIGINAL");
Bitstream b;
if (originals.isEmpty()) {
b = itemService.createSingleBitstream(context, is, item);
} else {
Bundle bundle = originals.get(0);
b = bitstreamService.create(context, bundle, is);
}
b.setName(context, name);
//now guess format of the bitstream
BitstreamFormat bf = bitstreamFormatService.guessFormat(context, b);
b.setFormat(context, bf);
}

private EPerson getEperson(Context context) throws SQLException {
if (getEpersonIdentifier() != null) {
return epersonService.find(context, getEpersonIdentifier());
} else {
return epersonService.findByEmail(context, epersonMail);
}
}
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
/**
* The contents of this file are subject to the license and copyright
* detailed in the LICENSE and NOTICE files at the root of the source
* tree and available online at
*
* http://www.dspace.org/license/
*/
package org.dspace.administer;

import org.apache.commons.cli.OptionGroup;
import org.apache.commons.cli.Options;
import org.dspace.scripts.configuration.ScriptConfiguration;

public class FileDownloaderConfiguration extends ScriptConfiguration<FileDownloader> {

private Class<FileDownloader> dspaceRunnableClass;

/**
* Generic getter for the dspaceRunnableClass
*
* @return the dspaceRunnableClass value of this ScriptConfiguration
*/
@Override
public Class<FileDownloader> getDspaceRunnableClass() {
return dspaceRunnableClass;
}

/**
* Generic setter for the dspaceRunnableClass
*
* @param dspaceRunnableClass The dspaceRunnableClass to be set on this IndexDiscoveryScriptConfiguration
*/
@Override
public void setDspaceRunnableClass(Class<FileDownloader> dspaceRunnableClass) {
this.dspaceRunnableClass = dspaceRunnableClass;
}

/**
* The getter for the options of the Script
*
* @return the options value of this ScriptConfiguration
*/
@Override
public Options getOptions() {
if (options == null) {

Options options = new Options();
OptionGroup ids = new OptionGroup();

options.addOption("h", "help", false, "help");

options.addOption("u", "url", true, "source url");
options.getOption("u").setRequired(true);

options.addOption("i", "uuid", true, "item uuid");
options.addOption("w", "wsid", true, "workspace id");
options.addOption("p", "pid", true, "item pid (e.g. handle or doi)");
ids.addOption(options.getOption("i"));
ids.addOption(options.getOption("w"));
ids.addOption(options.getOption("p"));
ids.setRequired(true);

options.addOption("e", "eperson", true, "eperson email");
options.getOption("e").setRequired(false);

options.addOption("n", "name", true, "name of the file/bitstream");
options.getOption("n").setRequired(false);

super.options = options;
}
return options;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

import java.sql.SQLException;
import java.text.MessageFormat;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Objects;
import javax.servlet.http.HttpServletRequest;
Expand Down Expand Up @@ -69,7 +70,6 @@ public ClarinMatomoBitstreamTracker() {
@Override
protected void preTrack(Context context, MatomoRequest matomoRequest, Item item, HttpServletRequest request) {
super.preTrack(context, matomoRequest, item, request);

matomoRequest.setSiteId(siteId);
log.debug("Logging to site " + matomoRequest.getSiteId());
String itemIdentifier = getItemIdentifier(item);
Expand All @@ -82,6 +82,11 @@ protected void preTrack(Context context, MatomoRequest matomoRequest, Item item,
}
try {
matomoRequest.setPageCustomVariable(new CustomVariable("source", "bitstream"), 1);
// Add the Item handle into the request as a custom dimension
LinkedHashMap<Long, Object> handleDimension = new LinkedHashMap<>();
handleDimension.put(configurationService.getLongProperty("matomo.custom.dimension.handle.id",
1L), item.getHandle());
matomoRequest.setDimensions(handleDimension);
} catch (MatomoException e) {
log.error(e);
}
Expand Down
Loading

0 comments on commit 83b295f

Please sign in to comment.