Scrapping from python results of GBFS-validator #165

iaguerri · 2024-01-04T10:33:34Z

If you are new to the GBFS Validator, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!

I'm working in a MaaS application. I need to validate the GBFS that the public operators gives to me.

What is the issue and why is it an issue?

I'm trying to do a request from python to the result of a validation (https://gbfs-validator.mobilitydata.org/validator?url=https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json)
I'm trying from POSTMAN

The problem is that the response is a 200 (OK) but the info is not possible to extract (even with scrapping) because the body says "We're sorry but my-project doesn't work properly without Javascript enabled. Please enable to continue"

The code used:

import requests
from bs4 import BeautifulSoup
 
url_validator = "[https://gbfs-validator.mobilitydata.org/validator"](https://gbfs-validator.mobilitydata.org/validator%22)
 
# Jsons de prueba
json_main_full_brusels = "[https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json"](https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json%22)                                               # Json Correcto
json_main_nolastupdated_brusels = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json%22)                 # Json Incorrecto (No last Updated)
json_main_vehiclyType_nolastupdated = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json%22)      # Json Incorrecto - feed VehicleTypes sin lastUpdated
json_main_nofeed_systeminformation = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json%22)    # Json Incorrecto - No feed SystemInformation
 
params = {
    "url": json_main_nolastupdated_brusels
}
 
url_completa = requests.Request('GET', url_validator, params=params).prepare().url
print("URL de la solicitud:", url_completa)
 

#APPROACH 1: access from the request
respuesta = requests.get(url_validator, params=params)

if respuesta.status_code == 200:
     datos_respuesta = respuesta.text
     print("Respuesta del Validador:", datos_respuesta)
else:
     print("Error en la solicitud. Código de estado:", respuesta.status_code)
     print("Contenido de la respuesta:", respuesta.text)`


#APPROACH 2: with selenium
soup = BeautifulSoup(respuesta.content, 'html.parser')
 
for div_element in soup.find_all('div', class_='data-v-7c2075bd'):
    # Extract the text content of the div element
    div_text = div_element.get_text(strip=True)
   
    # Print the value of k
    print("Valor de k es:", div_text)

Please describe some potential solutions you have considered (even if they aren’t related to GBFS).

I don't know why the html is not loaded after, but maybe activating Javascript it would be nicer to get this info

Thanks!!

The text was updated successfully, but these errors were encountered:

davidgamez · 2024-01-04T14:49:39Z

Hi @iaguerri, the GBFS Validator is currently deployed on Netlify. Looking at the error message you are getting, Netlify is detecting and blocking the use of a bot consumer. You can browse the Internet for solutions on how to avoid user-agent detection. However, I suggest using the "not documented/no stable" API endpoint if you want to get the validation report response for specific feeds. Unfortunately, we are not offering a stable API endpoint yet. The following issue contains information on how to access the API #95. If you would like to follow the development of the stable API, follow this issue #129.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrapping from python results of GBFS-validator #165

Scrapping from python results of GBFS-validator #165

iaguerri commented Jan 4, 2024

davidgamez commented Jan 4, 2024

Scrapping from python results of GBFS-validator #165

Scrapping from python results of GBFS-validator #165

Comments

iaguerri commented Jan 4, 2024

If you are new to the GBFS Validator, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!

What is the issue and why is it an issue?

Please describe some potential solutions you have considered (even if they aren’t related to GBFS).

davidgamez commented Jan 4, 2024