-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
In this Sprint, we created Dashboards with mock data as a proof-of-concept of our analytics pipeline --------- Co-authored-by: David Santiago Ortiz Almanza <[email protected]> Co-authored-by: Gotty <[email protected]> Co-authored-by: Laudarias <[email protected]> Co-authored-by: jdcastellanosb73 <[email protected]>
- Loading branch information
1 parent
fdf5273
commit 8d32f40
Showing
16 changed files
with
249 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,32 @@ | ||
# Analytics | ||
# Analytics Pipeline | ||
|
||
![Frame 71](https://github.com/ISIS3510-202410-Team-13/Analytics/assets/68788933/1aa76bc1-12e5-40ed-8cac-b1e0252c6e89) | ||
|
||
The Analytics Pipeline for UniSchedule encompasses dashboards aimed at providing insightful reports with visualizations to answer critical business questions. These dashboards serve as a tool for gaining valuable insights into user behavior, app usage patterns, and other relevant metrics. The pipeline is designed to enable informed decision-making and improve overall user experience within the UniSchedule ecosystem. | ||
|
||
## Approach | ||
|
||
Our approach for the analytics pipeline focuses on creating reports with visualizations that address specific business questions. While the current implementation uses mock data, the ultimate goal is to connect the UniSchedule application with the pipeline to analyze real-time gathered data. This proof of concept demonstrates the potential of leveraging data analytics to enhance decision-making and improve user experience within the UniSchedule ecosystem. The mock data created for this was carefully examined to ensure that every source of information will be available from the architecture designed for the application. We're defining how the data will flow once the application is ready to be connected with the pipeline. | ||
|
||
## Dashboards | ||
|
||
The `dashboards` directory contains PDF files exported from the reports generated by the analytics pipeline. These reports include visualizations and insights derived from the processed data. | ||
|
||
## Excel Mockups | ||
|
||
The `excel_mockups` directory houses the data used to create each report. These mockups simulate the sources of data that will be processed through the ETL (Extract, Transform, Load) process in the future. The mock data represents various aspects of user interactions, app usage patterns, and other relevant metrics. | ||
|
||
## Generators | ||
|
||
The `generators` directory contains Python scripts responsible for creating the mock data used in the Excel mockups. These scripts generate synthetic data to simulate real-world scenarios and enable the creation of comprehensive reports. | ||
|
||
|
||
## Access to Live Reports | ||
|
||
The following links provide acces to the reports developed in `looker-studio` (service provided by Google): | ||
|
||
* BQ 2.1 - [Evaluate User Satisfaction Score for space booking feature](https://lookerstudio.google.com/reporting/1af52a7c-94e9-4970-ad3f-76ac91c16c24) | ||
* BQ 2.3 - [Integration of social features, user retention, daily active usage](https://lookerstudio.google.com/reporting/9281694c-2631-46f5-8aab-0b23cb568ee9) | ||
* BQ 3.4 - [Schedule customization underutilized features](https://lookerstudio.google.com/reporting/d9c878b6-38cf-4186-ac6f-20c4ac3192ce) | ||
* BQ 4.1 - [Average weekly usage time by section for advertisement placement](https://lookerstudio.google.com/reporting/fb3c5144-378b-4239-a3df-c5327648dbe9) | ||
* BQ 5.2 - [Preferred times and locations for meetings](https://lookerstudio.google.com/reporting/cfd155a3-f15c-40dc-8a40-ac8b716ae167) |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
import pandas as pd | ||
import numpy as np | ||
from faker import Faker | ||
fake = Faker() | ||
|
||
# Crear datos falsos para la tabla | ||
np.random.seed(0) # Para reproducibilidad | ||
num_rows = 100 | ||
user_ids = [fake.unique.uuid4() for _ in range(num_rows)] | ||
session_ids = [fake.unique.uuid4() for _ in range(num_rows)] | ||
dates = [fake.date_this_year() for _ in range(num_rows)] | ||
start_times = [fake.time() for _ in range(num_rows)] | ||
end_times = [fake.time() for _ in range(num_rows)] | ||
sections = np.random.choice(['Calendario', 'Tareas', 'Notificaciones', 'Configuración'], num_rows) | ||
events = np.random.choice(['add_friend', 'chat', 'share_schedule'], num_rows) | ||
|
||
|
||
# Calcular duraciones de las sesiones y el tiempo en cada sección | ||
durations = np.random.randint(5, 120, num_rows) # Duración de sesión entre 5 y 120 minutos | ||
retention = np.random.randint(0, 100, num_rows) # retencion de usuario en porcentaje | ||
durations = np.random.randint(5, 10, num_rows) # veces que entraron a la aplicacion | ||
time_in_section = np.random.randint(1, durations) # Tiempo en la sección no puede ser mayor que la duración de la sesión | ||
interactions = np.random.randint(1, 20, num_rows) # Número de interacciones por sesión | ||
|
||
# Crear el DataFrame | ||
df = pd.DataFrame({ | ||
'UserID': user_ids, | ||
'SessionID': session_ids, | ||
'Fecha': dates, | ||
'HoraInicio': start_times, | ||
'HoraFin': end_times, | ||
'DuracionSesion (minutos)': durations, | ||
'Seccion': sections, | ||
'TiempoEnSeccion (minutos)': time_in_section, | ||
'Interacciones': interactions, | ||
'Evento': events, | ||
'retencion':retention, | ||
|
||
}) | ||
|
||
# Define the file path where you want to save the Excel file | ||
file_path = 'BQ2.3.xlsx' | ||
|
||
df.to_excel(file_path, index=False) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
import pandas as pd | ||
import numpy as np | ||
|
||
np.random.seed(42) | ||
num_rows = 100 | ||
user_ids = np.arange(1, num_rows + 1) | ||
user_types = np.random.choice(['Nuevo', 'Frecuente', 'Ocasional'], size=num_rows) | ||
|
||
booking_attempt_dates = pd.date_range(start="2024-01-01", end="2024-03-31", freq='8H')[:num_rows] | ||
|
||
|
||
buildings = ['ML', 'SD', 'O', 'W', 'RGD'] | ||
rooms = ['102', '202', '301', '302'] | ||
space_ids = [np.random.choice(buildings) + np.random.choice(rooms) for _ in range(num_rows)] | ||
|
||
ease_of_use_scores = np.random.randint(1, 6, size=num_rows) | ||
availability_scores = np.random.randint(1, 6, size=num_rows) | ||
overall_satisfaction_scores = np.random.randint(1, 6, size=num_rows) | ||
feedback_comments = np.random.choice(['Todo bien', 'Necesita mejoras', 'Excelente', 'Frustrante', 'Confuso'], size=num_rows) | ||
|
||
df_updated = pd.DataFrame({ | ||
'UserID': user_ids, | ||
'UserType': user_types, | ||
'BookingAttemptDate': booking_attempt_dates, | ||
'SpaceID': space_ids, | ||
'EaseOfUseScore': ease_of_use_scores, | ||
'AvailabilityScore': availability_scores, | ||
'OverallSatisfactionScore': overall_satisfaction_scores, | ||
'FeedbackComments': feedback_comments | ||
}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
import pandas as pd | ||
import numpy as np | ||
from faker import Faker | ||
fake = Faker() | ||
|
||
# Crear datos falsos para la tabla | ||
np.random.seed(0) # Para reproducibilidad | ||
num_rows = 100 | ||
user_ids = [fake.unique.uuid4() for _ in range(num_rows)] | ||
session_ids = [fake.unique.uuid4() for _ in range(num_rows)] | ||
dates = [fake.date_this_year() for _ in range(num_rows)] | ||
start_times = [fake.time() for _ in range(num_rows)] | ||
end_times = [fake.time() for _ in range(num_rows)] | ||
sections = np.random.choice(['Calendario', 'Tareas', 'Notificaciones', 'Configuración'], num_rows) | ||
events = np.random.choice(['Ver', 'Editar', 'Crear'], num_rows) | ||
costumization = np.random.choice(['BackGround_image', 'ChangeColor_Box', 'user_Icon'], num_rows) | ||
|
||
# Calcular duraciones de las sesiones y el tiempo en cada sección | ||
durations = np.random.randint(5, 120, num_rows) # Duración de sesión entre 5 y 120 minutos | ||
time_in_section = np.random.randint(1, durations) # Tiempo en la sección no puede ser mayor que la duración de la sesión | ||
interactions = np.random.randint(1, 20, num_rows) # Número de interacciones por sesión | ||
|
||
# Crear el DataFrame | ||
df = pd.DataFrame({ | ||
'UserID': user_ids, | ||
'SessionID': session_ids, | ||
'Fecha': dates, | ||
'HoraInicio': start_times, | ||
'HoraFin': end_times, | ||
'DuracionSesion (minutos)': durations, | ||
'Seccion': sections, | ||
'TiempoEnSeccion (minutos)': time_in_section, | ||
'Interacciones': interactions, | ||
'Evento': events | ||
}) | ||
|
||
df.to_excel("bq3.4", index=False) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import pandas as pd | ||
import numpy as np | ||
from faker import Faker | ||
fake = Faker() | ||
|
||
# Crear datos falsos para la tabla | ||
np.random.seed(0) # Para reproducibilidad | ||
num_rows = 100 | ||
user_ids = [fake.unique.uuid4() for _ in range(num_rows)] | ||
session_ids = [fake.unique.uuid4() for _ in range(num_rows)] | ||
dates = [fake.date_this_year() for _ in range(num_rows)] | ||
start_times = [fake.time() for _ in range(num_rows)] | ||
end_times = [fake.time() for _ in range(num_rows)] | ||
sections = np.random.choice(['Calendario', 'Tareas', 'Notificaciones', 'Configuración'], num_rows) | ||
events = np.random.choice(['Ver', 'Editar', 'Crear'], num_rows) | ||
|
||
# Calcular duraciones de las sesiones y el tiempo en cada sección | ||
durations = np.random.randint(5, 120, num_rows) # Duración de sesión entre 5 y 120 minutos | ||
time_in_section = np.random.randint(1, durations) # Tiempo en la sección no puede ser mayor que la duración de la sesión | ||
interactions = np.random.randint(1, 20, num_rows) # Número de interacciones por sesión | ||
|
||
# Crear el DataFrame | ||
df = pd.DataFrame({ | ||
'UserID': user_ids, | ||
'SessionID': session_ids, | ||
'Fecha': dates, | ||
'HoraInicio': start_times, | ||
'HoraFin': end_times, | ||
'DuracionSesion (minutos)': durations, | ||
'Seccion': sections, | ||
'TiempoEnSeccion (minutos)': time_in_section, | ||
'Interacciones': interactions, | ||
'Evento': events | ||
}) | ||
|
||
df.to_excel(file_path, index=False) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
import pandas as pd | ||
import numpy as np | ||
from faker import Faker | ||
import random | ||
|
||
np.random.seed(0) | ||
|
||
fake = Faker() | ||
|
||
# Generate random data for the dataset | ||
num_rows = 1000 | ||
|
||
# Generate UserIDs | ||
user_ids = [fake.unique.uuid4() for _ in range(num_rows)] | ||
|
||
# Generate UserType: Nuevo, Ocasional, Frecuente | ||
user_types = ['Nuevo', 'Ocasional', 'Frecuente'] | ||
user_type = [random.choice(user_types) for _ in range(num_rows)] | ||
|
||
# Generate UserSemester: Values between 1 and 10 | ||
user_semester = [random.randint(1, 10) for _ in range(num_rows)] | ||
|
||
# Generate UserCareer: ISIS, IIND, MATE, IBIO, IELE, IMEC, IQUI, ICYA, LITE, PSIC, MEDI | ||
user_careers = ["ISIS", "IIND", "MATE", "IBIO", "IELE", "IMEC", "IQUI", "ICYA", "LITE", "PSIC", "MEDI"] | ||
user_career = [random.choice(user_careers) for _ in range(num_rows)] | ||
|
||
# Generate MeetingDate | ||
meeting_dates = [fake.date_this_year() for _ in range(num_rows)] | ||
|
||
# Generate DayOfWeek | ||
days_of_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"] | ||
day_of_week = [random.choice(days_of_week) for _ in range(num_rows)] | ||
|
||
# Generate MeetingStartTime: Values between 8:00 and 17:00 | ||
meeting_start_times = [f"{random.randint(8, 16)}:{random.choice(['00', '30'])}" for _ in range(num_rows)] | ||
|
||
# Generate MeetingDuration: Up to 4 hours | ||
meeting_duration = [random.randint(15, 240) for _ in range(num_rows)] | ||
|
||
# Generate MeetingBuilding: ML, SD, W, R, O, C, LL, B, RGD, AU | ||
meeting_buildings = ["ML", "SD", "W", "R", "O", "C", "LL", "B", "RGD", "AU"] | ||
meeting_building = [random.choice(meeting_buildings) for _ in range(num_rows)] | ||
|
||
# Generate MeetingPurpose: Class, Leisure, Group Project, Other | ||
meeting_purposes = ["Class", "Leisure", "Group Project", "Other"] | ||
meeting_purpose = [random.choice(meeting_purposes) for _ in range(num_rows)] | ||
|
||
# Generate OverallSatisfactionScore: Values between 1 and 5 | ||
overall_satisfaction_score = [random.randint(1, 5) for _ in range(num_rows)] | ||
|
||
# Create the DataFrame | ||
df = pd.DataFrame({ | ||
'UserID': user_ids, | ||
'UserType': user_type, | ||
'UserSemester': user_semester, | ||
'UserCareer': user_career, | ||
'MeetingDate': meeting_dates, | ||
'DayOfWeek': day_of_week, | ||
'MeetingStartTime': meeting_start_times, | ||
'MeetingDuration': meeting_duration, | ||
'MeetingBuilding': meeting_building, | ||
'MeetingPurpose': meeting_purpose, | ||
'OverallSatisfactionScore': overall_satisfaction_score | ||
}) | ||
|
||
# Display the DataFrame | ||
print(df.head()) | ||
|
||
file_path = "./meeting_data.xlsx" | ||
df.to_excel(file_path, index=False) |