Skip to content

Commit

Permalink
Sprint 2 Release - Analytics (#16)
Browse files Browse the repository at this point in the history
In this Sprint, we created Dashboards with mock data as a
proof-of-concept of our analytics pipeline

---------

Co-authored-by: David Santiago Ortiz Almanza <[email protected]>
Co-authored-by: Gotty <[email protected]>
Co-authored-by: Laudarias <[email protected]>
Co-authored-by: jdcastellanosb73 <[email protected]>
  • Loading branch information
5 people authored Mar 23, 2024
1 parent fdf5273 commit 8d32f40
Show file tree
Hide file tree
Showing 16 changed files with 249 additions and 1 deletion.
33 changes: 32 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,32 @@
# Analytics
# Analytics Pipeline

![Frame 71](https://github.com/ISIS3510-202410-Team-13/Analytics/assets/68788933/1aa76bc1-12e5-40ed-8cac-b1e0252c6e89)

The Analytics Pipeline for UniSchedule encompasses dashboards aimed at providing insightful reports with visualizations to answer critical business questions. These dashboards serve as a tool for gaining valuable insights into user behavior, app usage patterns, and other relevant metrics. The pipeline is designed to enable informed decision-making and improve overall user experience within the UniSchedule ecosystem.

## Approach

Our approach for the analytics pipeline focuses on creating reports with visualizations that address specific business questions. While the current implementation uses mock data, the ultimate goal is to connect the UniSchedule application with the pipeline to analyze real-time gathered data. This proof of concept demonstrates the potential of leveraging data analytics to enhance decision-making and improve user experience within the UniSchedule ecosystem. The mock data created for this was carefully examined to ensure that every source of information will be available from the architecture designed for the application. We're defining how the data will flow once the application is ready to be connected with the pipeline.

## Dashboards

The `dashboards` directory contains PDF files exported from the reports generated by the analytics pipeline. These reports include visualizations and insights derived from the processed data.

## Excel Mockups

The `excel_mockups` directory houses the data used to create each report. These mockups simulate the sources of data that will be processed through the ETL (Extract, Transform, Load) process in the future. The mock data represents various aspects of user interactions, app usage patterns, and other relevant metrics.

## Generators

The `generators` directory contains Python scripts responsible for creating the mock data used in the Excel mockups. These scripts generate synthetic data to simulate real-world scenarios and enable the creation of comprehensive reports.


## Access to Live Reports

The following links provide acces to the reports developed in `looker-studio` (service provided by Google):

* BQ 2.1 - [Evaluate User Satisfaction Score for space booking feature](https://lookerstudio.google.com/reporting/1af52a7c-94e9-4970-ad3f-76ac91c16c24)
* BQ 2.3 - [Integration of social features, user retention, daily active usage](https://lookerstudio.google.com/reporting/9281694c-2631-46f5-8aab-0b23cb568ee9)
* BQ 3.4 - [Schedule customization underutilized features](https://lookerstudio.google.com/reporting/d9c878b6-38cf-4186-ac6f-20c4ac3192ce)
* BQ 4.1 - [Average weekly usage time by section for advertisement placement](https://lookerstudio.google.com/reporting/fb3c5144-378b-4239-a3df-c5327648dbe9)
* BQ 5.2 - [Preferred times and locations for meetings](https://lookerstudio.google.com/reporting/cfd155a3-f15c-40dc-8a40-ac8b716ae167)
Binary file added dashboards/BQ2.3.pdf
Binary file not shown.
Binary file added dashboards/bq2.1.pdf
Binary file not shown.
Binary file added dashboards/bq3.4.pdf
Binary file not shown.
Binary file added dashboards/bq4.1.pdf
Binary file not shown.
Binary file added dashboards/bq5.2.pdf
Binary file not shown.
Binary file added excel_mockups/BQ2.3.xlsx
Binary file not shown.
Binary file added excel_mockups/bq2.1.xlsx
Binary file not shown.
Binary file added excel_mockups/bq3.4.xlsx
Binary file not shown.
Binary file added excel_mockups/bq4.1.xlsx
Binary file not shown.
Binary file added excel_mockups/bq5.2.xlsx
Binary file not shown.
44 changes: 44 additions & 0 deletions generators/BQ2.3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import pandas as pd
import numpy as np
from faker import Faker
fake = Faker()

# Crear datos falsos para la tabla
np.random.seed(0) # Para reproducibilidad
num_rows = 100
user_ids = [fake.unique.uuid4() for _ in range(num_rows)]
session_ids = [fake.unique.uuid4() for _ in range(num_rows)]
dates = [fake.date_this_year() for _ in range(num_rows)]
start_times = [fake.time() for _ in range(num_rows)]
end_times = [fake.time() for _ in range(num_rows)]
sections = np.random.choice(['Calendario', 'Tareas', 'Notificaciones', 'Configuración'], num_rows)
events = np.random.choice(['add_friend', 'chat', 'share_schedule'], num_rows)


# Calcular duraciones de las sesiones y el tiempo en cada sección
durations = np.random.randint(5, 120, num_rows) # Duración de sesión entre 5 y 120 minutos
retention = np.random.randint(0, 100, num_rows) # retencion de usuario en porcentaje
durations = np.random.randint(5, 10, num_rows) # veces que entraron a la aplicacion
time_in_section = np.random.randint(1, durations) # Tiempo en la sección no puede ser mayor que la duración de la sesión
interactions = np.random.randint(1, 20, num_rows) # Número de interacciones por sesión

# Crear el DataFrame
df = pd.DataFrame({
'UserID': user_ids,
'SessionID': session_ids,
'Fecha': dates,
'HoraInicio': start_times,
'HoraFin': end_times,
'DuracionSesion (minutos)': durations,
'Seccion': sections,
'TiempoEnSeccion (minutos)': time_in_section,
'Interacciones': interactions,
'Evento': events,
'retencion':retention,

})

# Define the file path where you want to save the Excel file
file_path = 'BQ2.3.xlsx'

df.to_excel(file_path, index=False)
30 changes: 30 additions & 0 deletions generators/bq2.1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import pandas as pd
import numpy as np

np.random.seed(42)
num_rows = 100
user_ids = np.arange(1, num_rows + 1)
user_types = np.random.choice(['Nuevo', 'Frecuente', 'Ocasional'], size=num_rows)

booking_attempt_dates = pd.date_range(start="2024-01-01", end="2024-03-31", freq='8H')[:num_rows]


buildings = ['ML', 'SD', 'O', 'W', 'RGD']
rooms = ['102', '202', '301', '302']
space_ids = [np.random.choice(buildings) + np.random.choice(rooms) for _ in range(num_rows)]

ease_of_use_scores = np.random.randint(1, 6, size=num_rows)
availability_scores = np.random.randint(1, 6, size=num_rows)
overall_satisfaction_scores = np.random.randint(1, 6, size=num_rows)
feedback_comments = np.random.choice(['Todo bien', 'Necesita mejoras', 'Excelente', 'Frustrante', 'Confuso'], size=num_rows)

df_updated = pd.DataFrame({
'UserID': user_ids,
'UserType': user_types,
'BookingAttemptDate': booking_attempt_dates,
'SpaceID': space_ids,
'EaseOfUseScore': ease_of_use_scores,
'AvailabilityScore': availability_scores,
'OverallSatisfactionScore': overall_satisfaction_scores,
'FeedbackComments': feedback_comments
})
37 changes: 37 additions & 0 deletions generators/bq3.4.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import pandas as pd
import numpy as np
from faker import Faker
fake = Faker()

# Crear datos falsos para la tabla
np.random.seed(0) # Para reproducibilidad
num_rows = 100
user_ids = [fake.unique.uuid4() for _ in range(num_rows)]
session_ids = [fake.unique.uuid4() for _ in range(num_rows)]
dates = [fake.date_this_year() for _ in range(num_rows)]
start_times = [fake.time() for _ in range(num_rows)]
end_times = [fake.time() for _ in range(num_rows)]
sections = np.random.choice(['Calendario', 'Tareas', 'Notificaciones', 'Configuración'], num_rows)
events = np.random.choice(['Ver', 'Editar', 'Crear'], num_rows)
costumization = np.random.choice(['BackGround_image', 'ChangeColor_Box', 'user_Icon'], num_rows)

# Calcular duraciones de las sesiones y el tiempo en cada sección
durations = np.random.randint(5, 120, num_rows) # Duración de sesión entre 5 y 120 minutos
time_in_section = np.random.randint(1, durations) # Tiempo en la sección no puede ser mayor que la duración de la sesión
interactions = np.random.randint(1, 20, num_rows) # Número de interacciones por sesión

# Crear el DataFrame
df = pd.DataFrame({
'UserID': user_ids,
'SessionID': session_ids,
'Fecha': dates,
'HoraInicio': start_times,
'HoraFin': end_times,
'DuracionSesion (minutos)': durations,
'Seccion': sections,
'TiempoEnSeccion (minutos)': time_in_section,
'Interacciones': interactions,
'Evento': events
})

df.to_excel("bq3.4", index=False)
36 changes: 36 additions & 0 deletions generators/bq4.1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import pandas as pd
import numpy as np
from faker import Faker
fake = Faker()

# Crear datos falsos para la tabla
np.random.seed(0) # Para reproducibilidad
num_rows = 100
user_ids = [fake.unique.uuid4() for _ in range(num_rows)]
session_ids = [fake.unique.uuid4() for _ in range(num_rows)]
dates = [fake.date_this_year() for _ in range(num_rows)]
start_times = [fake.time() for _ in range(num_rows)]
end_times = [fake.time() for _ in range(num_rows)]
sections = np.random.choice(['Calendario', 'Tareas', 'Notificaciones', 'Configuración'], num_rows)
events = np.random.choice(['Ver', 'Editar', 'Crear'], num_rows)

# Calcular duraciones de las sesiones y el tiempo en cada sección
durations = np.random.randint(5, 120, num_rows) # Duración de sesión entre 5 y 120 minutos
time_in_section = np.random.randint(1, durations) # Tiempo en la sección no puede ser mayor que la duración de la sesión
interactions = np.random.randint(1, 20, num_rows) # Número de interacciones por sesión

# Crear el DataFrame
df = pd.DataFrame({
'UserID': user_ids,
'SessionID': session_ids,
'Fecha': dates,
'HoraInicio': start_times,
'HoraFin': end_times,
'DuracionSesion (minutos)': durations,
'Seccion': sections,
'TiempoEnSeccion (minutos)': time_in_section,
'Interacciones': interactions,
'Evento': events
})

df.to_excel(file_path, index=False)
70 changes: 70 additions & 0 deletions generators/bq5.2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import pandas as pd
import numpy as np
from faker import Faker
import random

np.random.seed(0)

fake = Faker()

# Generate random data for the dataset
num_rows = 1000

# Generate UserIDs
user_ids = [fake.unique.uuid4() for _ in range(num_rows)]

# Generate UserType: Nuevo, Ocasional, Frecuente
user_types = ['Nuevo', 'Ocasional', 'Frecuente']
user_type = [random.choice(user_types) for _ in range(num_rows)]

# Generate UserSemester: Values between 1 and 10
user_semester = [random.randint(1, 10) for _ in range(num_rows)]

# Generate UserCareer: ISIS, IIND, MATE, IBIO, IELE, IMEC, IQUI, ICYA, LITE, PSIC, MEDI
user_careers = ["ISIS", "IIND", "MATE", "IBIO", "IELE", "IMEC", "IQUI", "ICYA", "LITE", "PSIC", "MEDI"]
user_career = [random.choice(user_careers) for _ in range(num_rows)]

# Generate MeetingDate
meeting_dates = [fake.date_this_year() for _ in range(num_rows)]

# Generate DayOfWeek
days_of_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
day_of_week = [random.choice(days_of_week) for _ in range(num_rows)]

# Generate MeetingStartTime: Values between 8:00 and 17:00
meeting_start_times = [f"{random.randint(8, 16)}:{random.choice(['00', '30'])}" for _ in range(num_rows)]

# Generate MeetingDuration: Up to 4 hours
meeting_duration = [random.randint(15, 240) for _ in range(num_rows)]

# Generate MeetingBuilding: ML, SD, W, R, O, C, LL, B, RGD, AU
meeting_buildings = ["ML", "SD", "W", "R", "O", "C", "LL", "B", "RGD", "AU"]
meeting_building = [random.choice(meeting_buildings) for _ in range(num_rows)]

# Generate MeetingPurpose: Class, Leisure, Group Project, Other
meeting_purposes = ["Class", "Leisure", "Group Project", "Other"]
meeting_purpose = [random.choice(meeting_purposes) for _ in range(num_rows)]

# Generate OverallSatisfactionScore: Values between 1 and 5
overall_satisfaction_score = [random.randint(1, 5) for _ in range(num_rows)]

# Create the DataFrame
df = pd.DataFrame({
'UserID': user_ids,
'UserType': user_type,
'UserSemester': user_semester,
'UserCareer': user_career,
'MeetingDate': meeting_dates,
'DayOfWeek': day_of_week,
'MeetingStartTime': meeting_start_times,
'MeetingDuration': meeting_duration,
'MeetingBuilding': meeting_building,
'MeetingPurpose': meeting_purpose,
'OverallSatisfactionScore': overall_satisfaction_score
})

# Display the DataFrame
print(df.head())

file_path = "./meeting_data.xlsx"
df.to_excel(file_path, index=False)

0 comments on commit 8d32f40

Please sign in to comment.