Accurate electric power load forecasting is critical for power utility companies as it increases control over the relevant infrastructure, resulting in significant improvements in energy management and scheduling. However, point forecasting appears to fall short of providing these businesses with enough information to prepare for the worst. This paper proposes an encoder-decoder model that takes advantage of the expressiveness of transformer-based encoders to produce probabilistic forecasts, i.e, a distribution over the future predictions. However, point forecasting does not provide these companies with the information they require to be completely prepared for the worst. Two real-world datasets are utilized to incorporate the performance of the proposed framework on two different types of data: hourly load data from the power supply company of the city of Johor in Malaysia and hourly load consumption data from one of Grenoble Institute of Technology's buildings. The former represents aggregated data, which makes identifying patterns and trends easier, but the latter was taken from a single building (non-aggregated), which increases the difficulty of forecasts. The model's performance is discussed across multiple time horizons, including 24-hour, 1-week, and 1-month predictions. The framework achieved notable improvements compared to the used baseline, Amazon DeepAr, where accuracy was improved from 87.2 percent to 96.2 percent for Malaysian data and from 52.3 percent to 68.2 percent for Grenoble data for 24 hours ahead forecasting, from 84.7 percent to 89.7 percent for Malaysian data, and from 45.5 percent to 57.2 percent for Grenoble data for 1 month ahead forecasting.