Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am working on a project building the ETL process and dashboard to control some KPI metrics. I have created a table in BigQuery where, once a month, I save some key values calculated by aggregating data extracted from other table. I am measuring emails sent by employees, so in order to calculate one of those key values I need to read from two different tables and perform a left join to match, from each of the company's working areas existing in the aggregation (left table), how many employees that area has (right join).

This is a simplification of my tables:

Sent emails, grouped by area

|  Area Id  |  Service  |  Bad employees  | ...
|     1     |   Gmail   |      3416       | ...
|     2     |   Gmail   |     10782       | ...
|     2     |   Groups  |      9267       | ...

Total number of employees, grouped by area

|  Area Id  |  Total employees  | ...
|     1     |       34124       | ...
|     2     |       82561       | ...
|     3     |       49472       | ...

The problem comes here: as you can see, the first table (sent emails) has a field which does not appear on the second one; I am talking about Service. For this reason, when I join both tables I will get duplicated values for the Total employees field:

Joined table

|  Area Id  |  Service  |  Bad employees  |  Total employees  |
|     1     |   Gmail   |      3416       |       34124       |
|     2     |   Gmail   |     10782       |       82561       |
|     2     |   Groups  |      9267       |       82561       |

This final table will be used to create a report in Data Studio. I want to keep the Service field in my final table as I want to give the users the option of filtering by it. I can not edit the employees table schema and add a Service field to its entries because that information is unique from the emails table, it represents the service from which the email was sent and has nothing to do with the employees table.

I am struggling to get a valid data modeling option for this problem; if go with this solution and I want to represent on Data Studio, let's say, the Total number of employees per selected areas, I will get the wrong value for those areas containing multiple services:

  • Total employees area 1: 34.124
  • Total employees area 2: 82.561 + 82.561 = 165.122
  • Total employees: 34.124 + 165.122 = 199.246
  • Expected value: 34.124 + 82.561 = 116.685

This will affect any metric using the total employees value.

How can I keep the Service field of my joined table and still represent on Data Studio the correct value for Total employees?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
159 views
Welcome To Ask or Share your Answers For Others

1 Answer

Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...