I have data such as the following:
start_date | end_date | project_sales | project_category | project_code |
---|---|---|---|---|
2015-08-03 | 2015-08-06 | 1683 | CatA | 1 |
2015-08-02 | 2015-08-04 | 6500 | CatB | 2 |
I would suggest you check out this Stackoverflow question. I think that you will get the desirable answer there. Python Pandas counting and summing specific conditions
As for the high volume of data, I may recommend to look at the parallel data processing frameworks such as Dask, Modin,or Vaex. Check the differences between frameworks here in this blog.
Disclaimer: I don't own all above website. I think that those resources can answer the question better than I do.