Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

tl;dr: I want to generate a dates table in Redshift in order to make a report easier to generate. Preferable without needing large tables already in redshift, needing to upload a csv file.

long version: I am working on a report where I have to average new items created per day of the week. The date range could span months or more, so there could be, say, 5 Mondays but only 4 Sundays, which can make the math a little tricky. Also, I am not guaranteed an instance of a single item per day, especially once a user starts slicing the data. Which, this is tripping up the BI tool.

The best way to tackle this problem is most likely a dates table. However, most of the tutorials for dates tables use SQL commands that are not available or not fully supported by Redshift (I'm looking at you, generate_series).

Is there an easy way to generate a dates table in Redshift?

The code I was attempting to use: (based on this also-not-working recommendation: http://elliot.land/post/building-a-date-dimension-table-in-redshift )

CREATE TABLE facts.dates (
  "date_id"              INTEGER                     NOT NULL PRIMARY KEY,

  -- DATE
  "full_date"            DATE                        NOT NULL,

  -- YEAR
  "year_number"          SMALLINT                    NOT NULL,
  "year_week_number"     SMALLINT                    NOT NULL,
  "year_day_number"      SMALLINT                    NOT NULL,

  -- QUARTER
  "qtr_number"           SMALLINT                    NOT NULL,

  -- MONTH
  "month_number"         SMALLINT                    NOT NULL,
  "month_name"           CHAR(9)                     NOT NULL,
  "month_day_number"     SMALLINT                    NOT NULL,

  -- WEEK
  "week_day_number"      SMALLINT                    NOT NULL,

  -- DAY
  "day_name"             CHAR(9)                     NOT NULL,
  "day_is_weekday"       SMALLINT                    NOT NULL,
  "day_is_last_of_month" SMALLINT                    NOT NULL
) DISTSTYLE ALL SORTKEY (date_id)
;


INSERT INTO facts.dates
(
   "date_id"
  ,"full_date"
  ,"year_number"
  ,"year_week_number"
  ,"year_day_number"

  -- QUARTER
  ,"qtr_number"

  -- MONTH
  ,"month_number"
  ,"month_name"
  ,"month_day_number"

  -- WEEK
  ,"week_day_number"

  -- DAY
  ,"day_name"
  ,"day_is_weekday"
  ,"day_is_last_of_month"
)
  SELECT
    cast(seq + 1 AS INTEGER)                                      AS date_id,

    -- DATE
    datum                                                         AS full_date,

    -- YEAR
    cast(extract(YEAR FROM datum) AS SMALLINT)                    AS year_number,
    cast(extract(WEEK FROM datum) AS SMALLINT)                    AS year_week_number,
    cast(extract(DOY FROM datum) AS SMALLINT)                     AS year_day_number,

    -- QUARTER
    cast(to_char(datum, 'Q') AS SMALLINT)                         AS qtr_number,

    -- MONTH
    cast(extract(MONTH FROM datum) AS SMALLINT)                   AS month_number,
    to_char(datum, 'Month')                                       AS month_name,
    cast(extract(DAY FROM datum) AS SMALLINT)                     AS month_day_number,

    -- WEEK
    cast(to_char(datum, 'D') AS SMALLINT)                         AS week_day_number,

    -- DAY
    to_char(datum, 'Day')                                         AS day_name,
    CASE WHEN to_char(datum, 'D') IN ('1', '7')
      THEN 0
    ELSE 1 END                                                    AS day_is_weekday,
    CASE WHEN
      extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
                        INTERVAL '1' MONTH) :: DATE -
                       INTERVAL '1' DAY) = extract(DAY FROM datum)
      THEN 1
    ELSE 0 END                                                    AS day_is_last_of_month
  FROM
    -- Generate days for 81 years starting from 2000.
    (
      SELECT
        '2000-01-01' :: DATE + generate_series AS datum,
        generate_series                        AS seq
      FROM generate_series(0,81 * 365 + 20,1)
    ) DQ
  ORDER BY 1;

Which throws this error

[Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
1 statement failed.

... because, I assume, INSERT and generate_series are not allowed in the same command in Redshift

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
494 views
Welcome To Ask or Share your Answers For Others

1 Answer

As a workaround, you can spin Postgres instance on your local machine, run the code there, export to CSV, then run CREATE TABLE portion only in Redshift and load data from CSV. Since this is a one-time operation it's ok to do, this is what I'm actually doing for new Redshift deployments.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...