Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Problem

How to read a dataframe in polars from mysql. Docs are silent on the issue. Currently probably there is only support for parquet, json, ipc, etc, and no direct support for sql as mentioned here.

Regardless what would be an appropriate method to read in data using libraries like: sqlx or mysql

Current Approach

Currently I am following this approach as provided in this answer:

  1. Read in a Vec<Struct> using sqlx
  2. Convert it into a tuple of vecs (Vec<T>, Vec<T>) using the code below
  3. Convert (Vec<T>, Vec<T>) into (Series, Series)
  4. Create a dataframe using: DataFrame::new(vec![s0, s1]); where s0 and s1 are Series
struct A(u8, i8);

fn main() {
    let v = vec![A(1, 4), A(2, 6), A(3, 5)];

    let result = v.into_iter()
        .fold((vec![], vec![]), |(mut u, mut i), item| {
            u.push(item.0);
            i.push(item.1);
            (u, i)
        });
    
    dbg!(result);

    // `result` is just a tuple of vectors
    // let (unsigneds, signeds): (Vec<u8>, Vec<i8>) = result;
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
192 views
Welcome To Ask or Share your Answers For Others

1 Answer

Same answer as in this question, seems quite duplicate IMO.

You could use the builders for that or collect from iterators. Collecting from iterators is often fast, but in this case it requires you to loop the Vec<Country> twice, so you should benchmark.

Below is an example function for both the solutions shown.

use polars::prelude::*;

struct Country {
    country: String,
    count: i64,
}

fn example_1(values: &[Country]) -> (Series, Series) {
    let ca_country: Utf8Chunked = values.iter().map(|v| &*v.country).collect();
    let ca_count: NoNull<Int64Chunked> = values.iter().map(|v| v.count).collect();
    let mut s_country: Series = ca_country.into();
    let mut s_count: Series = ca_count.into_inner().into();
    s_country.rename("country");
    s_count.rename("country");
    (s_count, s_country)
}

fn example_2(values: &[Country]) -> (Series, Series) {
    let mut country_builder = Utf8ChunkedBuilder::new("country", values.len(), values.len() * 5);
    let mut count_builder = PrimitiveChunkedBuilder::<Int64Type>::new("count", values.len());

    values.iter().for_each(|v| {
        country_builder.append_value(&v.country);
        count_builder.append_value(v.count)
    });

    (
        count_builder.finish().into(),
        country_builder.finish().into(),
    )
}

Once you've got the Series, you can use DataFrame::new(columns) where columns: Vec<Series> to create a DataFrame.

Btw, if you want maximum performance, I really recommend connector-x. It has got polars and arrow integration and has got insane performance.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...