Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to run a query of about 50,000 records using ActiveRecord's find_each method, but it seems to be ignoring my other parameters like so:

Thing.active.order("created_at DESC").limit(50000).find_each {|t| puts t.id }

Instead of stopping at 50,000 I'd like and sorting by created_at, here's the resulting query that gets executed over the entire dataset:

Thing Load (198.8ms)  SELECT "things".* FROM "things" WHERE "things"."active" = 't' AND ("things"."id" > 373343) ORDER BY "things"."id" ASC LIMIT 1000

Is there a way to get similar behavior to find_each but with a total max limit and respecting my sort criteria?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
330 views
Welcome To Ask or Share your Answers For Others

1 Answer

The documentation says that find_each and find_in_batches don't retain sort order and limit because:

  • Sorting ASC on the PK is used to make the batch ordering work.
  • Limit is used to control the batch sizes.

You could write your own version of this function like @rorra did. But you can get into trouble when mutating the objects. If for example you sort by created_at and save the object it might come up again in one of the next batches. Similarly you might skip objects because the order of results has changed when executing the query to get the next batch. Only use that solution with read only objects.

Now my primary concern was that I didn't want to load 30000+ objects into memory at once. My concern was not the execution time of the query itself. Therefore I used a solution that executes the original query but only caches the ID's. It then divides the array of ID's into chunks and queries/creates the objects per chunk. This way you can safely mutate the objects because the sort order is kept in memory.

Here is a minimal example similar to what I did:

batch_size = 512
ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scope
ids.each_slice(batch_size) do |chunk|
    Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|
      # Do things with thing
    end
end

The trade-offs to this solution are:

  • The complete query is executed to get the ID's
  • An array of all the ID's is kept in memory
  • Uses the MySQL specific FIELD() function

Hope this helps!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...