Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to copy large set of files from one S3 to another S3, using asynchronous method. To achieve the same, the large set of files is broken into batches and each batch is handed over to a list of async method. The issue is, each async method is not processing more than 1 file in the batch, whereas each batch contains more than 1k files, not sure why async doesn't go back to process the remaining files.

Here is the code:

public void CreateAndExecuteSpawn(string srcBucket, List<List<string>> pdfFileList, IAmazonS3 s3client)
{
    int i = 0;
    List<Action> actions = new List<Action>();
    LambdaLogger.Log("PDF Set count: " + pdfFileList.Count.ToString());
    foreach (var list in pdfFileList)
        actions.Add(() => RenameFilesAsync(srcBucket, list, s3client));

    foreach (var method in actions)
    {
        method.Invoke();
        LambdaLogger.Log("Mehtod invoked: "+ i++.ToString());
    }
}
            
public async void RenameFilesAsync(string srcBucket, List<string> pdfFiles, IAmazonS3 s3client)
{
    LambdaLogger.Log("In RenameFileAsync method");
    CopyObjectRequest copyRequest = new CopyObjectRequest
    {
        SourceBucket = srcBucket,
        DestinationBucket = srcBucket
    };
    try
    {
        foreach (var file in pdfFiles)
        {
            if (!file.Contains("index.xml"))
            {
                string[] newFilename = file.Split('{');
                string[] destKey = file.Split('/');
                copyRequest.SourceKey = file;
                copyRequest.DestinationKey = destKey[0] + "/" + destKey[1] + "/Renamed/" + newFilename[1];
                LambdaLogger.Log("About to rename File: " + file);
                //Here after copying one file, function doesn't return to foreach loop
                CopyObjectResponse response = await s3client.CopyObjectAsync(copyRequest);
                //await s3client.CopyObjectAsync(copyRequest);
                LambdaLogger.Log("Rename done: ");
            }
        }
    }
    catch(Exception ex)
    {
        LambdaLogger.Log(ex.Message);
        LambdaLogger.Log(copyRequest.DestinationKey);
    }
}
        
public void FunctionHandler(S3Event evnt, ILambdaContext context)
{
    //Some code here
    CreateAndExecuteSpawn(bucket, pdfFileSet, s3client);
}


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
199 views
Welcome To Ask or Share your Answers For Others

1 Answer

First you need to fix the batch so that it will process the batches one at a time. Avoid async void; use async Task instead:

public async Task CreateAndExecuteSpawnAsync(string srcBucket, List<List<string>> pdfFileList, IAmazonS3 s3client)
{
    int i = 0;
    List<Func<Task>> actions = new();
    LambdaLogger.Log("PDF Set count: " + pdfFileList.Count.ToString());
    foreach (var list in pdfFileList)
        actions.Add(() => RenameFilesAsync(srcBucket, list, s3client));

    foreach (var method in actions)
    {
        await method();
        LambdaLogger.Log("Mehtod invoked: "+ i++.ToString());
    }
}
            
public async Task RenameFilesAsync(string srcBucket, List<string> pdfFiles, IAmazonS3 s3client)

Then you can add asynchronous concurrency within each batch. The current code is just a foreach loop, so of course it only processes one at a time. You can change this to be asynchronously concurrent by Selecting the tasks to run and then doing a Task.WhenAll at the end:

LambdaLogger.Log("In RenameFileAsync method");
CopyObjectRequest copyRequest = new CopyObjectRequest
{
    SourceBucket = srcBucket,
    DestinationBucket = srcBucket
};
try
{
  var tasks = pdfFiles
      .Where(file => !file.Contains("index.xml"))
      .Select(async file =>
      {
         string[] newFilename = file.Split('{');
         string[] destKey = file.Split('/');
         copyRequest.SourceKey = file;
         copyRequest.DestinationKey = destKey[0] + "/" + destKey[1] + "/Renamed/" + newFilename[1];
         LambdaLogger.Log("About to rename File: " + file);
         CopyObjectResponse response = await s3client.CopyObjectAsync(copyRequest);
         LambdaLogger.Log("Rename done: ");
      })
      .ToList();
  await Task.WhenAll(tasks);
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...