c# - Search over Azure Blob storage content (large blobs, over 256MB)

Question

Welcome To Ask or Share your Answers For Others

c# - Search over Azure Blob storage content (large blobs, over 256MB)

asked Jan 29, 2021 in Technique[技术] by 深蓝 (71.8m points)

On our application, we need to search inside the blobs' content. I have already looked at Azure Cognitive Search but the maximum size of a blob is 256MB and we have blobs larger than that. I searched for other alternatives that support indexing & searching on huge blobs, but couldn't find any. Is there something we can use? Thanks

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

441 views

1 Answer

深蓝 · Answer 1 · 2021-01-29T04:33:02+0000

Typically in cases where you have blobs so large I think it is best to pre-process them. This also has the advantage of having it staged in case you ever need to geo-replication or quickly restore from backup. For example in Azure Functions there are Blob triggers that can be fired to execute some code. In this, you could leverage Apache Tika to extract the text from the files and store them back to a separate blob container. Then have Cognitive Search pick up the extracted text from there. Please note, extracting this much text from files this large can be quite compute and memory intensive, so it is possible that your pre-processing might actually need some higher compute / memory.

The code is a little older now, but hopefully this example of using TikaDotNet in an Azure Function might also help: https://github.com/liamca/AzureSearch-AzureFunctions-CognitiveServices/blob/master/ApacheTika/run.csx

Please note, I have never tried this code on a file so large though.

Categories

c# - Search over Azure Blob storage content (large blobs, over 256MB)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags