select top 1 * from c returns count size as num of partitions * actual query count in cosmosdb spark

Stack Overflow Asked by You Hock Tan on January 1, 2022

I am using azure-cosmosdb-spark library for scala and I was trying to query the following

select top 1 * from c

but I gotten the final dataframe count to be 8 instead of 1. I suspect that the CosmosDBRDDIterator splits it into multiple partitions (in this case 8) and processed the query.

The result count will always be 8x the count of the actual results regardless the select query executed.

Is there any way I can avoid this and get my actual count as 1?

One Answer

Tried limit instead of top?

select * from c limit 1

Answered by Adrian Bona on January 1, 2022

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP