I am exploring options to do parallel read using singlestore spark connector, when I use the forced option, the spark job frequently fails as it starts before the min. number of executors to do parallel read are met. Hence, switched to automatic and using the parallel read feature: readFromAggregatorsMaterialized
.
Does it store the output of the query (which is 30 million records for now and might increase exponentially in future) in-memory?
Need your help to understand the impact using this feature will have on the db… I am using a shared cluster and I don’t want to impact other jobs.
FYI: The query is to fetch all the records in the table and doesn’t include nested queries.
TIA