Optimize SCALAR_VECTOR_MAX UDF to Smooth Precipitation Data Vectors

I have vectors of precipitation data. One year of data per hour in mm of rainfall during the hour stored as vectors of doubles. In other words, vector length = 365 * 24.

For various faults in the data collection process, I need to smooth the data so that each element of the vector does not exceed a max value. Effectively, apply the MAX(vector[i], scalar) to each element of the vector. It needs to be implemented as a UDF. Brute force of JSON UNPACKING the data and iterating over the vector in UDF works well when the number of rows is small, but does not scale well when the number of rows (geographic location where the measurement was taken) becomes very large.

Any pointers on how to do this more efficiently?

Hi @michael.arthur,

The best way to do this efficiently is to build a Wasm-based UDF. We recently created this sample that raises every element of a vector to a specified power:

You can take this source code, adapt it to do the MAX, and build a wasm function and install it. If you do, if you could publish it to github with an Apache 2.0 license and share it here, that would be awesome.

Is building such a UDF something you feel you can or will do?

1 Like

Any suggestions on generating the WIT bindings if Rust (and installing wit-bindgen) is not an option other than crafting them by hand?

I’m not sure I understand your question. The source code is in C/C++. Why can’t you use wit-bindgen?

wit-bindgen is used to create extension.c and extension.h, but Rust is required to build wit-bindgen. extension.h and extension.c could be modified by hand if need be when Rust is not an option, but the question was if there was maybe a third option that Googling didn’t turn up.

Why can’t you use Rust?