Rules to keep in mind when building Fused UDFs

Geospatial UDFs

Always use .estimate_utm_crs() when doing distance or area operations (buffering, getting length, calculating perimeter, etc.)
If output is missing and is vector, think about data validation: Check for invalid geometries: gdf.geometry.is_valid. Fix with gdf.geometry.buffer(0) or shapely.make_valid()
It usually works best with geopandas to use HTTPs URL format over S3, so use that even whenever I ask you to load a vector file from an S3:// (or similar cloud server) with geopandas

Returning data back to Workbench

Re-use existing code first

First always check public/common/utils.py to see if there are functions you could reuse
Do change files in common only use them.
Use: common = fused.load('https://github.com/fusedio/udfs/tree/7918aff/public/common/').utils first
If you're loading any other public UDF or utils.py function from any other UDF in the udfs/public/ repo always do so with this commit hash: 7918aff, even if the code you see uses another commit hash

Fused specific

Writing UDFs - interacting with Workbench

Everytime you write a UDF from the Agent you have to save it to directory to be able to re-use it: udf.to_directory(). Otherwise you won't be able to run it again later To save it to workbench, the web-based IDE of Fused you need to do udf.to_fused(). You can pass it a name if you want, if name is udf_name, give it udf.to_fused(udf_name).

Don't try to make code inline (do not do `python -c "import fused ...[rest of udf code]" because this is not reusable), write the proper script as a file every time first, then load it, save it to directory and then save it to workbench.
- This also means you'll be able to save the file and retrieve later while doing everything in the terminal in 1 go isn't reusable.
- So always make a proper file like my_udf.py with the code you are writing.
Once I give you a query you should do the following, in this order:
1. Write the code you want to execute to a file, like my_udf.py
2. Test this code locally with a small area (if it's a UDF that takes bounds) or good defaults. Use fused.run(my_udf, engine='remote') so you're testing the file as close to what it would look like against Workbench. No interest in running this locally.
3. Once you have some code that works well, save the udf to directory first locally (and use the suffix _local to the end of the name so that the local directory name doesn't conflict with the fused server name):
```
my_udf = fused.load('my_udf.py') # Load code you wrote first
my_udf.to_directory("my_udf_local") # Save it as directory so it has all the proper files. Don't try to write this yourself. Use existing functions
my_udf.to_fused("my_udf") # Once you're happy, save the UDF to fused without the _local suffix
```
If you want to update a UDF you already wrote in Workbench you can do so with udf.to_fused(overwrite=True).
If you get an error that says the UDF already exists, you need to load it first, change it locally then overwrite it.
After you wrote a UDF to Workbench, always share the link with me so I can go try it out in the product directly. You can do so with common.get_catalog_url(udf_python_object)x§.
To ensure UDFs have proper metadata with fused:id for catalog URLs, the workflow should be: Create and save UDF to directory, use the suffix _local: udf.to_directory("UDF_Name_local") Load the directory saved version: local_udf = fused.load("UDF_Name_local") Use this locally loaded (now that it has the proper metadata attached) and Save to Workbench: local_udf.to_fused('UDF_Name') Load back from server: server_udf = fused.load('UDF_Name') Try to Get catalog URL: try: common.get_catalog_url(server_udf) The server-loaded version will have the complete metadata including the fused:id needed for the catalog URL generation.
If you're working with a file already on Workbench, don't load it from directory. Always load it from fused server first with udf = fused.load("UDF_NAME") otherwise you'll have conflicts and find out that UDF already exists in the backend and can't edit it

Don't print the result of saving the udf to fused server. You've done this in the past before: result = udf.to_fused('udf_name') but it's very verbose. Just do udf.to_fused('udf_name'). Keep the output & prints minimum and clean do I can more easily read what you're doing. If you print too much stuff, I won't be able to easily keep track. No need to print the code you just wrote again

Make your best guess as to what type of UDF you are creating each time. Before saving udf you need to set some metadata:

For vectors (i.e. lines, points, polygons, anything with geometries. Like dataframe objects):

For UDFs where you want to be able to pan around the map in Workbench as you go you should use: udf.metadata={'fused:udfType': 'vector_tile'}
For UDFs you want to run just 1 time, i.e. loading a static file in 1 place you can set it to: udf.metadata={'fused:udfType': 'vector_single'}

For rasters (i.e. satellite images, elevation models, anything with images. Usually numpy array like objects):

For UDFs where you want to be able to pan around the map in Workbench as you go you should use: udf.metadata={'fused:udfType': 'raster_tile'}
For UDFs you want to run just 1 time, i.e. loading a static file in 1 place you can set it to: udf.metadata={'fused:udfType': 'raster_single'}
You can also edit the style of the layers & UDFs you make by taking inspiration from the fused docs. Update the metadata to have nice visualization by updating fused:vizConfig' before saving the UDF to fused server
Don't use present property in the visualization config, it doesn't seem to work too well.

Working with bounds / tiles

Prefer working with bounds: fused.types.Bounds as inputs for UDFs most of the times. There might be exceptions but you'd want to make sure to have a good reason.
The prefered way to move from bounds to a gdf (that you can plot properly) would be to use: common.get_tiles(bounds). Check the common code to see exactly what the params are. Specifically target_num_tiles is helpful when wanting to make sub tiles. And clip=True allows you to clip the tiles to exactly the bounds extent. Otherwise it uses just mercator tiles.

Profiling info

Sometimes I will send you back profiler info back from Cursor. If so, the values are in nanoseconds. Do the math to convert in seconds
You might not have the correct line index, especially if I give you this back from a UDF you wrote locally then saved to Workbench, one line might be different. Use your best judgement to figure out what might be the slowest part of the code and align it up or down 1-3 lines with the profiler values. If it's not obvious ask me

Running fused UDFs

Calling multiple UDFs

UDF inputs always need to be typed & with defaults.
If you need to pass a bounding box, pass bounds: fused.types.Bounds. If no default always give None
Some UDFs will call other UDFs with fused.run('fsh_****') (using shared token) or fused.run('my_udf_name') using directly the UDF name.
- When not sure what a UDF called like this does, you can load it's code with: udf = fused.load('fsh_***') and udf.code will show what that UDF does.

common pitfalls

common.read_tiff and common.mosaic_tiff both take bounds as a names argument but actually it doesn't ake a list of 4 but rather a tile gdf. For both you should first do something like: tile = common.get_tiles(bounds, target_num_tiles=4, clip=True)

Performance improvements

Queries on slow file formats like .csv, .geojson, etc. can be wrapped in a @fused.cache function to make them faster. This allows to make fetching the data faster as it uses Fused cache

@fused.cache
def get_data(path):
    return gpd.read_file(path)

file = get_data(path)

You might also want to use caching when calling files on buckets or servers as to prevent from being rate limited

UI working with me

You need to first rename the local directory of a UDF to be able to get the catalog URL. This is because if you have made a UDF called my_udf and saved it locally before sending to fused you'll have my_udf/ locally and a UDF on Fused server called my_udf. But the fused.load(my_udf) logic calls the local file before the fused server one. So rename the directory locally to my_udf_local/ before doing fused.load() when giving the Catalog URL of the UDF.
Whenever you give me links, make sure to write then as https://url/to/page so I can click on it when you show it to me
Give me a highlight og the UDF you built as with broad categories:
- 🔧 Key Features:
- 📋 UDF Details:
- 🔗 Catalog URL:
- 🎯 How It Works:

MaxLenormand/Fused_Cursor_Rules.md

Select an option

No results found