Rules to keep in mind when building Fused UDFs
-
Always use
.estimate_utm_crs()when doing distance or area operations (buffering, getting length, calculating perimeter, etc.) -
If output is missing and is vector, think about data validation: Check for invalid geometries:
gdf.geometry.is_valid. Fix withgdf.geometry.buffer(0)orshapely.make_valid() -
It usually works best with
geopandasto use HTTPs URL format over S3, so use that even whenever I ask you to load a vector file from anS3://(or similar cloud server) withgeopandas
- First always check
public/common/utils.pyto see if there are functions you could reuse - Do change files in
commononly use them. - Use:
common = fused.load('https://github.com/fusedio/udfs/tree/7918aff/public/common/').utilsfirst - If you're loading any other public UDF or
utils.pyfunction from any other UDF in the udfs/public/ repo always do so with this commit hash:7918aff, even if the code you see uses another commit hash
Everytime you write a UDF from the Agent you have to save it to directory to be able to re-use it: udf.to_directory(). Otherwise you won't be able to run it again later
To save it to workbench, the web-based IDE of Fused you need to do udf.to_fused(). You can pass it a name if you want, if name is udf_name, give it udf.to_fused(udf_name).
-
Don't try to make code inline (do not do `python -c "import fused ...[rest of udf code]" because this is not reusable), write the proper script as a file every time first, then load it, save it to directory and then save it to workbench.
- This also means you'll be able to save the file and retrieve later while doing everything in the terminal in 1 go isn't reusable.
- So always make a proper file like
my_udf.pywith the code you are writing.
-
Once I give you a query you should do the following, in this order:
- Write the code you want to execute to a file, like
my_udf.py - Test this code locally with a small area (if it's a UDF that takes
bounds) or good defaults. Usefused.run(my_udf, engine='remote')so you're testing the file as close to what it would look like against Workbench. No interest in running this locally. - Once you have some code that works well, save the udf to directory first locally (and use the suffix
_localto the end of the name so that the local directory name doesn't conflict with the fused server name):
my_udf = fused.load('my_udf.py') # Load code you wrote first my_udf.to_directory("my_udf_local") # Save it as directory so it has all the proper files. Don't try to write this yourself. Use existing functions my_udf.to_fused("my_udf") # Once you're happy, save the UDF to fused without the _local suffix
- Write the code you want to execute to a file, like
-
If you want to update a UDF you already wrote in Workbench you can do so with
udf.to_fused(overwrite=True). -
If you get an error that says the UDF already exists, you need to load it first, change it locally then overwrite it.
-
After you wrote a UDF to Workbench, always share the link with me so I can go try it out in the product directly. You can do so with
common.get_catalog_url(udf_python_object)x§. -
To ensure UDFs have proper metadata with fused:id for catalog URLs, the workflow should be: Create and save UDF to directory, use the suffix
_local:udf.to_directory("UDF_Name_local")Load the directory saved version:local_udf = fused.load("UDF_Name_local")Use this locally loaded (now that it has the proper metadata attached) and Save to Workbench:local_udf.to_fused('UDF_Name')Load back from server:server_udf = fused.load('UDF_Name')Try to Get catalog URL: try:common.get_catalog_url(server_udf)The server-loaded version will have the complete metadata including the fused:id needed for the catalog URL generation. -
If you're working with a file already on Workbench, don't load it from directory. Always load it from fused server first with
udf = fused.load("UDF_NAME")otherwise you'll have conflicts and find out that UDF already exists in the backend and can't edit it
Don't print the result of saving the udf to fused server. You've done this in the past before: result = udf.to_fused('udf_name') but it's very verbose. Just do udf.to_fused('udf_name'). Keep the output & prints minimum and clean do I can more easily read what you're doing. If you print too much stuff, I won't be able to easily keep track. No need to print the code you just wrote again
Make your best guess as to what type of UDF you are creating each time. Before saving udf you need to set some metadata:
For vectors (i.e. lines, points, polygons, anything with geometries. Like dataframe objects):
- For UDFs where you want to be able to pan around the map in Workbench as you go you should use:
udf.metadata={'fused:udfType': 'vector_tile'} - For UDFs you want to run just 1 time, i.e. loading a static file in 1 place you can set it to:
udf.metadata={'fused:udfType': 'vector_single'}
For rasters (i.e. satellite images, elevation models, anything with images. Usually numpy array like objects):
-
For UDFs where you want to be able to pan around the map in Workbench as you go you should use:
udf.metadata={'fused:udfType': 'raster_tile'} -
For UDFs you want to run just 1 time, i.e. loading a static file in 1 place you can set it to:
udf.metadata={'fused:udfType': 'raster_single'} -
You can also edit the style of the layers & UDFs you make by taking inspiration from the fused docs. Update the metadata to have nice visualization by updating
fused:vizConfig'before saving the UDF to fused server -
Don't use
presentproperty in the visualization config, it doesn't seem to work too well.
- Prefer working with
bounds: fused.types.Boundsas inputs for UDFs most of the times. There might be exceptions but you'd want to make sure to have a good reason. - The prefered way to move from
boundsto agdf(that you can plot properly) would be to use:common.get_tiles(bounds). Check thecommoncode to see exactly what the params are. Specificallytarget_num_tilesis helpful when wanting to make sub tiles. Andclip=Trueallows you to clip the tiles to exactly theboundsextent. Otherwise it uses just mercator tiles.
- Sometimes I will send you back profiler info back from Cursor. If so, the values are in nanoseconds. Do the math to convert in seconds
- You might not have the correct line index, especially if I give you this back from a UDF you wrote locally then saved to Workbench, one line might be different. Use your best judgement to figure out what might be the slowest part of the code and align it up or down 1-3 lines with the profiler values. If it's not obvious ask me
-
UDF inputs always need to be typed & with defaults.
-
If you need to pass a bounding box, pass
bounds: fused.types.Bounds. If no default always giveNone -
Some UDFs will call other UDFs with
fused.run('fsh_****')(using shared token) orfused.run('my_udf_name')using directly the UDF name.- When not sure what a UDF called like this does, you can load it's code with:
udf = fused.load('fsh_***')andudf.codewill show what that UDF does.
- When not sure what a UDF called like this does, you can load it's code with:
- common.read_tiff and common.mosaic_tiff both take
boundsas a names argument but actually it doesn't ake a list of 4 but rather atilegdf. For both you should first do something like:tile = common.get_tiles(bounds, target_num_tiles=4, clip=True)
- Queries on slow file formats like .csv, .geojson, etc. can be wrapped in a @fused.cache function to make them faster. This allows to make fetching the data faster as it uses Fused cache
@fused.cache
def get_data(path):
return gpd.read_file(path)
file = get_data(path)- You might also want to use caching when calling files on buckets or servers as to prevent from being rate limited
-
You need to first rename the local directory of a UDF to be able to get the catalog URL. This is because if you have made a UDF called
my_udfand saved it locally before sending to fused you'll havemy_udf/locally and a UDF on Fused server calledmy_udf. But thefused.load(my_udf)logic calls the local file before the fused server one. So rename the directory locally tomy_udf_local/before doingfused.load()when giving the Catalog URL of the UDF. -
Whenever you give me links, make sure to write then as https://url/to/page so I can click on it when you show it to me
-
Give me a highlight og the UDF you built as with broad categories:
🔧 Key Features:📋 UDF Details:🔗 Catalog URL:🎯 How It Works: