Last active
November 6, 2025 00:57
-
-
Save lmmx/dc8f01157c97ff8bf6ef1f7ecc5d995f to your computer and use it in GitHub Desktop.
Loaded node types into parquet via polars-genson via https://gist.github.com/lmmx/ed3dd70ea7997f27efa1ff31b625c0b1
Author
Author
Looking more closely at that function_item
funcs_types = funcs.explode("types").rename({"type":"symbol"}).with_columns(pl.col("types").struct.rename_fields(["field_type","named_field"])).unnest("types")
funcs_types.rename({"multiple":"_multiple","required":"_required"}).filter(pl.col("symbol") == "function_item").unnest("children")
funcs_types.rename({"multiple":"_multiple","required":"_required"}).filter(pl.col("symbol") == "function_item").unnest("children").explode("types")
funcs_types.rename({"multiple":"_multiple","required":"_required"}).filter(pl.col("symbol") == "function_item").unnest("children").explode("types").unnest("types")
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We can look into that a bit more easily if we rename the struct fields (so the field type doesn't clash with the symbol type)
If we look at the values, they're basically one-to-one between the unpacked field key-value pair's key name (i.e. the field name of the symbol), with some exceptions, e.g.:
There is obviously more here.
In terms of reliable targets I would expect that fields with required: true would be useful because we could always find them if we are looking at some semantic object (e.g. here the
blockfield in the body keyWe would then be able to write a program to conditionally extend our match range to the other, optional, parts of the AST based on a check for them in the surrounding nodes