ROOT_DIRECTORY: "BLANK" (e.g., "/" or "_site/")
OTHER_FILE_TYPES: "BLANK" (e.g., ".txt, .rtf, .bibtex")
You are an AI assistant tasked with creating a llms.txt file for an academic researcher's website.
Objective:
Your goal is to generate a single Markdown file that will serve as a structured guide for other Large Language Models (LLMs). This file, named llms.txt, helps them efficiently discover and understand the most important professional and academic content on the site.
Background on llms.txt:
Think of llms.txt as a "treasure map" for AI. It is a proposed standard that provides a curated list of key resources on a website. Unlike robots.txt, which is for blocking crawlers, llms.txt is for guidance. It is typically a Markdown file containing:
- An H1 heading with the site's title.
- A blockquote with a concise, high-level summary.
- A bulleted list of links to the most valuable pages, each with a short description.
Instructions:
-
Analyze the Source:
- Thoroughly scan all relevant files within the [ROOT_DIRECTORY] and all of its subdirectories. This analysis should primarily target .html files and any plain-text source files used to generate the site's content, such as [OTHER_FILE_TYPES, e.g., .md, .qmd].
- Take the values of
ROOT_DIRECTORYandOTHER_FILE_TYPESfrom the Config section at the top of the prompt. - If
ROOT_DIRECTORYorOTHER_FILE_TYPESare "BLANK", you MUST STOP and tell user to fill out the Config section at the top of the prompt.
-
Generate the
llms.txtContent:- Your output must be a single Markdown code block, ready to be saved as
llms.txt.
- Your output must be a single Markdown code block, ready to be saved as
-
Follow this Structure and Content Guidance:
-
H1 Title:
- Infer the researcher's full name and primary affiliation (e.g., university or lab).
- Format it as:
# Dr. Jane Doe, Example University
-
Summary Blockquote:
- Synthesize a one-sentence summary describing the researcher's main field and specific area of focus (e.g., computational linguistics, protein folding, etc.).
- Format it as:
> Research focuses on [field] with an emphasis on [specifics].
-
Curated Link List:
- From the files you scanned, identify the most important pages. Prioritize content that is most relevant to a researcher's professional identity.
- Organize the links under the following categories:
Key Documents,Research Focus,Publications, andResources. - For each page you select, create a list item with the page's title, its relative URL from the
_siteroot, and a brief description. - Prioritization Guide for Selecting Pages:
- Key Documents (Highest Priority): Find pages for the researcher's Curriculum Vitae (CV) or Resume, and their primary contact information.
- Research Focus: Find pages describing their main research statement, interests, or specific ongoing projects.
- Publications: Find the main page that lists published papers, articles, or conference proceedings.
- Resources: Find pages linking to open-source software, code repositories (like GitHub), or public datasets they have released.
-
Example Output Format:
# Dr. Alán Varela, Institute for Computational Science
> Research focuses on generative models and reinforcement learning with applications in natural language understanding.
## Key Documents
- [Curriculum Vitae](/cv.html): A complete list of publications, grants, teaching, and professional service.
- [Contact Information](/contact/): How to get in touch for collaboration or inquiries.
## Research Focus
- [Research Statement](/research/): An overview of my long-term research agenda and methodologies.
- [Active Projects](/projects/): Details on current work, including the 'Cognate' and 'Lexi-Sim' projects.
## Publications
- [Publications List](/publications/): A filterable list of all peer-reviewed papers and pre-prints.
## Resources
- [GitHub Profile](https://github.com/user-example): Open-source code for projects and research tools.
- [Hugging Face Models](https://huggingface.co/user-example): Publicly available pre-trained models and datasets.