Skip to content

API Agent Reference

Here we handle the connection to external software tools via the parameterisation of API calls by the LLM.

Base classes

The abstract base classes

Abstract base classes for API interaction components.

Provides base classes for query builders, fetchers, and interpreters used in API interactions and result processing.

BaseAPIModel

Bases: BaseModel

A base class for all API models.

Includes default fields uuid and method_name.

Source code in biochatter/api_agent/base/agent_abc.py
class BaseAPIModel(BaseModel):
    """A base class for all API models.

    Includes default fields `uuid` and `method_name`.
    """

    uuid: str | None = Field(
        None,
        description="Unique identifier for the model instance",
    )
    model_config = ConfigDict(arbitrary_types_allowed=True)

BaseFetcher

Bases: ABC

Abstract base class for fetchers.

A fetcher is responsible for submitting queries (in systems where submission and fetching are separate) and fetching and saving results of queries. It has to implement a fetch_results() method, which can wrap a multi-step procedure to submit and retrieve. Should implement retry method to account for connectivity issues or processing times.

Source code in biochatter/api_agent/base/agent_abc.py
class BaseFetcher(ABC):
    """Abstract base class for fetchers.

    A fetcher is responsible for submitting queries (in systems where
    submission and fetching are separate) and fetching and saving results of
    queries. It has to implement a `fetch_results()` method, which can wrap a
    multi-step procedure to submit and retrieve. Should implement retry method to
    account for connectivity issues or processing times.
    """

    @abstractmethod
    def fetch_results(
        self,
        query_models: list[BaseModel],
        retries: int | None = 3,
    ):
        """Fetch results by submitting a query.

        Can implement a multi-step procedure if submitting and fetching are
        distinct processes (e.g., in the case of long processing times as in the
        case of BLAST).

        Args:
        ----
            query_models: list of Pydantic models describing the parameterised
                queries

        """

fetch_results(query_models, retries=3) abstractmethod

Fetch results by submitting a query.

Can implement a multi-step procedure if submitting and fetching are distinct processes (e.g., in the case of long processing times as in the case of BLAST).


query_models: list of Pydantic models describing the parameterised
    queries
Source code in biochatter/api_agent/base/agent_abc.py
@abstractmethod
def fetch_results(
    self,
    query_models: list[BaseModel],
    retries: int | None = 3,
):
    """Fetch results by submitting a query.

    Can implement a multi-step procedure if submitting and fetching are
    distinct processes (e.g., in the case of long processing times as in the
    case of BLAST).

    Args:
    ----
        query_models: list of Pydantic models describing the parameterised
            queries

    """

BaseInterpreter

Bases: ABC

Abstract base class for result interpreters.

The interpreter is aware of the nature and structure of the results and can extract and summarise information from them.

Source code in biochatter/api_agent/base/agent_abc.py
class BaseInterpreter(ABC):
    """Abstract base class for result interpreters.

    The interpreter is aware of the nature and structure of the results and can
    extract and summarise information from them.
    """

    @abstractmethod
    def summarise_results(
        self,
        question: str,
        conversation_factory: Callable,
        response_text: str,
    ) -> str:
        """Summarise an answer based on the given parameters.

        Args:
        ----
            question (str): The question that was asked.

            conversation_factory (Callable): A function that creates a
                BioChatter conversation.

            response_text (str): The response.text returned from the request.

        Returns:
        -------
            A summary of the answer.

        Todo:
        ----
            Genericise (remove file path and n_lines parameters, and use a
            generic way to get the results). The child classes should manage the
            specifics of the results.

        """

summarise_results(question, conversation_factory, response_text) abstractmethod

Summarise an answer based on the given parameters.


question (str): The question that was asked.

conversation_factory (Callable): A function that creates a
    BioChatter conversation.

response_text (str): The response.text returned from the request.

A summary of the answer.
Todo:
Genericise (remove file path and n_lines parameters, and use a
generic way to get the results). The child classes should manage the
specifics of the results.
Source code in biochatter/api_agent/base/agent_abc.py
@abstractmethod
def summarise_results(
    self,
    question: str,
    conversation_factory: Callable,
    response_text: str,
) -> str:
    """Summarise an answer based on the given parameters.

    Args:
    ----
        question (str): The question that was asked.

        conversation_factory (Callable): A function that creates a
            BioChatter conversation.

        response_text (str): The response.text returned from the request.

    Returns:
    -------
        A summary of the answer.

    Todo:
    ----
        Genericise (remove file path and n_lines parameters, and use a
        generic way to get the results). The child classes should manage the
        specifics of the results.

    """

BaseQueryBuilder

Bases: ABC

An abstract base class for query builders.

Source code in biochatter/api_agent/base/agent_abc.py
class BaseQueryBuilder(ABC):
    """An abstract base class for query builders."""

    @property
    def structured_output_prompt(self) -> ChatPromptTemplate:
        """Define a structured output prompt template.

        This provides a default implementation for an API agent that can be
        overridden by subclasses to return a ChatPromptTemplate-compatible
        object.
        """
        return ChatPromptTemplate.from_messages(
            [
                (
                    "system",
                    "You are a world class algorithm for extracting information in structured formats.",
                ),
                (
                    "human",
                    "Use the given format to extract information from the following input: {input}",
                ),
                ("human", "Tip: Make sure to answer in the correct format"),
            ],
        )

    @abstractmethod
    def create_runnable(
        self,
        query_parameters: "BaseModel",
        conversation: "Conversation",
    ) -> Callable:
        """Create a runnable object for executing queries.

        Must be implemented by subclasses. Should use the LangChain
        `create_structured_output_runnable` method to generate the Callable.

        Args:
        ----
            query_parameters: A Pydantic data model that specifies the fields of
                the API that should be queried.

            conversation: A BioChatter conversation object.

        Returns:
        -------
            A Callable object that can execute the query.

        """

    @abstractmethod
    def parameterise_query(
        self,
        question: str,
        conversation: "Conversation",
    ) -> list[BaseModel]:
        """Parameterise a query object.

        Parameterises a Pydantic model with the fields of the API based on the
        given question using a BioChatter conversation instance. Must be
        implemented by subclasses.

        Args:
        ----
            question (str): The question to be answered.

            conversation: The BioChatter conversation object containing the LLM
                that should parameterise the query.

        Returns:
        -------
            A list containing one or more parameterised instance(s) of the query
            object (Pydantic BaseModel).

        """

structured_output_prompt property

Define a structured output prompt template.

This provides a default implementation for an API agent that can be overridden by subclasses to return a ChatPromptTemplate-compatible object.

create_runnable(query_parameters, conversation) abstractmethod

Create a runnable object for executing queries.

Must be implemented by subclasses. Should use the LangChain create_structured_output_runnable method to generate the Callable.


query_parameters: A Pydantic data model that specifies the fields of
    the API that should be queried.

conversation: A BioChatter conversation object.

A Callable object that can execute the query.
Source code in biochatter/api_agent/base/agent_abc.py
@abstractmethod
def create_runnable(
    self,
    query_parameters: "BaseModel",
    conversation: "Conversation",
) -> Callable:
    """Create a runnable object for executing queries.

    Must be implemented by subclasses. Should use the LangChain
    `create_structured_output_runnable` method to generate the Callable.

    Args:
    ----
        query_parameters: A Pydantic data model that specifies the fields of
            the API that should be queried.

        conversation: A BioChatter conversation object.

    Returns:
    -------
        A Callable object that can execute the query.

    """

parameterise_query(question, conversation) abstractmethod

Parameterise a query object.

Parameterises a Pydantic model with the fields of the API based on the given question using a BioChatter conversation instance. Must be implemented by subclasses.


question (str): The question to be answered.

conversation: The BioChatter conversation object containing the LLM
    that should parameterise the query.

A list containing one or more parameterised instance(s) of the query
object (Pydantic BaseModel).
Source code in biochatter/api_agent/base/agent_abc.py
@abstractmethod
def parameterise_query(
    self,
    question: str,
    conversation: "Conversation",
) -> list[BaseModel]:
    """Parameterise a query object.

    Parameterises a Pydantic model with the fields of the API based on the
    given question using a BioChatter conversation instance. Must be
    implemented by subclasses.

    Args:
    ----
        question (str): The question to be answered.

        conversation: The BioChatter conversation object containing the LLM
            that should parameterise the query.

    Returns:
    -------
        A list containing one or more parameterised instance(s) of the query
        object (Pydantic BaseModel).

    """

BaseTools

Abstract base class for tools.

Source code in biochatter/api_agent/base/agent_abc.py
class BaseTools:
    """Abstract base class for tools."""

    def make_pydantic_tools(self) -> list[BaseAPIModel]:
        """Uses pydantics create_model to create a list of pydantic tools from a dictionary of parameters"""
        tools = []
        for func_name, tool_params in self.tools_params.items():
            tools.append(create_model(func_name, **tool_params, __base__=BaseAPIModel))
        return tools

make_pydantic_tools()

Uses pydantics create_model to create a list of pydantic tools from a dictionary of parameters

Source code in biochatter/api_agent/base/agent_abc.py
def make_pydantic_tools(self) -> list[BaseAPIModel]:
    """Uses pydantics create_model to create a list of pydantic tools from a dictionary of parameters"""
    tools = []
    for func_name, tool_params in self.tools_params.items():
        tools.append(create_model(func_name, **tool_params, __base__=BaseAPIModel))
    return tools

The API Agent

Base API agent module.

APIAgent

Source code in biochatter/api_agent/base/api_agent.py
class APIAgent:
    def __init__(
        self,
        conversation_factory: Callable,
        query_builder: "BaseQueryBuilder",
        fetcher: "BaseFetcher",
        interpreter: "BaseInterpreter",
    ):
        """API agent class to interact with a tool's API for querying and fetching
        results.  The query fields have to be defined in a Pydantic model
        (`BaseModel`) and used (i.e., parameterised by the LLM) in the query
        builder. Specific API agents are defined in submodules of this directory
        (`api_agent`). The agent's logic is implemented in the `execute` method.

        Attributes
        ----------
            conversation_factory (Callable): A function used to create a
                BioChatter conversation, providing LLM access.

            query_builder (BaseQueryBuilder): An instance of a child of the
                BaseQueryBuilder class.

            result_fetcher (BaseFetcher): An instance of a child of the
                BaseFetcher class.

            result_interpreter (BaseInterpreter): An instance of a child of the
                BaseInterpreter class.

        """
        self.conversation_factory = conversation_factory
        self.query_builder = query_builder
        self.fetcher = fetcher
        self.interpreter = interpreter
        self.final_answer = None

    def parameterise_query(self, question: str) -> list[BaseModel] | None:
        """Use LLM to parameterise a query (a list of Pydantic models) based on the given
        question using a BioChatter conversation instance.
        """
        try:
            conversation = self.conversation_factory()
            return self.query_builder.parameterise_query(question, conversation)
        except Exception as e:
            print(f"Error generating query: {e}")
            return None

    def fetch_results(self, query_models: list[BaseModel]) -> str | None:
        """Fetch the results of the query using the individual API's implementation
        (either single-step or submit-retrieve).

        Args:
        ----
            query_models: list of parameterised query Pydantic models

        """
        try:
            return self.fetcher.fetch_results(query_models, 100)
        except Exception as e:
            print(f"Error fetching results: {e}")
            return None

    def summarise_results(
        self,
        question: str,
        response_text: str,
    ) -> str | None:
        """Summarise the retrieved results to extract the answer to the question."""
        try:
            return self.interpreter.summarise_results(
                question=question,
                conversation_factory=self.conversation_factory,
                response_text=response_text,
            )
        except Exception as e:
            print(f"Error extracting answer: {e}")
            return None

    def execute(self, question: str) -> str | None:
        """Wrapper that uses class methods to execute the API agent logic. Consists
        of 1) query generation, 2) query submission, 3) results fetching, and
        4) answer extraction. The final answer is stored in the final_answer
        attribute.

        Args:
        ----
            question (str): The question to be answered.

        """
        # Generate query
        try:
            query_models = self.parameterise_query(question)
            if not query_models:
                raise ValueError("Failed to generate query.")
        except ValueError as e:
            print(e)

        # Fetch results
        try:
            response_text = self.fetch_results(
                query_models=query_models,
            )
            if not response_text:
                raise ValueError("Failed to fetch results.")
        except ValueError as e:
            print(e)

        # Extract answer from results
        try:
            final_answer = self.summarise_results(question, response_text)
            if not final_answer:
                raise ValueError("Failed to extract answer from results.")
        except ValueError as e:
            print(e)

        self.final_answer = final_answer
        return final_answer

    def get_description(self, tool_name: str, tool_desc: str):
        return f"This API agent interacts with {tool_name}'s API for querying and fetching results. {tool_desc}"

__init__(conversation_factory, query_builder, fetcher, interpreter)

API agent class to interact with a tool's API for querying and fetching results. The query fields have to be defined in a Pydantic model (BaseModel) and used (i.e., parameterised by the LLM) in the query builder. Specific API agents are defined in submodules of this directory (api_agent). The agent's logic is implemented in the execute method.

Attributes
conversation_factory (Callable): A function used to create a
    BioChatter conversation, providing LLM access.

query_builder (BaseQueryBuilder): An instance of a child of the
    BaseQueryBuilder class.

result_fetcher (BaseFetcher): An instance of a child of the
    BaseFetcher class.

result_interpreter (BaseInterpreter): An instance of a child of the
    BaseInterpreter class.
Source code in biochatter/api_agent/base/api_agent.py
def __init__(
    self,
    conversation_factory: Callable,
    query_builder: "BaseQueryBuilder",
    fetcher: "BaseFetcher",
    interpreter: "BaseInterpreter",
):
    """API agent class to interact with a tool's API for querying and fetching
    results.  The query fields have to be defined in a Pydantic model
    (`BaseModel`) and used (i.e., parameterised by the LLM) in the query
    builder. Specific API agents are defined in submodules of this directory
    (`api_agent`). The agent's logic is implemented in the `execute` method.

    Attributes
    ----------
        conversation_factory (Callable): A function used to create a
            BioChatter conversation, providing LLM access.

        query_builder (BaseQueryBuilder): An instance of a child of the
            BaseQueryBuilder class.

        result_fetcher (BaseFetcher): An instance of a child of the
            BaseFetcher class.

        result_interpreter (BaseInterpreter): An instance of a child of the
            BaseInterpreter class.

    """
    self.conversation_factory = conversation_factory
    self.query_builder = query_builder
    self.fetcher = fetcher
    self.interpreter = interpreter
    self.final_answer = None

execute(question)

Wrapper that uses class methods to execute the API agent logic. Consists of 1) query generation, 2) query submission, 3) results fetching, and 4) answer extraction. The final answer is stored in the final_answer attribute.


question (str): The question to be answered.
Source code in biochatter/api_agent/base/api_agent.py
def execute(self, question: str) -> str | None:
    """Wrapper that uses class methods to execute the API agent logic. Consists
    of 1) query generation, 2) query submission, 3) results fetching, and
    4) answer extraction. The final answer is stored in the final_answer
    attribute.

    Args:
    ----
        question (str): The question to be answered.

    """
    # Generate query
    try:
        query_models = self.parameterise_query(question)
        if not query_models:
            raise ValueError("Failed to generate query.")
    except ValueError as e:
        print(e)

    # Fetch results
    try:
        response_text = self.fetch_results(
            query_models=query_models,
        )
        if not response_text:
            raise ValueError("Failed to fetch results.")
    except ValueError as e:
        print(e)

    # Extract answer from results
    try:
        final_answer = self.summarise_results(question, response_text)
        if not final_answer:
            raise ValueError("Failed to extract answer from results.")
    except ValueError as e:
        print(e)

    self.final_answer = final_answer
    return final_answer

fetch_results(query_models)

Fetch the results of the query using the individual API's implementation (either single-step or submit-retrieve).


query_models: list of parameterised query Pydantic models
Source code in biochatter/api_agent/base/api_agent.py
def fetch_results(self, query_models: list[BaseModel]) -> str | None:
    """Fetch the results of the query using the individual API's implementation
    (either single-step or submit-retrieve).

    Args:
    ----
        query_models: list of parameterised query Pydantic models

    """
    try:
        return self.fetcher.fetch_results(query_models, 100)
    except Exception as e:
        print(f"Error fetching results: {e}")
        return None

parameterise_query(question)

Use LLM to parameterise a query (a list of Pydantic models) based on the given question using a BioChatter conversation instance.

Source code in biochatter/api_agent/base/api_agent.py
def parameterise_query(self, question: str) -> list[BaseModel] | None:
    """Use LLM to parameterise a query (a list of Pydantic models) based on the given
    question using a BioChatter conversation instance.
    """
    try:
        conversation = self.conversation_factory()
        return self.query_builder.parameterise_query(question, conversation)
    except Exception as e:
        print(f"Error generating query: {e}")
        return None

summarise_results(question, response_text)

Summarise the retrieved results to extract the answer to the question.

Source code in biochatter/api_agent/base/api_agent.py
def summarise_results(
    self,
    question: str,
    response_text: str,
) -> str | None:
    """Summarise the retrieved results to extract the answer to the question."""
    try:
        return self.interpreter.summarise_results(
            question=question,
            conversation_factory=self.conversation_factory,
            response_text=response_text,
        )
    except Exception as e:
        print(f"Error extracting answer: {e}")
        return None

Web APIs

The BLAST tool

Module for handling BLAST API interactions.

Provides functionality for building queries, fetching results, and interpreting BLAST (Basic Local Alignment Search Tool) sequence alignment data.

BlastFetcher

Bases: BaseFetcher

A class for retrieving API results from BLAST.

Retrieves results from BLAST given a parameterised BlastQuery.

TODO add a limit of characters to be returned from the response.text?

Source code in biochatter/api_agent/web/blast.py
class BlastFetcher(BaseFetcher):
    """A class for retrieving API results from BLAST.

    Retrieves results from BLAST given a parameterised BlastQuery.

    TODO add a limit of characters to be returned from the response.text?
    """

    def _submit_query(self, request_data: BlastQueryParameters) -> str:
        """POST the BLAST query and retrieve the RID.

        The method submits the structured BlastQuery object and returns the RID.

        Args:
        ----
            request_data: BlastQuery object containing the BLAST query
                parameters.

        Returns:
        -------
            str: The Request ID (RID) for the submitted BLAST query.

        """
        data = {
            "CMD": request_data.cmd,
            "PROGRAM": request_data.program,
            "DATABASE": request_data.database,
            "QUERY": request_data.query,
            "FORMAT_TYPE": request_data.format_type,
            "MEGABLAST": request_data.megablast,
            "HITLIST_SIZE": request_data.max_hits,
        }
        # Include any other_params if provided
        if request_data.other_params:
            data.update(request_data.other_params)
        # Make the API call
        query_string = urlencode(data)
        # Combine base URL with the query string
        full_url = f"{request_data.url}?{query_string}"
        # Print the full URL
        request_data.full_url = full_url
        print("Full URL built by retriever:\n", request_data.full_url)
        response = requests.post(request_data.url, data=data, timeout=10)
        response.raise_for_status()
        # Extract RID from response
        print(response)
        match = re.search(r"RID = (\w+)", response.text)
        if match:
            return match.group(1)

        msg = "RID not found in BLAST submission response."
        raise ValueError(msg)

    def _fetch_results(
        self,
        rid: str,
        question_uuid: str,
        retries: int = 10000,
    ) -> str:
        """Fetch BLAST query data given RID.

        The second function to be called for a BLAST query.
        """
        base_url = "https://blast.ncbi.nlm.nih.gov/Blast.cgi"
        check_status_params = {
            "CMD": "Get",
            "FORMAT_OBJECT": "SearchInfo",
            "RID": rid,
        }
        get_results_params = {
            "CMD": "Get",
            "FORMAT_TYPE": "XML",
            "RID": rid,
        }

        # Check the status of the BLAST job
        for attempt in range(retries):
            status_response = requests.get(base_url, params=check_status_params, timeout=10)
            status_response.raise_for_status()
            status_text = status_response.text
            print("evaluating status")
            if "Status=WAITING" in status_text:
                print(f"{question_uuid} results not ready, waiting...")
                time.sleep(15)
            elif "Status=FAILED" in status_text:
                msg = "BLAST query FAILED."
                raise RuntimeError(msg)
            elif "Status=UNKNOWN" in status_text:
                msg = "BLAST query expired or does not exist."
                raise RuntimeError(msg)
            elif "Status=READY" in status_text:
                if "ThereAreHits=yes" in status_text:
                    print(f"{question_uuid} results are ready, retrieving.")
                    results_response = requests.get(
                        base_url,
                        params=get_results_params,
                        timeout=10,
                    )
                    results_response.raise_for_status()
                    return results_response.text
                return "No hits found"
            if attempt == retries - 1:
                msg = "Maximum attempts reached. Results may not be ready."
                raise TimeoutError(msg)
        return None

    def fetch_results(
        self,
        query_models: list[BlastQueryParameters],
        retries: int = 20,
    ) -> str:
        """Submit request and fetch results from BLAST API.

        Wraps individual submission and retrieval of results.

        Args:
        ----
            query_models: list of Pydantic models of the queries
            retries: the number of maximum retries

        Returns:
        -------
            str: the result from the BLAST API

        """
        # For now, we only use the first query in the list
        query = query_models[0]
        rid = self._submit_query(request_data=query)
        return self._fetch_results(
            rid=rid,
            question_uuid=query.question_uuid,
            retries=retries,
        )

fetch_results(query_models, retries=20)

Submit request and fetch results from BLAST API.

Wraps individual submission and retrieval of results.


query_models: list of Pydantic models of the queries
retries: the number of maximum retries

str: the result from the BLAST API
Source code in biochatter/api_agent/web/blast.py
def fetch_results(
    self,
    query_models: list[BlastQueryParameters],
    retries: int = 20,
) -> str:
    """Submit request and fetch results from BLAST API.

    Wraps individual submission and retrieval of results.

    Args:
    ----
        query_models: list of Pydantic models of the queries
        retries: the number of maximum retries

    Returns:
    -------
        str: the result from the BLAST API

    """
    # For now, we only use the first query in the list
    query = query_models[0]
    rid = self._submit_query(request_data=query)
    return self._fetch_results(
        rid=rid,
        question_uuid=query.question_uuid,
        retries=retries,
    )

BlastInterpreter

Bases: BaseInterpreter

A class for interpreting BLAST results.

Source code in biochatter/api_agent/web/blast.py
class BlastInterpreter(BaseInterpreter):
    """A class for interpreting BLAST results."""

    def summarise_results(
        self,
        question: str,
        conversation_factory: Callable,
        response_text: str,
    ) -> str:
        """Extract the answer from the BLAST results.

        Args:
        ----
            question (str): The question to be answered.
            conversation_factory: A BioChatter conversation object.
            response_text (str): The response.text returned by NCBI.

        Returns:
        -------
            str: The extracted answer from the BLAST results.

        """
        prompt = ChatPromptTemplate.from_messages(
            [
                (
                    "system",
                    "You are a world class molecular biologist who knows everything about NCBI and BLAST results.",
                ),
                ("user", "{input}"),
            ],
        )
        summary_prompt = BLAST_SUMMARY_PROMPT.format(
            question=question,
            context=response_text,
        )
        output_parser = StrOutputParser()
        conversation = conversation_factory()
        chain = prompt | conversation.chat | output_parser
        return chain.invoke({"input": {summary_prompt}})

summarise_results(question, conversation_factory, response_text)

Extract the answer from the BLAST results.


question (str): The question to be answered.
conversation_factory: A BioChatter conversation object.
response_text (str): The response.text returned by NCBI.

str: The extracted answer from the BLAST results.
Source code in biochatter/api_agent/web/blast.py
def summarise_results(
    self,
    question: str,
    conversation_factory: Callable,
    response_text: str,
) -> str:
    """Extract the answer from the BLAST results.

    Args:
    ----
        question (str): The question to be answered.
        conversation_factory: A BioChatter conversation object.
        response_text (str): The response.text returned by NCBI.

    Returns:
    -------
        str: The extracted answer from the BLAST results.

    """
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are a world class molecular biologist who knows everything about NCBI and BLAST results.",
            ),
            ("user", "{input}"),
        ],
    )
    summary_prompt = BLAST_SUMMARY_PROMPT.format(
        question=question,
        context=response_text,
    )
    output_parser = StrOutputParser()
    conversation = conversation_factory()
    chain = prompt | conversation.chat | output_parser
    return chain.invoke({"input": {summary_prompt}})

BlastQueryBuilder

Bases: BaseQueryBuilder

A class for building a BlastQuery object.

Source code in biochatter/api_agent/web/blast.py
class BlastQueryBuilder(BaseQueryBuilder):
    """A class for building a BlastQuery object."""

    def create_runnable(
        self,
        query_parameters: "BlastQueryParameters",
        conversation: "Conversation",
    ) -> Callable:
        """Create a runnable object for executing queries.

        Creates a runnable using the LangChain
        `create_structured_output_runnable` method.

        Args:
        ----
            query_parameters: A Pydantic data model that specifies the fields of
                the API that should be queried.

            conversation: A BioChatter conversation object.

        Returns:
        -------
            A Callable object that can execute the query.

        """
        return create_structured_output_runnable(
            output_schema=query_parameters,
            llm=conversation.chat,
            prompt=self.structured_output_prompt,
        )

    def parameterise_query(
        self,
        question: str,
        conversation: "Conversation",
    ) -> list[BlastQueryParameters]:
        """Generate a BlastQuery object.

        Generates the object based on the given question, prompt, and
        BioChatter conversation. Uses a Pydantic model to define the API fields.
        Creates a runnable that can be invoked on LLMs that are qualified to
        parameterise functions.

        Args:
        ----
            question (str): The question to be answered.

            conversation: The conversation object used for parameterising the
                BlastQuery.

        Returns:
        -------
            BlastQuery: the parameterised query object (Pydantic model)

        """
        runnable = self.create_runnable(
            query_parameters=BlastQueryParameters,
            conversation=conversation,
        )
        blast_call_obj = runnable.invoke(
            {"input": f"Answer:\n{question} based on:\n {BLAST_QUERY_PROMPT}"},
        )
        blast_call_obj.question_uuid = str(uuid.uuid4())
        return [blast_call_obj]

create_runnable(query_parameters, conversation)

Create a runnable object for executing queries.

Creates a runnable using the LangChain create_structured_output_runnable method.


query_parameters: A Pydantic data model that specifies the fields of
    the API that should be queried.

conversation: A BioChatter conversation object.

A Callable object that can execute the query.
Source code in biochatter/api_agent/web/blast.py
def create_runnable(
    self,
    query_parameters: "BlastQueryParameters",
    conversation: "Conversation",
) -> Callable:
    """Create a runnable object for executing queries.

    Creates a runnable using the LangChain
    `create_structured_output_runnable` method.

    Args:
    ----
        query_parameters: A Pydantic data model that specifies the fields of
            the API that should be queried.

        conversation: A BioChatter conversation object.

    Returns:
    -------
        A Callable object that can execute the query.

    """
    return create_structured_output_runnable(
        output_schema=query_parameters,
        llm=conversation.chat,
        prompt=self.structured_output_prompt,
    )

parameterise_query(question, conversation)

Generate a BlastQuery object.

Generates the object based on the given question, prompt, and BioChatter conversation. Uses a Pydantic model to define the API fields. Creates a runnable that can be invoked on LLMs that are qualified to parameterise functions.


question (str): The question to be answered.

conversation: The conversation object used for parameterising the
    BlastQuery.

BlastQuery: the parameterised query object (Pydantic model)
Source code in biochatter/api_agent/web/blast.py
def parameterise_query(
    self,
    question: str,
    conversation: "Conversation",
) -> list[BlastQueryParameters]:
    """Generate a BlastQuery object.

    Generates the object based on the given question, prompt, and
    BioChatter conversation. Uses a Pydantic model to define the API fields.
    Creates a runnable that can be invoked on LLMs that are qualified to
    parameterise functions.

    Args:
    ----
        question (str): The question to be answered.

        conversation: The conversation object used for parameterising the
            BlastQuery.

    Returns:
    -------
        BlastQuery: the parameterised query object (Pydantic model)

    """
    runnable = self.create_runnable(
        query_parameters=BlastQueryParameters,
        conversation=conversation,
    )
    blast_call_obj = runnable.invoke(
        {"input": f"Answer:\n{question} based on:\n {BLAST_QUERY_PROMPT}"},
    )
    blast_call_obj.question_uuid = str(uuid.uuid4())
    return [blast_call_obj]

BlastQueryParameters

Bases: BaseModel

Pydantic model for the parameters of a BLAST query request.

The class is used for configuring and sending a request to the NCBI BLAST query API. The fields are dynamically configured by the LLM based on the user's question.

Source code in biochatter/api_agent/web/blast.py
class BlastQueryParameters(BaseModel):
    """Pydantic model for the parameters of a BLAST query request.

    The class is used for configuring and sending a request to the NCBI BLAST
    query API. The fields are dynamically configured by the LLM based on the
    user's question.

    """

    url: str | None = Field(
        default="https://blast.ncbi.nlm.nih.gov/Blast.cgi?",
        description="ALWAYS USE DEFAULT, DO NOT CHANGE",
    )
    cmd: str | None = Field(
        default="Put",
        description="Command to execute, 'Put' for submitting query, 'Get' for retrieving results.",
    )
    program: str | None = Field(
        default="blastn",
        description=(
            "BLAST program to use, e.g., 'blastn' for nucleotide-nucleotide BLAST, "
            "'blastp' for protein-protein BLAST."
        ),
    )
    database: str | None = Field(
        default="nt",
        description=(
            "Database to search, e.g., 'nt' for nucleotide database, 'nr' for "
            "non redundant protein database, 'pdb' the Protein Data Bank "
            "database, which is used specifically for protein structures, "
            "'refseq_rna' and 'refseq_genomic': specialized databases for "
            "RNA sequences and genomic sequences"
        ),
    )
    query: str | None = Field(
        None,
        description=(
            "Nucleotide or protein sequence for the BLAST or blat query, "
            "make sure to always keep the entire sequence given."
        ),
    )
    format_type: str | None = Field(
        default="Text",
        description="Format of the BLAST results, e.g., 'Text', 'XML'.",
    )
    rid: str | None = Field(
        None,
        description="Request ID for retrieving BLAST results.",
    )
    other_params: dict | None = Field(
        default={"email": "user@example.com"},
        description="Other optional BLAST parameters, including user email.",
    )
    max_hits: int | None = Field(
        default=15,
        description="Maximum number of hits to return in the BLAST results.",
    )
    sort_by: str | None = Field(
        default="score",
        description="Criterion to sort BLAST results by, e.g., 'score', 'evalue'.",
    )
    megablast: str | None = Field(
        default="on",
        description="Set to 'on' for human genome alignemnts",
    )
    question_uuid: str | None = Field(
        default_factory=lambda: str(uuid.uuid4()),
        description="Unique identifier for the question.",
    )
    full_url: str | None = Field(
        default="TBF",
        description="Full URL to be used to submit the BLAST query",
    )

The OncoKB tool

OncoKB API agent.

OncoKBFetcher

Bases: BaseFetcher

A class for retrieving API results.

Retrieve from OncoKB given a parameterized OncoKBQuery.

Source code in biochatter/api_agent/web/oncokb.py
class OncoKBFetcher(BaseFetcher):
    """A class for retrieving API results.

    Retrieve from OncoKB given a parameterized OncoKBQuery.
    """

    def __init__(self, api_token="demo"):
        self.headers = {
            "Authorization": f"Bearer {api_token}",
            "Accept": "application/json",
        }
        self.base_url = "https://demo.oncokb.org/api/v1"

    def fetch_results(
        self,
        request_data: list[OncoKBQueryParameters],
        retries: int | None = 3,
    ) -> str:
        """Submit the OncoKB query and fetch the results directly.

        No multi-step procedure, thus no wrapping of submission and retrieval in
        this case.

        Args:
        ----
            request_data: List of OncoKBQuery objects (Pydantic models)
                containing the OncoKB query parameters.

            retries: The number of retries to fetch the results.

        Returns:
        -------
            str: The results of the OncoKB query.

        """
        # For now, we only use the first query in the list
        query = request_data[0]

        # Submit the query and get the URL
        params = query.dict(exclude_unset=True)
        endpoint = params.pop("endpoint")
        params.pop("question_uuid")
        full_url = f"{self.base_url}/{endpoint}"
        response = requests.get(full_url, headers=self.headers, params=params)
        response.raise_for_status()

        # Fetch the results from the URL
        results_response = requests.get(response.url, headers=self.headers)
        results_response.raise_for_status()

        return results_response.text

fetch_results(request_data, retries=3)

Submit the OncoKB query and fetch the results directly.

No multi-step procedure, thus no wrapping of submission and retrieval in this case.


request_data: List of OncoKBQuery objects (Pydantic models)
    containing the OncoKB query parameters.

retries: The number of retries to fetch the results.

str: The results of the OncoKB query.
Source code in biochatter/api_agent/web/oncokb.py
def fetch_results(
    self,
    request_data: list[OncoKBQueryParameters],
    retries: int | None = 3,
) -> str:
    """Submit the OncoKB query and fetch the results directly.

    No multi-step procedure, thus no wrapping of submission and retrieval in
    this case.

    Args:
    ----
        request_data: List of OncoKBQuery objects (Pydantic models)
            containing the OncoKB query parameters.

        retries: The number of retries to fetch the results.

    Returns:
    -------
        str: The results of the OncoKB query.

    """
    # For now, we only use the first query in the list
    query = request_data[0]

    # Submit the query and get the URL
    params = query.dict(exclude_unset=True)
    endpoint = params.pop("endpoint")
    params.pop("question_uuid")
    full_url = f"{self.base_url}/{endpoint}"
    response = requests.get(full_url, headers=self.headers, params=params)
    response.raise_for_status()

    # Fetch the results from the URL
    results_response = requests.get(response.url, headers=self.headers)
    results_response.raise_for_status()

    return results_response.text

OncoKBInterpreter

Bases: BaseInterpreter

Source code in biochatter/api_agent/web/oncokb.py
class OncoKBInterpreter(BaseInterpreter):
    def summarise_results(
        self,
        question: str,
        conversation_factory: Callable,
        response_text: str,
    ) -> str:
        """Extract the answer from the BLAST results.

        Args:
        ----
            question (str): The question to be answered.
            conversation_factory: A BioChatter conversation object.
            response_text (str): The response.text returned by OncoKB.

        Returns:
        -------
            str: The extracted answer from the BLAST results.

        """
        prompt = ChatPromptTemplate.from_messages(
            [
                (
                    "system",
                    "You are a world class molecular biologist who knows "
                    "everything about OncoKB and cancer genomics. Your task is "
                    "to interpret results from OncoKB API calls and summarise "
                    "them for the user.",
                ),
                ("user", "{input}"),
            ],
        )
        summary_prompt = ONCOKB_SUMMARY_PROMPT.format(
            question=question,
            context=response_text,
        )
        output_parser = StrOutputParser()
        conversation = conversation_factory()
        chain = prompt | conversation.chat | output_parser
        answer = chain.invoke({"input": {summary_prompt}})
        return answer

summarise_results(question, conversation_factory, response_text)

Extract the answer from the BLAST results.


question (str): The question to be answered.
conversation_factory: A BioChatter conversation object.
response_text (str): The response.text returned by OncoKB.

str: The extracted answer from the BLAST results.
Source code in biochatter/api_agent/web/oncokb.py
def summarise_results(
    self,
    question: str,
    conversation_factory: Callable,
    response_text: str,
) -> str:
    """Extract the answer from the BLAST results.

    Args:
    ----
        question (str): The question to be answered.
        conversation_factory: A BioChatter conversation object.
        response_text (str): The response.text returned by OncoKB.

    Returns:
    -------
        str: The extracted answer from the BLAST results.

    """
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are a world class molecular biologist who knows "
                "everything about OncoKB and cancer genomics. Your task is "
                "to interpret results from OncoKB API calls and summarise "
                "them for the user.",
            ),
            ("user", "{input}"),
        ],
    )
    summary_prompt = ONCOKB_SUMMARY_PROMPT.format(
        question=question,
        context=response_text,
    )
    output_parser = StrOutputParser()
    conversation = conversation_factory()
    chain = prompt | conversation.chat | output_parser
    answer = chain.invoke({"input": {summary_prompt}})
    return answer

OncoKBQueryBuilder

Bases: BaseQueryBuilder

A class for building an OncoKBQuery object.

Source code in biochatter/api_agent/web/oncokb.py
class OncoKBQueryBuilder(BaseQueryBuilder):
    """A class for building an OncoKBQuery object."""

    def create_runnable(
        self,
        query_parameters: "OncoKBQueryParameters",
        conversation: "Conversation",
    ) -> Callable:
        """Creates a runnable object for executing queries using the LangChain
        `create_structured_output_runnable` method.

        Args:
        ----
            query_parameters: A Pydantic data model that specifies the fields of
                the API that should be queried.

            conversation: A BioChatter conversation object.

        Returns:
        -------
            A Callable object that can execute the query.

        """
        return create_structured_output_runnable(
            output_schema=query_parameters,
            llm=conversation.chat,
            prompt=self.structured_output_prompt,
        )

    def parameterise_query(
        self,
        question: str,
        conversation: "Conversation",
    ) -> list[OncoKBQueryParameters]:
        """Generate an OncoKBQuery object.

        Generate based on the given question, prompt, and BioChatter
        conversation. Uses a Pydantic model to define the API fields. Creates a
        runnable that can be invoked on LLMs that are qualified to parameterise
        functions.

        Args:
        ----
            question (str): The question to be answered.

            conversation: The conversation object used for parameterising the
                OncoKBQuery.

        Returns:
        -------
            OncoKBQueryParameters: the parameterised query object (Pydantic model)

        """
        runnable = self.create_runnable(
            query_parameters=OncoKBQueryParameters,
            conversation=conversation,
        )
        oncokb_call_obj = runnable.invoke(
            {"input": f"Answer:\n{question} based on:\n {ONCOKB_QUERY_PROMPT}"},
        )
        oncokb_call_obj.question_uuid = str(uuid.uuid4())
        return [oncokb_call_obj]

create_runnable(query_parameters, conversation)

Creates a runnable object for executing queries using the LangChain create_structured_output_runnable method.


query_parameters: A Pydantic data model that specifies the fields of
    the API that should be queried.

conversation: A BioChatter conversation object.

A Callable object that can execute the query.
Source code in biochatter/api_agent/web/oncokb.py
def create_runnable(
    self,
    query_parameters: "OncoKBQueryParameters",
    conversation: "Conversation",
) -> Callable:
    """Creates a runnable object for executing queries using the LangChain
    `create_structured_output_runnable` method.

    Args:
    ----
        query_parameters: A Pydantic data model that specifies the fields of
            the API that should be queried.

        conversation: A BioChatter conversation object.

    Returns:
    -------
        A Callable object that can execute the query.

    """
    return create_structured_output_runnable(
        output_schema=query_parameters,
        llm=conversation.chat,
        prompt=self.structured_output_prompt,
    )

parameterise_query(question, conversation)

Generate an OncoKBQuery object.

Generate based on the given question, prompt, and BioChatter conversation. Uses a Pydantic model to define the API fields. Creates a runnable that can be invoked on LLMs that are qualified to parameterise functions.


question (str): The question to be answered.

conversation: The conversation object used for parameterising the
    OncoKBQuery.

OncoKBQueryParameters: the parameterised query object (Pydantic model)
Source code in biochatter/api_agent/web/oncokb.py
def parameterise_query(
    self,
    question: str,
    conversation: "Conversation",
) -> list[OncoKBQueryParameters]:
    """Generate an OncoKBQuery object.

    Generate based on the given question, prompt, and BioChatter
    conversation. Uses a Pydantic model to define the API fields. Creates a
    runnable that can be invoked on LLMs that are qualified to parameterise
    functions.

    Args:
    ----
        question (str): The question to be answered.

        conversation: The conversation object used for parameterising the
            OncoKBQuery.

    Returns:
    -------
        OncoKBQueryParameters: the parameterised query object (Pydantic model)

    """
    runnable = self.create_runnable(
        query_parameters=OncoKBQueryParameters,
        conversation=conversation,
    )
    oncokb_call_obj = runnable.invoke(
        {"input": f"Answer:\n{question} based on:\n {ONCOKB_QUERY_PROMPT}"},
    )
    oncokb_call_obj.question_uuid = str(uuid.uuid4())
    return [oncokb_call_obj]

The bio.tools API

Module for interacting with the bio.tools API.

BioToolsFetcher

Bases: BaseFetcher

A class for retrieving API results from BioTools.

Retrieves API results given a parameterized BioToolsQuery.

Source code in biochatter/api_agent/web/bio_tools.py
class BioToolsFetcher(BaseFetcher):
    """A class for retrieving API results from BioTools.

    Retrieves API results given a parameterized BioToolsQuery.
    """

    def __init__(self, api_token: str = "demo") -> None:  # noqa: S107
        """Initialise the BioToolsFetcher.

        Args:
        ----
            api_token: The API token for the BioTools API.

        """
        self.headers = {
            "Authorization": f"Bearer {api_token}",
            "Accept": "application/json",
        }
        self.base_url = "https://bio.tools/api"

    def fetch_results(
        self,
        request_data: list[BioToolsQueryParameters],
        retries: int | None = 3,  # noqa: ARG002
    ) -> str:
        """Submit the BioTools query and fetch the results directly.

        No multi-step procedure, thus no wrapping of submission and retrieval in
        this case.

        Args:
        ----
            request_data: List of BioToolsQuery objects (Pydantic models)
                containing the BioTools query parameters.

            retries: The number of retries to fetch the results.

        Returns:
        -------
            str: The results of the BioTools query.

        """
        # For now, we only use the first query in the list
        query = request_data[0]

        # Submit the query and get the URL
        params = query.dict(exclude_unset=True)
        endpoint = params.pop("endpoint")
        params.pop("question_uuid")
        full_url = f"{self.base_url}/{endpoint}"
        response = requests.get(full_url, headers=self.headers, params=params, timeout=30)
        response.raise_for_status()

        # Fetch the results from the URL
        results_response = requests.get(response.url, headers=self.headers, timeout=30)
        results_response.raise_for_status()

        return results_response.text

__init__(api_token='demo')

Initialise the BioToolsFetcher.


api_token: The API token for the BioTools API.
Source code in biochatter/api_agent/web/bio_tools.py
def __init__(self, api_token: str = "demo") -> None:  # noqa: S107
    """Initialise the BioToolsFetcher.

    Args:
    ----
        api_token: The API token for the BioTools API.

    """
    self.headers = {
        "Authorization": f"Bearer {api_token}",
        "Accept": "application/json",
    }
    self.base_url = "https://bio.tools/api"

fetch_results(request_data, retries=3)

Submit the BioTools query and fetch the results directly.

No multi-step procedure, thus no wrapping of submission and retrieval in this case.


request_data: List of BioToolsQuery objects (Pydantic models)
    containing the BioTools query parameters.

retries: The number of retries to fetch the results.

str: The results of the BioTools query.
Source code in biochatter/api_agent/web/bio_tools.py
def fetch_results(
    self,
    request_data: list[BioToolsQueryParameters],
    retries: int | None = 3,  # noqa: ARG002
) -> str:
    """Submit the BioTools query and fetch the results directly.

    No multi-step procedure, thus no wrapping of submission and retrieval in
    this case.

    Args:
    ----
        request_data: List of BioToolsQuery objects (Pydantic models)
            containing the BioTools query parameters.

        retries: The number of retries to fetch the results.

    Returns:
    -------
        str: The results of the BioTools query.

    """
    # For now, we only use the first query in the list
    query = request_data[0]

    # Submit the query and get the URL
    params = query.dict(exclude_unset=True)
    endpoint = params.pop("endpoint")
    params.pop("question_uuid")
    full_url = f"{self.base_url}/{endpoint}"
    response = requests.get(full_url, headers=self.headers, params=params, timeout=30)
    response.raise_for_status()

    # Fetch the results from the URL
    results_response = requests.get(response.url, headers=self.headers, timeout=30)
    results_response.raise_for_status()

    return results_response.text

BioToolsInterpreter

Bases: BaseInterpreter

A class for interpreting BioTools results.

Source code in biochatter/api_agent/web/bio_tools.py
class BioToolsInterpreter(BaseInterpreter):
    """A class for interpreting BioTools results."""

    def summarise_results(
        self,
        question: str,
        conversation_factory: Callable,
        response_text: str,
    ) -> str:
        """Extract the answer from the BLAST results.

        Args:
        ----
            question (str): The question to be answered.
            conversation_factory: A BioChatter conversation object.
            response_text (str): The response.text returned by bio.tools.

        Returns:
        -------
            str: The extracted answer from the BLAST results.

        """
        prompt = ChatPromptTemplate.from_messages(
            [
                (
                    "system",
                    "You are a world class bioinformatician who knows "
                    "everything about bio.tools packages and the "
                    "bioinformatics ecosystem. Your task is to interpret "
                    "results from BioTools API calls and summarise "
                    "them for the user.",
                ),
                ("user", "{input}"),
            ],
        )
        summary_prompt = BIOTOOLS_SUMMARY_PROMPT.format(
            question=question,
            context=response_text,
        )
        output_parser = StrOutputParser()
        conversation = conversation_factory()
        chain = prompt | conversation.chat | output_parser
        return chain.invoke({"input": {summary_prompt}})

summarise_results(question, conversation_factory, response_text)

Extract the answer from the BLAST results.


question (str): The question to be answered.
conversation_factory: A BioChatter conversation object.
response_text (str): The response.text returned by bio.tools.

str: The extracted answer from the BLAST results.
Source code in biochatter/api_agent/web/bio_tools.py
def summarise_results(
    self,
    question: str,
    conversation_factory: Callable,
    response_text: str,
) -> str:
    """Extract the answer from the BLAST results.

    Args:
    ----
        question (str): The question to be answered.
        conversation_factory: A BioChatter conversation object.
        response_text (str): The response.text returned by bio.tools.

    Returns:
    -------
        str: The extracted answer from the BLAST results.

    """
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are a world class bioinformatician who knows "
                "everything about bio.tools packages and the "
                "bioinformatics ecosystem. Your task is to interpret "
                "results from BioTools API calls and summarise "
                "them for the user.",
            ),
            ("user", "{input}"),
        ],
    )
    summary_prompt = BIOTOOLS_SUMMARY_PROMPT.format(
        question=question,
        context=response_text,
    )
    output_parser = StrOutputParser()
    conversation = conversation_factory()
    chain = prompt | conversation.chat | output_parser
    return chain.invoke({"input": {summary_prompt}})

BioToolsQueryBuilder

Bases: BaseQueryBuilder

A class for building an BioToolsQuery object.

Source code in biochatter/api_agent/web/bio_tools.py
class BioToolsQueryBuilder(BaseQueryBuilder):
    """A class for building an BioToolsQuery object."""

    def create_runnable(
        self,
        query_parameters: "BioToolsQueryParameters",
        conversation: "Conversation",
    ) -> Callable:
        """Create a runnable object for executing queries.

        Create runnable using the LangChain `create_structured_output_runnable`
        method.

        Args:
        ----
            query_parameters: A Pydantic data model that specifies the fields of
                the API that should be queried.

            conversation: A BioChatter conversation object.

        Returns:
        -------
            A Callable object that can execute the query.

        """
        return create_structured_output_runnable(
            output_schema=query_parameters,
            llm=conversation.chat,
            prompt=self.structured_output_prompt,
        )

    def parameterise_query(
        self,
        question: str,
        conversation: "Conversation",
    ) -> list[BioToolsQueryParameters]:
        """Generate an BioToolsQuery object.

        Generate a BioToolsQuery object based on the given question, prompt,
        and BioChatter conversation. Uses a Pydantic model to define the API
        fields.  Creates a runnable that can be invoked on LLMs that are
        qualified to parameterise functions.

        Args:
        ----
            question (str): The question to be answered.

            conversation: The conversation object used for parameterising the
                BioToolsQuery.

        Returns:
        -------
            BioToolsQueryParameters: the parameterised query object (Pydantic
                model)

        """
        runnable = self.create_runnable(
            query_parameters=BioToolsQueryParameters,
            conversation=conversation,
        )
        oncokb_call_obj = runnable.invoke(
            {
                "input": f"Answer:\n{question} based on:\n {BIOTOOLS_QUERY_PROMPT}",
            },
        )
        oncokb_call_obj.question_uuid = str(uuid.uuid4())
        return [oncokb_call_obj]

create_runnable(query_parameters, conversation)

Create a runnable object for executing queries.

Create runnable using the LangChain create_structured_output_runnable method.


query_parameters: A Pydantic data model that specifies the fields of
    the API that should be queried.

conversation: A BioChatter conversation object.

A Callable object that can execute the query.
Source code in biochatter/api_agent/web/bio_tools.py
def create_runnable(
    self,
    query_parameters: "BioToolsQueryParameters",
    conversation: "Conversation",
) -> Callable:
    """Create a runnable object for executing queries.

    Create runnable using the LangChain `create_structured_output_runnable`
    method.

    Args:
    ----
        query_parameters: A Pydantic data model that specifies the fields of
            the API that should be queried.

        conversation: A BioChatter conversation object.

    Returns:
    -------
        A Callable object that can execute the query.

    """
    return create_structured_output_runnable(
        output_schema=query_parameters,
        llm=conversation.chat,
        prompt=self.structured_output_prompt,
    )

parameterise_query(question, conversation)

Generate an BioToolsQuery object.

Generate a BioToolsQuery object based on the given question, prompt, and BioChatter conversation. Uses a Pydantic model to define the API fields. Creates a runnable that can be invoked on LLMs that are qualified to parameterise functions.


question (str): The question to be answered.

conversation: The conversation object used for parameterising the
    BioToolsQuery.

BioToolsQueryParameters: the parameterised query object (Pydantic
    model)
Source code in biochatter/api_agent/web/bio_tools.py
def parameterise_query(
    self,
    question: str,
    conversation: "Conversation",
) -> list[BioToolsQueryParameters]:
    """Generate an BioToolsQuery object.

    Generate a BioToolsQuery object based on the given question, prompt,
    and BioChatter conversation. Uses a Pydantic model to define the API
    fields.  Creates a runnable that can be invoked on LLMs that are
    qualified to parameterise functions.

    Args:
    ----
        question (str): The question to be answered.

        conversation: The conversation object used for parameterising the
            BioToolsQuery.

    Returns:
    -------
        BioToolsQueryParameters: the parameterised query object (Pydantic
            model)

    """
    runnable = self.create_runnable(
        query_parameters=BioToolsQueryParameters,
        conversation=conversation,
    )
    oncokb_call_obj = runnable.invoke(
        {
            "input": f"Answer:\n{question} based on:\n {BIOTOOLS_QUERY_PROMPT}",
        },
    )
    oncokb_call_obj.question_uuid = str(uuid.uuid4())
    return [oncokb_call_obj]

BioToolsQueryParameters

Bases: BaseModel

Parameters for querying the bio.tools API.

Source code in biochatter/api_agent/web/bio_tools.py
class BioToolsQueryParameters(BaseModel):
    """Parameters for querying the bio.tools API."""

    base_url: str = Field(
        default="https://bio.tools/api/",
        description="Base URL for the BioTools API.",
    )
    endpoint: str = Field(
        ...,
        description="Specific API endpoint to hit. Example: 't/' for listing tools.",
    )
    biotoolsID: str | None = Field(  # noqa: N815
        None,
        description="Search for bio.tools tool ID (usually quoted - to get exact match)",
    )
    name: str | None = Field(
        None,
        description="Search for tool name (quoted as needed: quoted for exact match, unquoted for fuzzy search)",
    )
    homepage: str | None = Field(
        None,
        description="Exact search for tool homepage URL (**must** be quoted)",
    )
    description: str | None = Field(
        None,
        description="Search over tool description (quoted as needed)",
    )
    version: str | None = Field(
        None,
        description="Exact search for tool version (**must** be quoted)",
    )
    topic: str | None = Field(
        None,
        description="Search for EDAM Topic (term) (quoted as needed)",
    )
    topicID: str | None = Field(  # noqa: N815
        None,
        description="Exact search for EDAM Topic (URI): **must** be quoted",
    )
    function: str | None = Field(
        None,
        description="Fuzzy search over function (input, operation, output, note and command)",
    )
    operation: str | None = Field(
        None,
        description="Fuzzy search for EDAM Operation (term) (quoted as needed)",
    )
    operationID: str | None = Field(  # noqa: N815
        None,
        description="Exact search for EDAM Operation (ID) (**must** be quoted)",
    )
    dataType: str | None = Field(  # noqa: N815
        None,
        description="Fuzzy search over input and output for EDAM Data (term) (quoted as needed)",
    )
    dataTypeID: str | None = Field(  # noqa: N815
        None,
        description="Exact search over input and output for EDAM Data (ID) (**must** be quoted)",
    )
    dataFormat: str | None = Field(  # noqa: N815
        None,
        description="Fuzzy search over input and output for EDAM Format (term) (quoted as needed)",
    )
    dataFormatID: str | None = Field(  # noqa: N815
        None,
        description="Exact search over input and output for EDAM Format (ID) (**must** be quoted)",
    )
    input: str | None = Field(
        None,
        description="Fuzzy search over input for EDAM Data and Format (term) (quoted as needed)",
    )
    inputID: str | None = Field(  # noqa: N815
        None,
        description="Exact search over input for EDAM Data and Format (ID) (**must** be quoted)",
    )
    inputDataType: str | None = Field(  # noqa: N815
        None,
        description="Fuzzy search over input for EDAM Data (term) (quoted as needed)",
    )
    inputDataTypeID: str | None = Field(  # noqa: N815
        None,
        description="Exact search over input for EDAM Data (ID) (**must** be quoted)",
    )
    inputDataFormat: str | None = Field(  # noqa: N815
        None,
        description="Fuzzy search over input for EDAM Format (term) (quoted as needed)",
    )
    inputDataFormatID: str | None = Field(  # noqa: N815
        None,
        description="Exact search over input for EDAM Format (ID) (**must** be quoted)",
    )
    output: str | None = Field(
        None,
        description="Fuzzy search over output for EDAM Data and Format (term) (quoted as needed)",
    )
    outputID: str | None = Field(  # noqa: N815
        None,
        description="Exact search over output for EDAM Data and Format (ID) (**must** be quoted)",
    )
    outputDataType: str | None = Field(  # noqa: N815
        None,
        description="Fuzzy search over output for EDAM Data (term) (quoted as needed)",
    )
    outputDataTypeID: str | None = Field(  # noqa: N815
        None,
        description="Exact search over output for EDAM Data (ID) (**must** be quoted)",
    )
    outputDataFormat: str | None = Field(  # noqa: N815
        None,
        description="Fuzzy search over output for EDAM Format (term) (quoted as needed)",
    )
    outputDataFormatID: str | None = Field(  # noqa: N815
        None,
        description="Exact search over output for EDAM Format (ID) (**must** be quoted)",
    )
    toolType: str | None = Field(  # noqa: N815
        None,
        description="Exact search for tool type",
    )
    collectionID: str | None = Field(  # noqa: N815
        None,
        description="Exact search for tool collection (normally quoted)",
    )
    maturity: str | None = Field(
        None,
        description="Exact search for tool maturity",
    )
    operatingSystem: str | None = Field(  # noqa: N815
        None,
        description="Exact search for tool operating system",
    )
    language: str | None = Field(
        None,
        description="Exact search for programming language",
    )
    cost: str | None = Field(
        None,
        description="Exact search for cost",
    )
    license: str | None = Field(
        None,
        description="Exact search for software or data usage license (quoted as needed)",
    )
    accessibility: str | None = Field(
        None,
        description="Exact search for tool accessibility",
    )
    credit: str | None = Field(
        None,
        description="Fuzzy search over credit (name, email, URL, ORCID iD, type of entity, type of role and note)",
    )
    creditName: str | None = Field(  # noqa: N815
        None,
        description="Exact search for name of credited entity",
    )
    creditTypeRole: str | None = Field(  # noqa: N815
        None,
        description="Exact search for role of credited entity",
    )
    creditTypeEntity: str | None = Field(  # noqa: N815
        None,
        description="Exact search for type of credited entity",
    )
    creditOrcidID: str | None = Field(  # noqa: N815
        None,
        description="Exact search for ORCID iD of credited entity (**must** be quoted)",
    )
    publication: str | None = Field(
        None,
        description=(
            "Fuzzy search over publication (DOI, PMID, PMCID, publication type and tool version) (quoted as needed)"
        ),
    )
    publicationID: str | None = Field(  # noqa: N815
        None,
        description="Exact search for publication ID (DOI, PMID or PMCID) (**must** be quoted)",
    )
    publicationType: str | None = Field(  # noqa: N815
        None,
        description="Exact search for publication type",
    )
    publicationVersion: str | None = Field(  # noqa: N815
        None,
        description="Exact search for tool version associated with a publication (**must** be quoted)",
    )
    link: str | None = Field(
        None,
        description="Fuzzy search over general link (URL, type and note) (quote as needed)",
    )
    linkType: str | None = Field(  # noqa: N815
        None,
        description="Exact search for type of information found at a link",
    )
    documentation: str | None = Field(
        None,
        description="Fuzzy search over documentation link (URL, type and note) (quote as needed)",
    )
    documentationType: str | None = Field(  # noqa: N815
        None,
        description="Exact search for type of documentation",
    )
    download: str | None = Field(
        None,
        description="Fuzzy search over download link (URL, type, version and note) (quote as needed)",
    )
    downloadType: str | None = Field(  # noqa: N815
        None,
        description="Exact search for type of download",
    )
    downloadVersion: str | None = Field(  # noqa: N815
        None,
        description="Exact search for tool version associated with a download (**must** be quoted)",
    )
    otherID: str | None = Field(  # noqa: N815
        None,
        description="Fuzzy search over alternate tool IDs (ID value, type of ID and version)",
    )
    otherIDValue: str | None = Field(  # noqa: N815
        None,
        description="Exact search for value of alternate tool ID (**must** be quoted)",
    )
    otherIDType: str | None = Field(  # noqa: N815
        None,
        description="Exact search for type of alternate tool ID",
    )
    otherIDVersion: str | None = Field(  # noqa: N815
        None,
        description="Exact search for tool version associated with an alternate ID (**must** be quoted)",
    )
    question_uuid: str | None = Field(
        default_factory=lambda: str(uuid.uuid4()),
        description="Unique identifier for the question.",
    )

Python APIs

Generic Python API ingestion

Module for ingesting any Python module and generating a query builder.

GenericQueryBuilder

Bases: BaseQueryBuilder

A class for building a generic query using LLM tools.

The query builder works by ingesting a Python module and generating a list of Pydantic classes for each callable in the module. It then uses these classes to parameterise a query using LLM tool binding.

Source code in biochatter/api_agent/python/generic_agent.py
class GenericQueryBuilder(BaseQueryBuilder):
    """A class for building a generic query using LLM tools.

    The query builder works by ingesting a Python module and generating a list
    of Pydantic classes for each callable in the module. It then uses these
    classes to parameterise a query using LLM tool binding.
    """

    def create_runnable(
        self,
        query_parameters: list["BaseAPIModel"],
        conversation: Conversation,
    ) -> Callable:
        """Create a runnable object for the query builder.

        Args:
        ----
            query_parameters: The list of Pydantic classes to be used for the
                query.

            conversation: The conversation object used for parameterising the
                query.

        Returns:
        -------
            The runnable object for the query builder.

        """
        runnable = conversation.chat.bind_tools(query_parameters, tool_choice="required")
        return runnable | PydanticToolsParser(tools=query_parameters)

    def parameterise_query(
        self,
        question: str,
        prompt: str,
        conversation: "Conversation",
        module: ModuleType,
        generated_classes: list[BaseAPIModel] | None = None,
    ) -> list[BaseAPIModel]:
        """Parameterise tool calls for any Python module.

        Generate a list of parameterised BaseModel instances based on the given
        question, prompt, and BioChatter conversation. Uses a Pydantic model
        to define the API fields.

        Using langchain's `bind_tools` method to allow the LLM to parameterise
        the function call, based on the functions available in the module.

        Relies on defined structure and annotation of the passed module.

        Args:
        ----
            question (str): The question to be answered.

            prompt (str): The prompt to be used for the query, instructing the
                LLM of its task and the module context.

            conversation: The conversation object used for parameterising the
                query.

            module: The Python module to be used for the query.

            generated_classes: The list of Pydantic classes to be used for the
                query. If not provided, the classes will be generated from the
                module. Allows for external injection of classes for testing
                purposes.

        Returns:
        -------
            list[BaseAPIModel]: the parameterised query object (Pydantic
                model)

        """
        if generated_classes is None:
            tools = generate_pydantic_classes(module)

        runnable = self.create_runnable(
            conversation=conversation,
            query_parameters=tools,
        )

        query = [
            ("system", prompt),
            ("human", question),
        ]

        return runnable.invoke(query)

create_runnable(query_parameters, conversation)

Create a runnable object for the query builder.


query_parameters: The list of Pydantic classes to be used for the
    query.

conversation: The conversation object used for parameterising the
    query.

The runnable object for the query builder.
Source code in biochatter/api_agent/python/generic_agent.py
def create_runnable(
    self,
    query_parameters: list["BaseAPIModel"],
    conversation: Conversation,
) -> Callable:
    """Create a runnable object for the query builder.

    Args:
    ----
        query_parameters: The list of Pydantic classes to be used for the
            query.

        conversation: The conversation object used for parameterising the
            query.

    Returns:
    -------
        The runnable object for the query builder.

    """
    runnable = conversation.chat.bind_tools(query_parameters, tool_choice="required")
    return runnable | PydanticToolsParser(tools=query_parameters)

parameterise_query(question, prompt, conversation, module, generated_classes=None)

Parameterise tool calls for any Python module.

Generate a list of parameterised BaseModel instances based on the given question, prompt, and BioChatter conversation. Uses a Pydantic model to define the API fields.

Using langchain's bind_tools method to allow the LLM to parameterise the function call, based on the functions available in the module.

Relies on defined structure and annotation of the passed module.


question (str): The question to be answered.

prompt (str): The prompt to be used for the query, instructing the
    LLM of its task and the module context.

conversation: The conversation object used for parameterising the
    query.

module: The Python module to be used for the query.

generated_classes: The list of Pydantic classes to be used for the
    query. If not provided, the classes will be generated from the
    module. Allows for external injection of classes for testing
    purposes.

list[BaseAPIModel]: the parameterised query object (Pydantic
    model)
Source code in biochatter/api_agent/python/generic_agent.py
def parameterise_query(
    self,
    question: str,
    prompt: str,
    conversation: "Conversation",
    module: ModuleType,
    generated_classes: list[BaseAPIModel] | None = None,
) -> list[BaseAPIModel]:
    """Parameterise tool calls for any Python module.

    Generate a list of parameterised BaseModel instances based on the given
    question, prompt, and BioChatter conversation. Uses a Pydantic model
    to define the API fields.

    Using langchain's `bind_tools` method to allow the LLM to parameterise
    the function call, based on the functions available in the module.

    Relies on defined structure and annotation of the passed module.

    Args:
    ----
        question (str): The question to be answered.

        prompt (str): The prompt to be used for the query, instructing the
            LLM of its task and the module context.

        conversation: The conversation object used for parameterising the
            query.

        module: The Python module to be used for the query.

        generated_classes: The list of Pydantic classes to be used for the
            query. If not provided, the classes will be generated from the
            module. Allows for external injection of classes for testing
            purposes.

    Returns:
    -------
        list[BaseAPIModel]: the parameterised query object (Pydantic
            model)

    """
    if generated_classes is None:
        tools = generate_pydantic_classes(module)

    runnable = self.create_runnable(
        conversation=conversation,
        query_parameters=tools,
    )

    query = [
        ("system", prompt),
        ("human", question),
    ]

    return runnable.invoke(query)

AutoGenerate Pydantic classes for each callable.

This module provides a function to generate Pydantic classes for each callable (function/method) in a given module. It extracts parameters from docstrings using docstring-parser and creates Pydantic models with fields corresponding to the parameters. If a parameter name conflicts with BaseModel attributes, it is aliased.

Examples

import scanpy as sc generated_classes = generate_pydantic_classes(sc.tl) for model in generated_classes: ... print(model.schema())

generate_pydantic_classes(module)

Generate Pydantic classes for each callable.

For each callable (function/method) in a given module. Extracts parameters from docstrings using docstring-parser. Each generated class has fields corresponding to the parameters of the function. If a parameter name conflicts with BaseModel attributes, it is aliased.

Params:

module : ModuleType The Python module from which to extract functions and generate models.

Returns

list[Type[BaseModel]] A list of Pydantic model classes corresponding to each function found in module.

Notes

  • For now, all parameter types are set to Any to avoid complications with complex or external classes that are not easily JSON-serializable.
  • Optional parameters (those with a None default) are represented as Optional[Any].
  • Required parameters (no default) use ... to indicate that the field is required.
Source code in biochatter/api_agent/python/autogenerate_model.py
def generate_pydantic_classes(module: ModuleType) -> list[type[BaseAPIModel]]:
    """Generate Pydantic classes for each callable.

    For each callable (function/method) in a given module. Extracts parameters
    from docstrings using docstring-parser. Each generated class has fields
    corresponding to the parameters of the function. If a parameter name
    conflicts with BaseModel attributes, it is aliased.

    Params:
    -------
    module : ModuleType
        The Python module from which to extract functions and generate models.

    Returns
    -------
    list[Type[BaseModel]]
        A list of Pydantic model classes corresponding to each function found in
            `module`.

    Notes
    -----
    - For now, all parameter types are set to `Any` to avoid complications with
      complex or external classes that are not easily JSON-serializable.
    - Optional parameters (those with a None default) are represented as
      `Optional[Any]`.
    - Required parameters (no default) use `...` to indicate that the field is
      required.

    """
    base_attributes = set(dir(BaseAPIModel))
    classes_list = []

    for name, func in inspect.getmembers(module, inspect.isfunction):
        # Skip private/internal functions (e.g., _something)
        if name.startswith("_"):
            continue

        # Parse docstring for parameter descriptions
        doc = inspect.getdoc(func) or ""
        parsed_doc = parse(doc)
        doc_params = {p.arg_name: p.description or "No description available." for p in parsed_doc.params}

        sig = inspect.signature(func)
        fields = {}

        for param_name, param in sig.parameters.items():
            # Skip *args and **kwargs for now
            if param_name in ("args", "kwargs"):
                continue

            # Fetch docstring description or fallback
            description = doc_params.get(param_name, "No description available.")

            # Determine default value
            # If no default, we use `...` indicating a required field
            if param.default is not inspect.Parameter.empty:
                default_value = param.default

                # Convert MappingProxyType to a dict for JSON compatibility
                if isinstance(default_value, MappingProxyType):
                    default_value = dict(default_value)

                # Handle non-JSON-compliant float values by converting to string
                if default_value in [float("inf"), float("-inf"), float("nan"), float("-nan")]:
                    default_value = str(default_value)
            else:
                default_value = ...  # No default means required

            # For now, all parameter types are Any
            annotation = Any

            # Append the original annotation as a note in the description if
            # available
            if param.annotation is not inspect.Parameter.empty:
                description += f"\nOriginal type annotation: {param.annotation}"

            # If default_value is None, parameter can be Optional
            # If not required, mark as Optional[Any]
            if default_value is None:
                annotation = Any | None

            # Prepare field kwargs
            field_kwargs = {"description": description, "default": default_value}

            # If field name conflicts with BaseModel attributes, alias it
            field_name = param_name
            if param_name in base_attributes:
                alias_name = param_name + "_param"
                field_kwargs["alias"] = param_name
                field_name = alias_name

            fields[field_name] = (annotation, Field(**field_kwargs))

        # Create the Pydantic model

        tl_parameters_model = create_model(
            name,
            **fields,
            __base__=BaseAPIModel,
        )
        classes_list.append(tl_parameters_model)
    return classes_list

Scanpy modules

Module for generating anndata queries using LLM tools.

AnnDataIOQueryBuilder

Bases: BaseQueryBuilder

A class for building a AnndataIO query object.

Source code in biochatter/api_agent/python/anndata_agent.py
class AnnDataIOQueryBuilder(BaseQueryBuilder):
    """A class for building a AnndataIO query object."""

    def create_runnable(
        self,
        query_parameters: list["BaseAPIModel"],
        conversation: "Conversation",
    ) -> Callable:
        """Create a runnable object for executing queries.

        Create runnable using the LangChain `create_structured_output_runnable`
        method.

        Args:
        ----
            query_parameters: A Pydantic data model that specifies the fields of
                the API that should be queried.

            conversation: A BioChatter conversation object.

        Returns:
        -------
            A Callable object that can execute the query.

        """
        runnable = conversation.chat.bind_tools(query_parameters, tool_choice="required")
        return runnable | PydanticToolsParser(tools=query_parameters)

    def parameterise_query(
        self,
        question: str,
        conversation: "Conversation",
    ) -> list["BaseModel"]:
        """Generate a AnnDataIOQuery object.

        Generates the object based on the given question, prompt, and
        BioChatter conversation. Uses a Pydantic model to define the API fields.
        Creates a runnable that can be invoked on LLMs that are qualified to
        parameterise functions.

        Args:
        ----
            question (str): The question to be answered.

            conversation: The conversation object used for parameterising the
                AnnDataIOQuery.

        Returns:
        -------
            AnnDataIOQuery: the parameterised query object (Pydantic model)

        """
        tools = [
            ReadCSV,
            ReadExcel,
            ReadH5AD,
            ReadHDF,
            ReadLoom,
            ReadMTX,
            ReadText,
            ReadZarr,
            ConcatenateAnnData,
            MapAnnData,
        ]
        runnable = self.create_runnable(
            conversation=conversation,
            query_parameters=tools,
        )
        query = [
            ("system", ANNDATA_IO_QUERY_PROMPT),
            ("human", f"{question}"),
        ]
        return runnable.invoke(
            query,
        )

create_runnable(query_parameters, conversation)

Create a runnable object for executing queries.

Create runnable using the LangChain create_structured_output_runnable method.


query_parameters: A Pydantic data model that specifies the fields of
    the API that should be queried.

conversation: A BioChatter conversation object.

A Callable object that can execute the query.
Source code in biochatter/api_agent/python/anndata_agent.py
def create_runnable(
    self,
    query_parameters: list["BaseAPIModel"],
    conversation: "Conversation",
) -> Callable:
    """Create a runnable object for executing queries.

    Create runnable using the LangChain `create_structured_output_runnable`
    method.

    Args:
    ----
        query_parameters: A Pydantic data model that specifies the fields of
            the API that should be queried.

        conversation: A BioChatter conversation object.

    Returns:
    -------
        A Callable object that can execute the query.

    """
    runnable = conversation.chat.bind_tools(query_parameters, tool_choice="required")
    return runnable | PydanticToolsParser(tools=query_parameters)

parameterise_query(question, conversation)

Generate a AnnDataIOQuery object.

Generates the object based on the given question, prompt, and BioChatter conversation. Uses a Pydantic model to define the API fields. Creates a runnable that can be invoked on LLMs that are qualified to parameterise functions.


question (str): The question to be answered.

conversation: The conversation object used for parameterising the
    AnnDataIOQuery.

AnnDataIOQuery: the parameterised query object (Pydantic model)
Source code in biochatter/api_agent/python/anndata_agent.py
def parameterise_query(
    self,
    question: str,
    conversation: "Conversation",
) -> list["BaseModel"]:
    """Generate a AnnDataIOQuery object.

    Generates the object based on the given question, prompt, and
    BioChatter conversation. Uses a Pydantic model to define the API fields.
    Creates a runnable that can be invoked on LLMs that are qualified to
    parameterise functions.

    Args:
    ----
        question (str): The question to be answered.

        conversation: The conversation object used for parameterising the
            AnnDataIOQuery.

    Returns:
    -------
        AnnDataIOQuery: the parameterised query object (Pydantic model)

    """
    tools = [
        ReadCSV,
        ReadExcel,
        ReadH5AD,
        ReadHDF,
        ReadLoom,
        ReadMTX,
        ReadText,
        ReadZarr,
        ConcatenateAnnData,
        MapAnnData,
    ]
    runnable = self.create_runnable(
        conversation=conversation,
        query_parameters=tools,
    )
    query = [
        ("system", ANNDATA_IO_QUERY_PROMPT),
        ("human", f"{question}"),
    ]
    return runnable.invoke(
        query,
    )

ConcatenateAnnData

Bases: BaseAPIModel

Concatenate AnnData objects along an axis.

Source code in biochatter/api_agent/python/anndata_agent.py
class ConcatenateAnnData(BaseAPIModel):
    """Concatenate AnnData objects along an axis."""

    method_name: str = Field(default="anndata.concat", description="NEVER CHANGE")
    adatas: list | dict = Field(
        ...,
        description=(
            "The objects to be concatenated. "
            "Either a list of AnnData objects or a mapping of keys to AnnData objects."
        ),
    )
    axis: str = Field(
        default="obs",
        description="Axis to concatenate along. Can be 'obs' (0) or 'var' (1). Default is 'obs'.",
    )
    join: str = Field(
        default="inner",
        description="How to align values when concatenating. Options: 'inner' or 'outer'. Default is 'inner'.",
    )
    merge: str | Callable | None = Field(
        default=None,
        description=(
            "How to merge elements not aligned to the concatenated axis. "
            "Strategies include 'same', 'unique', 'first', 'only', or a callable function."
        ),
    )
    uns_merge: str | Callable | None = Field(
        default=None,
        description="How to merge the .uns elements. Uses the same strategies as 'merge'.",
    )
    label: str | None = Field(
        default=None,
        description="Column in axis annotation (.obs or .var) to place batch information. Default is None.",
    )
    keys: list | None = Field(
        default=None,
        description=(
            "Names for each object being concatenated. "
            "Used for column values or appended to the index if 'index_unique' is not None. "
            "Default is None."
        ),
    )
    index_unique: str | None = Field(
        default=None,
        description="Delimiter for making the index unique. When None, original indices are kept.",
    )
    fill_value: Any | None = Field(
        default=None,
        description="Value used to fill missing indices when join='outer'. Default behavior depends on array type.",
    )
    pairwise: bool = Field(
        default=False,
        description="Include pairwise elements along the concatenated dimension. Default is False.",
    )

MapAnnData

Bases: BaseAPIModel

Apply mapping functions to elements of AnnData.

Source code in biochatter/api_agent/python/anndata_agent.py
class MapAnnData(BaseAPIModel):
    """Apply mapping functions to elements of AnnData."""

    method_name: str = Field(
        default="anndata.obs|var['annotation_name'].map",
        description=(
            "ALWAYS ALWAYS ALWAYS REPLACE THE anndata BY THE ONE GIVEN BY THE INPUT"
            "Specifies the AnnData attribute and operation being performed. "
            "For example, 'obs.map' applies a mapping function or dictionary to the specified column in `adata.obs`. "
            "This must always include the AnnData component and the `.map` operation. "
            "Adapt the component (e.g., 'obs', 'var', etc.) to the specific use case."
        ),
    )
    dics: dict | None = Field(default=None, description="Dictionary to map over.")

ReadCSV

Bases: BaseAPIModel

Read .csv file.

Source code in biochatter/api_agent/python/anndata_agent.py
class ReadCSV(BaseAPIModel):
    """Read .csv file."""

    method_name: str = Field(default="io.read_csv", description="NEVER CHANGE")
    filename: str = Field(
        default="placeholder.csv",
        description="Path to the .csv file",
    )
    delimiter: str | None = Field(
        None,
        description="Delimiter used in the .csv file",
    )
    first_column_names: bool | None = Field(
        None,
        description="Whether the first column contains names",
    )

ReadExcel

Bases: BaseAPIModel

Read .xlsx (Excel) file.

Source code in biochatter/api_agent/python/anndata_agent.py
class ReadExcel(BaseAPIModel):
    """Read .xlsx (Excel) file."""

    method_name: str = Field(default="io.read_excel", description="NEVER CHANGE")
    filename: str = Field(
        default="placeholder.xlsx",
        description="Path to the .xlsx file",
    )
    sheet: str | None = Field(None, description="Sheet name or index to read from")
    dtype: str | None = Field(
        None,
        description="Data type for the resulting dataframe",
    )

ReadH5AD

Bases: BaseAPIModel

Read .h5ad-formatted hdf5 file.

Source code in biochatter/api_agent/python/anndata_agent.py
class ReadH5AD(BaseAPIModel):
    """Read .h5ad-formatted hdf5 file."""

    method_name: str = Field(default="io.read_h5ad", description="NEVER CHANGE")
    filename: str = Field(default="dummy.h5ad", description="Path to the .h5ad file")
    backed: str | None = Field(
        default=None,
        description="Mode to access file: None, 'r' for read-only",
    )
    as_sparse: str | None = Field(
        default=None,
        description="Convert to sparse format: 'csr', 'csc', or None",
    )
    as_sparse_fmt: str | None = Field(
        default=None,
        description="Sparse format if converting, e.g., 'csr'",
    )
    index_unique: str | None = Field(
        default=None,
        description="Make index unique by appending suffix if needed",
    )

ReadHDF

Bases: BaseAPIModel

Read .h5 (hdf5) file.

Source code in biochatter/api_agent/python/anndata_agent.py
class ReadHDF(BaseAPIModel):
    """Read .h5 (hdf5) file."""

    method_name: str = Field(default="io.read_hdf", description="NEVER CHANGE")
    filename: str = Field(default="placeholder.h5", description="Path to the .h5 file")
    key: str | None = Field(None, description="Group key within the .h5 file")

ReadLoom

Bases: BaseAPIModel

Read .loom-formatted hdf5 file.

Source code in biochatter/api_agent/python/anndata_agent.py
class ReadLoom(BaseAPIModel):
    """Read .loom-formatted hdf5 file."""

    method_name: str = Field(default="io.read_loom", description="NEVER CHANGE")
    filename: str = Field(
        default="placeholder.loom",
        description="Path to the .loom file",
    )
    sparse: bool | None = Field(None, description="Whether to read data as sparse")
    cleanup: bool | None = Field(None, description="Clean up invalid entries")
    X_name: str | None = Field(None, description="Name to use for X matrix")
    obs_names: str | None = Field(
        None,
        description="Column to use for observation names",
    )
    var_names: str | None = Field(
        None,
        description="Column to use for variable names",
    )

ReadMTX

Bases: BaseAPIModel

Read .mtx file.

Source code in biochatter/api_agent/python/anndata_agent.py
class ReadMTX(BaseAPIModel):
    """Read .mtx file."""

    method_name: str = Field(default="io.read_mtx", description="NEVER CHANGE")
    filename: str = Field(
        default="placeholder.mtx",
        description="Path to the .mtx file",
    )
    dtype: str | None = Field(None, description="Data type for the matrix")

ReadText

Bases: BaseAPIModel

Read .txt, .tab, .data (text) file.

Source code in biochatter/api_agent/python/anndata_agent.py
class ReadText(BaseAPIModel):
    """Read .txt, .tab, .data (text) file."""

    method_name: str = Field(default="io.read_text", description="NEVER CHANGE")
    filename: str = Field(
        default="placeholder.txt",
        description="Path to the text file",
    )
    delimiter: str | None = Field(None, description="Delimiter used in the file")
    first_column_names: bool | None = Field(
        None,
        description="Whether the first column contains names",
    )

ReadZarr

Bases: BaseAPIModel

Read from a hierarchical Zarr array store.

Source code in biochatter/api_agent/python/anndata_agent.py
class ReadZarr(BaseAPIModel):
    """Read from a hierarchical Zarr array store."""

    method_name: str = Field(default="io.read_zarr", description="NEVER CHANGE")
    filename: str = Field(
        default="placeholder.zarr",
        description="Path or URL to the Zarr store",
    )

Module for interacting with the scanpy API for plotting (pl).

ScanpyPlDrawGraphQueryParameters

Bases: BaseModel

Parameters for querying the Scanpy pl.draw_graph API.

Source code in biochatter/api_agent/python/scanpy_pl_full.py
class ScanpyPlDrawGraphQueryParameters(BaseModel):
    """Parameters for querying the Scanpy `pl.draw_graph` API."""

    method_name: str = Field(
        default="sc.pl.draw_graph",
        description="The name of the method to call.",
    )
    question_uuid: str | None = Field(
        default=None,
        description="Unique identifier for the question.",
    )
    adata: str = Field(
        ...,
        description="Annotated data matrix.",
    )
    color: str | list[str] | None = Field(
        default=None,
        description="Keys for annotations of observations/cells or variables/genes.",
    )
    gene_symbols: str | None = Field(
        default=None,
        description="Column name in `.var` DataFrame that stores gene symbols.",
    )
    use_raw: bool | None = Field(
        default=None,
        description="Use `.raw` attribute of `adata` for coloring with gene expression.",
    )
    sort_order: bool = Field(
        default=True,
        description=(
            "For continuous annotations used as color parameter, "
            "plot data points with higher values on top of others."
        ),
    )
    edges: bool = Field(
        default=False,
        description="Show edges.",
    )
    edges_width: float = Field(
        default=0.1,
        description="Width of edges.",
    )
    edges_color: str | list[float] | list[str] = Field(
        default="grey",
        description="Color of edges.",
    )
    neighbors_key: str | None = Field(
        default=None,
        description="Where to look for neighbors connectivities.",
    )
    arrows: bool = Field(
        default=False,
        description="Show arrows (deprecated in favor of `scvelo.pl.velocity_embedding`).",
    )
    arrows_kwds: dict[str, Any] | None = Field(
        default=None,
        description="Arguments passed to `quiver()`.",
    )
    groups: str | list[str] | None = Field(
        default=None,
        description="Restrict to a few categories in categorical observation annotation.",
    )
    components: str | list[str] | None = Field(
        default=None,
        description="For instance, ['1,2', '2,3']. To plot all available components use components='all'.",
    )
    projection: str = Field(
        default="2d",
        description="Projection of plot.",
    )
    legend_loc: str = Field(
        default="right margin",
        description="Location of legend.",
    )
    legend_fontsize: int | float | str | None = Field(
        default=None,
        description="Numeric size in pt or string describing the size.",
    )
    legend_fontweight: int | str = Field(
        default="bold",
        description="Legend font weight.",
    )
    legend_fontoutline: int | None = Field(
        default=None,
        description="Line width of the legend font outline in pt.",
    )
    colorbar_loc: str | None = Field(
        default="right",
        description="Where to place the colorbar for continuous variables.",
    )
    size: float | list[float] | None = Field(
        default=None,
        description="Point size. If None, is automatically computed as 120000 / n_cells.",
    )
    color_map: str | Any | None = Field(
        default=None,
        description="Color map to use for continuous variables.",
    )
    palette: str | list[str] | Any | None = Field(
        default=None,
        description="Colors to use for plotting categorical annotation groups.",
    )
    na_color: str | tuple[float, ...] = Field(
        default="lightgray",
        description="Color to use for null or masked values.",
    )
    na_in_legend: bool = Field(
        default=True,
        description="If there are missing values, whether they get an entry in the legend.",
    )
    frameon: bool | None = Field(
        default=None,
        description="Draw a frame around the scatter plot.",
    )
    vmin: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="The value representing the lower limit of the color scale.",
    )
    vmax: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="The value representing the upper limit of the color scale.",
    )
    vcenter: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="The value representing the center of the color scale.",
    )
    norm: Any | None = Field(
        default=None,
        description="Normalization for the colormap.",
    )
    add_outline: bool = Field(
        default=False,
        description="Add a thin border around groups of dots.",
    )
    outline_width: tuple[float, ...] = Field(
        default=(0.3, 0.05),
        description="Width of the outline as a fraction of the scatter dot size.",
    )
    outline_color: tuple[str, ...] = Field(
        default=("black", "white"),
        description="Colors for the outline: border color and gap color.",
    )
    ncols: int = Field(
        default=4,
        description="Number of panels per row.",
    )
    hspace: float = Field(
        default=0.25,
        description="Height of the space between multiple panels.",
    )
    wspace: float | None = Field(
        default=None,
        description="Width of the space between multiple panels.",
    )
    return_fig: bool | None = Field(
        default=None,
        description="Return the matplotlib figure.",
    )
    show: bool | None = Field(
        default=None,
        description="Show the plot; do not return axis.",
    )
    save: str | bool | None = Field(
        default=None,
        description="If `True` or a `str`, save the figure.",
    )
    ax: Any | None = Field(
        default=None,
        description="A matplotlib axes object.",
    )
    layout: str | None = Field(
        default=None,
        description="One of the `draw_graph()` layouts.",
    )
    kwargs: dict[str, Any] | None = Field(
        default=None,
        description="Additional arguments passed to `matplotlib.pyplot.scatter()`.",
    )

ScanpyPlPcaQueryParameters

Bases: BaseModel

Parameters for querying the scanpy pl.pca API.

Source code in biochatter/api_agent/python/scanpy_pl_full.py
class ScanpyPlPcaQueryParameters(BaseModel):
    """Parameters for querying the scanpy `pl.pca` API."""

    method_name: str = Field(
        default="sc.pl.pca",
        description="The name of the method to call.",
    )
    question_uuid: str | None = Field(
        default=None,
        description="Unique identifier for the question.",
    )
    adata: str = Field(
        ...,
        description="Annotated data matrix.",
    )
    color: str | list[str] | None = Field(
        default=None,
        description="Keys for annotations of observations/cells or variables/genes.",
    )
    components: str | list[str] = Field(
        default="1,2",
        description="For example, ['1,2', '2,3']. To plot all available components use 'all'.",
    )
    projection: str = Field(
        default="2d",
        description="Projection of plot.",
    )
    legend_loc: str = Field(
        default="right margin",
        description="Location of legend.",
    )
    legend_fontsize: int | float | str | None = Field(
        default=None,
        description="Font size for legend.",
    )
    legend_fontweight: int | str | None = Field(
        default=None,
        description="Font weight for legend.",
    )
    color_map: str | None = Field(
        default=None,
        description="String denoting matplotlib color map.",
    )
    palette: str | list[str] | dict | None = Field(
        default=None,
        description="Colors to use for plotting categorical annotation groups.",
    )
    frameon: bool | None = Field(
        default=None,
        description="Draw a frame around the scatter plot.",
    )
    size: int | float | None = Field(
        default=None,
        description="Point size. If `None`, is automatically computed as 120000 / n_cells.",
    )
    show: bool | None = Field(
        default=None,
        description="Show the plot, do not return axis.",
    )
    save: str | bool | None = Field(
        default=None,
        description="If `True` or a `str`, save the figure.",
    )
    ax: str | None = Field(
        default=None,
        description="A matplotlib axes object.",
    )
    return_fig: bool = Field(
        default=False,
        description="Return the matplotlib figure object.",
    )
    marker: str | None = Field(
        default=".",
        description="Marker symbol.",
    )
    annotate_var_explained: bool = Field(
        default=False,
        description="Annotate the percentage of explained variance.",
    )

ScanpyPlQueryBuilder

Bases: BaseQueryBuilder

A class for building a AnndataIO query object.

Source code in biochatter/api_agent/python/scanpy_pl_full.py
class ScanpyPlQueryBuilder(BaseQueryBuilder):
    """A class for building a AnndataIO query object."""

    def create_runnable(
        self,
        query_parameters: list["BaseAPIModel"],
        conversation: "Conversation",
    ) -> Callable:
        """Create a runnable object for executing queries.

        Create runnable using the LangChain `create_structured_output_runnable`
        method.

        Args:
        ----
            query_parameters: A Pydantic data model that specifies the fields of
                the API that should be queried.

            conversation: A BioChatter conversation object.

        Returns:
        -------
            A Callable object that can execute the query.

        """
        runnable = conversation.chat.bind_tools(query_parameters)
        return runnable | PydanticToolsParser(tools=query_parameters)

    def parameterise_query(
        self,
        question: str,
        conversation: "Conversation",
    ) -> list["BaseModel"]:
        """Generate a AnnDataIOQuery object.

        Generates the object based on the given question, prompt, and
        BioChatter conversation. Uses a Pydantic model to define the API fields.
        Creates a runnable that can be invoked on LLMs that are qualified to
        parameterise functions.

        Args:
        ----
            question (str): The question to be answered.

            conversation: The conversation object used for parameterising the
                ScanpyPlQuery.

        Returns:
        -------
            ScanpyPlQuery: the parameterised query object (Pydantic model)

        """
        tools = [
            ScanpyPlScatterQueryParameters,
            ScanpyPlPcaQueryParameters,
            ScanpyPlTsneQueryParameters,
            ScanpyPlUmapQueryParameters,
            ScanpyPlDrawGraphQueryParameters,
            ScanpyPlSpatialQueryParameters,
        ]
        runnable = self.create_runnable(conversation=conversation, query_parameters=tools)
        return runnable.invoke(question)

create_runnable(query_parameters, conversation)

Create a runnable object for executing queries.

Create runnable using the LangChain create_structured_output_runnable method.


query_parameters: A Pydantic data model that specifies the fields of
    the API that should be queried.

conversation: A BioChatter conversation object.

A Callable object that can execute the query.
Source code in biochatter/api_agent/python/scanpy_pl_full.py
def create_runnable(
    self,
    query_parameters: list["BaseAPIModel"],
    conversation: "Conversation",
) -> Callable:
    """Create a runnable object for executing queries.

    Create runnable using the LangChain `create_structured_output_runnable`
    method.

    Args:
    ----
        query_parameters: A Pydantic data model that specifies the fields of
            the API that should be queried.

        conversation: A BioChatter conversation object.

    Returns:
    -------
        A Callable object that can execute the query.

    """
    runnable = conversation.chat.bind_tools(query_parameters)
    return runnable | PydanticToolsParser(tools=query_parameters)

parameterise_query(question, conversation)

Generate a AnnDataIOQuery object.

Generates the object based on the given question, prompt, and BioChatter conversation. Uses a Pydantic model to define the API fields. Creates a runnable that can be invoked on LLMs that are qualified to parameterise functions.


question (str): The question to be answered.

conversation: The conversation object used for parameterising the
    ScanpyPlQuery.

ScanpyPlQuery: the parameterised query object (Pydantic model)
Source code in biochatter/api_agent/python/scanpy_pl_full.py
def parameterise_query(
    self,
    question: str,
    conversation: "Conversation",
) -> list["BaseModel"]:
    """Generate a AnnDataIOQuery object.

    Generates the object based on the given question, prompt, and
    BioChatter conversation. Uses a Pydantic model to define the API fields.
    Creates a runnable that can be invoked on LLMs that are qualified to
    parameterise functions.

    Args:
    ----
        question (str): The question to be answered.

        conversation: The conversation object used for parameterising the
            ScanpyPlQuery.

    Returns:
    -------
        ScanpyPlQuery: the parameterised query object (Pydantic model)

    """
    tools = [
        ScanpyPlScatterQueryParameters,
        ScanpyPlPcaQueryParameters,
        ScanpyPlTsneQueryParameters,
        ScanpyPlUmapQueryParameters,
        ScanpyPlDrawGraphQueryParameters,
        ScanpyPlSpatialQueryParameters,
    ]
    runnable = self.create_runnable(conversation=conversation, query_parameters=tools)
    return runnable.invoke(question)

ScanpyPlScatterQueryParameters

Bases: BaseModel

Parameters for querying the scanpy pl.scatter API.

Source code in biochatter/api_agent/python/scanpy_pl_full.py
class ScanpyPlScatterQueryParameters(BaseModel):
    """Parameters for querying the scanpy `pl.scatter` API."""

    method_name: str = Field(
        default="sc.pl.scatter",
        description="The name of the method to call.",
    )
    question_uuid: str = Field(
        default_factory=lambda: str(uuid.uuid4()),
        description="Unique identifier for the question.",
    )
    adata: str = Field(description="Annotated data matrix.")
    x: str | None = Field(default=None, description="x coordinate.")
    y: str | None = Field(default=None, description="y coordinate.")
    color: str | tuple[float, ...] | list[str | tuple[float, ...]] | None = Field(
        default=None,
        description="Keys for annotations of observations/cells or variables/genes, or a hex color specification.",
    )
    use_raw: bool | None = Field(
        default=None,
        description="Whether to use raw attribute of adata. Defaults to True if .raw is present.",
    )
    layers: str | list[str] | None = Field(
        default=None,
        description="Layer(s) to use from adata's layers attribute.",
    )
    basis: str | None = Field(
        default=None,
        description="String that denotes a plotting tool that computed coordinates (e.g., 'pca', 'tsne', 'umap').",
    )
    sort_order: bool = Field(
        default=True,
        description="For continuous annotations used as color parameter, plot data points with higher values on top.",
    )
    groups: str | list[str] | None = Field(
        default=None,
        description="Restrict to specific categories in categorical observation annotation.",
    )
    projection: str = Field(
        default="2d",
        description="Projection of plot ('2d' or '3d').",
    )
    legend_loc: str | None = Field(
        default="right margin",
        description="Location of legend ('none', 'right margin', 'on data', etc.).",
    )
    size: int | float | None = Field(
        default=None,
        description="Point size. If None, automatically computed as 120000 / n_cells.",
    )
    color_map: str | None = Field(
        default=None,
        description="Color map to use for continuous variables (e.g., 'magma', 'viridis').",
    )
    show: bool | None = Field(
        default=None,
        description="Show the plot, do not return axis.",
    )
    save: str | bool | None = Field(
        default=None,
        description="If True or a str, save the figure. String is appended to default filename.",
    )

ScanpyPlSpatialQueryParameters

Bases: BaseModel

Parameters for querying the Scanpy pl.spatial API.

Source code in biochatter/api_agent/python/scanpy_pl_full.py
class ScanpyPlSpatialQueryParameters(BaseModel):
    """Parameters for querying the Scanpy `pl.spatial` API."""

    method_name: str = Field(
        default="sc.pl.spatial",
        description="The name of the method to call.",
    )
    question_uuid: str | None = Field(
        default=None,
        description="Unique identifier for the question.",
    )
    adata: str = Field(
        ...,
        description="Annotated data matrix.",
    )
    color: str | list[str] | None = Field(
        default=None,
        description="Keys for annotations of observations/cells or variables/genes.",
    )
    gene_symbols: str | None = Field(
        default=None,
        description="Column name in `.var` DataFrame that stores gene symbols.",
    )
    use_raw: bool | None = Field(
        default=None,
        description="Use `.raw` attribute of `adata` for coloring with gene expression.",
    )
    layer: str | None = Field(
        default=None,
        description="Name of the AnnData object layer to plot.",
    )
    library_id: str | None = Field(
        default=None,
        description="Library ID for Visium data, e.g., key in `adata.uns['spatial']`.",
    )
    img_key: str | None = Field(
        default=None,
        description=(
            "Key for image data, used to get `img` and `scale_factor` from "
            "'images' and 'scalefactors' entries for this library."
        ),
    )
    img: Any | None = Field(
        default=None,
        description="Image data to plot, overrides `img_key`.",
    )
    scale_factor: float | None = Field(
        default=None,
        description="Scaling factor used to map from coordinate space to pixel space.",
    )
    spot_size: float | None = Field(
        default=None,
        description="Diameter of spot (in coordinate space) for each point.",
    )
    crop_coord: tuple[int, ...] | None = Field(
        default=None,
        description="Coordinates to use for cropping the image (left, right, top, bottom).",
    )
    alpha_img: float = Field(
        default=1.0,
        description="Alpha value for image.",
    )
    bw: bool = Field(
        default=False,
        description="Plot image data in grayscale.",
    )
    sort_order: bool = Field(
        default=True,
        description=(
            "For continuous annotations used as color parameter, plot data points "
            "with higher values on top of others."
        ),
    )
    groups: str | list[str] | None = Field(
        default=None,
        description="Restrict to specific categories in categorical observation annotation.",
    )
    components: str | list[str] | None = Field(
        default=None,
        description="For example, ['1,2', '2,3']. To plot all available components, use 'all'.",
    )
    projection: str = Field(
        default="2d",
        description="Projection of plot.",
    )
    legend_loc: str = Field(
        default="right margin",
        description="Location of legend.",
    )
    legend_fontsize: int | float | str | None = Field(
        default=None,
        description="Numeric size in pt or string describing the size.",
    )
    legend_fontweight: int | str = Field(
        default="bold",
        description="Legend font weight.",
    )
    legend_fontoutline: int | None = Field(
        default=None,
        description="Line width of the legend font outline in pt.",
    )
    colorbar_loc: str | None = Field(
        default="right",
        description="Where to place the colorbar for continuous variables.",
    )
    size: float = Field(
        default=1.0,
        description="Point size. If None, automatically computed as 120000 / n_cells.",
    )
    color_map: str | Any | None = Field(
        default=None,
        description="Color map to use for continuous variables.",
    )
    palette: str | list[str] | Any | None = Field(
        default=None,
        description="Colors to use for plotting categorical annotation groups.",
    )
    na_color: str | tuple[float, ...] | None = Field(
        default=None,
        description="Color to use for null or masked values.",
    )
    na_in_legend: bool = Field(
        default=True,
        description="If there are missing values, whether they get an entry in the legend.",
    )
    frameon: bool | None = Field(
        default=None,
        description="Draw a frame around the scatter plot.",
    )
    vmin: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="The value representing the lower limit of the color scale.",
    )
    vmax: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="The value representing the upper limit of the color scale.",
    )
    vcenter: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="The value representing the center of the color scale.",
    )
    norm: Any | None = Field(
        default=None,
        description="Normalization for the colormap.",
    )
    add_outline: bool = Field(
        default=False,
        description="Add a thin border around groups of dots.",
    )
    outline_width: tuple[float, ...] = Field(
        default=(0.3, 0.05),
        description="Width of the outline as a fraction of the scatter dot size.",
    )
    outline_color: tuple[str, ...] = Field(
        default=("black", "white"),
        description="Colors for the outline: border color and gap color.",
    )
    ncols: int = Field(
        default=4,
        description="Number of panels per row.",
    )
    hspace: float = Field(
        default=0.25,
        description="Height of the space between multiple panels.",
    )
    wspace: float | None = Field(
        default=None,
        description="Width of the space between multiple panels.",
    )
    return_fig: bool | None = Field(
        default=None,
        description="Return the matplotlib figure.",
    )
    show: bool | None = Field(
        default=None,
        description="Show the plot; do not return axis.",
    )
    save: str | bool | None = Field(
        default=None,
        description="If `True` or a `str`, save the figure.",
    )
    ax: Any | None = Field(
        default=None,
        description="A matplotlib axes object.",
    )
    kwargs: dict[str, Any] | None = Field(
        default=None,
        description="Additional arguments passed to `matplotlib.pyplot.scatter()`.",
    )

ScanpyPlTsneQueryParameters

Bases: BaseModel

Parameters for querying the Scanpy pl.tsne API.

Source code in biochatter/api_agent/python/scanpy_pl_full.py
class ScanpyPlTsneQueryParameters(BaseModel):
    """Parameters for querying the Scanpy `pl.tsne` API."""

    method_name: str = Field(
        default="sc.pl.tsne",
        description="The name of the method to call.",
    )
    question_uuid: str | None = Field(
        default=None,
        description="Unique identifier for the question.",
    )
    adata: str = Field(
        ...,
        description="Annotated data matrix.",
    )
    color: str | list[str] | None = Field(
        default=None,
        description="Keys for annotations of observations/cells or variables/genes.",
    )
    gene_symbols: str | None = Field(
        default=None,
        description="Column name in `.var` DataFrame that stores gene symbols.",
    )
    use_raw: bool | None = Field(
        default=None,
        description="Use `.raw` attribute of `adata` for coloring with gene expression.",
    )
    sort_order: bool = Field(
        default=True,
        description="Plot data points with higher values on top for continuous annotations.",
    )
    edges: bool = Field(
        default=False,
        description="Show edges.",
    )
    edges_width: float = Field(
        default=0.1,
        description="Width of edges.",
    )
    edges_color: str | list[float] | list[str] = Field(
        default="grey",
        description="Color of edges.",
    )
    neighbors_key: str | None = Field(
        default=None,
        description="Key for neighbors connectivities.",
    )
    arrows: bool = Field(
        default=False,
        description="Show arrows (deprecated in favor of `scvelo.pl.velocity_embedding`).",
    )
    arrows_kwds: dict[str, Any] | None = Field(
        default=None,
        description="Arguments passed to `quiver()`.",
    )
    groups: str | None = Field(
        default=None,
        description="Restrict to specific categories in categorical observation annotation.",
    )
    components: str | list[str] | None = Field(
        default=None,
        description="Components to plot, e.g., ['1,2', '2,3']. Use 'all' to plot all available components.",
    )
    projection: str = Field(
        default="2d",
        description="Projection of plot ('2d' or '3d').",
    )
    legend_loc: str = Field(
        default="right margin",
        description="Location of legend.",
    )
    legend_fontsize: int | float | str | None = Field(
        default=None,
        description="Font size for legend.",
    )
    legend_fontweight: int | str = Field(
        default="bold",
        description="Font weight for legend.",
    )
    legend_fontoutline: int | None = Field(
        default=None,
        description="Line width of the legend font outline in pt.",
    )
    size: float | list[float] | None = Field(
        default=None,
        description="Point size. If `None`, computed as 120000 / n_cells.",
    )
    color_map: str | Any | None = Field(
        default=None,
        description="Color map for continuous variables.",
    )
    palette: str | list[str] | Any | None = Field(
        default=None,
        description="Colors for plotting categorical annotation groups.",
    )
    na_color: str | tuple[float, ...] = Field(
        default="lightgray",
        description="Color for null or masked values.",
    )
    na_in_legend: bool = Field(
        default=True,
        description="Include missing values in the legend.",
    )
    frameon: bool | None = Field(
        default=None,
        description="Draw a frame around the scatter plot.",
    )
    vmin: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="Lower limit of the color scale.",
    )
    vmax: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="Upper limit of the color scale.",
    )
    vcenter: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="Center of the color scale, useful for diverging colormaps.",
    )
    norm: Any | None = Field(
        default=None,
        description="Normalization for the colormap.",
    )
    add_outline: bool = Field(
        default=False,
        description="Add a thin border around groups of dots.",
    )
    outline_width: tuple[float, ...] = Field(
        default=(0.3, 0.05),
        description="Width of the outline as a fraction of the scatter dot size.",
    )
    outline_color: tuple[str, ...] = Field(
        default=("black", "white"),
        description="Colors for the outline: border color and gap color.",
    )
    ncols: int = Field(
        default=4,
        description="Number of panels per row.",
    )
    hspace: float = Field(
        default=0.25,
        description="Height of the space between multiple panels.",
    )
    wspace: float | None = Field(
        default=None,
        description="Width of the space between multiple panels.",
    )
    return_fig: bool | None = Field(
        default=None,
        description="Return the matplotlib figure.",
    )
    show: bool | None = Field(
        default=None,
        description="Show the plot; do not return axis.",
    )
    save: str | bool | None = Field(
        default=None,
        description="If `True` or a `str`, save the figure.",
    )
    ax: Any | None = Field(
        default=None,
        description="A matplotlib axes object.",
    )
    kwargs: dict[str, Any] | None = Field(
        default=None,
        description="Additional arguments passed to `matplotlib.pyplot.scatter()`.",
    )

ScanpyPlUmapQueryParameters

Bases: BaseModel

Parameters for querying the Scanpy pl.umap API.

Source code in biochatter/api_agent/python/scanpy_pl_full.py
class ScanpyPlUmapQueryParameters(BaseModel):
    """Parameters for querying the Scanpy `pl.umap` API."""

    method_name: str = Field(
        default="sc.pl.umap",
        description="The name of the method to call.",
    )
    question_uuid: str | None = Field(
        default=None,
        description="Unique identifier for the question.",
    )
    adata: str = Field(
        ...,
        description="Annotated data matrix.",
    )
    color: str | list[str] | None = Field(
        default=None,
        description="Keys for annotations of observations/cells or variables/genes.",
    )
    mask_obs: str | None = Field(
        default=None,
        description="Mask for observations.",
    )
    gene_symbols: str | None = Field(
        default=None,
        description="Column name in `.var` DataFrame that stores gene symbols.",
    )
    use_raw: bool | None = Field(
        default=None,
        description="Use `.raw` attribute of `adata` for coloring with gene expression.",
    )
    sort_order: bool = Field(
        default=True,
        description="Plot data points with higher values on top for continuous annotations.",
    )
    edges: bool = Field(
        default=False,
        description="Show edges.",
    )
    edges_width: float = Field(
        default=0.1,
        description="Width of edges.",
    )
    edges_color: str | list[float] | list[str] = Field(
        default="grey",
        description="Color of edges.",
    )
    neighbors_key: str | None = Field(
        default=None,
        description="Key for neighbors connectivities.",
    )
    arrows: bool = Field(
        default=False,
        description="Show arrows (deprecated in favor of `scvelo.pl.velocity_embedding`).",
    )
    arrows_kwds: dict[str, Any] | None = Field(
        default=None,
        description="Arguments passed to `quiver()`.",
    )
    groups: str | None = Field(
        default=None,
        description="Restrict to specific categories in categorical observation annotation.",
    )
    components: str | list[str] | None = Field(
        default=None,
        description="Components to plot, e.g., ['1,2', '2,3']. Use 'all' to plot all available components.",
    )
    dimensions: int | None = Field(
        default=None,
        description="Number of dimensions to plot.",
    )
    layer: str | None = Field(
        default=None,
        description="Name of the AnnData object layer to plot.",
    )
    projection: str = Field(
        default="2d",
        description="Projection of plot ('2d' or '3d').",
    )
    scale_factor: float | None = Field(
        default=None,
        description="Scale factor for the plot.",
    )
    color_map: str | Any | None = Field(
        default=None,
        description="Color map for continuous variables.",
    )
    cmap: str | Any | None = Field(
        default=None,
        description="Alias for `color_map`.",
    )
    palette: str | list[str] | Any | None = Field(
        default=None,
        description="Colors for plotting categorical annotation groups.",
    )
    na_color: str | tuple[float, ...] = Field(
        default="lightgray",
        description="Color for null or masked values.",
    )
    na_in_legend: bool = Field(
        default=True,
        description="Include missing values in the legend.",
    )
    size: float | list[float] | None = Field(
        default=None,
        description="Point size. If `None`, computed as 120000 / n_cells.",
    )
    frameon: bool | None = Field(
        default=None,
        description="Draw a frame around the scatter plot.",
    )
    legend_fontsize: int | float | str | None = Field(
        default=None,
        description="Font size for legend.",
    )
    legend_fontweight: int | str = Field(
        default="bold",
        description="Font weight for legend.",
    )
    legend_loc: str = Field(
        default="right margin",
        description="Location of legend.",
    )
    legend_fontoutline: int | None = Field(
        default=None,
        description="Line width of the legend font outline in pt.",
    )
    colorbar_loc: str = Field(
        default="right",
        description="Location of the colorbar.",
    )
    vmax: str | float | Any | list[str | float | Any] | None = Field(
        default=None,
        description="Upper limit of the color scale.",
    )
    vmin: str | float | Any | list[str | float | Any] | None = Field(
        default=None,