Skip to content

read_trees

Reads tree edges from a file and organizes them into a list of trees.

Each tree is represented as a list of tuples, where each tuple contains two integers representing a parent-child relationship. Trees are separated by empty lines or comment lines (lines starting with #) in the file.

Parameters:

Name Type Description Default
file_path str

The path to the file containing tree edge data. The file should have edge delimited integers on each line representing parent-child relationships.

required
sep str

The delimiter between parent and child in the input file (default is " ").

' '

Returns:

Type Description
list of list of tuple of int

A list of trees, where each tree is a list of tuples. Each tuple contains two integers representing a parent-child relationship.

Notes
  • Empty lines and lines starting with # are treated as separators between trees.
  • If the file ends without a separator, the last tree is still included in the output.

Examples:

Given a file path/to/file with the following content:

# Tree 1
1 2
1 3
# Tree 2
1 2
2 3
>>> read_trees("path/to/file")
[[(1, 2), (1, 3)], [(4, 5), (4, 6)]]
Source code in src/arborist/utils.py
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def read_trees(file_path, sep=" "):
    """
    Reads tree edges from a file and organizes them into a list of trees.

    Each tree is represented as a list of tuples, where each tuple contains
    two integers representing a parent-child relationship. Trees are separated
    by empty lines or comment lines (lines starting with `#`) in the file.

    Parameters
    ----------
    file_path : str
        The path to the file containing tree edge data. The file should have
        edge delimited integers on each line representing parent-child
        relationships.
    sep : str, optional
        The delimiter between parent and child in the input file (default is ``" "``).

    Returns
    -------
    list of list of tuple of int
        A list of trees, where each tree is a list of tuples. Each tuple
        contains two integers representing a parent-child relationship.

    Notes
    -----
    - Empty lines and lines starting with `#` are treated as separators
      between trees.
    - If the file ends without a separator, the last tree is still included
      in the output.

    Examples
    --------
    Given a file `path/to/file` with the following content:

    ```
    # Tree 1
    1 2
    1 3
    # Tree 2
    1 2
    2 3
    ```



    >>> read_trees("path/to/file")
    [[(1, 2), (1, 3)], [(4, 5), (4, 6)]]
    """
    trees = []
    current_tree = []

    with open(file_path, "r") as file:
        for line in file:
            line = line.strip()
            if not line or "#" in line:  # Skip empty lines and comments
                if current_tree:  # Store previous tree before starting a new one
                    trees.append(current_tree)
                    current_tree = []
                continue

            # Convert space-separated numbers to tuple (parent, child)
            parts = line.split(sep)
            if len(parts) == 2:
                current_tree.append((int(parts[0]), int(parts[1])))

    if current_tree:  # Append last tree if exists
        trees.append(current_tree)

    return trees