Skip to content

generate synthetic fastq buffer

Mojo function 🡭

def generate_synthetic_fastq_buffer(num_reads: Int, min_length: Int, max_length: Int, min_phred: Int, max_phred: Int, quality_schema: String, gc_bias: Float32 = 0.5) -> List[Byte]

Generate a contiguous in-memory FASTQ buffer with configurable read length and quality distribution.

Read lengths are chosen deterministically in [min_length, max_length] (inclusive). Per-base Phred scores follow a positional decay model (high quality at 5’ end, degrading toward 3’ end), mimicking real Illumina quality profiles. Base composition follows a configurable GC content model with pseudorandom distribution.

Args:

  • num_reads (Int): Number of FASTQ records to generate.
  • min_length (Int): Minimum sequence length per read (inclusive).
  • max_length (Int): Maximum sequence length per read (inclusive).
  • min_phred (Int): Minimum Phred score per base (inclusive) — used as the floor for 3’ end quality.
  • max_phred (Int): Maximum Phred score per base (inclusive) — used as the ceiling for 5’ end quality.
  • quality_schema (String): Schema name (e.g. “sanger”, “solexa”, “illumina_1.8”, “generic”).
  • gc_bias (Float32): Target GC fraction in [0.0, 1.0]. Default 0.5 (50% GC). Values above 0.5 increase G/C frequency; below 0.5 increase A/T frequency.

Returns:

List: List[Byte] containing valid 4-line FASTQ data; pass to MemoryReader for parsing.

Raises:

Error: If num_reads < 0, min_length > max_length, or min_phred > max_phred.