part 3: applying llms in parallel with gemini

in the last sections, we talked about leveraging llms to accelerate our code review process by essentially brute-forcing the inspection of code primitives in bulk. now, let’s dive deeper into a concrete example: a python script that uses google’s “gemini” generative ai library to parallelize those queries. this script sets the stage for what we’ll actually prove in part 4, where i’ll show a working demonstration.

#!/usr/bin/env python3
"""
Send parallel requests to the Gemini API with up to four code files appended
to a user-supplied prompt. The list of code file paths is read from a text
file containing line-separated file paths.

Usage:
  python gemini_parallel.py --prompt "Your prompt here" \
                            --paths_file code_files.txt \
                            --api_key YOUR_API_KEY \
                            --limit 3 \
                            [--model gemini-1p5-beta] \
                            [--output_file results.txt]

Arguments:
  --prompt:       Required. The text prompt to be sent to the Gemini API.
  --paths_file:   Required. A text file with one file path per line.
  --api_key:      Optional. The Gemini API key. If not set, ensure
                  $API_KEY is in your environment.
  --limit:        Optional. Maximum number of concurrent requests (default=2).
  --model:        Optional. Name of the Gemini model to use (default="gemini-1.5-flash").
  --output_file:  Optional. File path to write results to. Outputs will still
                  be printed to stdout.
"""

import argparse
import os
import asyncio

import google.generativeai as genai


def parse_arguments():
    parser = argparse.ArgumentParser(
        description="Send parallel requests to the Gemini API with code file contents appended."
    )
    parser.add_argument(
        "--prompt",
        type=str,
        required=True,
        help="The prompt to be used as the basis for generating content."
    )
    parser.add_argument(
        "--paths_file",
        type=str,
        required=True,
        help="Path to a text file containing line-separated full paths to code files."
    )
    parser.add_argument(
        "--api_key",
        type=str,
        default=None,
        help="Gemini API key. If not set, must be present in environment as $API_KEY."
    )
    parser.add_argument(
        "--limit",
        type=int,
        default=2,
        help="Maximum number of concurrent requests to send (default=2)."
    )
    parser.add_argument(
        "--model",
        type=str,
        default="gemini-1.5-flash",
        help="Name of the Gemini model to use (default='gemini-1.5-flash')."
    )
    parser.add_argument(
        "--output_file",
        type=str,
        default=None,
        help="File path to write results to. Outputs will also be printed to stdout."
    )
    return parser.parse_args()


def chunk_list(items, chunk_size=3):
    """Yield successive chunks of `items` of size `chunk_size`."""
    for i in range(0, len(items), chunk_size):
        yield items[i:i + chunk_size]


async def generate_with_code_files(semaphore, model, user_prompt, file_paths):
    """
    Appends up to 4 code file contents (with filenames) to the user_prompt, 
    then calls the Gemini API. The call is wrapped in a semaphore for 
    concurrency control and uses to_thread for I/O-bound concurrency.
    """
    # Read code files
    code_contents = []
    for path in file_paths:
        try:
            with open(path, "r", encoding="utf-8") as f:
                code_contents.append(f.read())
        except Exception as e:
            code_contents.append(f"[Error reading file {path}: {e}]")

    # Construct the combined prompt: user prompt + filename + code
    prompt_sections = []
    for path, content in zip(file_paths, code_contents):
        prompt_sections.append(f"Filename: {path}\n{content}")

    combined_prompt = user_prompt + "\n\n" + "\n\n".join(prompt_sections)

    # Make the API call inside a semaphore (limit concurrency)
    async with semaphore:
        # The model.generate_content(...) call is synchronous, so wrap it in to_thread
        response = await asyncio.to_thread(model.generate_content, combined_prompt)
        return response.text


async def main():
    args = parse_arguments()

    # Ensure API key is configured
    if args.api_key:
        os.environ["API_KEY"] = args.api_key
    if "API_KEY" not in os.environ:
        raise ValueError("No API key found. Set --api_key or $API_KEY in your environment.")

    # Configure the Gemini library
    genai.configure(api_key=os.environ["API_KEY"])

    # Create a Gemini model instance using the specified --model value
    model = genai.GenerativeModel(args.model)

    # Read all code file paths from the specified file
    with open(args.paths_file, "r", encoding="utf-8") as f:
        all_paths = [line.strip() for line in f if line.strip()]

    # Prepare to run tasks in parallel
    tasks = []
    semaphore = asyncio.Semaphore(args.limit)  # concurrency limit

    # Break file paths into chunks of up to 4
    for chunk in chunk_list(all_paths, 3):
        tasks.append(
            asyncio.create_task(
                generate_with_code_files(semaphore, model, args.prompt, chunk)
            )
        )

    # Gather results
    results = await asyncio.gather(*tasks, return_exceptions=True)

    # If an output file was specified, open it in write mode
    if args.output_file:
        with open(args.output_file, "w", encoding="utf-8") as out_file:
            for i, result in enumerate(results, start=1):
                out_file.write(f"\n=== Response #{i} ===\n")
                if isinstance(result, Exception):
                    out_file.write(f"Error: {result}\n")
                else:
                    out_file.write(f"{result}\n")

    # Print to stdout regardless of whether output_file was specified
    for i, result in enumerate(results, start=1):
        print(f"\n=== Response #{i} ===")
        if isinstance(result, Exception):
            print(f"Error: {result}")
        else:
            print(result)


if __name__ == "__main__":
    asyncio.run(main())

the big picture

we’re automating the scanning of code files by passing them—in small batches—to an llm alongside a prompt that describes what we’re looking for. the idea is straightforward: if you suspect a certain function or pattern is dangerous, you don’t want to read thousands of lines manually to confirm its usage. instead, you let the llm read for you. the script below offloads that reading to gemini, letting you check multiple files in parallel.

how it works

prompt + file contents
the script combines a user-supplied prompt with the contents of up to three files (or however many you choose) at a time. it then calls gemini’s generate_content method. you’d typically shape your prompt to say something like, “analyze this code for xyz pattern” or “find calls to dangerous_primitive and describe how they’re used.”
concurrency with asyncio
manual calls to an llm for each file could be painfully slow, especially if you have hundreds of files. this script uses asyncio plus a semaphore to control concurrency, letting you send multiple requests simultaneously without overwhelming the api. if you have an api limit or just don’t want to hammer the service, you can adjust --limit to control how many requests happen at once.
batching approach
the function chunk_list groups file paths into smaller lists. each chunk is appended to the user prompt, and then the script calls gemini. the result is that each “task” involves a small batch of files, which might help keep the context size manageable (rather than a single monstrous prompt). once gemini sends back the text, you could grep it for the patterns you care about.
structured output
by printing or writing results to a file, you can keep track of which file triggered which response. if gemini’s output says, “found suspicious usage at line 45,” you can pivot directly to the relevant file. this structure is the key to bridging automated scanning with your manual verification. let the llm flag suspicious bits, then decide for yourself whether it’s a real vulnerability.

why this approach matters

scale: you can throw large codebases at the llm without getting bogged down in manual line-by-line reviews.
speed: concurrency via asyncio cuts down the waiting time for each request dramatically.
flexibility: unlike standard sast tools that rely on formal patterns, you can craft your prompt in plain language or direct the model to look for higher-level logic issues.
human-like context: llms are good at spotting context-dependent issues that older scanning tools might miss. they effectively mimic a human reading code, which can be invaluable when you’re searching for subtle usage patterns.

code walkthrough

here’s a high-level summary of the main bits in the script:

argument parsing
we parse command-line arguments (prompt, paths file, api key, concurrency limit, model, output file). these let you tweak how the script runs without modifying the source.
reading file paths
each line in --paths_file is treated as a code file location. after a quick strip of whitespace, you’ve got a list of code files to feed into gemini.
chunking
the chunk_list function yields subsets of file paths, letting you decide how many files to examine in each batch. the default chunk size here is three, but you could adjust it to four or more.
parallel requests
generate_with_code_files reads each file in the chunk, appends its content to the user prompt, then calls gemini. a semaphore ensures you only run a fixed number of these calls at once. once gemini returns a response, the script either prints it or stores it in an output file.
results
after all tasks complete, you’ll have a compiled set of responses—one per chunk. from there, it’s up to you to parse or just visually inspect them. you might grep for “found vulnerability,” “calling X function,” or any keyword that indicates a potential problem.

what’s next?

in part 4, i’ll show a real-world example using this script. we’ll take an open-source repo like gitlab (or a subset of it) and show how quickly we can zero in on suspicious patterns using an llm, all without manually reading reams of code. this is where theory meets practice—demonstrating that yes, you really can offload bulk code scanning to an llm and speed up your bug-hunting process.

stay tuned.