πŸ“„ Unstructured Data Parsing: Process Invoices with AI & PDF Plugins

One of the biggest headaches in enterprise workflows is manually extracting data from variously formatted PDFs, invoices, or research reports.

By utilizing MindLogic's built-in Native PDF Extractor plugin and LLM Structured Extraction capabilities, we can instantly turn rigid unstructured documents into JSON databases readable by downstream systems.

Scenario Overview

In this workflow, we implement:

  1. Input File: Pass in a local or remote PDF file path containing a report/invoice.
  2. Text Parsing: Use the PDF plugin to extract all text as a long string.
  3. Field Extraction: An LLM reads the long string and is forced to extract the amount, date, and client name into a specific JSON format.
  4. Data Validation: A custom node script parses the JSON to determine if the amount exceeds a threshold.

Node Orchestration Steps

Step 1: Configure the PDF Extractor Plugin

Create a node named [Invoice Reader]. Select the Native PDF Extractor in the plugin panel. Configure the input URL/Path; it can be a local path (e.g., file:///Users/.../invoice.pdf) or a cloud link. Upon execution, the plain text of the PDF will be stored in node.outputs['pdf_text'].

Step 2: LLM Structured Extraction

Create a node named [Data Extraction], connected from upstream.

  1. Select the LLM plugin.
  2. System Prompt (Crucial for forcing format):
    You are an invoice data extraction bot. Please extract information from the text below and strictly output valid JSON format. Do not output any explanatory text other than the JSON.
    Format requirements:
    {
       "company_name": "Company Name",
       "date": "YYYY-MM-DD",
       "total_amount": Number
    }
    
  3. User Prompt: {{ node.inputs['pdf_text'] }} After execution, the LLM will output a clean JSON string.

Step 3: Business Logic Audit via Script

Create a node named [Business Logic Audit]. This step does not use a plugin. Instead, handle the JSON passed from upstream directly in the Node Script at the bottom.

let jsonStr = node.inputs['response'];
try {
    let invoiceData = JSON.parse(jsonStr);
    let amount = invoiceData.total_amount;
    
    // If the amount exceeds 10000, change node title and color alert
    if (amount > 10000) {
        node.title = "⚠️ Director Approval Needed";
        node.color = "red";
    } else {
        node.title = "βœ… Auto-Approved";
        node.color = "green";
    }
} catch (e) {
    node.title = "❌ Parse Failed";
    node.outputs['__scriptError'] = e.toString();
}

Results & Value

This is a typical "Hybrid Node Automation". You have not only used cutting-edge AI technology but also returned to the rigor of traditional code logic. With MindLogic, you have transformed a service that used to require programmers writing hundreds of lines of Python code and deploying a server into a visual mini-program built just by connecting lines on a canvas!