File Processor Module

Convert PDFs, images, documents, and media files to AI-ready formats.

Overview

The File Processor is NeuroGen's core document conversion tool. It handles: - PDFs - Extract text, tables, and structure - Images - OCR text extraction - Documents - Word, Excel, PowerPoint conversion - Media - Audio/video transcription

Getting Started

  1. Navigate to Dashboard > File Processor
  2. Upload files by drag-and-drop or file browser
  3. Configure processing options
  4. Click Process
  5. View results and download outputs

Supported File Types

Category Formats
Documents PDF, DOCX, DOC, ODT, RTF, TXT
Spreadsheets XLSX, XLS, CSV
Presentations PPTX, PPT, ODP
Images PNG, JPG, JPEG, TIFF, BMP, GIF
Audio MP3, WAV, M4A, FLAC, OGG
Video MP4, MOV, AVI, MKV, WEBM

Processing Options

PDF Processing

  • Extract text - Full text extraction with formatting
  • Extract tables - Convert tables to structured data
  • OCR mode - For scanned/image PDFs
  • Page range - Process specific pages only

Image OCR

  • Language - Select OCR language(s)
  • Enhancement - Auto-enhance before OCR
  • Output format - Plain text or structured

Audio/Video

  • Transcription - Speech-to-text conversion
  • Language - Specify audio language
  • Speaker diarization - Identify speakers (premium)

Output Formats

Format Use Case
JSON Structured data with metadata
JSONL Training data for LLMs
Markdown Readable documents
CSV Spreadsheet data
Plain Text Simple text extraction

Session Output

Each processing job creates a session containing: - output.json - Full structured output - text_content.txt - Extracted text - metadata.json - Processing metadata - Additional format-specific files

Tips for Best Results

  1. Clean PDFs work best - Native PDFs process faster than scanned
  2. Use OCR for scans - Enable OCR mode for image-based PDFs
  3. Check language settings - Set correct language for OCR
  4. Preview before share - Verify output quality

Example: Convert PDF for Custom GPT

  1. Upload your PDF document
  2. Select JSONL output format
  3. Process the file
  4. Go to My Files > Sessions
  5. Click Share on the output file
  6. Select Full Access permission
  7. Use the share link in your Custom GPT

Troubleshooting

"Processing failed"

  • Check file isn't corrupted
  • Verify file size is within limits (50MB)
  • Try a different output format

"No text extracted"

  • Enable OCR mode for scanned documents
  • Check the document contains readable text
  • Try with image enhancement enabled

"Slow processing"

  • Large files take longer
  • OCR is slower than native text extraction
  • Audio/video transcription can take several minutes

Files automatically appear in My Files > Sessions after processing

Connecting