File Processor Module

Convert PDFs, images, documents, and media files to AI-ready formats.

Overview

The File Processor is NeuroGen's core document conversion tool. It handles: - PDFs - Extract text, tables, and structure - Images - OCR text extraction - Documents - Word, Excel, PowerPoint conversion - Media - Audio/video transcription

Getting Started

Navigate to Dashboard > File Processor
Upload files by drag-and-drop or file browser
Configure processing options
Click Process
View results and download outputs

Supported File Types

Category	Formats
Documents	PDF, DOCX, DOC, ODT, RTF, TXT
Spreadsheets	XLSX, XLS, CSV
Presentations	PPTX, PPT, ODP
Images	PNG, JPG, JPEG, TIFF, BMP, GIF
Audio	MP3, WAV, M4A, FLAC, OGG
Video	MP4, MOV, AVI, MKV, WEBM

Processing Options

PDF Processing

Extract text - Full text extraction with formatting
Extract tables - Convert tables to structured data
OCR mode - For scanned/image PDFs
Page range - Process specific pages only

Image OCR

Language - Select OCR language(s)
Enhancement - Auto-enhance before OCR
Output format - Plain text or structured

Audio/Video

Transcription - Speech-to-text conversion
Language - Specify audio language
Speaker diarization - Identify speakers (premium)

Output Formats

Format	Use Case
JSON	Structured data with metadata
JSONL	Training data for LLMs
Markdown	Readable documents
CSV	Spreadsheet data
Plain Text	Simple text extraction

Session Output

Each processing job creates a session containing: - output.json - Full structured output - text_content.txt - Extracted text - metadata.json - Processing metadata - Additional format-specific files

Tips for Best Results

Clean PDFs work best - Native PDFs process faster than scanned
Use OCR for scans - Enable OCR mode for image-based PDFs
Check language settings - Set correct language for OCR
Preview before share - Verify output quality

Example: Convert PDF for Custom GPT

Upload your PDF document
Select JSONL output format
Process the file
Go to My Files > Sessions
Click Share on the output file
Select Full Access permission
Use the share link in your Custom GPT

Troubleshooting

"Processing failed"

Check file isn't corrupted
Verify file size is within limits (50MB)
Try a different output format

"No text extracted"

Enable OCR mode for scanned documents
Check the document contains readable text
Try with image enhancement enabled

"Slow processing"

Large files take longer
OCR is slower than native text extraction
Audio/video transcription can take several minutes

Files automatically appear in My Files > Sessions after processing