APSIS Construction Drawings Analysis API

AI-powered tools for processing, analyzing, and comparing construction drawings and specification documents.

Table of Contents

APSIS Construction Drawings Analysis API

Introduction

The APSIS Construction Drawings Analysis API provides AI-powered tools for processing, analyzing, and comparing construction drawings and specification documents. The system combines computer vision, document processing, and large language models to extract metadata, track revisions, compare versions, and enable semantic search across your construction documentation.

Key Capabilities:

Metadata Extraction: Automatically extract drawing numbers, revisions, dates, authors, scales, and titles from PDF drawings and specifications
Revision Detection: Identify previous versions and predict next revision numbers using metadata matching or AI-powered similarity
Visual Comparison: Highlighting changes between drawing revisions (new iteration due end of Nov)
AI-Powered Analysis: Get natural language summaries and visual diffs of changes between specification versions
Semantic Search: Query your documents using natural language and retrieve relevant sections

Target Audience: This documentation is intended for APSIS developers integrating the API into their applications, internal tools, or workflows.

Available Features

Feature	Status	Description
Extract Data	✅ Available	Extract metadata from drawings and specifications
Detect Revision	✅ Available	Track document versions automatically
Compare Specifications	✅ Available	Visual diff + AI analysis of changes
Document Search	✅ Available	Semantic search using natural language
Drawing Compare	🚧 In Development	Visual comparison (Est. end of Nov 2025)

Production Environment

Base URL:

https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io

API Documentation (Swagger):

https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/swagger/

Admin Dashboard:

https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/admin/

Purpose: View task history, manage users, debug failed extractions
Access: Contact Agency AI for admin credentials
Audience: APSIS internal developers

Quick Start (5 Minutes)

New to the API? Follow this minimal example to extract metadata from your first drawing:

Authenticate:

az login --tenant 1b2933b9-9fb0-4b10-a4ac-ac16b30953b8 --allow-no-subscriptions

Get Token:

TOKEN=$(az account get-access-token \
  --resource api://377cc639-f3a7-436a-9a21-14b643f34dda \
  --query accessToken \
  --output tsv)

Upload & Extract: See Feature 1: Extract Data for the complete workflow

For production applications, see Authentication for service principal setup.

Authentication

All API endpoints require authentication using Azure Active Directory (Azure AD) JWT tokens. Currently, the API validates tokens from the Agency AI UK tenant.

Authentication Flow

Obtain an Azure AD Access Token from your application’s authentication flow
Include the token in every API request using the Authorization header
Token format: Bearer <access_token>

Azure AD Configuration

Tenant: 1b2933b9-9fb0-4b10-a4ac-ac16b30953b8 (agencyaiuk.com)

API Resource ID: api://377cc639-f3a7-436a-9a21-14b643f34dda

Method 1: Azure CLI (Testing & Development)

For testing and development, use the Azure CLI to obtain a token:

# Step 1: Login to Azure with the Agency AI tenant
az login --tenant 1b2933b9-9fb0-4b10-a4ac-ac16b30953b8 --allow-no-subscriptions

# Expected output:
# "No subscriptions found for <your-email>."
# This is normal - you don't need subscriptions to get API tokens

# Step 2: Get access token for the APSIS API
TOKEN=$(az account get-access-token \
  --resource api://377cc639-f3a7-436a-9a21-14b643f34dda \
  --query accessToken \
  --output tsv)

# Step 3: Use the token in API requests
curl -X GET "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/tasks/270" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json"

Important Notes:

No Azure subscription required: You may see “No subscriptions found” - this is expected and normal
Use --allow-no-subscriptions flag: This prevents the CLI from showing subscription errors
Your user account must exist in the agencyaiuk.com tenant
Your account must have been granted API access permissions

Method 2: Service Principal (Production Applications)

For production applications, use a service principal with client credentials flow. This enables server-to-server authentication without user interaction.

API Credentials

Provided Credentials:

Contact Agency AI UK administrators to receive your:

Client ID: Your application’s unique client identifier
Client Secret: Your application’s secret key
Tenant ID: 1b2933b9-9fb0-4b10-a4ac-ac16b30953b8

Usage

Complete workflow with automatic token extraction:

# Set credentials (replace with your actual values)
CLIENT_ID="your-client-id"
CLIENT_SECRET="your-client-secret"

# Step 1: Request access token and extract it automatically
TOKEN=$(curl -s -X POST "https://login.microsoftonline.com/1b2933b9-9fb0-4b10-a4ac-ac16b30953b8/oauth2/v2.0/token" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  --data-urlencode "grant_type=client_credentials" \
  --data-urlencode "client_id=${CLIENT_ID}" \
  --data-urlencode "client_secret=${CLIENT_SECRET}" \
  --data-urlencode "scope=api://377cc639-f3a7-436a-9a21-14b643f34dda/.default" \
  | grep -o '"access_token":"[^"]*' | sed 's/"access_token":"//')

# Verify token was retrieved
echo "Token: ${TOKEN:0:50}..."  # Show first 50 characters

# Step 2: Use the token to get a drawing by ID
curl -X GET "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/files/674" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json"

Important Notes:

Use /.default scope to request all granted application permissions
Tokens are valid for 1 hour (3599 seconds)
Implement token caching to avoid requesting new tokens for every API call
The token payload will include "roles": ["api.apsis"] claim

Example: Using the Token in API Requests

# All API requests must include the Authorization header
curl -X POST https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/extract \
  -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGc..." \
  -H "Content-Type: application/json" \
  -d '{"drawing_file_ids": [123, 124]}'

Token Expiration

Default expiration: 1 hour (3600 seconds)
Handling expiration: Implement token refresh logic in your application
HTTP 401 response: Indicates expired or invalid token - refresh and retry

Authentication Errors

401 Unauthorized

{
  "detail": "Authentication credentials were not provided."
}

Solution: Ensure the Authorization header is present and correctly formatted.

403 Forbidden

{
  "detail": "You do not have permission to perform this action."
}

Common Patterns

Status Codes

The API uses standard HTTP status codes:

200 OK: Request succeeded
201 Created: Resource created successfully
400 Bad Request: Invalid request parameters
401 Unauthorized: Missing or invalid authentication token
403 Forbidden: Insufficient permissions
404 Not Found: Resource does not exist
500 Internal Server Error: Server-side error

Polling

Many API operations are asynchronous and require polling to check completion status:

Best Practices:

Poll every 3-5 seconds for task status
Set a reasonable timeout (e.g., 5 minutes for extraction tasks)
Check both task-level and file-level status codes
Individual files in a batch may complete at different times

Example:

# Initial request returns task_id
TASK_ID=456

# Poll until status is 2 (SUCCESS) or 3 (FAILED)
while true; do
  STATUS=$(curl -s "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/tasks/$TASK_ID" \
    -H "Authorization: Bearer $TOKEN" | jq -r '.status')

  if [ "$STATUS" == "2" ] || [ "$STATUS" == "3" ]; then
    echo "Task completed with status: $STATUS"
    break
  fi

  echo "Task still processing (status: $STATUS)..."
  sleep 5
done

Error Handling

Retry Strategy:

Implement exponential backoff for 5xx server errors
Don’t retry 4xx client errors (fix the request instead)
Maximum 3 retry attempts recommended

Common Errors:

401 Unauthorized: Token expired - refresh and retry
404 Not Found: Resource doesn’t exist - check IDs
500 Internal Server Error: Temporary issue - retry with backoff

File Upload Architecture

Required Upload Method

⚠️ Important: The API requires Shared Access Signature (SAS) URLs for all PDF uploads. Direct file upload through the API (multipart/form-data) is not supported.

Why SAS URLs Are Required

The system architecture requires files to be in Azure Blob Storage before processing:

Architecture Reason:

The API does not store files - only references to blobs in Azure Storage
Celery background workers read files directly from Azure Storage for processing
This enables horizontal scaling across multiple worker containers (4-15 replicas)

Performance Benefits:

Direct upload path: Your application → Azure Storage (bypasses API entirely)
Large file handling: Construction PDFs (10-100MB) upload faster without API proxy
Parallel uploads: Multiple files upload simultaneously to Azure
Reduced latency: No intermediary server for file transfers

Cost Efficiency:

API containers don’t handle large file bandwidth (reduces costs)
Worker containers access files directly from storage (no download through API)

Security:

Time-limited access (SAS URLs expire after 5 hours)
No storage account credentials exposed
Per-file write permissions only

Backend/Script Usage

Even for backend-to-backend or script-based integrations, you must use the SAS URL workflow. This is not UI-specific complexity - it’s how the system is architected.

Upload Workflow

POST /api/drawings/upload           → API returns SAS URLs (metadata only)
PUT {sas_url} with PDF binary       → Upload directly to Azure Blob Storage
POST /api/drawings/files            → Register blob_path with API
POST /api/drawings/extract          → Background workers process from Azure Storage

Learn more: Azure Storage SAS Overview

Scale + Load

The production environment uses autoscaling on both API and worker containers to handle variable workloads efficiently.

Architecture

API Container: apsis-container-api
- Autoscaling: 1-10 replicas
- Trigger: HTTP concurrent requests (≥10 concurrent requests)
Worker Containers: apsis-container-docanalyser
- Autoscaling: 4-15 replicas
- Trigger: Redis queue depth (≥3 jobs in cpu_q queue)
- Processes background tasks via Celery
Message Broker: Redis - Job queue for distributing work to workers

Autoscaling Configuration

Worker Autoscaling (apsis-container-docanalyser):

Min replicas: 4
Max replicas: 15
Scaling trigger: Redis queue length on cpu_q queue
Threshold: Scales up when queue length ≥ 3 jobs
Each batch extraction request creates one background job
Multiple smaller jobs enable better horizontal scaling across worker replicas
Large single jobs can saturate workers and slow down the entire system

Best Practices

⚠️ DO NOT send more than 10 files per batch

Why? One batch = one job. Large batches prevent effective autoscaling and slow down the entire system.

Recommended:

✅ Send 5 batches of 10 files (50 files total)
❌ Don’t send 1 batch of 50 files

How it works:

5 separate jobs → Workers scale and process in parallel
1 large job → Single worker processes serially, no scaling

See detailed batch size recommendations in Feature 1: Extract Data.

Feature 1: Document Extractor

Overview

Automatically extracts metadata from construction drawings and specification documents using computer vision and AI. The system identifies drawing regions using object detection, then uses LLM to extract structured metadata.

Extracted Fields

For Drawings:

Author
Drawing Number
Revision
Status (e.g., “For Construction”, “Approved”)
Date
Scale (e.g., “1:100”)
Drawing Title

For Specifications:

Author
Document Title
Revision
Date

Workflow

The extraction process involves 5 steps:

Step 1: Get Upload URL

Request a pre-signed SAS URL to upload your PDF to Azure Blob Storage.

curl -X POST "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/upload" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "file_names": ["drawing1.pdf", "drawing2.pdf"]
  }'

Important - Filenames vs. File Paths:

The file_names array contains display names only - these are NOT file paths from your computer. The API uses these names to track your files (stored as original_name in the database) and generates unique storage paths automatically.

Example:

Step 1 (this request): Send just the filename → "file_names": ["FloorPlan.pdf"]
Step 2 (actual upload): Use your local file path → --data-binary "@/Users/yourname/Documents/FloorPlan.pdf"

The API creates its own storage paths (e.g., user/2024/11/abc123.pdf) and returns SAS URLs pointing to those locations.

Response:

{
  "blobs": [
    {
      "url": "https://apsisdocumentblobstorage.blob.core.windows.net/drawings/user/2024/11/abc123.pdf?sv=2021-08-06&se=2024-11-22T11%3A00%3A00Z&sr=b&sp=cw&sig=...",
      "name": "abc123.pdf",
      "blob_path": "user/2024/11/abc123.pdf",
      "original_name": "drawing1.pdf"
    },
    {
      "url": "https://apsisdocumentblobstorage.blob.core.windows.net/drawings/user/2024/11/def456.pdf?sv=2021-08-06&se=2024-11-22T11%3A00%3A00Z&sr=b&sp=cw&sig=...",
      "name": "def456.pdf",
      "blob_path": "user/2024/11/def456.pdf",
      "original_name": "drawing2.pdf"
    }
  ]
}

URL Format Explained:

Base: https://apsisdocumentblobstorage.blob.core.windows.net/drawings/
Path: user/2024/11/abc123.pdf (unique file path)
SAS parameters (appended automatically):
- sv: Storage version
- se: Expiration time (typically 1 hour from request)
- sr: Resource type (blob)
- sp: Permissions (create + write)
- sig: Signature (authentication token)

Step 2: Upload PDF to Blob Storage

Upload your PDF file directly to Azure Blob Storage using the exact URL from the response (including all SAS parameters).

# Upload first file using the URL from blobs[0].url
curl -X PUT "https://apsisdocumentblobstorage.blob.core.windows.net/drawings/user/2024/11/abc123.pdf?sv=2021-08-06&se=2024-11-22T11%3A00%3A00Z&sr=b&sp=cw&sig=<signature>" \
  -H "Content-Type: application/pdf" \
  -H "x-ms-blob-type: BlockBlob" \
  --data-binary "@/path/to/drawing1.pdf"

# Upload second file using the URL from blobs[1].url
curl -X PUT "https://apsisdocumentblobstorage.blob.core.windows.net/drawings/user/2024/11/def456.pdf?sv=2021-08-06&se=2024-11-22T11%3A00%3A00Z&sr=b&sp=cw&sig=<signature>" \
  -H "Content-Type: application/pdf" \
  -H "x-ms-blob-type: BlockBlob" \
  --data-binary "@/path/to/drawing2.pdf"

Important:

Use the complete URL from the API response - do not modify it
The sig=<signature> parameter is a long authentication token generated by the API
Replace /path/to/drawing1.pdf with the actual path to your PDF file

Note: The SAS URL is time-limited (typically 1 hour). Upload immediately after receiving the URL.

Step 3: Create DrawingFile Records

curl -X POST "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/files" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "blob_path": "user/2024/11/abc123.pdf",
    "file_name": "drawing1.pdf",
    "task_id": null
  }'

Response:

{
  "drawing_file": {
    "id": 123,
    "file_name": "drawing1.pdf",
    "blob_path": "user/2024/11/abc123.pdf",
    "status": 0,
    "num_pages": 5,
    "info": null,
    "spec_info": null,
    "created_at": "2024-11-22T10:00:00Z"
  }
}

Save the id value (123 in this example) - you’ll need it for the extraction request.

Repeat for each uploaded file to get all file IDs.

Step 4: Trigger Extraction

Start the metadata extraction task.

curl -X POST "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/extract" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "drawing_file_ids": [123, 124],
    "detect_revision": false,
    "use_embedding_revision": false
  }'

Parameters:

drawing_file_ids: Array of file IDs from Step 3
detect_revision: Set to true to enable automatic revision detection (see Feature 2)
- Drawings are matched by drawing_number field
- Specifications are matched by AI similarity (embeddings)
use_embedding_revision: Reserved for future use (currently ignored)

⚠️ IMPORTANT: Batch Size Limits

DO NOT send more than 10 files per batch. Send multiple smaller batches instead.

✅ Send 5 batches of 10 files (optimal autoscaling)
❌ Don’t send 1 batch of 50 files (saturates workers)

See Scale + Load for detailed explanation of how autoscaling works and why this matters.

Response:

{
  "task_id": 456,
  "drawing_file_ids": [123, 124]
}

Step 5: Poll Task Status

Check the extraction progress and retrieve results.

curl -X GET "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/tasks/456" \
  -H "Authorization: Bearer ${TOKEN}"

Response (Processing):

{
  "id": 456,
  "status": 1,
  "files": [
    {
      "id": 123,
      "file_name": "drawing1.pdf",
      "status": 1,
      "info": null,
      "spec_info": null
    }
  ]
}

Response (Success):

{
  "id": 456,
  "status": 2,
  "files": [
    {
      "id": 123,
      "file_name": "drawing1.pdf",
      "status": 2,
      "info": {
        "author": "John Smith",
        "drawing_number": "A-001",
        "revision": "P01",
        "status": "For Construction",
        "date": "2024-01-15",
        "scale": "1:100",
        "drawing_title": "Ground Floor Plan"
      },
      "spec_info": null,
      "error_message": null
    },
    {
      "id": 124,
      "file_name": "spec1.pdf",
      "status": 2,
      "info": null,
      "spec_info": {
        "author": "Jane Doe",
        "document_title": "Structural Specifications",
        "revision": "Rev A",
        "date": "2024-01-10"
      },
      "error_message": null
    }
  ]
}

Status Values:

0 = PENDING (queued but not started)
1 = PROCESSING (currently being analyzed)
2 = SUCCESS (extraction completed)
3 = FAILED (error occurred - check error_message)

📝 Note on Automatic Embedding: Documents are automatically chunked and embedded during extraction, making them ready for semantic search. See Feature 4: Document Search to query your documents using natural language.

Polling Recommendations

Poll every 3-5 seconds until status is 2 (SUCCESS) or 3 (FAILED)
Set a timeout (e.g., 5 minutes) to avoid infinite polling
For batch uploads, individual files may complete at different times

Processing Limits

Maximum 50 pages per PDF
Maximum 50 PDFs per batch
First 3 pages used for metadata extraction
Files exceeding 1M tokens will fail classification

Error Handling

Example Failed Response:

{
  "id": 456,
  "status": 3,
  "files": [
    {
      "id": 123,
      "file_name": "drawing1.pdf",
      "status": 3,
      "info": null,
      "spec_info": null,
      "error_message": "PDF exceeds maximum page limit (50 pages)"
    }
  ]
}

Feature 2: Detect Revision (Version Tracking)

Overview

Automatically identifies previous versions of a document and predicts the next revision number. The system uses different matching strategies based on document type.

How Matching Works

The system automatically selects the appropriate matching method based on the document type:

For Drawings

Metadata-Based Matching:

Matches by drawing_number field
Finds most recent drawing with same number but different revision
Fast and deterministic
Requires a valid drawing_number to be extracted
If drawing_number is missing, revision detection is skipped

Best for: Standard construction drawings with consistent numbering schemes

For Specifications

AI Similarity Matching:

Generates vector embedding from first 5 pages of PDF
Uses L2 distance to find most similar previous specification
Similarity threshold: 0.30 (lower = more similar)
Works even without structured metadata fields

Best for: Specification documents that may lack consistent numbering schemes

Revision Prediction

Once a previous version is found, the system auto-increments the revision to predict the next version:

Supported formats:

Numeric: “Rev 0” → “Rev 1”, “Issue 5” → “Issue 6”
Alpha: “Rev A” → “Rev B”, “Z” → “AA”
Ordinal: “1st Issue” → “2nd Issue”
Alphanumeric: “A1” → “A2”, “R9” → “R10”

Workflow

Follow Steps 1-3 from Feature 1 (Extract Data), then enable revision detection:

Trigger Extraction with Revision Detection

curl -X POST "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/extract" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "drawing_file_ids": [125],
    "detect_revision": true
  }'

The system will automatically:

Use drawing_number matching for drawings
Use AI similarity (embeddings) for specifications

Poll for Results

curl -X GET "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/tasks/457" \
  -H "Authorization: Bearer ${TOKEN}"

Response with Revision Detection:

{
  "id": 457,
  "status": 2,
  "files": [
    {
      "id": 125,
      "file_name": "A-001-P03.pdf",
      "status": 2,
      "info": {
        "author": "John Smith",
        "drawing_number": "A-001",
        "revision": "P03",
        "status": "For Construction",
        "date": "2024-02-01",
        "scale": "1:100",
        "drawing_title": "Ground Floor Plan",
        "previous_file_name": "A-001-P02.pdf",
        "previous_revision": "P02",
        "predicted_revision": "P04"
      }
    }
  ]
}

Revision Fields Explained:

revision: Current revision extracted from this document
previous_file_name: Name of the matched previous version
previous_revision: Revision from the matched previous version
predicted_revision: Auto-incremented next revision (e.g., P02 → P03 → P04)

No Match Found

If no previous version is found:

{
  "info": {
    "drawing_number": "A-001",
    "revision": "P01",
    "previous_file_name": null,
    "previous_revision": null,
    "predicted_revision": null
  }
}

📝 Note on Automatic Embedding: Documents are automatically chunked and embedded during revision detection, making them ready for semantic search. See Feature 4: Document Search for details.

Feature 3: Compare Specifications

Overview

Advanced comparison that combines visual diff with AI-powered text analysis. As agreed with APSIS, this feature separates visual comparison from AI analysis to optimize cost and functionality.

Architecture:

Visual Diff - Uses the open-source pdf-diff library
- Generates pixel-perfect visual comparisons
- Outputs PNG/JPG images with strike-through and underline markup
- Open Source - Zero licensing costs (avoids expensive tools like Adobe Acrobat SDK)
- See pdf-diff documentation for technical details
AI Text Analysis - Uses LLM for semantic understanding
- Extracts text from both PDFs
- Generates natural language summary of changes
- Identifies change types, impact levels, and locations
- This is what the AI does - text comparison and summarization, not visual diffs

Outputs:

Visual diff image (PNG/JPG): Generated by pdf-diff showing deletions (strikethrough) and additions (underline)
AI analysis (Markdown): Generated by LLM summarizing what changed and why it matters
Downloadable report: Image file with visual markup

Why open-source for visual comparison?

The visual diff uses the open-source pdf-diff library rather than commercial solutions to:

Avoid licensing fees: Adobe Acrobat SDK and similar tools cost thousands of dollars annually
Ensure portability: Open-source code can be deployed anywhere without vendor lock-in

Why image output instead of PDF?

The pdf-diff tool output is rendered as a raster image (PNG/JPG) rather than a PDF because:

Precise visual comparison: Renders both PDFs to images at the pixel level for exact comparison
Visual markup: Overlays strike-through and underline styling directly on the rendered pages
Simplicity: Easier to display in web browsers and applications
Format independence: Works regardless of PDF complexity or embedded fonts

Limitations:

Output is a raster image, not a vector PDF
Text cannot be selected or copied from the diff image
File size may be larger for multi-page documents
Resolution limited by the result_width parameter (default: 900px)

Use Cases

Best for:

Specification document version comparison (text-heavy PDFs)
Change review workflows where visual markup is helpful
Generating executive summaries of document changes
Audit trails showing “what changed” between versions

The combination of visual diff + LLM analysis provides:

Visual proof: Image showing exactly what changed with strike-through/underline markup
Natural language summary: AI-generated explanation of changes in plain English
Web-friendly format: PNG/JPG images display easily in browsers and applications

Workflow

Create PDF Diff Task

curl -X POST "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/pdf-diffs" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "prev_file_id": 123,
    "new_file_id": 125,
    "visual_diff_params": {
      "style": "strike,underline",
      "format": "png",
      "top_margin": 0.0,
      "bottom_margin": 100.0,
      "result_width": 1200
    },
    "llm_diff_params": {
      "context_length": 5
    }
  }'

visual_diff_params:

Controls how the visual diff image is generated using the pdf-diff library.

style (string, default: "strike,underline"): How to mark changes visually
- Format: "deletion_style,addition_style"
- "strike" - Strikethrough text for deletions
- "underline" - Underline text for additions
- "strike,underline" - Combined (recommended): deletions struck through, additions underlined
format (string, default: "png"): Output image format
- "png" - Lossless compression, larger file size, best for text clarity
- "jpg" - Lossy compression, smaller file size, may have artifacts on text
top_margin (float, default: 0.0): Crop margin from top of page (%)
- 0.0 = No cropping from top
- 10.0 = Crop top 10% of page
- Useful for removing headers/title blocks
bottom_margin (float, default: 100.0): Crop margin from bottom of page (%)
- 100.0 = No cropping from bottom
- 90.0 = Crop bottom 10% of page
- Useful for removing footers
result_width (integer, default: 900): Output image width in pixels
- Recommended: 900 - 1200 for web display
- Higher values = better quality but larger file size
- Lower values = faster processing but may lose detail

llm_diff_params:

Controls how the AI analyzes changes between document versions.

context_length (integer, default: 5): Number of unchanged lines to show around each change in the diff
- How it works: The system extracts text from both PDFs, creates a unified diff (like git diff), then sends it to LLM for analysis
- Lower values (1-3): Focused on changes only, uses fewer tokens, may miss broader context
- Medium values (5-10): Balanced context, good for most use cases (recommended)
- Higher values (15-20): Maximum context for understanding complex interdependent changes, uses more tokens

Example: With context_length: 3, the LLM sees 3 unchanged lines around each change:

--- Previous Version
+++ New Version
@@ -45,7 +45,7 @@
 Foundation depth: 1.2m below ground level
 Concrete strength: C25/30
-Reinforcement: 12mm diameter bars @ 200mm centers
+Reinforcement: 16mm diameter bars @ 150mm centers
 Foundation width: 600mm

With context_length: 0, only the changed line is shown (less context for understanding the change’s impact).

Response:

{
  "pdf_diff_id": 234,
  "status_visual_diff": "INITIATED",
  "status_llm_diff": "INITIATED"
}

Poll PDF Diff Status

The PDF diff has two separate statuses that complete independently:

curl -X GET "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/pdf-diffs/234" \
  -H "Authorization: Bearer ${TOKEN}"

Response (Processing):

{
  "id": 234,
  "status_visual_diff": 1,
  "status_llm_diff": 1,
  "prev_file": { "id": 123, "file_name": "A-001-P01.pdf" },
  "new_file": { "id": 125, "file_name": "A-001-P02.pdf" },
  "blob_uuid": null,
  "llm_summary": null
}

Response (Success):

{
  "id": 234,
  "status_visual_diff": 2,
  "status_llm_diff": 2,
  "prev_file": {
    "id": 123,
    "file_name": "A-001-P01.pdf",
    "info": { "drawing_number": "A-001", "revision": "P01" }
  },
  "new_file": {
    "id": 125,
    "file_name": "A-001-P02.pdf",
    "info": { "drawing_number": "A-001", "revision": "P02" }
  },
  "blob_uuid": "abc-def-ghi-jkl",
  "llm_summary": "The main changes between P01 and P02 include:\n\n- Updated revision from P01 to P02 in title block\n- Moved north wall 2 meters east (Grid Line 3 → Grid Line 4)\n- Added emergency exit door on west elevation\n- Updated ceiling height from 3.0m to 3.5m in Room 101\n- Modified HVAC duct routing in mechanical room",
  "error_message_visual_diff": null,
  "error_message_llm_diff": null,
  "created_at": "2024-11-22T10:00:00Z",
  "updated_at": "2024-11-22T10:05:00Z"
}

Download Visual Diff PDF

curl -X GET "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/pdf-diffs/234/download" \
  -H "Authorization: Bearer ${TOKEN}"

Response:

{
  "download_url": "https://apsisdocumentblobstorage.blob.core.windows.net/.../comparison.pdf",
  "filename": "diff_A-001_P01_vs_P02.pdf",
  "content_type": "application/pdf"
}

Use the download_url to download the PDF comparison document.

List PDF Diffs

curl -X GET "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/pdf-diffs?status=SUCCESS&limit=20&offset=0" \
  -H "Authorization: Bearer ${TOKEN}"

Query Parameters:

status: Filter by status (INITIATED, PROCESSING, SUCCESS, FAILED)
limit: Number of results per page (default: 20)
offset: Pagination offset

Response:

{
  "count": 42,
  "next": "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/pdf-diffs?limit=20&offset=20",
  "previous": null,
  "results": [
    {
      "id": 234,
      "status_visual_diff": 2,
      "status_llm_diff": 2,
      "initiated_by_name": "John Doe",
      "prev_file_name": "A-001-P01.pdf",
      "new_file_name": "A-001-P02.pdf",
      "blob_uuid": "abc-def-ghi-jkl",
      "llm_summary": "The main changes include...",
      "created_at": "2024-11-22T10:00:00Z"
    }
  ]
}

Delete PDF Diff

curl -X DELETE "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/pdf-diffs/234" \
  -H "Authorization: Bearer ${TOKEN}"

Response:

{
  "message": "PDF diff deleted successfully"
}

📝 Note on Automatic Embedding: Documents are automatically chunked and embedded when creating PDF diffs, making them ready for semantic search. See Feature 4: Document Search to query your compared specifications.

Feature 4: Document Search (RAG)

Overview

Semantic search across construction documents using Retrieval-Augmented Generation (RAG). The system:

Chunks documents into searchable text segments
Generates vector embeddings for each chunk
Performs similarity search based on your natural language query
Automatically generates an AI answer using GPT based on retrieved chunks
Returns the answer, source references, and raw chunks

Automatic Embedding: Documents are chunked and embedded automatically when you use other API features (Extract Data, Detect Revision, Compare Specifications). In most cases, your documents will already be ready for search without any additional setup.

What You Get:

AI-generated answer: Natural language response synthesized from relevant document chunks
Source references: Chunk IDs showing which parts of documents were used
Raw chunks: Full text of retrieved chunks for transparency and citation

Use Cases

“What are the structural requirements for the foundation?”
“Find all references to fire safety specifications”
“What materials are specified for the exterior walls?”

Workflow

Step 1: Chunk and Embed Documents (Often Already Done)

⚠️ IMPORTANT: Automatic Embedding

Documents are automatically chunked and embedded by default when using any API feature (Extract Data, Detect Revision, Compare Specifications).

You only need to run this step if:

The document was uploaded but never processed through other features

You want to manually trigger re-embedding of an existing document

You are wanting to upload and query an totally new document

To check if a document is already embedded, see Step 2 below and check the embedding_status field.

Before searching, documents must be processed and embedded. This is typically a one-time operation per document.

curl -X POST "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/chunk-and-embed" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "drawing_file_ids": [123, 124, 125]
  }'

Response:

{
  "enqueued": 3,
  "file_ids": [123, 124, 125]
}

The embedding process runs asynchronously. It typically takes 1-3 minutes per document depending on length.

Step 2: Check Embedding Status

Monitor the embedding progress for each file:

curl -X GET "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/files/123" \
  -H "Authorization: Bearer ${TOKEN}"

Response:

{
  "id": 123,
  "file_name": "structural-specs.pdf",
  "embedding_status": 3,
  "status": 2,
  "num_pages": 45,
  "created_at": "2024-11-22T10:00:00Z"
}

Embedding Status Values:

0 = PENDING (queued but not started)
1 = PROCESSING (currently being chunked and embedded)
2 = FAILED (error during embedding)
3 = COMPLETED (ready for search)

Step 3: List Embedded Documents

Get all documents ready for search:

curl -X GET "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/users/embedded-drawing-files" \
  -H "Authorization: Bearer ${TOKEN}"

Response:

{
  "files": [
    {
      "id": 123,
      "file_name": "structural-specs.pdf",
      "created_at": "2024-11-22T10:00:00Z"
    },
    {
      "id": 124,
      "file_name": "architectural-specs.pdf",
      "created_at": "2024-11-22T09:30:00Z"
    }
  ]
}

Step 4: Perform RAG Query

Search across your embedded documents using natural language:

curl -X POST "https://apsis-container-api.thankfultree-84692b9d.uksouth.azurecontainerapps.io/api/drawings/rag" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the structural requirements for the foundation?",
    "top_k": 5,
    "drawing_file_ids": [123, 124]
  }'

Parameters:

query: Natural language search query
top_k: Number of most relevant chunks to return (default: 5, max: 20)
drawing_file_ids: Optional array to limit search to specific documents. Omit to search all embedded documents.

Response (Success):

{
  "answer": "The structural requirements for the foundation include: minimum depth of 1.2m below ground level, concrete strength of C30/37, and high yield steel reinforcement to BS 4449. The foundation width must be determined by a structural engineer based on the ground bearing capacity, which is confirmed at 150 kN/m² at foundation level. Foundation design assumes uniform bearing pressure, with settlement calculations to be verified by the structural engineer. Additionally, foundation waterproofing must use Type A barrier protection with minimum 1200 gauge DPM, with all joints heat welded and the DPM continuous under all ground-bearing slabs.",
  "source_chunk_ids": [567, 568, 789],
  "retrieved_chunks": [
    {
      "id": 567,
      "drawing_file_id": 123,
      "drawing_file_name": "structural-specs.pdf",
      "text": "Foundation Design Requirements:\n\nAll foundations shall be designed in accordance with BS 8004. Minimum depth: 1.2m below ground level. Concrete strength: C30/37. Reinforcement: High yield steel to BS 4449. Foundation width shall be determined by structural engineer based on ground bearing capacity.",
      "position": 12,
      "created_at": "2024-11-22T10:05:00Z"
    },
    {
      "id": 568,
      "drawing_file_id": 123,
      "drawing_file_name": "structural-specs.pdf",
      "text": "Ground Bearing Capacity:\n\nSafe bearing capacity confirmed at 150 kN/m² at foundation level. Foundation design assumes uniform bearing pressure. Settlement calculations to be verified by structural engineer.",
      "position": 15,
      "created_at": "2024-11-22T10:05:00Z"
    },
    {
      "id": 789,
      "drawing_file_id": 124,
      "drawing_file_name": "architectural-specs.pdf",
      "text": "Foundation waterproofing shall be Type A barrier protection with minimum 1200 gauge DPM. All joints to be heat welded. DPM to be continuous under all ground-bearing slabs.",
      "position": 8,
      "created_at": "2024-11-22T09:45:00Z"
    }
  ]
}

Response Fields:

answer (string): AI-generated comprehensive answer synthesized from the retrieved chunks
source_chunk_ids (array): List of chunk IDs that were used to generate the answer, ordered by relevance (most relevant first)
retrieved_chunks (array): Full chunk objects with the following fields:
- id: Unique chunk identifier
- drawing_file_id: Source document ID
- drawing_file_name: Source document filename
- text: The actual text content of the chunk
- position: Chunk position in the document (sequential)
- created_at: When the chunk was created

Response (No Results):

If no relevant chunks are found, you’ll receive a 404 response:

{
  "detail": "No relevant chunks found for the query"
}

Using the Response

The API returns a complete answer along with source information:

answer - AI-generated response synthesized from relevant chunks (ready to display to users)
source_chunk_ids - Array of chunk IDs used to generate the answer (ordered by relevance)
retrieved_chunks - Full text of all matching chunks with metadata (file name, position, text content)

You can display the answer directly to users and optionally show source references by matching source_chunk_ids with chunks in retrieved_chunks.

Best Practices

Query formulation:

Use specific terminology from construction domain
Ask focused questions rather than broad topics
Include context (e.g., “foundation requirements” vs “requirements”)

Chunk count (top_k):

Start with top_k: 5 for most queries
Increase to top_k: 10-15 for broader topics
Higher values increase context but may include less relevant chunks

Document filtering:

Use drawing_file_ids to search specific documents when context is known
Omit for project-wide searches across all documents

Embedding optimization:

Embed documents immediately after upload for best user experience
Re-embed documents if content changes significantly
Monitor embedding_status to ensure documents are ready

Feature 5: Drawing Compare (Visual Comparison)

⚠️ IN DEVELOPMENT

This feature is being reimplemented using a new agentic approach based on discussions with APSIS.

Estimated Availability: End of November 2025

APSIS Construction Drawings Analysis API

Introduction

Available Features

Production Environment

Audience: APSIS internal developers

Quick Start (5 Minutes)

Authentication

Authentication Flow

Azure AD Configuration

Method 1: Azure CLI (Testing & Development)

Method 2: Service Principal (Production Applications)

API Credentials

Usage

Example: Using the Token in API Requests

Token Expiration

Authentication Errors

Common Patterns

Status Codes

Polling

Error Handling

File Upload Architecture

Required Upload Method

Why SAS URLs Are Required

Backend/Script Usage

Upload Workflow

Scale + Load

Architecture

Autoscaling Configuration

Best Practices

Feature 1: Document Extractor

Overview

Extracted Fields

Workflow

Step 1: Get Upload URL

Step 2: Upload PDF to Blob Storage

Step 3: Create DrawingFile Records

Step 4: Trigger Extraction

Step 5: Poll Task Status

Polling Recommendations

Processing Limits

Error Handling

Feature 2: Detect Revision (Version Tracking)

Overview

How Matching Works

For Drawings

For Specifications

Revision Prediction

Workflow

Trigger Extraction with Revision Detection

Poll for Results

No Match Found

Feature 3: Compare Specifications

Overview

Use Cases

Workflow

Create PDF Diff Task

Poll PDF Diff Status

Download Visual Diff PDF

List PDF Diffs

Delete PDF Diff

Feature 4: Document Search (RAG)

Overview

Use Cases

Workflow

Step 1: Chunk and Embed Documents (Often Already Done)

Step 2: Check Embedding Status

Step 3: List Embedded Documents

Step 4: Perform RAG Query

Using the Response

Best Practices

Feature 5: Drawing Compare (Visual Comparison)