Where documents land
The SharePoint folder taxonomy
Every document filed by DIH Hub lives under /DIH/… in one of five primary objects — Deals,
Entities, Counterparties, Investors, Functions. The five mirror how DIH operates day-to-day. The second level is the
lifecycle or category bucket. The third level is the specific deal, entity, firm, investor, or function. The fourth is
the document type (Origination, DD, IC, Constitutional, Engagement, etc.). One canonical home per document, with metadata
carrying the cross-references.
| Primary object |
Buckets (level 2) |
Per-item sub-folders |
Routing trigger (first match wins) |
| ADeals
Investment initiatives — origination to exit
|
Active
Archive
Evaluation
|
Origination
DD
IC
Execution
Closing
|
Deal codename in subject — e.g. [HEIRLOOM], [SUNFLOWER].
Killed / Exited deals move one-way to /Archive. |
| BEntities
Legal entities in the DIH perimeter
|
Group
Opcos
SPVs
Wind-down
|
Constitutional
Board
Accounts
Regulator
Intercompany
|
No codename; entity match-keyword in subject (e.g. TASC-Infra, DIH-Holdings). |
| CCounterparties
External service-providers under contract
|
Banks
Law
Audit
Consult
Vendors
|
Engagement
Invoices
KYC
|
No codename; no entity match; sender domain matches a registered counterparty (e.g. @kpmg.com, @morganlewis.com). |
| DInvestors
External capital — group or deal-level
|
LPs
Co-Investors
|
Commitment
KYC
Reports
Capital
|
Sender or recipient domain matches a registered investor (e.g. @blackrock.com, LP list). |
| EFunctions
Cross-cutting internal disciplines
|
Finance
Legal
IR
Ops
IT
HR
|
Tax
Treasury
Policies
Templates
Comms
Insurance
Employment
Payroll
Incentives
+ more
|
No codename, no entity, no counterparty/investor match; function keyword in subject (e.g. VAT → Finance/Tax). |
The full tree
/DIH
├── /Deals
│ ├── /Active/<codename>/{Origination, DD, IC, Execution, Closing}
│ ├── /Archive/<codename>/...
│ └── /Evaluation
│
├── /Entities
│ ├── /Group DIH-Holdings, intermediate holds
│ ├── /Opcos TASC-Infra, TASC-Towers
│ ├── /SPVs deal-specific SPVs
│ └── /Wind-down dormant or dissolving
│ └── <entity>/{Constitutional, Board, Accounts, Regulator, Intercompany}
│
├── /Counterparties
│ ├── /Banks Citi, UniCredit, PWP
│ ├── /Law Morgan-Lewis, Dentons
│ ├── /Audit KPMG, PKF-Littlejohn, SG-LLP
│ ├── /Consult Detecon
│ └── /Vendors IT, payroll, brokers, other
│ └── <firm>/{Engagement, Invoices, KYC}
│
├── /Investors
│ ├── /LPs/<name>/{Commitment, KYC, Reports, Capital}
│ └── /Co-Investors/<name>/{Commitment, KYC, Reports}
│
└── /Functions
├── /Finance/{Tax, Treasury, Audit-Coord}
├── /Legal/{Templates, Policies, KYC-Framework}
├── /IR/{Comms, Fundraising}
├── /Ops/{Insurance, Vendors-Master, Asset-Reports}
├── /IT/{Policies, Access, DataRoom-Admin}
└── /HR/{Employment, Payroll, Training, Incentives}
Worked example. An email from @morganlewis.com with subject
"[HEIRLOOM] SPA mark-up v3" lands at /DIH/Deals/Active/Heirloom/Execution.
Tier 1 (deal codename) fires first, so the Morgan Lewis sender domain is never tested at Tier 3. The counterparty
relationship is preserved as metadata on the document, making it findable from the Morgan Lewis angle without
duplication. One canonical home per document.
06Step
Decide
Deterministic routing — the precedence chain
The DeterministicRoutingEngine walks five tiers in strict order, short-circuiting on the first hit.
A hit yields AutoFile at confidence 1.0 — no LLM cost. Most volume should land here once registers
are warm.
01
Deal codename
Explicit [CODENAME] in subject, or whole-word match in subject + filename + first 1 KB of text. Minimum 4 chars.
/DIH/Deals/…
02
Entity keyword
Match against Entity.MatchKeywords[].
/DIH/Entities/…
03
Counterparty domain
Sender email domain matches a registered counterparty (e.g. @kpmg.com).
/DIH/Counterparties/…
04
Investor domain
Parallel check: sender or recipient domain matches an investor.
/DIH/Investors/…
05
Function keyword
Subject keyword matches a FunctionRoute (e.g. "VAT" → Finance/Tax).
/DIH/Functions/…
What about ambiguity?
Within any tier, one match → route. Multiple matches → the engine refuses to guess and sends the document straight to
Triage with reason AmbiguousDeterministic. Zero matches → fall through to the next tier.
Zero matches across all five tiers → on to the LLM.
Plus a learning layer
Once a path has accumulated enough learned signals from past human triage decisions, the engine
will route to it directly — without paying for an LLM call. This is a reinforcement layer on top of the precedence
chain, not a sixth tier; it grows as operators resolve triage items and bakes their judgments back into the routing.
07Step
Decide
LLM fallback — two models in series
When the rules can't decide, the document is handed to the LLM tier. Two models, both invoked through the
Azure AI Foundry inference SDK:
Mini · gpt-4o-mini
Cheap and fast. Always called first. Returns structured JSON:
{ primary_object, sub_path, confidence_score, alternative_candidates[], reasoning, is_new_proposal }.
Escalation · gpt-4.1 or claude-sonnet-4-6
Configurable per environment. Called only when mini's confidence is in the uncertain band. Foundry routes non-OpenAI
families (claude-, llama-, mistral-, phi-) through the inference
endpoint by deployment-name prefix.
What the LLM is actually asked
The prompt embeds DIH context (active deals, archive deals, counterparties, entities, investors, functions), the document's
metadata (source mailbox, sender, recipients, subject, filename, date), and up to 16,000 characters of sanitised extracted
text. The model returns a single sub_path under /DIH/<bucket>/… with a minimum of three
segments, plus alternative candidates.