Software Engineer I (PM-adjacent)

AI Documentation Strategy: University of Nebraska

Evaluating AI documentation tools and recommending RAG architecture for an 8-campus university system

AI StrategyTechnical EvaluationStakeholder PresentationEnterprise

RoleSoftware Engineer I (PM-adjacent)

Timeline2024

Context

The University of Nebraska System supports 8 campuses and state colleges. Our documentation was scattered across wikis, Confluence pages, and tribal knowledge. Leadership wanted to explore AI-powered documentation tools that could help users find answers without manually searching through dozens of outdated pages.

I was asked to evaluate whether an AI documentation assistant was feasible, and if so, what architecture would work best.

My Roles & Responsibilities

As the lead evaluator on this initiative, I was in charge of:

Defining the evaluation criteria and testing methodology
Building a proof-of-concept with Mistral 7B
Benchmarking RAG vs. fine-tuning approaches against 50 real user queries
Presenting findings and a data-backed recommendation to senior IT leadership

The Challenge

The core question wasn't "can AI do this?" It was "which approach gives us accurate answers we can trust?"

Two main architectural approaches existed:

Fine-tuning: Train a model on our documentation corpus. Potentially more accurate for our specific domain, but expensive and hard to update.
RAG (Retrieval-Augmented Generation): Keep documents in a vector database and retrieve relevant chunks at query time. Easier to update but depends heavily on retrieval quality.

For a university system, source fidelity was critical. If the AI told a student the wrong deadline or policy, the consequences were real.

What I Did

Built a Proof-of-Concept with Mistral 7B

I chose Mistral 7B as the base model because it was open-source, ran on commodity hardware, and had strong performance benchmarks for its size.

The POC had two components:

A RAG pipeline that ingested our documentation, chunked it, embedded it into a vector store, and retrieved relevant context at query time
A fine-tuned version trained on a subset of our documentation for comparison

Benchmarked Both Approaches

I tested both against a set of 50 representative questions that reflected real user queries (policy questions, procedural how-tos, deadline lookups). I measured:

Factual accuracy: Did the answer match the source document?
Source attribution: Could the system point to where the answer came from?
Latency: How fast did it respond?
Maintenance burden: How hard is it to update when documents change?

Presented Findings to Stakeholders

I compiled the results into a presentation for senior IT leadership. The recommendation was clear: RAG outperformed fine-tuning on source fidelity because it retrieved and cited actual document chunks rather than generating from learned patterns. Fine-tuning produced more fluent responses but occasionally hallucinated policy details, which was unacceptable for a university system.

The Recommendation

I recommended RAG architecture for three reasons:

Factor	RAG	Fine-tuning
Source fidelity	High: cites actual documents	Medium: can hallucinate details
Update speed	Fast: re-index changed docs	Slow: requires retraining
Cost	Lower: no GPU training needed	Higher: compute-intensive
Maintenance	Team can update docs, system adapts	Requires ML expertise to retrain

The recommendation was accepted by leadership.

Takeaways

The best technical solution isn't always the most impressive one. Fine-tuning felt more sophisticated, but RAG was the right choice because it prioritized accuracy over fluency, which is what mattered for our users.
PMs need to translate technical trade-offs into business language. Stakeholders didn't care about embedding dimensions or chunk sizes. They cared about "will it give wrong answers?" and "how much will it cost to maintain?"
Build the POC before the PowerPoint. Having a working prototype with real performance data made the recommendation credible. A slide deck alone wouldn't have been convincing.

Skills Applied

AI architecture evaluation (RAG vs. fine-tuning)
Proof-of-concept development
Performance benchmarking and data analysis
Stakeholder communication and technical translation
Decision-making under uncertainty

← back to work