
Key takeaways
• AI is already drafting the first 70–80% of educational content. Lesson plans, quizzes, slide decks, voice-overs, captions, alt text, translations and per-student paths are the workflows where edtech teams ship real productivity gains today — not in theory, in production.
• The LLMs are commodities; the moat is the retrieval, the human review and the compliance harness. A buy-versus-build decision in 2026 is not “which model?” — it is “who builds the RAG layer, the teacher-review workflow, the SCORM/xAPI export and the audit trail?”
• Adoption is past the hype phase. Around 60% of K-12 teachers use AI for lesson planning; ~84% of high school students use it for schoolwork; the global AI-in-education market is on a 30%+ CAGR through 2035. If your edtech product does not have an AI authoring story by 2027, it will look like a 2014 product.
• Quality, bias and compliance are the real cost drivers. Hallucinations, biased examples, FERPA/COPPA/GDPR exposure, and the EU AI Act’s “high-risk” classification of educational AI mean every shipped feature needs a documented review path, not just a prompt.
• Bloom’s Taxonomy is the prompt framework most teams skip. Without deliberate scaffolding, AI defaults to Remember/Understand. Higher-order learning objectives (Apply, Analyse, Evaluate, Create) need explicit prompt patterns, grounded sources and rubrics — that is the difference between “an AI lesson” and a real one.
• This guide is what we use internally when scoping AI authoring features for edtech clients. We have shipped AI-assisted features into platforms like BrainCert and built bespoke learning products from scratch. Below is the full playbook — tools, architecture, KPIs, pitfalls and a five-question decision framework.
Why Fora Soft wrote this playbook
Fora Soft has been building education and learning software since 2005. We grew the e-learning platform BrainCert into an LMS used by global enterprises, shipped InstaClass, The Language Chef, Tabsera, Talensy, Scholarly, Career Point and language-learning products like Input Logger. Across all of them, AI-assisted authoring — lesson generation, assessment, voice, captioning, personalisation — is now the most-requested feature category we get briefs on.
This playbook is the internal cheat sheet we use on the first call: which workflows actually move metrics, which tools to integrate, where the architecture splits, and what compliance posture you need before a school district will sign. We publish it here so the next founder reading it does not have to spend three months learning what we already know.
Adding AI authoring to your learning product?
Bring the workflow you want to automate — lesson plans, quizzes, captions, translations, per-student paths. We will return with a reference architecture, a build plan, and an honest number for what it costs.
The nine content workflows AI is already replacing
When we audit an edtech product for AI fit, these are the candidates we test first. Each one has shipped real-time-savings and quality wins for clients in the last 18 months.
| Workflow | What AI does | Typical time saved |
|---|---|---|
| Lesson plan drafting | Generates standards-aligned plans from a topic + grade | 60–80% |
| Assessment generation | Pulls concepts from source PDF, generates questions across Bloom levels | 70–90% |
| Slide decks & visuals | Drafts slide structure + image prompts from a lesson plan | 50–70% |
| Voice-over & dubbing | TTS narration with consistent voice; optional voice cloning | 90%+ |
| Captions & transcripts | Whisper-class ASR + alignment; human spot-check | 80–95% |
| Translation & localisation | LLM translation + native review; locale-aware examples | 70–85% |
| Accessibility (alt text, descriptions) | Auto alt text, audio descriptions, dyslexia-friendly rewrites | 60–80% |
| Source-material summarisation | Distils teacher-uploaded PDFs into key concepts & outlines | 70%+ |
| Per-student learning paths | Adaptive sequencing using grades + retrieval over curriculum | N/A — it lifts outcomes, not just speed |
The pattern across the table: AI eliminates the cold-start cost of every artefact and shifts the human contribution from drafting to review. The product implication is that authoring tools designed for “type from blank” are about to be replaced by tools designed for “review and approve”. The two UX paradigms are different products.
Reference architecture for AI-assisted authoring
Five layers, in this order, each with a clear contract.
Layer 1 — Source of truth. Curriculum standards, approved textbooks, your existing question banks, and any teacher-uploaded materials. Indexed in a vector database (Pinecone, Weaviate, Qdrant or pgvector) with strict per-tenant isolation.
Layer 2 — Generation. A retrieval-augmented LLM call: pull the relevant passages from layer 1, attach them to a Bloom-aware prompt template, run on a frontier model for high-stakes content (Claude, GPT-class, Gemini) or an open-weight model for cost-sensitive bulk work (Llama, Mistral, Qwen).
Layer 3 — Human review. Teacher dashboard showing the draft alongside the source passages it cites. One-click approve, edit, regenerate, or escalate to a subject-matter expert. This is where compliance lives.
Layer 4 — Distribution. Approved content lands in your CMS, exports as SCORM 2004 or xAPI for LMS interoperability, and as QTI for assessments. Without these formats, your content does not travel.
Layer 5 — Analytics & feedback. Learner records (xAPI Learning Record Store) feed back into layer 1, closing the loop on what works. This is what turns “an AI lesson generator” into adaptive learning.
Reach for this architecture when: a stakeholder asks “why can’t we just call the OpenAI API directly?”. Direct calls work for prototypes; layers 1, 3 and 4 are why production edtech survives a procurement review.
The tool matrix: what to use for which job
| Job | Default tools | When to use |
|---|---|---|
| Reasoning, lesson + quiz drafting | Frontier LLMs (Claude, GPT, Gemini) | Anything graded or shown to learners |
| High-volume bulk drafts | Open-weight models (Llama, Mistral, Qwen) with batch APIs | Cost-sensitive content factories |
| Voice-over & dubbing | ElevenLabs, Azure Neural TTS, Google TTS | Multi-language narration, voice consistency |
| Captioning, transcripts | Whisper, Deepgram, AssemblyAI | Video lessons, podcasts, live classes |
| Image generation | DALL-E, Stable Diffusion, FLUX | Diagrams, illustrations — never copyrighted likenesses |
| Avatar / talking-head video | HeyGen, Synthesia, D-ID | Repeatable instructor presence at scale |
| Adaptive learning paths | Custom RAG over learner records + curriculum | When you have learner-progress data to feed in |
| Off-the-shelf authoring | MagicSchool, Eduaide, Diffit, Curipod, Quizizz AI | When buying for end-teachers, not building a platform |
Personalisation: where the real outcome lift lives
AI-generated content that is the same for everyone is interesting; AI-generated content that adapts to the individual learner is what moves test scores and retention metrics. Three patterns are doing the heavy lifting.
Adaptive sequencing. Knewton-style prescriptive recommendation: based on prior performance, what is the next concept this learner is likely to master? Years of replicated studies show double-digit gains in test results when paths adapt rather than march in lockstep.
Socratic tutoring. Khan Academy’s Khanmigo and Carnegie Learning’s tutors push back on wrong answers with a question, not the right answer. Dramatic gains in conceptual understanding compared with answer-first tutoring.
Log-contextualised RAG. The current research frontier: feed the LLM not just the curriculum, but a window of the learner’s recent answers, mistakes, and interaction logs. The result is responses calibrated to that learner’s level — not the median student.
Quality assurance: why hallucinations are an architecture problem
An LLM left alone will confidently invent facts, citations and named figures. In educational content, that is unacceptable: the whole product proposition is “trust this”. Five practices, layered, keep the failure rate low.
1. Retrieval grounding. Never let the model answer from its weights alone. Pull a passage from your curated source, attach it to the prompt, and require the answer to cite it.
2. Rubric-based evaluation harness. Run every release’s outputs through a fixed set of test prompts; score for factuality, alignment to learning objective, reading level, bias and tone. Fail builds when scores drift.
3. Bias and fairness checks. Generated examples should not lean on stereotypes (gendered jobs, cultural defaults). Lightweight bias linters catch the obvious cases; periodic human audits catch the subtle ones.
4. Plagiarism and originality screens. Especially important for assessments. Tools like Copyleaks have wide LLM coverage; do not trust them blindly — false-positive rates around 5% mean human adjudication.
5. Human-in-the-loop sign-off. A teacher or SME approves before publish. The interface should make “approve” one click and “edit” cheap. If editing is expensive, the AI is not saving anyone time.
Reach for this stack when: shipping AI authoring to regulated buyers (school districts, governments, healthcare training). The five practices together are what passes a procurement audit; any one of them alone usually does not.
Need an AI authoring layer with a real review pipeline?
We build retrieval, evaluation harnesses and teacher review UIs designed for FERPA, COPPA, GDPR and EU AI Act. Tell us your product; we will tell you the smallest production-grade slice.
Bloom’s Taxonomy as a prompt framework
Most teams skip this and end up with AI content that hovers at the “Remember” and “Understand” levels — the part of pedagogy that is least valuable. Treating Bloom levels as named prompt patterns lifts the cognitive ceiling of generated content significantly.
| Level | Prompt pattern | AI reliability |
|---|---|---|
| Remember | List, define, recall from a grounded source | Excellent |
| Understand | Explain to a [age] learner using [analogy] | Excellent |
| Apply | Build a scenario that requires [concept]; ask for a solution | Good |
| Analyse | Compare X and Y; cite differences from grounded sources | Good with retrieval |
| Evaluate | Judge [argument] against [criteria]; cite evidence | Fair — needs rubric |
| Create | Design [artefact] following [constraints] + [rubric] | Fair — needs SME review |
A practical lesson is to expose the Bloom level as a first-class control in your authoring UI. Teachers learn quickly to ask for “Apply-level questions on photosynthesis” rather than “more questions”.
Reach for Bloom-level controls when: teachers complain that AI-generated quizzes feel shallow. Nine times out of ten the prompt was too generic; the level dial fixes it without retraining anything.
Accessibility: WCAG 2.2 AA without the manual labour
AI does the boring 80% of accessibility work that teams skip when they are racing to ship. The trick is treating accessibility as a generation task, not a remediation task.
Alt text on every image. A vision-language model produces a draft alt text (and a flag for “decorative”) at upload time. Teachers can edit, but the floor is no longer empty alt attributes.
Captions and transcripts on every video. Whisper-class ASR plus a forced-alignment step produces synced captions; transcripts are a free byproduct. Manual review remains for high-stakes content (assessment, regulated training).
Audio descriptions for video. An LLM drafts a script of visual events from the original transcript; a TTS voice reads it. Production teams used to spend hours on this manually.
Dyslexia-friendly rewrites and reading-level adjustment. A button on every paragraph: “simplify to grade 5”, “break long sentences”, “add definitions”. The cost of differentiation drops to seconds per artefact.
Multi-language at parity. An LLM-driven localisation pass can produce a credible draft of every artefact in the target language; a native-speaker pass closes the last 5%.
Compliance: FERPA, COPPA, GDPR and the EU AI Act
FERPA (US). Education records cannot be shared with vendors without an explicit data-processing agreement. Many AI tools are “school officials” under FERPA only if you sign one. Build the contract requirement into your buyer-onboarding flow.
COPPA (US, under-13). No commercial profiling, parental notice, deletion rights. The default for K-12 products is to assume COPPA applies and treat any deviation as legal review.
GDPR (EU). Lawful basis for processing learner data, data minimisation, the right to deletion, cross-border transfer rules. AI inference on learner data is processing — document it.
EU AI Act. AI used to assess learning outcomes, to determine educational placement or to monitor behaviour is classified as high-risk in Annex III. Expect to need a risk-management plan, human oversight, transparency documentation and a fundamental-rights impact assessment by the time enforcement bites.
Practical defaults we ship. Per-tenant data isolation, no learner data in foundation-model training pipelines, retrieval-only access to district data, audit log of every AI generation, role-based approval before publish, opt-in consent for under-18 user data. None of this is optional in 2026.
Mini case: layering AI authoring into BrainCert
When we engaged on BrainCert, the platform was already a mature LMS with virtual classrooms, courses and assessments. The brief was to add AI-assisted authoring without compromising the platform’s enterprise-grade compliance posture.
The shape of the work matched the architecture above. A retrieval layer wraps the customer’s existing course materials, so generated content is grounded in their curriculum, not a generic web crawl. A teacher review surface sits between the LLM and the published lesson, with one-click approve, edit and regenerate. Compliance was not bolted on at the end — per-tenant isolation, audit logs and explicit consent flows lived in the spec from day one.
The pattern repeats on most of our edtech engagements. The Web 2.0 LMS does not change; the Web 4.0 brain on top is what unlocks the productivity story for teachers and the differentiation story for the platform. We described the broader version of this approach in our AI integration playbook.
Market reality — where adoption actually is
The commonly cited figures put AI-in-education on a 30%+ CAGR through the next decade, growing from a low-single-digit-billions market in 2025 toward triple-digit billions by 2035. That is the sector tailwind. The more useful numbers are adoption rates by segment.
K-12 teachers. Roughly 60% used AI for lesson planning in the 2024–25 school year (RAND), a number that has since increased. The product implication is that an authoring tool that does not include AI feels archaic to its primary user.
K-12 students. Around 84% of high school students use AI for some part of their schoolwork. Whether this is “cheating” or a new literacy depends on whether your platform turns it into a teaching moment or pretends it is not happening.
Higher education. Faculty adoption lags student usage; ~49% of faculty use AI in their teaching practice; ~86% of students use it for research and brainstorming. The opportunity for a platform is on the teacher-side workflow.
Corporate L&D. The fastest-shifting category. Compliance training, onboarding, sales enablement and certification all are migrating to AI-drafted content with human SME review. The buyer is the chief learning officer, the metric is throughput.
Cost shape and unit economics
Three line items dominate the run-rate of an AI authoring product, and getting them right early is the difference between healthy gross margin and an OpenAI bill that eats your business.
1. Inference. Per-call LLM and TTS costs. Route easy work to small / open models; reserve frontier models for high-stakes generation. Cache aggressively. Use batch APIs for non-interactive jobs — major providers offer roughly half-price for batch.
2. Storage and retrieval. Embedding indexes, vector databases, learner-progress logs. Start with pgvector to avoid a third-party bill; graduate to a dedicated vector DB only when you can measure the latency or scale problem.
3. Human-review labour. Often forgotten. Teacher-review minutes per piece of content are real costs whether the teacher is your customer (their time) or your reviewer (your payroll). Tracking review minutes per artefact is one of the most underrated KPIs in the category.
A small caveat on price guidance: we are deliberately not putting hard dollar figures here. Our Agent-Engineering practice means the build cost is shorter and lighter than traditional agencies, and we would rather quote precisely after seeing a brief than commit to round numbers in public.
A decision framework — five questions before you start
1. What is the buyer’s actual pain? “We want AI” is not a brief. “Our teachers spend 5 hours per week on quiz authoring” is. Quantify the workflow first; pick the AI feature second.
2. What is the data you can ground on? Curriculum standards, your own question banks, customer-uploaded materials. If the data is thin, the AI will hallucinate; build the data foundation before the generation features.
3. What does the human reviewer’s minute look like? If approving a draft takes longer than writing it from scratch, the AI is making things worse. Design the review UI before the generation prompt.
4. Which jurisdictions and audiences are in scope? Under-13 + EU is the strictest combination. Design for that and the rest comes for free.
5. What is your model strategy? Single-vendor (fast to start, vendor-lock risk), multi-vendor (fallback resilience, more ops), open-weight (cost and privacy, more engineering). Pick deliberately at sprint zero.
Five pitfalls we see edtech teams repeat
1. Skipping retrieval. Calling the LLM directly without grounding seems faster; in regulated content it ships hallucinations to learners and regulators. Retrieval is not optional.
2. Treating AI as a writer rather than an editor. The strongest authoring UX is “here is a draft, edit it” — not “type a prompt”. Wireframe the review flow first.
3. No evaluation harness. Models change; vendor APIs change; without an automated test suite for your prompts, you ship regressions and find out from teachers on Twitter.
4. Ignoring SCORM/xAPI/QTI. An AI-generated lesson that cannot be exported to a customer’s LMS does not count as authoring. Treat the export format as a first-class feature, not a footnote.
5. Compliance left to legal review at the end. Compliance is an architecture decision — per-tenant isolation, no training on customer data, audit logs, consent flows. Bolt-on compliance after the fact is the most expensive way to ship.
Already shipped an AI authoring feature and getting hallucination complaints?
We run rescue audits. Bring the prompts, the architecture and the failing cases. Within a week we will produce a fix plan that gets the failure rate under 2%.
KPIs that prove the AI layer is working
Quality KPIs. Hallucination rate under 2% on factual questions. Bias-flag rate under 1% on generated examples. Teacher edit-distance per draft under 20% (i.e., teachers accept the draft mostly as-is).
Productivity KPIs. Mean time to publish a finished lesson reduced by 50%+ vs. baseline. Lesson plans generated per teacher per week trending up. Reviewer minutes per artefact trending down.
Outcome KPIs. Learner engagement (sessions, time-on-task) on AI-generated content vs. legacy content. Test-score lift for cohorts on personalised paths vs. control cohorts. Drop-off rate per artefact — if AI-generated content has higher drop-off, your generation prompts are not working.
Buy, integrate or build — how to decide
Buy off-the-shelf authoring (MagicSchool, Eduaide, Diffit, Curipod, Quizizz AI). Right when you are an end-user buyer, not a platform vendor; you want value this quarter, not a moat.
Integrate model APIs and build the application layer. The default for nearly every edtech platform vendor in 2026. The differentiation is in your retrieval layer, your review UX, your compliance posture, your LMS integrations — not in the model itself.
Train or fine-tune your own model. Right when you have a defensible proprietary corpus and the model needs to be cheaper, smaller or specialised in ways APIs do not handle. The bar is high and the timeline is months, not weeks.
Reach for “integrate” when: you are a platform with a customer base. The leverage is in the application layer, not the foundation model. Custom training is the right call later, not first.
How to evaluate a software partner for this build
Ask for a real reference architecture diagram. Not a sales slide — the boxes, arrows and naming conventions they use in production. If they cannot draw it on a whiteboard, they have not done it.
Ask how they manage prompts. Versioned in source control, tested by an evaluation harness, scored before deploy. “We tweak the prompt in the UI” is a maintenance disaster waiting to happen.
Ask about compliance posture. They should have shipped FERPA-bound and GDPR-bound products. Ask which clients, even if anonymised. Our AI integration practice can show that pattern across edtech, healthcare and enterprise products.
Ask for the LMS export story. SCORM, xAPI, QTI. If they look blank, they have not shipped to schools at scale.
FAQ
Should we use a single LLM vendor or multiple?
For an MVP, single is faster. For production, plan for multi-vendor fallback — not for cost, but for reliability and to limit single-vendor lock-in. We typically design with one primary and one secondary frontier model, plus an open-weight option for cost-sensitive bulk work.
How do we stop the AI from inventing facts in lessons?
Retrieval-augmented generation (RAG) over a curated source library, plus an evaluation harness that flags responses without grounded citations, plus a human review step before publish. Any one of those alone is insufficient; the three together get hallucination rates under 2% on most subject areas.
Is AI-generated content allowed in regulated assessments?
It depends on the regulator and the use. Under the EU AI Act, AI used to assess learning outcomes is high-risk — allowed but with extra documentation and human oversight. In the US, FERPA and BIPA apply at the data level. The defensible posture is human-in-the-loop review of every AI-generated assessment item before it enters a graded context.
Will AI replace teachers or instructional designers?
No, but it changes their job. The teacher of 2027 spends less time drafting and more time reviewing, personalising and coaching. The instructional designer spends less time on first-draft content and more time on curriculum architecture, evaluation rubrics and pedagogy. Platforms that respect that shift will win procurement; ones that do not will be branded “cheat tools” and lose districts.
How do we keep AI authoring affordable at scale?
Three levers in order of impact: route easy tasks to smaller or open-weight models; cache aggressively (the same lesson topic gets generated thousands of times across your customers); use batch APIs for asynchronous jobs at roughly half the price of synchronous. Most cost surprises in production come from skipping caching, not from model price changes.
What happens when models are updated mid-product?
Quality changes — usually for the better, sometimes not. The defence is an evaluation harness: a fixed set of test prompts and expected outputs that you run on every model upgrade. If scores regress, hold back. Without this, you find out from your customers, which is the most expensive form of QA.
Do we need a custom model for our subject domain?
Almost never as a first step. A frontier model with retrieval over your domain corpus outperforms a fine-tuned small model in most cases. Custom training makes sense when you have a defensible proprietary corpus, predictable workloads, and a serving cost that genuinely justifies the investment.
How fast can Fora Soft ship a production-grade AI authoring layer?
For an MVP layered on an existing platform — retrieval, prompt library, teacher review UI, evaluation harness — we typically plan a 3–5 week discovery and 10–14 weeks of build. Adding multilingual, voice and analytics adds 4–8 weeks. Our Agent-Engineering practice compresses both phases by roughly 15–20% compared with traditional agency timelines.
What to Read Next
Edtech AI
Customisable AI lesson content creation
A deeper dive into the lesson-generation workflow specifically — templates, scaffolding, and per-class adaptation.
AI integration
How we improve software products with AI features
The general playbook for adding AI to a Web 2.0 product without rebuilding it.
Context
Web 1.0 to Web 4.0 for software founders
Where AI authoring fits in the broader stack of web eras and capabilities.
Planning
Wireframing in software development
The discovery discipline that decides whether your AI feature ships on plan.
AI + content
The essential guide to AI-powered SEO
A companion read on how AI changes the inbound side of an edtech product.
Ready to ship AI authoring that earns trust?
AI-assisted educational content is no longer the future of edtech — it is the present. The teams that win the next two years will not be the ones with the cleverest prompts. They will be the ones with the best retrieval, the cleanest review UX, the boring compliance plumbing and the honest analytics that prove the AI layer is improving outcomes, not just speeding up output.
If you are building or extending an edtech product and want to skip the months of trial-and-error, bring the brief and the constraints. We will return with a reference architecture, a build plan, and an honest cost shape that reflects Agent-Engineering speed.
Let’s build the AI authoring layer your learners deserve.
One 30-minute call, three answers: what to ship first, what the architecture looks like, and what an honest cost-and-timeline shape is — with Agent Engineering built in.


.avif)

Comments