Merging Multiple Content Sources into One API: Lessons Learned

Merging Multiple Content Sources into One API: Lessons Learned · Jason Guo · 2026-06-19

How I stopped data from living in silos — and what I learned along the way A few weeks ago, I ran into a surprisingly common problem: I had content scattered across two completely different systems, and my website only knew how to talk to one of them. I was writing articles in my admin dashboard — personal essays, notes, creative writing — and none of it was showing up on the blog page. Not a single post. Meanwhile, the blog was happily displaying content from a third-party CMS platform called InMind, blissfully unaware that a whole other world of content existed in my local database. The fix sounds simple on paper: just show content from both places. But the moment you try to actually do it, you realize the two systems speak completely different languages. This is the story of how I untangled that mess. --- The Problem: Two Systems, Zero Cooperation My website runs on Cloudflare's infrastructure — a platform that lets you host a website and a small database right at the edge of the internet, close to your visitors. The backend is powered by a lightweight framework called Hono , which handles all incoming requests. Content was coming from two places: InMind CMS — a third-party content platform I was already using. When you ask it for an article, it hands back a fully formatted, ready-to-display chunk of HTML. A local database (D1) — Cloudflare's built-in SQLite database. This is where my admin panel stored articles and writings. The content was saved as Markdown — plain text with some formatting symbols. The blog page fetched content from one API endpoint: GET /api/articles . The problem? That endpoint only ever talked to InMind. It had never been wired up to the local database at all. So every article I wrote myself? Lost. Invisible. Gone from the reader's perspective. --- Why Simply "Adding Both" Isn't Simple You might think the fix is just: fetch from both, combine, done. But when you actually sit down to do it, a handful of thorny details show up uninvited. The two systems don't agree on naming conventions. InMind sends data with underscores in field names — things like published_at and title_zh . My local database, processed through a tool called Drizzle ORM, uses camelCase — publishedAt and titleZh . Same information, different spellings. If you mix them carelessly, the frontend code quietly breaks in ways that are hard to debug. The content itself is in different formats. InMind delivers pre-rendered HTML — the web's native language, ready to drop into a page. My local database stores Markdown — a lightweight writing format that needs to be converted before display. You can't treat them the same way. My local database has no "excerpt" field. InMind articles come with a tidy summary. Local articles don't — they just have a full body of text. To show a preview card on the blog listing, I had to generate that summary by stripping out all the Markdown symbols and grabbing the first 200 characters of plain text. My "writings" table has a different sense of time. I have a separate table for creative writing — poems, essays. For those, the date that matters is when I wrote the piece , not when I uploaded it to the database. A poem written in 2019 but uploaded last week should show its 2019 date. This required pulling from a different date field than the one used for regular articles. Slug collisions are theoretically possible. Every article has a unique identifier called a slug — the bit that appears in the URL, like /articles/my-first-post . If the same slug existed in both InMind and my local database, I'd end up with duplicates. I needed a clear rule for who wins. --- The Solution: A Universal Translator The core insight was this: instead of changing how content is stored, I'd change how it's presented. I wrote what I call a "normalizer" for each data source — a small function that acts like a universal translator. No matter what format the raw data arrives in, the normalizer converts it into one standard, consistent shape before it ever leaves the API. The frontend code never has to worry about where content came from. Here's what that looks like for a regular article from the local database: function normalizeLocalArticle(row, detailed = false) { return { id: `local-${row.id}`, slug: row.slug, title: row.title, title_zh: row.titleZh ?? null, excerpt: extractPlainText(row.content).slice(0, 200) || null, feature_image: row.coverUrl ?? null, published_at: row.createdAt ?? null, tags: parseTags(row.tags), source: "local", html: null, // signals: this is Markdown, not HTML ...(detailed ? { content: row.content ?? "", content_zh: row.contentZh ?? "", } : {}), } } Notice the html: null field. That's a deliberate signal to the frontend: this content is Markdown, please render it accordingly. InMind articles, by contrast, have an html field filled with actual content. The frontend checks that field and picks the right rendering method. --- Fetching Everything at Once (Without Slowing Down) Once the normalizers were in place, the new listing endpoint became straightforward. The key was fetching all three sources simultaneously , not one after another. Think of it like ordering from three different restaurants at the same time instead of waiting for the first delivery before calling the second. The total wait time becomes whatever the slowest restaurant takes — not the sum of all three. const [inmindResult, localRows, writingRows] = await Promise.all([ fetchFromInMind("/content/posts"), fetchLocalArticles(db), fetchLocalWritings(db), ]) After everything arrives, the results get merged. The rule for handling duplicate slugs is explicit and documented: local articles win over writings, writings win over InMind. My own content always takes priority. const seen = new Set() const merged = [] for (const article of [...localArticles, ...localWritings, ...inmindArticles]) { if (article.slug && !seen.has(article.slug)) { seen.add(article.slug) merged.push(article) } } merged.sort((a, b) => new Date(b.published_at).getTime() - new Date(a.published_at).getTime() ) Finally, everything gets sorted by date, newest first, and returned as a single clean list. --- The Detail Page: Local First, InMind as Fallback The individual article page works differently. Rather than fetching all sources simultaneously and merging, it checks sources in order of priority — and stops as soon as it finds a match. Local database first. If found, return it immediately. No need to call InMind at all. // Check both local tables at the same time const [articleRows, writingRows] = await Promise.all([ db.articles.findBySlug(slug), db.writings.findBySlug(slug), ]) if (articleRows[0]?.published) { return normalizeLocalArticle(articleRows[0], true) } if (writingRows[0]?.published) { return normalizeWriting(writingRows[0], true) } // Only reach out to InMind if nothing local was found return await fetchFromInMind(`/content/posts/${slug}`) This matters for performance. InMind is an external network call — it takes time and uses up my rate limit. There's no reason to make that call if the content already exists locally. --- Bilingual Content, Handled Gracefully My site serves readers in both English and Chinese. The local database has separate fields for both languages — title , content in English, and titleZh , contentZh in Chinese. The normalizer maps these to a consistent shape. The frontend simply checks which language is active and picks the right field — falling back to English if a Chinese translation doesn't exist yet. const displayTitle = lang === "zh" ? article.title_zh || article.title : article.title It's a small thing, but it means writers can add translations incrementally without anything breaking. --- Rendering: Two Formats, One Component The last piece was making the article detail page smart enough to handle both HTML and Markdown. InMind content arrives as HTML — just drop it onto the page. Local content arrives as Markdown — it needs a renderer. The signal I embedded earlier ( html: null vs html: "<p>..." ) makes this decision clean: {article.html ? ( // InMind content — already HTML, render directly <article dangerouslySetInnerHTML={{ __html: article.html }} /> ) : ( // Local content — Markdown, needs conversion <article> <ReactMarkdown>{markdownContent}</ReactMarkdown> </article> )} No special cases in the routing, no content-type headers to parse. The data carries its own instructions. --- Five Things I'd Tell My Past Self 1. Normalize at the boundary, not the storage layer. Don't try to force every content source into the same database format. Let each source be what it naturally is — InMind's HTML, my Markdown — and do the translation work at the API layer. Consumers never need to care. 2. Build in fault tolerance from the start. Adding .catch(() => []) to every database query means a single failing source can't take down the entire page. A broken InMind connection still shows local articles. A corrupt local table still shows InMind content. Resilience is a design choice, not an afterthought. 3. Make priority rules explicit and visible. Implicit ordering is a future bug waiting to happen. Write a comment that says "local wins over writings, writings win over InMind" and put it right next to the merge code. Your future self will thank you. 4. Concurrent requests are almost always worth it. Three sequential API calls in a row can easily add 600–900ms to a page load. Three concurrent calls take whatever the slowest one takes. For a content-heavy page, this difference is real and user-visible. 5. The detail endpoint has different rules than the list endpoint. The list page needs everything at once — parallel makes sense. The detail page just needs one article — so check cheaply first (local database), reach out expensively second (external API). Different problems, different shapes. --- Wrapping Up What started as a bug — why aren't my articles showing up? — turned into a small lesson in API design. The real challenge wasn't the code. It was resisting the temptation to restructure how content was stored, and instead finding the right place to absorb complexity: the normalizer layer, right at the boundary between data sources and the rest of the application. The site now pulls from three sources, speaks two languages, renders two content formats, and the frontend code has no idea any of this is happening. That invisibility is, I think, the best possible outcome. --- Tags: Cloudflare, Hono, D1, TypeScript, Full-Stack, API Design