code-challengemicro-appsalgorithms

Micro-App Code Challenge: Build a Restaurant Recommender Using Only Public APIs and a Small LLM

UUnknown

2026-01-27

11 min read

Timed micro-app challenge: build a lightweight restaurant recommender using public APIs, caching, rate-limits, and a small LLM.

Stop spinning your wheels deciding where to eat — build a tiny recommender in a timed challenge

Decision fatigue is real. Teams, friend groups, and diners waste minutes (hours cumulatively) arguing over restaurants. If you're a developer who wants to build something small, useful, and demonstrative of modern skills — API integration, caching, rate-limit handling, and prompt engineering for a small LLM — this timed micro-app challenge is for you.

The elevator pitch

In this code challenge you'll build a lightweight micro-app that recommends restaurants using only public place-data providers (Foursquare, Yelp, and OpenStreetMap-derived services) and a small LLM for re-ranking and personalization. The constraints mimic real-world limits: strict API rate limits, token budgets for LLM calls, and a tiny deployment footprint suitable for a hackathon, pair-programming session, or interview exercise.

Why this challenge matters in 2026

Micro-apps and vibe-coding went mainstream after 2023; by late 2025 and into 2026 the shift accelerated as:

On-device and small LLMs (quantized 4-bit models like popular 7B variants) became robust enough for short context tasks.
Public place-data providers (Foursquare, Yelp, and OpenStreetMap-derived services) stabilized free tiers that support lightweight apps.
Edge computing and serverless functions made low-latency, cheap micro-app deployments trivial.

That means in 2026 you can build something meaningful in a few hours and ship it to a personal audience — just like Rebecca Yu's Where2Eat — while demonstrating key engineering skills recruiters and teams look for.

Challenge brief (90–180 minute timed edition)

Goal: Build a micro-app that returns 5 personalized restaurant recommendations for a user-specified location and preferences, using only public APIs for place data and a small LLM for light personalization and ranking.

Requirements

Use at least one public place-data API (Foursquare, Yelp, Google Places — note API keys required) or OpenStreetMap alternatives.
Query the place API for candidates (up to 20 results) and then call a small LLM to re-rank and summarize the top 5.
Implement caching with TTL and stale-while-revalidate semantics.
Respect API rate limits: implement exponential backoff and a simple token-bucket or leaky-bucket rate limiter.
Keep LLM prompts small and deterministic; return structured JSON (no free-text parsing) from the LLM.
Implement minimal UI: CLI, server-rendered page, or single-page app — keep the scope micro. See notes on micro-event landing pages for ideas about lean frontends and conversion flows.

Scoring rubric

Functionality (40%): Returns consistent recommendations and handles errors.
API & caching (20%): Proper caching, TTL, and rate limit handling. See serverless vs dedicated patterns for caching and infra trade-offs: Serverless vs Dedicated Crawlers.
Prompt engineering (20%): Small, effective prompt; structured output; token efficiency.
Code quality & tests (10%): Readability, comments, and at least one unit test or integration test using mocks.
UX & docs (10%): Clear README with run instructions and a short demo gif or screenshots. If you need inspiration for quick demos and creative assets, check this roundup: Free creative assets for venues.

Architecture: micro, cheap, and testable

Keep it minimal. A simple architecture that works well for the challenge:

Frontend: minimal SPA or server-rendered HTML to accept location + preferences
Backend (serverless or small container):

/search -> fetch candidates from place API (cached)
/rank -> call small LLM to re-rank and summarize (cache final responses)

Cache: Redis (preferred), or in-memory LRU for timeboxed demo
Rate limiter: token-bucket per external API

Why split search and rank?

Separating raw candidate retrieval (cheap, cached, many results) from LLM-based ranking (costly, token-limited) lets you maximize value while minimizing LLM calls. Use search cache aggressively and run rank only when necessary (e.g., new user preferences or cache miss). This pattern mirrors how creators and local sellers avoid unnecessary remote work — see the From Pop-Up to Platform playbook for similar separation of concerns in event flows.

Practical implementation — Node.js + Express + Redis (starter)

The examples below are compact and practical for a timed session. Replace API keys and provider placeholders as needed.

1) Fetch candidates from Foursquare (example)

// server/api/foursquare.js
const fetch = require('node-fetch');
const FSQ_KEY = process.env.FSQ_KEY; // set in env

async function searchPlaces({ ll, query, limit=20 }) {
  const url = new URL('https://api.foursquare.com/v3/places/search');
  url.searchParams.set('ll', ll); // "lat,lng"
  url.searchParams.set('query', query);
  url.searchParams.set('limit', limit);

  const res = await fetch(url.toString(), {
    headers: { 'Accept': 'application/json', 'Authorization': FSQ_KEY }
  });

  if (!res.ok) throw new Error(`Foursquare error ${res.status}`);
  const data = await res.json();
  return data.results || [];
}

module.exports = { searchPlaces };

2) Cache wrapper with Redis and stale-while-revalidate

// server/cache.js
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);

async function cached(key, ttlSeconds, fetcher) {
  const raw = await redis.get(key);
  if (raw) {
    // Return cached value but trigger background refresh if close to expiry
    const meta = JSON.parse(raw);
    if (Date.now() - meta.fetchedAt > (ttlSeconds - 30) * 1000) {
      fetcher().then(value => redis.set(key, JSON.stringify({ value, fetchedAt: Date.now() }), 'EX', ttlSeconds)).catch(()=>{});
    }
    return meta.value;
  }

  const value = await fetcher();
  await redis.set(key, JSON.stringify({ value, fetchedAt: Date.now() }), 'EX', ttlSeconds);
  return value;
}

module.exports = { cached };

3) Rate limiter (token bucket)

// server/rateLimiter.js
class TokenBucket {
  constructor(capacity, refillIntervalMs, refillAmount) {
    this.capacity = capacity;
    this.tokens = capacity;
    this.refillIntervalMs = refillIntervalMs;
    setInterval(() => {
      this.tokens = Math.min(this.capacity, this.tokens + refillAmount);
    }, refillIntervalMs);
  }

  tryRemove(count=1) {
    if (this.tokens >= count) { this.tokens -= count; return true; }
    return false;
  }
}

// Example: 60 requests per minute -> capacity 60, refill every 1000ms with 1
const fsqBucket = new TokenBucket(60, 1000, 1);
module.exports = { fsqBucket };

4) LLM re-ranker: compact prompt and structured JSON output

Small LLMs are fast but token-limited. The trick: send only essential context, use a strict system message, and require a concise JSON output. Always enforce a token cap at the API call level.

// server/llm.js
const fetch = require('node-fetch');
const LLM_ENDPOINT = process.env.LLM_ENDPOINT; // e.g., your small-model API
const LLM_KEY = process.env.LLM_KEY;

async function rankPlaces({ userPref, places }) {
  // Build a compact prompt
  const system = `You are a concise restaurant recommender. Given user preferences and candidate places, return a JSON array with top 5 ranked items. Use only the fields: id, score (0-100), reason (max 80 chars). Do not add extra text.`;

  const input = {
    userPref: userPref, // e.g., { cuisine: 'korean', budget: '$$', vibe: 'casual' }
    places: places.map(p => ({ id: p.fsq_id || p.id, name: p.name.slice(0,40), categories: (p.categories||[]).map(c => c.name).slice(0,3) }))
  };

  const body = {
    model: 'small-llm-7b',
    max_tokens: 256,
    temperature: 0.1,
    messages: [
      { role: 'system', content: system },
      { role: 'user', content: JSON.stringify(input) }
    ]
  };

  const res = await fetch(LLM_ENDPOINT, {
    method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${LLM_KEY}` },
    body: JSON.stringify(body)
  });

  if (!res.ok) throw new Error('LLM error: ' + await res.text());
  const j = await res.json();
  // provider-dependent: extract assistant content
  const content = j.choices?.[0]?.message?.content || j.output || '';
  try { return JSON.parse(content); } catch (e) { throw new Error('LLM returned non-JSON'); }
}

module.exports = { rankPlaces };

Prompt engineering under constraints (practical tips)

Prompting an LLM in 2026 often means working with smaller models or quantized weights. This requires a shift from verbose few-shot prompts to compact, structured instructions.

Use strict system messages. Tell the LLM exactly what JSON fields to return and ban free text.
Compress inputs. Send only required fields (IDs, short names, categories). Pre-process long names and descriptions.
Prefer examples sparingly. Few-shot can be helpful but costs tokens — favor a single, tiny example if necessary.
Enforce output formats. Parseable JSON avoids brittle string parsing in your code.
Set temperature low. For deterministic rankings, 0.0–0.2 is best.

Small LLMs + strict prompts = deterministic, low-cost personalization suitable for micro-apps.

Handling rate limits and failures

When you depend on public APIs, you must treat limits and outages as first-class citizens:

Token-bucket per API: Protect your quota and smooth bursts. For architecture and backend patterns that prioritise low-latency sellers and edge workloads, see Designing Resilient Edge Backends.
Exponential backoff with jitter: For transient 429/503 responses, retry with randomized delays. These techniques are common in serverless vs dedicated setups: Serverless vs Dedicated Crawlers.
Circuit breaker: If an API consistently fails, temporarily disable calls and serve cached data or a degraded experience.
Graceful fallback: If the LLM fails, fall back to a heuristic ranking (distance + rating) from the place API.

// simple retry pattern
async function fetchWithRetry(url, opts, retries=3) {
  for (let i = 0; i <= retries; i++) {
    const res = await fetch(url, opts);
    if (res.ok) return res;
    if (![429, 502, 503, 504].includes(res.status)) throw new Error(await res.text());
    const jitter = Math.random() * 300;
    await new Promise(r => setTimeout(r, Math.pow(2, i) * 100 + jitter));
  }
  throw new Error('Failed after retries');
}

Testing, reproducibility, and demos

For a short challenge you still want deterministic behavior for reviewers:

Mock place API responses in tests (nock for Node, responses for Python).
Mock LLM outputs with canned JSON to verify parsing and UI rendering.
Record a 30–60s demo (gif or mp4) that shows the flow: search → cache hit/miss → rank → UI. If you need compact camera or demo gear, this PocketCam Pro review is a practical reference.

Deployment: keep it tiny

Deploy to a free or low-cost tier to keep the demo lightweight:

Vercel/Netlify edge functions for frontend + serverless rank endpoint
Cloud Run / Fly.io for a containerized service that needs Redis
Local demo: run with Docker Compose (app + redis) and include a script to generate env file

Advanced tweaks and 2026 trends to try (if you have extra time)

Client-side personalization: Store user preference vectors locally and use an on-device small LLM to re-rank without remote calls. This mirrors neighbourhood creator strategies and short-form food creators experimenting with local personalization: Neighborhood pop-ups & food creator economy.
Vector search for menu vibes: Embed short descriptions and run a lightweight vector similarity in Redis Vector or Weaviate to surface vibe matches. Small food brands increasingly rely on listings and packaging signals — see how local listings matter.
Privacy-first mode: Keep user preferences on-device and only send anonymized queries to APIs.
Quantized model inference at the edge: Experiment with 4-bit quantized Llama-family models running in an edge container for super low-latency ranking. Edge backend patterns and low-latency designs are covered in this playbook: Designing Resilient Edge Backends.

Example challenge constraints (for judges)

Make the evaluation fair by standardizing constraints:

Time limit: 90–180 minutes.
Allowed libraries: any HTTP client, Redis client, and an HTTP server framework. No paid data sources beyond free-tier API keys.
LLM budget: 500–1000 tokens per submission (enforced by model param or mock).
Must run locally with docker compose or via a one-line deploy.

What to look for in solutions — practical rubric for reviewers

Does the app avoid unnecessary LLM calls by caching candidate searches?
Are prompts compact and deterministic, and does the app handle malformed LLM responses safely?
Is rate-limiting implemented per external API, and are retries bounded with jitter?
Is the app reproducible locally and documented clearly in the README? Look for tidy demo assets and short-form demo guidance like in this roundup: Free creative assets for venues.

Real-world case study: glancing at Rebecca Yu's Where2Eat (inspiration, not a clone)

Rebecca Yu's week-long vibe-coding approach highlights an important lesson: build what you need, iterate fast, and ship. Her Where2Eat app prioritized speed over completeness — minimal dataset, immediate feedback loops, and iterating UI with friends. For this challenge, replicate that ethos: prioritize a working pipeline (search → cache → rank → display) over exhaustive features. If you're demoing live or streaming a quick walkthrough, consider low-latency live stacks and edge-first coverage patterns: Live Streaming Stack.

Common pitfalls and how to avoid them

Over-loading the LLM: Don’t send full descriptions for 20 places — compress. Use IDs and short metadata.
No caching: Causes obvious rate-limit issues. Cache aggressively and use stale-while-revalidate.
Unparseable LLM output: Enforce JSON; validate and fallback to heuristics if parsing fails.
Hard-coded rate limits: Use configuration per API so you can adapt during the challenge.

Deliverables checklist (what to submit)

Source repo with README and run instructions
Docker compose or one-line deploy script
Short demo gif (30–60s)
Tests showing mocked API and mocked LLM behavior
Note (1 paragraph) describing design trade-offs

Wrapping up: why this micro-app is a great portfolio piece

This project signals practical skills employers want in 2026: API integration, resilient engineering under rate limits, caching patterns that scale, and prompt engineering for smaller models. It also demonstrates product sense: you shipped a small, useful app under constraints.

Actionable takeaways

Split candidate search from LLM ranking to minimize cost and latency.
Cache aggressively and use stale-while-revalidate for freshness without extra LLM calls.
Make LLM prompts compact, deterministic, and return JSON to ensure robustness.
Implement token-bucket rate limiting and exponential backoff with jitter to survive public API hiccups. For patterns and trade-offs, see Serverless vs Dedicated Crawlers and Designing Resilient Edge Backends.
Keep the app deployable in a single command — judges love reproducibility. If you need inspiration on moving from pop-up demos to repeatable revenue flows, check From Pop-Up to Platform.

Call to action

Ready to try the timed challenge? Fork a starter repo, set up one public API key (Foursquare or Yelp), and an LLM endpoint (use a free small model or a mocked local server). Build your micro-app in 90–180 minutes, record a quick demo, and share it with thecoding.club community. Post your repo link and demo — we'll review, give feedback, and highlight creative solutions. Need ideas for demo gear or quick camera kits? See this practical review: PocketCam Pro & community kit.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.