Guide — me.md

Before you start

What you need

A corpus of personal files. Notes, journals, project files, emails — whatever you've accumulated. The more you have, the richer your me.md will be. This works best with hundreds or thousands of files, but even a few dozen will produce useful results.
An AI coding tool that can read local files. The pipeline was built with Claude Code, which can read files on your machine, spawn parallel sub-agents, and work through large tasks autonomously. Any AI tool with file access and long-context capability will work.
Your files in a readable format. Markdown and plain text are ideal. HTML works. PDFs are readable but noisier. Images and binaries will be skipped.

Organise first (a little)

You don't need a perfectly organised archive, but the pipeline works better if your files have some folder structure. If everything is in one flat directory, spend 30 minutes creating rough groupings — by topic, time period, or source. The pipeline processes folders as batches, so structure helps.

Time and cost

Processing a large corpus through an LLM takes time and tokens. For a ~30k file vault, expect the full pipeline to take several hours of AI processing time. You can reduce this significantly by being aggressive at the triage stage — the less you process, the faster and cheaper it is. Start with your highest-value content and expand from there.

Stage 0

Folder Annotation

This is the most important preparatory step, and the one most people skip. Without it, every downstream agent is guessing about what it's reading.

The problem

A folder of markdown files could be your original writing, imported articles, course notes, client deliverables, or reference material. A folder called notes might contain your personal reflections or someone else's lecture notes you saved. Without annotation, AI agents will misattribute content — treating saved articles as your opinions, or reference material as your original work.

The technique

Walk through your folder structure and for each major folder (or subfolder where the purpose isn't obvious from the name), create a short annotation file explaining:

What this folder contains
Whose content it is (your original work vs. imported/saved material)
The rough time period it covers
How relevant it is to personal context

You can do this manually, or have AI sample files and propose annotations for you to confirm. The key is that you — the human — verify the annotations, because only you know the provenance of your files.

Sample prompt

I'm annotating my personal file archive so that AI agents can correctly
interpret the content in later processing stages.

For this folder, sample 5-10 files and tell me:
1. What this folder appears to contain
2. Whether it seems like my original writing or imported/reference material
3. The rough time period (if detectable from dates in filenames or content)
4. Relevance to building a personal profile (high/medium/low/skip)

Folder path: [path]

After reviewing the AI's assessment, correct anything wrong and save the annotation. If you use Claude Code, a CLAUDE.md file in the folder works perfectly — Claude reads these automatically.

Output: An annotation file in each significant folder, explaining what's in it and whose content it is.

Stage 1

Inventory & Triage

Not all files contribute equally to a personal context document. A journal entry about a life decision is worth a hundred utility bills. This stage builds a prioritised manifest of what to process.

The technique

Crawl your file structure and classify every folder (or file group) into tiers:

Tier 1 — Process thoroughly. Journals, daily logs, annual reviews, personal reflections, blog posts, project documentation you wrote. These are rich in personal context — read and extract in detail.
Tier 2 — Extract facts. CVs, professional docs, admin records with timeline data, social media exports. These contain useful facts (dates, roles, names, skills) but don't need deep summarisation.
Tier 3 — Skim. Reference material, saved articles, course notes. What you chose to save reveals interests, but the content itself isn't about you.
Tier 4 — Skip. Exact duplicates, binary files, build artifacts, someone else's content that you happen to have a copy of. Don't waste tokens on these.

What to skip

Be aggressive here. Common things to skip entirely:

Binary files (images, videos, compiled code) — unless they have meaningful filenames
Build artifacts, cache files, .DS_Store, node_modules
Exact duplicate files (same content imported from multiple sources)
Someone else's content that you've saved (articles, ebooks, manuals)
Auto-generated content (email signatures, boilerplate, templates)

Sample prompt

I'm building a personal context document — a comprehensive profile of
who I am, distilled from my personal files. I need to triage my file
archive by relevance.

Here's my folder structure with file counts:
[paste folder tree with counts]

For each folder, classify as:
- TIER 1 (process thoroughly): Rich in personal context
- TIER 2 (extract facts): Contains useful biographical/professional facts
- TIER 3 (skim): Minor relevance, might contain a useful detail
- TIER 4 (skip): No personal context value

Also flag any folders where you'd need to see sample files before
classifying.

Output: A prioritised manifest — a list of every folder with its tier classification and processing order.

Stage 2

Folder-Level Extraction

This is where the bulk of the work happens. Each folder (or subfolder) is processed as an independent batch, and an AI agent extracts every personal-context fact it can find.

The technique

Process each folder from your manifest as a separate AI session. For large folders, break them into sub-batches that fit comfortably in a context window. The key is that each batch is independent — which means you can parallelise aggressively.

For Tier 1 content (thorough processing)

Have the agent read every file and extract everything that reveals personal context: facts, opinions, decisions, relationships, skills, timeline events, emotional states, patterns of thought.

For Tier 2 content (fact extraction)

Pull out concrete facts: dates, roles, company names, qualifications, locations, tools used, people mentioned. Don't summarise the prose — just extract the data points.

For Tier 3 content (skim)

Look at file names, folder structure, and a sample of content. What topics recur? What did you choose to save? This reveals interests without needing to read every file.

Parallelisation

If your AI tool supports sub-agents (Claude Code does), launch one agent per folder simultaneously. This is the single biggest time-saver in the pipeline. Instead of processing 15 top-level folders sequentially (hours), process them in parallel (the time of the slowest one).

Sample prompt

I'm building a personal context document about myself. You are processing
one folder from my personal file archive.

Folder: [folder name]
Context: [paste the folder's annotation from Stage 0]

Read every file in this folder and extract everything that reveals
personal context about me:

- Biographical facts (birth, education, locations, family)
- Professional history (roles, companies, what I actually did)
- Projects I've worked on and their outcomes
- Skills, tools, and technologies I've used (with rough proficiency)
- Beliefs, values, opinions, principles
- Interests and passions (especially recurring ones)
- Key relationships and collaborations
- Timeline events with dates (moves, career changes, milestones)
- How I communicate and think (voice, style, decision-making patterns)
- Emotional states, struggles, turning points

Rules:
- Be specific. Include names, dates, places, and concrete details.
- Distinguish between MY writing/views and reference material I saved.
- If a file is clearly someone else's work, note what it reveals about
  my interests (I chose to save it) but don't attribute the views to me.
- Preserve chronological information — note when things happened.
- Include direct quotes where they capture my voice or a strong opinion.
- Flag any contradictions or evolution you notice within this folder.

Output: One summary document per folder (or sub-batch), containing all extracted personal-context facts. Expect 50–100 of these across a large archive.

Stage 3

Thematic Synthesis

The folder summaries from Stage 2 are organised by source, but your me.md needs to be organised by theme. This stage reorganises, deduplicates, and merges.

The problem

The same fact will appear in multiple folder summaries. A project might be mentioned in your journals, your project folder, your CV, and your annual review. A career move might appear in admin records (change of address), personal notes (the decision process), and project files (the work itself). Simply concatenating summaries would produce a repetitive, disorganised mess.

The technique

Take all folder summaries and synthesise them into thematic sections. The suggested themes match the target document structure:

Identity & biography
Timeline (chronological narrative)
Career & professional history
Projects
Skills & expertise
Interests & passions
Beliefs, values & principles
Communication style & psychological profile
Key relationships
Current context

Key challenges

Deduplication. Merge, don't repeat. If five summaries mention the same project, produce one comprehensive account.
Temporal coherence. Arrange information chronologically within each theme. Show how things evolved over time.
Contradiction resolution. If a 2019 summary says you love React and a 2023 summary says you've switched to Svelte, capture the evolution, don't pick one.
Weighting. Information from Tier 1 sources (journals, reflections) should carry more weight than Tier 2 (admin records) for subjective sections like beliefs and interests.

Sample prompt

I'm building a personal context document. Below are summaries extracted
from different folders of my personal file archive. There is significant
overlap and repetition across these summaries.

Synthesise all of these into a coherent draft for the following section:
[SECTION NAME, e.g. "Timeline" or "Career & Professional History"]

Rules:
- Merge duplicate information. Don't repeat the same fact twice.
- When the same topic appears at different time periods, show the
  evolution chronologically.
- Prefer specific details (dates, names, numbers) over vague statements.
- Preserve direct quotes that capture my voice or strong opinions.
- Flag any contradictions you can't resolve so I can clarify.
- Aim for [X] words for this section.
- Prioritise information from journals and personal reflections over
  admin records for subjective content (beliefs, interests, emotions).

Source summaries:
[paste all folder summaries]

Run this once per thematic section. You can parallelise these too — each section synthesis is independent.

Output: 8–12 thematic section drafts, each covering one aspect of your personal context with information merged from all sources.

Stage 4

Final Composition

Assemble the thematic sections into a single, coherent document and compress it to fit your context budget.

The technique

Feed all section drafts to AI and have it compose the final document. This isn't just concatenation — it's editing for flow, removing cross-section redundancy, and enforcing the word budget.

Sample prompt

Below are thematic section drafts for my Personal Context Document
(me.md). Assemble these into a single, coherent document.

Guidelines:
- Target length: 4,000–6,000 words. This is a hard constraint.
- Every sentence must earn its place. If removing it wouldn't change
  how an AI assists me, cut it.
- Use clear markdown structure with ## headers for major sections
  and ### for subsections.
- Lead each section with the most important information.
- The Timeline section should read as a chronological narrative,
  not a list of bullet points.
- Don't editorialize or add interpretation. Stick to facts and
  direct quotes from my own writing.
- End with a "Current Context" section covering my present situation,
  active projects, and priorities — this is the most immediately
  useful section for AI interactions.
- Preserve specific details (dates, names, places, numbers) —
  these are what make the document useful rather than generic.
- Where my views or situation have evolved over time, show the
  trajectory, not just the current state.

Section drafts:
[paste all thematic section drafts]

Output: Your me.md — a single markdown document, 3,000–8,000 words, containing a comprehensive personal context profile distilled from your entire digital life.

After the pipeline

Review and correct

Read your me.md carefully. With multiple summarisation layers, facts can drift. Check that:

Dates and names are correct
Nothing has been invented or conflated
Important things haven't been lost in compression
The tone sounds like a description of you, not a generic profile
Sensitive information you don't want in the document has been removed

Edit directly. The pipeline gives you a strong first draft; your knowledge of your own life makes it accurate.

Where to put it

AI custom instructions / system prompts. Paste the full document (or a condensed version) so every conversation starts with context.
Project-level context files. In Claude Code, include it in or reference it from a CLAUDE.md file.
At the start of important conversations. For one-off sessions where you need deep context.

Keeping it current

Your me.md will decay. The "Current Context" section goes stale within weeks. A refresh schedule:

Monthly: Update Current Context (active projects, priorities, challenges).
Quarterly: Update Projects and Skills sections.
Annually: Full review. Major life changes warrant an immediate refresh.

You don't need to re-run the full pipeline for routine updates — just edit the relevant sections. Only re-run the pipeline if you've accumulated a large amount of new source material (e.g., a year of journals).

Writing principles for your prompts

These principles should guide the AI at every stage of the pipeline. Include them in your prompts or in a project-level instruction file.

Specificity over generality

Vague

"Experienced with several programming languages."

Useful

"Primary stack: Go and PostgreSQL (7 years). Frontend: Svelte (3 years, previously React). Comfortable with Python for scripting."

Specific details let the AI calibrate. Vague statements tell it nothing.

The why, not just the what

Flat

"Moved to Berlin in 2019."

Revealing

"Moved to Berlin in 2019 — wanted to be closer to the startup ecosystem after years of remote work."

Motivation reveals values and decision-making patterns.

Evolution, not just current state

Static

"I believe in test-driven development."

Shows growth

"Converted to TDD around 2020 after a production incident that tests-first would have caught. Now non-negotiable for anything that handles money."

Evolution shows how you think and learn, not just what you currently believe.

Honest proficiency levels

Don't let the AI inflate your expertise. If you're intermediate at something, the document should say so. An AI that thinks you're an expert will skip explanations you need.

Tensions and contradictions

Real people are contradictory. You might value minimalism but hoard side projects. You might preach work-life balance but work 60-hour weeks when excited. Instruct the AI to preserve these tensions — they give a more accurate model of how you actually behave.

Write for an AI audience

The document will be consumed by a system that uses it to calibrate responses. That means:

State things directly. The AI won't pick up on subtext.
Use consistent terminology. If you call something "my SaaS" in one place and "the platform" in another, the AI might think they're different things.
Front-load important information. The beginning of the document gets disproportionate weight.

Appendix: source material by value

Use this to guide your triage decisions in Stage 1.

High-value (Tier 1)

Journals and daily logs — the single best source. Raw, honest, timestamped, full of decisions and reflections.
Annual reviews / retrospectives — pre-compressed life summaries. If you write these, prioritise them.
Personal notes and reflections — anything where you wrote about yourself, your goals, your thinking.
Project documentation you wrote — READMEs, briefs, plans, post-mortems.
Blog posts and published writing — your public voice reveals how you think.

Medium-value (Tier 2)

CVs and LinkedIn — career facts, dates, companies, roles.
Old emails (especially sent) — reveal your voice, priorities, and relationships.
Note-taking app exports — Evernote, Notion, Obsidian, Apple Notes.
Social media archives — Twitter/X exports, Facebook data downloads.
Chat histories — selectively useful for identifying projects and relationships.

Low-value (Tier 3)

Admin and financial records — useful for timeline anchoring ("I was paying rent in Berlin in 2019") but not personal context.
Reference material — manuals, guides, saved articles. What you saved reveals interests, but the content isn't about you.
Course materials — certificates and syllabi say what you studied, not who you became.

Skip (Tier 4)

Binary files, images, videos (unless meaningfully named)
Build artifacts, caches, .DS_Store, node_modules
Exact duplicates (common with Evernote imports)
Someone else's content you happen to have saved
Auto-generated boilerplate

Building your me.md

Before you start

What you need

Organise first (a little)

Time and cost

Folder Annotation

The problem

The technique

Sample prompt

Inventory & Triage

The technique

What to skip

Sample prompt

Folder-Level Extraction

The technique

For Tier 1 content (thorough processing)

For Tier 2 content (fact extraction)

For Tier 3 content (skim)

Parallelisation

Sample prompt

Thematic Synthesis

The problem

The technique

Key challenges

Sample prompt

Final Composition

The technique

Sample prompt

After the pipeline

Review and correct

Where to put it

Keeping it current

Writing principles for your prompts

Specificity over generality

The why, not just the what

Evolution, not just current state

Honest proficiency levels

Tensions and contradictions

Write for an AI audience

Appendix: source material by value

High-value (Tier 1)

Medium-value (Tier 2)

Low-value (Tier 3)

Skip (Tier 4)

Building your `me.md`