me.mdA step-by-step guide to the distillation pipeline. This is the process I used to compress ~30,000 files spanning 20 years of my digital life into a single 4,000-word personal context document using Claude.
me.md will be. This works best with hundreds or thousands of files, but even a few dozen will produce useful results.You don't need a perfectly organised archive, but the pipeline works better if your files have some folder structure. If everything is in one flat directory, spend 30 minutes creating rough groupings — by topic, time period, or source. The pipeline processes folders as batches, so structure helps.
Processing a large corpus through an LLM takes time and tokens. For a ~30k file vault, expect the full pipeline to take several hours of AI processing time. You can reduce this significantly by being aggressive at the triage stage — the less you process, the faster and cheaper it is. Start with your highest-value content and expand from there.
This is the most important preparatory step, and the one most people skip. Without it, every downstream agent is guessing about what it's reading.
A folder of markdown files could be your original writing, imported articles, course notes, client deliverables, or reference material. A folder called notes might contain your personal reflections or someone else's lecture notes you saved. Without annotation, AI agents will misattribute content — treating saved articles as your opinions, or reference material as your original work.
Walk through your folder structure and for each major folder (or subfolder where the purpose isn't obvious from the name), create a short annotation file explaining:
You can do this manually, or have AI sample files and propose annotations for you to confirm. The key is that you — the human — verify the annotations, because only you know the provenance of your files.
I'm annotating my personal file archive so that AI agents can correctly interpret the content in later processing stages. For this folder, sample 5-10 files and tell me: 1. What this folder appears to contain 2. Whether it seems like my original writing or imported/reference material 3. The rough time period (if detectable from dates in filenames or content) 4. Relevance to building a personal profile (high/medium/low/skip) Folder path: [path]
After reviewing the AI's assessment, correct anything wrong and save the annotation. If you use Claude Code, a CLAUDE.md file in the folder works perfectly — Claude reads these automatically.
Not all files contribute equally to a personal context document. A journal entry about a life decision is worth a hundred utility bills. This stage builds a prioritised manifest of what to process.
Crawl your file structure and classify every folder (or file group) into tiers:
Be aggressive here. Common things to skip entirely:
.DS_Store, node_modulesI'm building a personal context document — a comprehensive profile of who I am, distilled from my personal files. I need to triage my file archive by relevance. Here's my folder structure with file counts: [paste folder tree with counts] For each folder, classify as: - TIER 1 (process thoroughly): Rich in personal context - TIER 2 (extract facts): Contains useful biographical/professional facts - TIER 3 (skim): Minor relevance, might contain a useful detail - TIER 4 (skip): No personal context value Also flag any folders where you'd need to see sample files before classifying.
This is where the bulk of the work happens. Each folder (or subfolder) is processed as an independent batch, and an AI agent extracts every personal-context fact it can find.
Process each folder from your manifest as a separate AI session. For large folders, break them into sub-batches that fit comfortably in a context window. The key is that each batch is independent — which means you can parallelise aggressively.
Have the agent read every file and extract everything that reveals personal context: facts, opinions, decisions, relationships, skills, timeline events, emotional states, patterns of thought.
Pull out concrete facts: dates, roles, company names, qualifications, locations, tools used, people mentioned. Don't summarise the prose — just extract the data points.
Look at file names, folder structure, and a sample of content. What topics recur? What did you choose to save? This reveals interests without needing to read every file.
If your AI tool supports sub-agents (Claude Code does), launch one agent per folder simultaneously. This is the single biggest time-saver in the pipeline. Instead of processing 15 top-level folders sequentially (hours), process them in parallel (the time of the slowest one).
I'm building a personal context document about myself. You are processing one folder from my personal file archive. Folder: [folder name] Context: [paste the folder's annotation from Stage 0] Read every file in this folder and extract everything that reveals personal context about me: - Biographical facts (birth, education, locations, family) - Professional history (roles, companies, what I actually did) - Projects I've worked on and their outcomes - Skills, tools, and technologies I've used (with rough proficiency) - Beliefs, values, opinions, principles - Interests and passions (especially recurring ones) - Key relationships and collaborations - Timeline events with dates (moves, career changes, milestones) - How I communicate and think (voice, style, decision-making patterns) - Emotional states, struggles, turning points Rules: - Be specific. Include names, dates, places, and concrete details. - Distinguish between MY writing/views and reference material I saved. - If a file is clearly someone else's work, note what it reveals about my interests (I chose to save it) but don't attribute the views to me. - Preserve chronological information — note when things happened. - Include direct quotes where they capture my voice or a strong opinion. - Flag any contradictions or evolution you notice within this folder.
The folder summaries from Stage 2 are organised by source, but your me.md needs to be organised by theme. This stage reorganises, deduplicates, and merges.
The same fact will appear in multiple folder summaries. A project might be mentioned in your journals, your project folder, your CV, and your annual review. A career move might appear in admin records (change of address), personal notes (the decision process), and project files (the work itself). Simply concatenating summaries would produce a repetitive, disorganised mess.
Take all folder summaries and synthesise them into thematic sections. The suggested themes match the target document structure:
I'm building a personal context document. Below are summaries extracted from different folders of my personal file archive. There is significant overlap and repetition across these summaries. Synthesise all of these into a coherent draft for the following section: [SECTION NAME, e.g. "Timeline" or "Career & Professional History"] Rules: - Merge duplicate information. Don't repeat the same fact twice. - When the same topic appears at different time periods, show the evolution chronologically. - Prefer specific details (dates, names, numbers) over vague statements. - Preserve direct quotes that capture my voice or strong opinions. - Flag any contradictions you can't resolve so I can clarify. - Aim for [X] words for this section. - Prioritise information from journals and personal reflections over admin records for subjective content (beliefs, interests, emotions). Source summaries: [paste all folder summaries]
Run this once per thematic section. You can parallelise these too — each section synthesis is independent.
Assemble the thematic sections into a single, coherent document and compress it to fit your context budget.
Feed all section drafts to AI and have it compose the final document. This isn't just concatenation — it's editing for flow, removing cross-section redundancy, and enforcing the word budget.
Below are thematic section drafts for my Personal Context Document (me.md). Assemble these into a single, coherent document. Guidelines: - Target length: 4,000–6,000 words. This is a hard constraint. - Every sentence must earn its place. If removing it wouldn't change how an AI assists me, cut it. - Use clear markdown structure with ## headers for major sections and ### for subsections. - Lead each section with the most important information. - The Timeline section should read as a chronological narrative, not a list of bullet points. - Don't editorialize or add interpretation. Stick to facts and direct quotes from my own writing. - End with a "Current Context" section covering my present situation, active projects, and priorities — this is the most immediately useful section for AI interactions. - Preserve specific details (dates, names, places, numbers) — these are what make the document useful rather than generic. - Where my views or situation have evolved over time, show the trajectory, not just the current state. Section drafts: [paste all thematic section drafts]
me.md — a single markdown document, 3,000–8,000 words, containing a comprehensive personal context profile distilled from your entire digital life.
Read your me.md carefully. With multiple summarisation layers, facts can drift. Check that:
Edit directly. The pipeline gives you a strong first draft; your knowledge of your own life makes it accurate.
CLAUDE.md file.Your me.md will decay. The "Current Context" section goes stale within weeks. A refresh schedule:
You don't need to re-run the full pipeline for routine updates — just edit the relevant sections. Only re-run the pipeline if you've accumulated a large amount of new source material (e.g., a year of journals).
These principles should guide the AI at every stage of the pipeline. Include them in your prompts or in a project-level instruction file.
"Experienced with several programming languages."
"Primary stack: Go and PostgreSQL (7 years). Frontend: Svelte (3 years, previously React). Comfortable with Python for scripting."
Specific details let the AI calibrate. Vague statements tell it nothing.
"Moved to Berlin in 2019."
"Moved to Berlin in 2019 — wanted to be closer to the startup ecosystem after years of remote work."
Motivation reveals values and decision-making patterns.
"I believe in test-driven development."
"Converted to TDD around 2020 after a production incident that tests-first would have caught. Now non-negotiable for anything that handles money."
Evolution shows how you think and learn, not just what you currently believe.
Don't let the AI inflate your expertise. If you're intermediate at something, the document should say so. An AI that thinks you're an expert will skip explanations you need.
Real people are contradictory. You might value minimalism but hoard side projects. You might preach work-life balance but work 60-hour weeks when excited. Instruct the AI to preserve these tensions — they give a more accurate model of how you actually behave.
The document will be consumed by a system that uses it to calibrate responses. That means:
Use this to guide your triage decisions in Stage 1.
.DS_Store, node_modules