12 min read

Building a tiny GDPR troublemaker

Building a tiny GDPR troublemaker

I promised a post about Mailcow. It will happen. I have notes, screenshots, and containers that has behaved just enough to be dangerous. Then I had an idea that would not let go. A small program that exercises GDPR for real. Not as a policy page. As a thing you run on a Tuesday night with coffee and a dog on the floor.

This post is about why GDPR is good, why it is underused, and how I plan to test it at scale with a stubborn little Python script.

Why GDPR is good

GDPR is not a pop-up about cookies. It is a power shift. It puts me in the driver seat and forces companies to explain themselves. What they store. Why they store it. How long it lives. Who they share it with. Where it goes. When it must be deleted. What happens when something leaks. That is not fluff. That is accountability you can act on.

People do not talk about this enough. We argue about banners and forget the part where you can ask for a copy of your data, get it electronically, correct it, erase it, pause processing, move it elsewhere, say no to profiling and ads, and ask for a human when a machine is making decisions about you. There is also a real authority that can tell companies to fix things and, when needed, fine them. In Sweden that is Integritetsskyddsmyndigheten (Swedish Authority for Privacy Protection).

Rights that sit in a drawer are not rights. So I am taking them out.

Small but important nuance: when I ask electronically, they should answer electronically where possible, in a commonly used format. The first copy is free. A controller may charge a reasonable fee only for additional copies, or refuse/charge if a request is manifestly unfounded or excessive. That’s the line I won’t cross, I want to test the law, not abuse it.

What GDPR actually gives me

Short list. All of these are real and usable.

  • Access. I can ask for a copy of my data. I also get the context. Purpose, categories, recipients, transfers, retention, source, and whether automated decisions are involved. First copy is free. If I ask electronically, I should get an electronic answer in a common format.
  • Rectification. Wrong becomes right. If something is incorrect, they have to fix it and tell relevant recipients.
  • Erasure. Data must go when it is no longer needed, when I withdraw consent, when the processing is unlawful, or when I have a valid objection. There are legal exceptions like accounting.
  • Restriction. Freeze the use of my data while accuracy or legality is disputed. You can keep it, you do not get to keep using it.
  • Portability. Give me the data I provided in a structured and machine readable format. To me or directly to another service if feasible.
  • Objection. I can say no to direct marketing. Full stop. I can also object to processing based on legitimate interest. Then you need strong reasons to continue.
  • Automated decisions. I can refuse being subject to a decision made only by a machine that has legal or similar effects. I can ask for a person and an explanation.
  • Breach duties. If things spill, there are timelines and sometimes a duty to tell me.

If you only remember one thing: this is not a favor. It is mandatory.

What companies must actually do

There is a lot of talk. Here is the do.

  • Answer within one month. If more time is needed, tell me within that month and why.
  • Provide the first copy for free. They can only charge for additional copies, or refuse/charge if a request is manifestly unfounded or excessive.
  • Answer electronically where possible when I ask electronically. Use a commonly used format (PDF/CSV), not a dead portal.
  • Keep identity checks proportionate. No passport selfie if an email confirmation is enough.
  • Fix, erase, restrict when conditions are met; inform recipients where relevant.
  • Stop direct marketing when I object. No debate.
  • Data minimisation and sane defaults. Collect less, keep it for less time, show your homework.
  • Respect the authority. Integritetsskyddsmyndigheten can order changes and impose penalties.

Again. Not optional.

The idea that grabbed me

I want to see how much data companies actually have about me. Not the slogan. The rows. The fields. The log entries. I want to see if processes work when a normal person uses them. The answer is to automate the boring parts and keep the control in my hands.

So I am writing a small local program. Think of it as a transparency bot that lives on my laptop.

Reality check: I just started. I have a game plan, a repo skeleton, and Mailcow IMAP talking to me, and that’s about it. This might never turn into a polished tool. I’ve built similar “scan-my-mail-then-ask-AI” things before. One of them was a spam classifier that confidently labeled everything as spam. I’m not a programmer. I’m a guy with ideas and a Labrador. We’ll see if this one becomes real.

I won’t spam controllers. One request per controller every 6–12 months (unless something material changes). The goal is transparency, not denial-of-service.

What it does

  1. Discover
    It scans my own IMAP mailbox locally. Receipts. Order confirmations. Support threads. It extracts company names and domains. It resolves them to a clean list of controllers. I review and approve. Nothing leaves my machine. Messages are synced incrementally by UID. Raw .eml and a SHA-256 digest are stored locally so I can prove what was sent and received. If it crashes on the first “order confirmation,” that’s expected at this stage. Pipes first, polish later.
  2. Request
    It generates the right letter in Swedish or English. It asks for an electronic answer in a common format. It includes a short appendix with direct quotes from the law. Not my opinion. The text. At the bottom. Outgoing mail is signed (SPF/DKIM/DMARC via Mailcow). Identity data is minimum necessary and explained; if more is needed, they must say why.
  3. Forward to me
    Every answer is forwarded to me in the exact shape I requested. My script watches the mailbox for replies and attachments. It checks that delivery and format match what was promised. The bot validates attachments (PDF/CSV), rejects passworded ZIPs without a safe pass hand-off, and flags “please log in to our portal” as portal-only.
  4. Be annoying on purpose
    If a reply is late or incomplete, the program nudges. If the nudge is ignored, it prepares a complaint to Integritetsskyddsmyndigheten with a full bundle. Timeline. Headers. Letters. Attachments. I click approve. Off it goes. Escalation is never automatic. Low-confidence cases pause for human review. Complaints include a tidy bundle: originals, headers, timelines, and hashes.
  5. Do the rest of the rights
    Not just access. It can send rectification, erasure, restriction, objection, and portability requests. Same pattern. Clear letter. Law at the bottom. Deadlines tracked. Evidence bundled. Each right has its own checklist and template. The bot shows which legal elements were satisfied and which are missing before I send anything.

Example of output:

# =========================================================
# GDPR Transparency Bot — Sample Letters (all-in-one)
# Paste as a single code block under your blog post.
# Bracketed fields [like this] are placeholders.
# =========================================================

# Access request (Art. 15) + marketing objection (Art. 21)

Subject: GDPR Art. 15 access request (electronic) + Art. 21 objection to direct marketing

Hello,

I am exercising my GDPR rights using a local transparency tool that tracks deadlines and reply quality.
Please provide access to my personal data under Article 15 and stop processing for direct marketing,
including any related profiling, under Article 21. Please confirm suppression of my contact details from all
marketing lists.

Please reply electronically in a commonly used format (PDF or CSV). The first copy is free.

Identification: [full name], [email], [address]. If you need more to verify identity, please state the minimum required and why.

The deadline is one month from receipt. If the deadline is missed or the reply is incomplete, I will file a complaint
with Integritetsskyddsmyndigheten.

Legal excerpt: Art. 12(3), 12(5), 15(1-3), 21(2-3).

# ---------------------------------------------------------

# Friendly reminder (pre-deadline)

Subject: Reminder — GDPR Art. 15 access request

Hello,

On [date] I sent an access request. The one-month deadline is approaching.
Please reply electronically in a commonly used format.

For clarity, this request is tracked by my local transparency tool.

Legal excerpt: Art. 12(3).

# ---------------------------------------------------------

# Overdue notice + intent to complain

Subject: Overdue — GDPR Art. 15 access request

Hello,

More than one month has passed since my request on [date]. Please deliver the reply immediately or explain any extension,
which should have been communicated within one month.

If a complete reply does not arrive, I will file a complaint with Integritetsskyddsmyndigheten.

Legal excerpt: Art. 12(3), 77.

# ---------------------------------------------------------

# Completion request (incomplete reply)

Subject: Completion requested — GDPR Art. 15

Hello,

Thank you for your reply. It is missing mandatory elements under Article 15, including one or more of:
- purposes of processing
- categories of personal data
- recipients and any transfers outside the EU
- retention periods
- source where not collected from me
- existence of automated decisions or profiling
- a copy of my personal data under processing

Please complete these items and provide the copy electronically.

Legal excerpt: Art. 15(1), 15(3).

# ---------------------------------------------------------

# Rectification (Art. 16)

Subject: GDPR Art. 16 rectification request

Hello,

Please correct the following inaccurate data: [describe]. Where applicable, inform relevant recipients of the rectification
and confirm completion.

Legal excerpt: Art. 16, 19.

# ---------------------------------------------------------

# Erasure (Art. 17)

Subject: GDPR Art. 17 erasure request

Hello,

Please erase my personal data. Legal ground: [no longer necessary] / [consent withdrawn] / [unlawful processing] /
[successful Art. 21 objection]. If an exception applies, please specify it clearly. Confirm erasure and notify relevant recipients.

Legal excerpt: Art. 17(1), 19.

# ---------------------------------------------------------

# Restriction (Art. 18)

Subject: GDPR Art. 18 restriction request

Hello,

Please restrict processing while accuracy or lawfulness is assessed. Store but do not use the data until resolved.
Confirm activation and lifting of the restriction.

Legal excerpt: Art. 18.

# ---------------------------------------------------------

# Objection — direct marketing (Art. 21)

Subject: GDPR Art. 21 objection — direct marketing

Hello,

I object to processing for direct marketing, including related profiling. Stop this processing and confirm suppression
of my contact details from marketing lists.

Legal excerpt: Art. 21(2)-(3).

# ---------------------------------------------------------

# Data portability (Art. 20)

Subject: GDPR Art. 20 data portability request

Hello,

Please provide the personal data I have provided to you in a structured, commonly used and machine-readable format,
electronically to this address. If feasible, also transmit it directly to: [recipient].

Legal excerpt: Art. 20(1)-(2).

# ---------------------------------------------------------

# Proportional identity verification

Subject: Identity verification — proportionality request

Hello,

You requested [type of ID]. Please explain why this level of identification is necessary for this specific request and confirm
the minimum information you require. I can provide a less intrusive proof if sufficient.

Legal excerpt: Art. 5(1)(c), 12(6).

# ---------------------------------------------------------

# Complaint to Integritetsskyddsmyndigheten (IMY)

Subject: Complaint — failure to comply with GDPR request

To Integritetsskyddsmyndigheten,

I am submitting a complaint regarding [controller name]. I sent a GDPR request on [date] and the controller has
[failed to reply within one month / provided an incomplete reply / refused to answer electronically /
requested disproportionate identification / refused to act on erasure/restriction/objection].

Timeline:
- [date]: original request sent (copy attached)
- [date]: reminder sent (copy attached)
- [date]: overdue notice sent (copy attached)
- [date]: reply received (if any) — summary: [brief]
- Today: complaint submitted

Evidence attached: original request, reminders, overdue notice, any replies and attachments, headers, and metadata.
I ask IMY to review compliance and order corrective action where appropriate.

Regards,
[name], [email], [phone], [address]

Under the hood (plan + current chaos)

First, honesty: it does not work yet. I have a plan, some pipes, and a crash log that says my 03:00 regexes were optimistic.
I am still sketching and wiring. This is how it will work, and where it face-plants today.

What comes in

  1. IMAP fetch (TLS). Incremental by UID. Store raw .eml + SHA-256.
  2. Normalise: PDF→text (pdfminer), OCR (Tesseract), DOCX/HTML/CSV→text.
  3. Flag passworded ZIPs and “portal-only” replies early.
  4. Language guess: sv/en; others = manual review.

What it looks for in replies

A checklist mapped to the rights. Each item becomes hit, weak, or missing.

  • Purposes of processing
  • Categories of data
  • Recipients and any third-country transfers
  • Retention periods
  • Source if not collected from me
  • Automated decisions or profiling
  • A real copy of my data (not just a description)
  • Delivery and format meet electronic, common format
  • Deadline handled and any extension explained in time
  • For marketing: explicit suppression confirmed
  • Data copy present (rows/fields/IDs), not just a narrative
  • Marketing suppression confirmed (exact wording captured)

Rules first, local AI second

  • Rules catch the obvious
    Regex, keyword lists, table detection, CSV sanity checks, portal links, passworded files.
  • Local LLM (Llama) helps when text is messy
    It classifies and extracts, e.g.
    • Is there an actual data copy or only a summary
    • Is “as long as necessary” a valid retention period or fluff
    • Did they honour an Art. 21 marketing objection
    • Is the ID check proportionate for this request
  • Decision = rules + a small LLM adjustment
    The model can only nudge. No free rein.
  • The Llama model can only nudge ±10 points on the rules score. It cannot override hard fails (e.g., no attachment, portal-only).

Output with reasons

Each case gets a status and confidence, plus short pointers to the lines that triggered it.

Every flag links to the exact line in the reply that triggered it no black boxes, just receipts.

[controller=ACME AB] reply=2025-08-13T10:14Z
format=PDF ok | data_copy=present | purposes=present | recipients=missing
retention=present("36 months") | source=missing | profiling=none stated
marketing=confirmed suppressed | decision=INCOMPLETE | confidence=0.91
reason: missing recipients + missing source | action: send completion letter

If confidence drops below ~90, it pauses for me to decide before any escalation.

It will not be foolproof (by design)

Expected pain:

  • Bad OCR on scanned images
  • Passworded ZIP or “please log in to our portal”
  • Replies in a third language or with vague wording
  • CSV that is not personal data at all
  • Over-the-top ID demands
  • When the bot can’t be confident, it stops. I click. I own the decision, not the model.

When that happens the bot stops and asks. Human in the loop before complaints.

Local Llama plan

  • Runs via llama.cpp or Ollama on my machine
  • The model runs locally and only sees extracted text. No mail credentials or raw inbox are passed to the model.
  • Tasks are small: classify, extract, label
  • Boring prompts on purpose

Example prompt it will use:

System: You label GDPR replies for specific items.
User: Given this text, answer JSON with keys:
{ "has_data_copy": true|false,
  "retention_ok": true|false,
  "recipients_present": true|false,
  "marketing_suppressed": true|false,
  "notes": "short reason" }
Text:
<<<controller reply text here>>>

Signals and planned weights

+20  attachment looks like CSV or table-like PDF
+15  explicit “we have suppressed your contact for marketing”
+15  retention includes a concrete period or rule
+15  recipients list names at least one party
+10  “copy of your personal data” plus unique rows/IDs
 -5  vague “as long as necessary” with no rule
-10  portal link required to view data
-15  passworded ZIP without pass transfer method
-20  no attachment and no data fields in body
Decision = rules sum + LLM adjustment in [-10, +10]
# Planned JSON label from the model
{
  "has_data_copy": true,
  "recipients_present": false,
  "retention_ok": true,
  "marketing_suppressed": true,
  "notes": "missing recipients; copy present in CSV"
}

Security and privacy

  • Everything lives under ~/.gdprbot/, encrypted at rest
  • No analytics, no telemetry
  • Attachments opened in a sandbox
  • No automatic link following
  • Local vault purges bundles after 12 months by default (configurable).
  • Attachments open in a network-isolated sandbox; no external fetches.

UI plan

Three views. Inbox to pick messages, Checklist to see hits and misses, Next step to send the right letter.

Export ICS for deadlines.

Dry-run mode: generate drafts only, nothing is sent until I click.

Escalation to Integritetsskyddsmyndigheten always needs a manual click.

Progress today (yes, it already talks to Mailcow)

imap: connecting to mail.example.tld:993 (TLS)  # Mailcow, I know, that post is coming
auth: OK as [email protected]
mailbox: INBOX selected (34219 messages)
scan: reading the latest 500…

hit: "Order confirmation" — candidate controller => ACME AB <[email protected]>

Traceback (most recent call last):
  File "gdprbot/worker.py", line 214, in process_message
    controller = match.group("controller").strip()
AttributeError: 'NoneType' object has no attribute 'group'

status: connected ✅  parsed ✅  crashed on first hit ❌
todo: stop trusting 03:00 regexes; write a real extractor; add try/except + fallback to Llama.
note: last time I let an LLM classify my inbox, it decided 100% of messages were spam.
plan: guard rails first, models second.

Next fixes

  • Replace brittle regex with NER + domain catalog
  • Wrap extractor in try/except and fall back to Llama label
  • Add unit tests with synthetic PDFs/CSVs
  • Add DKIM signing + bounce handling

Summary

This is not legal advice, not a name-and-shame engine, and not a SaaS. It’s a local tool for ordinary people to use ordinary rights, with receipts.

This is plumbing, not magic. Rules do the heavy lifting. A small local Llama cleans up fuzzy language. The bot shows its work and asks before doing anything loud. When it actually runs, it should make GDPR rights boring to use. Boring is good. Boring gets processed.

Tone and ethics

Polite. Firm. Predictable. I am not naming and shaming support teams. I want repeatable process and better defaults. If you handle GDPR well, perfect. If you do not, the authority gets a neat package and you get a clear signal to fix your flow.

Why build this

Because the law is only useful if people use it. Because I would like proof rather than slogans. Because it should not take a lawyer and a week of your life to ask for your own data. Because I want to measure what actually happens when you press the buttons. Do the right mailboxes work. Do the formats make sense. Do companies understand what a copy of personal data is. Do they erase when they should.

Also because I like automation and I like dogs. Anouk is my Labrador and Chief Security Officer. She barks at blinking USB sticks. She approves of boring compliance that arrives on time.

What happens next

I’m not a programmer by trade. I tinker. If you’re reading this and you build better extractors, safer parsers, or nicer UIs feel free to steal, fork, or correct me loudly when I put this on git.

I will run ten cases end to end. Access, rectification, erasure, restriction, objection, portability. I will fix the tool based on what breaks. Then I will run ten more. I will write about patterns, not people.

Mailcow friends. I have not forgotten you. That post lands when the recipes are solid and survive a messy Sunday.

Me. A laptop. Some law. A transparency bot that gets the answers I am owed. And if it does not, I snitch. Because a law that is not followed is just text on a web page.

MVP Sprint (if this ever ships)

Day 1–2 IMAP UID sync + raw .eml + hashing, put this github
Day 3–4 Normalizers (PDF/OCR/CSV/HTML)
Day 5–6 Controller catalog + domain resolution
Day 7–8 Art. 15/21 templates + Mailcow DKIM
Day 9–10 Rules engine v1 + confidence + UI (Inbox/Checklist/Next)
Day 11–12 Bundle exporter (ZIP+PDF) + ICS deadlines
Day 13–14 Test corpus + crash fixes + write-up