dealing with duplicates

rm -f $HOME/msgid.cache

This rule uses formail to remove emails that have a message-id that has already passed through the system. It keeps an 8K log. “W” waits until it gets the exit code from formail and filters if appropriate. the ‘h’ means pipe the headers only. it uses a user-defined lockfile for this task.

:0 Wh: msgid.lock
| formail -D 8192 $HOME/.msgid.cache

mbox re-processing

we need to split the messages with formail -s

mv -i mbox mbox.tmp
formail -s procmail < mbox.tmp && rm -f mbox.tmp

maildir re-processing

no need for Formail here

find folder/tmp/ -type f #should be empty
procmail < folder/new/message && rm -f folder/new/message
procmail < folder/cur/message && rm -f folder/cur/message
rm -rf folder/


maildir2mbox conversion

we need to add ^From and Formail does it so well compared to procmail -f -

find folder/tmp/ -type f #should be empty
formail < folder/new/message | procmail && rm -f folder/new/message
formail < folder/cur/message | procmail && rm -f folder/cur/message
rm -rf folder/

mbox2split conversion

mbox to Maildir or MH

formail -s procmail < mbox && rm -f mbox


splitting files,

grep '^Return-Path: ' brokenmbox | wc -l 

that number minus 2 (think about it),

csplit -n 4 -s brokenmbox /^Return-Path:/ {901} 

find files about 10MB,

find . -type f -size +$((1024 * 1024 * 10))c             

move all matching files to another folder,

find .lists.* -type f -print0 | xargs -0 -I file mv file tosort/
rm -rf .lists.*


re-processing / duplicates


