[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5256) encoding of guide.sdf



hyc@symas.com writes:
> I can't think of a grep regex that will detect 8-bit characters, but
> you need to find them all and get rid of them.

grep -v '[^ -~<tab>]'

where you type ^V tab to the shell to get the <tab>.

To search for probably-UTF-8 chars:

perl -ne '/[\300-\377][\200-\277]/ && print "$ARGV:$_"'

To search for probably-not-UTF-8 8-bit chars:

perl -ne '/[\300-\377](?![\200-\277])|(^|[^\200-\377])[\200-\277]/ && print "$ARGV:$_"'

-- 
Regards,
Hallvard