Preventing People from Posting Personally Identifying Information and Private System Information

We are concerned with people posting personally identifying information (full names, addresses, email, phone numbers, etc), and information about their system that might lead to a security vulnerability (passwords, usernames, ips, file system info, etc).

I feel we have converged on a reasonable system, explained a bit more below, but I'd like to open a discussion here.

In particular, I'd love to hear any advice for uses of discourses automated system to auto-flag PII and security info we should be concerned with.

System Information that Might Lead to a Security Vulnerability

Currently, we mostly rely on people's own good judgment not to reveal information about their systems that lead to a security vulnerability. RStudio commercial customers are encouraged to work with RStudio's excellent direct support folks, and most know what not to share on a public forum.
However, mistakes happen. Many community users detect, flag, and hide these kinds of issues.

Personally Identifiable Information (PII)

We have privacy policy listed here: Privacy - Posit Community
One key point is that it is against policy to post PII in topics threads and reply/posts. If you'd like PII out there, please do so in your profile or in (semi-private) direct messages.

The idea being that if you one day later wish reconsider the amount of PII on this site, a profile is a single point of control. Whereas going back over all your previous posts is a tedious process.

Discourse features

Discourse: Actions on Watched Words
We can setup watch words and regular expressions that are automatically cencored, or that automatically require review before becoming public.

We currently cencor many curse words blocked, for example ■■■■, ■■■■, and ■■■■.

We also auto-hide for review the following regular expressions

IP: ([0-9]{1,3}\.?){4}
Phone: \d{3}-\d{4}
email: [\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+

We would love to help people use this forum safely, so let me know if you have advice on how to improve the current policy. And if you'd suggest other regex to auto-hide.

2 Likes

I don't know if it is practical (or a likely mistake), but RSA private keys? And I don't know if multiple line regexps are a good idea (or if I indeed wrote usable regexp).

RSA: -----BEGIN RSA PRIVATE KEY-----.*-----END RSA PRIVATE KEY-----