-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #11 from logmanager-oss/implement_data_generation_…
…and_regexp_scanning implement regex scan and data generation
- Loading branch information
Showing
19 changed files
with
268 additions
and
54 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -63,10 +63,12 @@ Usage of ./logveil: | |
|
||
### How it works | ||
|
||
**This is only a simplified example and does not match 1:1 with how anonymization is actually implemented** | ||
|
||
Consider below log line. It is formatted in a common `key:value` format. | ||
|
||
``` | ||
{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip":"89.239.31.49", "username":"[email protected]", "organization":"TESTuser.test.com"} | ||
{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip":"89.239.31.49", "username":"[email protected]", "organization":"TESTuser.test.com", "mac": "71:e5:41:18:cb:3e"} | ||
``` | ||
|
||
First, LogVeil will load anonymization data from supplied directory (`-d example_anon_data/`). Each file in that folder should be named according to the values it will be masking. For example, lets assume we have following directory structure: | ||
|
@@ -80,32 +82,44 @@ Next, LogVeil will go over each log line in supplied input and extract `key:valu | |
2. `"src_ip":"89.239.31.49"` | ||
3. `"username":"[email protected]"` | ||
4. `"organization":"TESTuser.test.com"` | ||
5. `"mac": "71:e5:41:18:cb:3e"` | ||
|
||
Then, LogVeil will try to match extracted pairs to anonymization data it loaded in previous step. Two paris should be matched: | ||
|
||
1. `"username":"[email protected]"` with `username.txt` | ||
2. `"organization":"TESTuser.test.com"` with `organization.txt` | ||
1. `"src_ip":"89.239.31.49"` with `src_ip.txt` | ||
2. `"username":"[email protected]"` with `username.txt` | ||
3. `"organization":"TESTuser.test.com"` with `organization.txt` | ||
|
||
And one pair should be matched by regular expression scanning: | ||
|
||
Now LogVeil will grab a random values from files which filenames matched with keys and replace original values with them. Outcome should look like this: | ||
1. `"mac": "71:e5:41:18:cb:3e"` | ||
|
||
1. `"username":"ladislav.dosek"` | ||
2. `"organization":"Apple"` | ||
Now LogVeil will grab values (randomly) from files which filenames matched with keys, generate new value for `mac` key and create a replacement map in format `"original_value":"new_value"`: | ||
|
||
And thats it. Now anonymized log can be written to output along with anonymization proof: | ||
1. `"89.239.31.49":"10.20.0.53"` | ||
1. `"[email protected]":"ladislav.dosek"` | ||
2. `"TESTuser.test.com":"Apple"` | ||
3. `"71:e5:41:18:cb:3e": "0f:da:68:92:7f:2b"` | ||
|
||
Now each element from the above list will be iterated over and compared against log line. Whenever `original_value` is found it will be replaced with `new_value`. Outcome should look like this: | ||
|
||
``` | ||
{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip":"89.239.31.49", "username":"ladislav.dosek", "organization":"Apple"} | ||
{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip":"10.20.0.53", "username":"ladislav.dosek", "organization":"Apple", "mac": "0f:da:68:92:7f:2b"} | ||
``` | ||
|
||
``` | ||
{"original": "27.221.126.209", "new": "10.20.0.53"}, | ||
"{"original":"[email protected]","new":"ladislav.dosek"}" | ||
"{"original":"TESTuser.test.com","new":"Apple"}" | ||
{"original": "71:e5:41:18:cb:3e", "new": "0f:da:68:92:7f:2b"}, | ||
``` | ||
|
||
### Anonymization data | ||
|
||
Each `key:value` pair which you want to anonymize data must have its equivalent in anonymization data folder. | ||
|
||
If anonymization data does not exist for any given `key:value` pair then LogVeil will attempt to use regular expressions to match and replace common values such as: IPv4, IPv6, MAC, Emails and URLs. | ||
|
||
For example, if you want to anonymize values in `organization` and `username` keys, you need to have two files of the same name in anonymization folder containing some random data. | ||
|
||
### Output | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,10 @@ | ||
package anonymizer | ||
|
||
import ( | ||
"math/rand" | ||
"testing" | ||
|
||
"github.com/go-faker/faker/v4" | ||
"github.com/logmanager-oss/logveil/internal/config" | ||
"github.com/logmanager-oss/logveil/internal/proof" | ||
"github.com/stretchr/testify/assert" | ||
|
@@ -18,8 +20,18 @@ func TestAnonimizer_AnonymizeData(t *testing.T) { | |
{ | ||
name: "Test AnonymizeData", | ||
anonymizingDataDir: "../../tests/data/anonymization_data", | ||
input: map[string]string{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip": "10.10.10.1", "username": "miloslav.illes", "organization": "Microsoft", "raw": "2024-06-05T14:59:27.000+00:00, 10.10.10.1, miloslav.illes, Microsoft"}, | ||
expectedOutput: "2024-06-05T14:59:27.000+00:00, 10.20.0.53, ladislav.dosek, Apple", | ||
input: map[string]string{ | ||
"@timestamp": "2024-06-05T14:59:27.000+00:00", | ||
"src_ip": "10.10.10.1", | ||
"src_ipv6": "7f1d:64ed:536a:1fd7:fe8e:cc29:9df4:7911", | ||
"mac": "71:e5:41:18:cb:3e", | ||
"email": "[email protected]", | ||
"url": "https://www.testurl.com", | ||
"username": "miloslav.illes", | ||
"organization": "Microsoft", | ||
"raw": "2024-06-05T14:59:27.000+00:00, 10.10.10.1, 7f1d:64ed:536a:1fd7:fe8e:cc29:9df4:7911, miloslav.illes, Microsoft, 71:e5:41:18:cb:3e, [email protected], https://www.testurl.com", | ||
}, | ||
expectedOutput: "2024-06-05T14:59:27.000+00:00, 10.20.0.53, 8186:39ac:48a4:c6af:a2f1:581a:8b95:25e2, ladislav.dosek, Apple, 0f:da:68:92:7f:2b, [email protected], http://soqovkq.com/NfkcUjG.php", | ||
}, | ||
} | ||
|
||
|
@@ -31,6 +43,7 @@ func TestAnonimizer_AnonymizeData(t *testing.T) { | |
} | ||
// Disabling randomization so we know which values to expect | ||
anonymizer.SetRandFunc(func(int) int { return 1 }) | ||
faker.SetRandomSource(rand.NewSource(1)) | ||
output := anonymizer.Anonymize(tt.input) | ||
|
||
assert.Equal(t, tt.expectedOutput, output) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
package generator | ||
|
||
import ( | ||
"github.com/go-faker/faker/v4" | ||
) | ||
|
||
type Generator struct{} | ||
|
||
func (g *Generator) GenerateRandomIPv4() string { | ||
return faker.IPv4() | ||
} | ||
|
||
func (g *Generator) GenerateRandomIPv6() string { | ||
return faker.IPv6() | ||
} | ||
|
||
func (g *Generator) GenerateRandomMac() string { | ||
return faker.MacAddress() | ||
} | ||
|
||
func (g *Generator) GenerateRandomEmail() string { | ||
return faker.Email() | ||
} | ||
|
||
func (g *Generator) GenerateRandomUrl() string { | ||
return faker.URL() | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
package lookup | ||
|
||
const ( | ||
Ipv4Pattern = "((25[0-5]|(2[0-4]|1\\d|[1-9]|)\\d)\\.?\b){4}" | ||
Ipv6Pattern = "(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))" | ||
MacPattern = "([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})" | ||
EmailPattern = "[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*" | ||
UrlPattern = "https?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{1,256}\\.[a-zA-Z0-9()]{1,6}\\b([-a-zA-Z0-9()@:%_\\+.~#?&//=]*)" | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
package lookup | ||
|
||
import ( | ||
"regexp" | ||
) | ||
|
||
type Lookup struct { | ||
ValidIpv4 *regexp.Regexp | ||
ValidIpv6 *regexp.Regexp | ||
ValidMac *regexp.Regexp | ||
ValidEmail *regexp.Regexp | ||
ValidUrl *regexp.Regexp | ||
} | ||
|
||
func New() *Lookup { | ||
return &Lookup{ | ||
ValidIpv4: regexp.MustCompile(Ipv4Pattern), | ||
ValidIpv6: regexp.MustCompile(Ipv6Pattern), | ||
ValidMac: regexp.MustCompile(MacPattern), | ||
ValidEmail: regexp.MustCompile(EmailPattern), | ||
ValidUrl: regexp.MustCompile(UrlPattern), | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.