Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selecting columns for a list breaks multiple matches #2

Open
bobpaul opened this issue Jul 11, 2017 · 3 comments
Open

Selecting columns for a list breaks multiple matches #2

bobpaul opened this issue Jul 11, 2017 · 3 comments

Comments

@bobpaul
Copy link
Contributor

bobpaul commented Jul 11, 2017

Maybe this can already be done and I'm just not getting it, but here's a contrived example to illulstrate.

Let's say I have some output of ps aux which looks like this:

$ ps aux 
message+   792  0.0  0.0  42892  3672 ?        Ss   11:33   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root       839  0.0  0.1 274488  5924 ?        Ssl  11:33   0:00 /usr/lib/accountsservice/accounts-daemon
daemon     846  0.0  0.0  26044  2064 ?        Ss   11:33   0:00 /usr/sbin/atd -f
root      1003  0.0  0.0  13376   168 ?        Ss   11:33   0:00 /sbin/mdadm --monitor --pid-file /run/mdadm/monitor.pid --daemonise --scan --syslog
bobpaul     1318  0.0  0.1  21516  5224 pts/0    Ss   11:37   0:00 -bash
bobpaul     1339  0.0  4.5 676188 183092 ?       Ssl  11:38   0:18 emacs --daemon
bobpaul     1499  0.0  0.1  21568  5504 pts/1    Ss+  11:48   0:00 -bash
bobpaul     1512  0.0  0.1  21480  5420 pts/2    Ss+  11:48   0:00 -bash
bobpaul     2635  0.0  0.0  12944   936 pts/0    R+   19:03   0:00 grep --color=auto -e daemon -e bash
bobpaul     2636  0.0  0.0  21516  2104 pts/0    D+   19:03   0:00 -bash
$ 

Now, for all lines that contain bash I want to print the 5th column. For all lines that contain daemon I want to print the 2nd column. This can be done in awk like:

$ ps aux | awk '/daemon/ { print $2 } /bash/ { print $5 }'
792
839
846
1003
21516
1339
21568
21480
2635
12944
21516
$ 

So I try it to incrementally build the command with pyp... I start by matching both conditions, which after a bit of messing around, I figured out I could do with 'or'. (Maybe this is already abusive.)

$ ps aux | pyp "p.re('.*daemon.*').split() or p.re('.*bash.*').split()"
[[0]message+[1]792[2]0.0[3]0.0[4]42892[5]3672[6]?[7]Ss[8]11:33[9]0:00[10]/usr/bin/dbus-daemon[11]--system[12]--address=systemd:[13]--nofork[14]--nopidfile[15]--systemd-activation]
[[0]root[1]839[2]0.0[3]0.1[4]274488[5]5924[6]?[7]Ssl[8]11:33[9]0:00[10]/usr/lib/accountsservice/accounts-daemon]
[[0]daemon[1]846[2]0.0[3]0.0[4]26044[5]2064[6]?[7]Ss[8]11:33[9]0:00[10]/usr/sbin/atd[11]-f]
[[0]root[1]1003[2]0.0[3]0.0[4]13376[5]168[6]?[7]Ss[8]11:33[9]0:00[10]/sbin/mdadm[11]--monitor[12]--pid-file[13]/run/mdadm/monitor.pid[14]--daemonise[15]--scan[16]--syslog]
[[0]bobpaul[1]1318[2]0.0[3]0.1[4]21516[5]5224[6]pts/0[7]Ss[8]11:37[9]0:00[10]-bash]
[[0]bobpaul[1]1339[2]0.0[3]4.5[4]676188[5]183092[6]?[7]Ssl[8]11:38[9]0:18[10]emacs[11]--daemon]
[[0]bobpaul[1]1499[2]0.0[3]0.1[4]21568[5]5504[6]pts/1[7]Ss+[8]11:48[9]0:00[10]-bash]
[[0]bobpaul[1]1512[2]0.0[3]0.1[4]21480[5]5420[6]pts/2[7]Ss+[8]11:48[9]0:00[10]-bash]
[[0]bobpaul[1]2635[2]0.0[3]0.0[4]12944[5]936[6]pts/0[7]R+[8]19:03[9]0:00[10]grep[11]--color=auto[12]-e[13]daemon[14]-e[15]bash]
[[0]bobpaul[1]2636[2]0.0[3]0.0[4]21516[5]2104[6]pts/0[7]D+[8]19:03[9]0:00[10]-bash]
$

Good so far. And grab the columns (remember awk is 1 indexed, python is 0):

$ ps aux | pyp "p.re('.*daemon.*').split()[1] or p.re('.*bash.*').split()[4]"
792
839
846
1003
1339
2635
$ 

Wait, that's not enough results. It's only shows the columns for daemon matches. I think what's happening is the [1] selector must cause the first part to evaluate to True in cases where the regex didn't match (returned None). (None[1] would cause an exception, so part of the exception handling routine must make it always return True).

This becomes apparent if we remove the column selector from the daemon regex:

$ ps | pyp "p.re('.*daemon.*').split() or p.re('.*bash.*').split()[4]"
[[0]message+[1]792[2]0.0[3]0.0[4]42892[5]3672[6]?[7]Ss[8]11:33[9]0:00[10]/usr/bin/dbus-daemon[11]--system[12]--address=systemd:[13]--nofork[14]--nopidfile[15]--systemd-activation]
[[0]root[1]839[2]0.0[3]0.1[4]274488[5]5924[6]?[7]Ssl[8]11:33[9]0:00[10]/usr/lib/accountsservice/accounts-daemon]
[[0]daemon[1]846[2]0.0[3]0.0[4]26044[5]2064[6]?[7]Ss[8]11:33[9]0:00[10]/usr/sbin/atd[11]-f]
[[0]root[1]1003[2]0.0[3]0.0[4]13376[5]168[6]?[7]Ss[8]11:33[9]0:00[10]/sbin/mdadm[11]--monitor[12]--pid-file[13]/run/mdadm/monitor.pid[14]--daemonise[15]--scan[16]--syslog]
21516
[[0]bobpaul[1]1339[2]0.0[3]4.5[4]676188[5]183092[6]?[7]Ssl[8]11:38[9]0:18[10]emacs[11]--daemon]
21568
21480
[[0]bobpaul[1]2635[2]0.0[3]0.0[4]12944[5]936[6]pts/0[7]R+[8]19:03[9]0:00[10]grep[11]--color=auto[12]-e[13]daemon[14]-e[15]bash]
21516
$ 

Now it's returning both matches again, but only selecting columns on the second match.

Am I just approaching this problem the wrong way, or is it not currently possible to replicate the awk code that outputs a different column depending on what within the line matched?

@zenlc2000
Copy link
Owner

zenlc2000 commented Jul 12, 2017

My pyp is a little rusty - I don't get to use it as much as I'd like. I know how I'd do it with vanilla python:

# pseudocode
for line in stdin:
    if "bash" in line:
        print(line.split(' ')[4])
    elif "daemon" in line:
        print(line.split(' ')[1])

I'll have to spend a bit playing with pyp again to see how it would work.

@zenlc2000
Copy link
Owner

I get a little bit closer if I only keep lines containing the strings you want. Now I get blanks for the second re.

$cat pyp_test.txt | ./pyp3 "'bash' in p or 'daemon' in p" | ./pyp3 "p.re('.daemon.').split()[1] or p.re('.bash.').split()[4]"
792
839
846
1003

1339

2635

I still kind of think this can be done as-is. Just need to think it through a bit more.

@bobpaul
Copy link
Contributor Author

bobpaul commented Jul 12, 2017

Oh, you gave me an idea and I got very close:

$ cat ps.txt | python2 pyp3 "keep('daemon') or keep('bash') | p.split()[1] if 'daemon' in p else p.split()[4] if 'bash' in p else ''"
792
839
846
1003
21516
1339
21568
21480
2635
21516
$ cat ps | awk '/daemon/ { print $2 } /bash/ { print $5 }'
792
839
846
1003
21516
1339
21568
21480
2635
12944
21516

The difference is there's one line that contains both bash and daemon. Awk is performing 2 independent IFs, whereas with the pyp statement above it's if-else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants