Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to show first N lines (--head=N) #359

Open
alexm opened this issue Jun 14, 2022 · 21 comments
Open

Add an option to show first N lines (--head=N) #359

alexm opened this issue Jun 14, 2022 · 21 comments

Comments

@alexm
Copy link

alexm commented Jun 14, 2022

A common use of ack is filtering the output of commands like ps, kubectl, etc. that show column names on the first line:

$ ps -a
    PID TTY          TIME CMD
   8079 tty2     00:02:53 Xorg
   8256 tty2     00:00:00 gnome-session-b
  22672 pts/0    00:00:00 ps

Sometimes it would be useful to show the column names (i.e. usually the first line is enough) when the output meaning is not obvious. This can be easily achieved by adding a match for the first line, e.g.:

$ ps -a | ack '^\s*PID|gnome'
    PID TTY          TIME CMD
   8256 tty2     00:00:00 gnome-session-b

However, an option to always show the first line would be very useful, e.g.

  • ps -a | ack --column-names gnome
  • ps -a | ack --first-line gnome
@petdance
Copy link
Collaborator

petdance commented Jun 14, 2022

This falls outside of what ack is intended to do. ack doesn't know anything about the text it's searching, and parsing that would certainly be the case. Ignore this, I thought you were talking about parsing the heading line.

For something like this, why even use ack vs. grep? There are no advantages to ack over grep, other than the Perl regexes.

If all you really want is the headings from ps, you could call ps twice and use head to get the headings the first time, and then grep your results from the second.

$ ps -a | head -n1 ; ps -a | grep cronolog
  PID TTY          TIME CMD
 8239 pts/14   00:00:00 cronolog
23339 pts/3    00:00:00 cronolog
31527 pts/2    00:00:00 cronolog

@alexm
Copy link
Author

alexm commented Jun 16, 2022

I'm not sure why ack should need to know anything about the text to always show the first line, i.e. print the first line (whatever it contains) and then proceed to filter from line 2 onwards.

Since ack has so many more features than grep, I thought that this could be nice to have too, but I understand your reluctance to add this feature.

Cheers, and thanks for your excellent work!

@alexm alexm closed this as completed Jun 16, 2022
@petdance
Copy link
Collaborator

petdance commented Jun 16, 2022

Hmm, I think I've been mixing up two things. Let's explore this idea.

@petdance petdance reopened this Jun 16, 2022
@petdance
Copy link
Collaborator

So say we have a call like

ack foo --head=1

That means that ack will show the first line that matches, plus any lines that match /foo/. Questions:

  • Do we should the first N lines only if there is a match in the file?
  • Do we show matches in the header lines? If "foo" shows up in the header lines, what do we do?
  • Do we do the --head=1 rule on every file? The scenario you're describing is good for piped-in data, but ack does more than that.

@petdance petdance changed the title Add an option to show column names (i.e. first line) Add an option to show first N lines (--head=N) Jun 16, 2022
@petdance
Copy link
Collaborator

Also, back to my original question: Why use ack at all for this? Why not do:

ps -a | head -n1 ; ps -a | grep cronolog

@n1vux
Copy link
Contributor

n1vux commented Jun 16, 2022

Why use ack at all for this?

because i want Perl RE not egrep RE ?

( I use ps -a | perl -nlE 'say if 1..1 or /cronolog/ for such cases but that's me.)

@petdance
Copy link
Collaborator

because i want Perl RE not egrep RE ?

I know why you might, but I was asking @alexm.

@alexm
Copy link
Author

alexm commented Jun 16, 2022

Do we show the first N lines only if there is a match in the file?

Yes, only if there's a match in the file.

Do we show matches in the header lines? If "foo" shows up in the header lines, what do we do?

Nothing, --head=1 would mean that the first N lines are skipped from filtering as would happen with ps -a | head -n1 ; ps -a | ack regex, thus making --head=1 a convenient shortcut of that snippet.

Do we do the --head=1 rule on every file? The scenario you're describing is good for piped-in data, but ack does more than that.

Yes. For instance (see the notes below):

$ ps -a > ps.txt
$ kubectl get pods > kubectl.txt
$ ack --head=1 foobar *.txt
ps.txt
------
    PID TTY          TIME CMD
   6727 tty2     00:00:51 foobar

kubectl.txt
-----------
NAME                              READY   STATUS      RESTARTS   AGE
foobar-backend-7d8965b74b-wx76t   1/1     Running     0          2d20h

Notes:

  • the line number is not shown to keep the column layout aligned with the header
  • there is an additional line below the filename to show more clearly where the output starts, but it's only a suggestion

Also, back to my original question: Why use ack at all for this?

I use ack more often that grep, even grep has now the -P option that let's you use Perl regexes. I prefer ack for several reasons:

  • Being able to put a .ackrc file in my projects (and home) to ignore node_modules, vendor, dist, etc.
  • Using predefined or custom file types in filtering
  • It feels faster than grep
  • Does recursive search by default
  • Ignoring certain files and directories, like .git, etc.
  • Is shorter to type
  • And probably many more that I can't remember now 😉

@petdance
Copy link
Collaborator

I get all those reasons for using ack over grep (I've preached them :-)) I was just meaning in the case of filtering output from ps.

@alexm
Copy link
Author

alexm commented Jun 16, 2022

In ps -a | head -n1 ; ps -a | grep expr the output from both ps commands could theoretically be different.

Then, I imagined myself adding a new option to ack (which would be easier for me than doing it for grep). I'm even willing to send a pull request if I find enough round tuits.

Other than these, I can't say there's any other particular reason to prefer ack over grep to filter ps output.

@petdance
Copy link
Collaborator

petdance commented Jun 16, 2022

The lines of dashes you've shown would be a new feature, right?

Right now if you don't want the grouping/line numbers, you have the -h. We don't yet have an option to just turn off line numbers, although it's a feature request that has been around a while and I wouldn't be opposed to. See #142

I wouldn't want --head=1 to change any behaviors on how things get output. If you were doing an ack of multiple files and you wanted --head=1, you would probably also have to have a --no-line-number argument as well.

Do we show matches in the header lines? If "foo" shows up in the header lines, what do we do?

Nothing, --head=1 would mean that the first N lines are skipped from filtering

When you say "skipped from filtering" here, do you mean "skipped from being searched"?

If so, then I'm not sure I'm OK with that, but will think. If not, please say more about what you mean by "filtering"?

(And thanks for taking the time to work through these questions. This is the tough part of figuring out features.)

@petdance
Copy link
Collaborator

petdance commented Jun 16, 2022

Some things I'm thinking about: You're talking about using this to show the first line of a stream because you know it's a command like ps and you want the headings. I see the use of this being broader than just that.

For example, I might go acking through a tree of source and do

ack salestax src/ --head=5

because it's helpful to see the first 5 lines of the file that I'm getting results for, even though they aren't a "heading" like in the ps example. Maybe my results look like:

whatever.py
1: # whatever.py
2: # This program does the dingdong doodle.
3: # Created by ....
4: ...
5: ...
78: salestax = calculate_tax()
168: print(salestax)

Having those first 5 lines helps give me context for the actual matches. And that said, I think that if "salestax" appears in the first 5 lines, then it should be highlighted like any other ack match.

Another thought: How does --head=N interact with --output?

@alexm
Copy link
Author

alexm commented Jun 17, 2022

( I use ps -a | perl -nlE 'say if 1..1 or /cronolog/ for such cases but that's me.)

Wow! Didn't know that trick and I like it a lot 😄

I'm assuming that the if 1..1 uses $. implicitly, but I can't find where is this documented. Any pointers?

Thanks, @n1vux!

@n1vux
Copy link
Contributor

n1vux commented Jun 17, 2022

@alexm , yes, the scalar Range Operator implicitly compares an integer against $. .
This goes back to the early days when Larry was blending the best of shell, libc, sed, and awk into one language, Perl 1 or 2ish, iirc.
Great for -e one-liners, a little too cryptic for a maintainable script and useless in a reusable module.

On the theory of making simple things simple and hard things easier, --head=9 is a good addition for ack .

(Perl Range Op is more flexible: either value can also be a RE /^start\b/i .. /^end\b/i or logical expression, a() .. b() meaning from first line where a() is true to first line where b() is true. And mix and match.
The Range op in list context is DWIMish magic for list constants.)

https://perldoc.perl.org/perlop#Range-Operators

@alexm
Copy link
Author

alexm commented Jun 20, 2022

The lines of dashes you've shown would be a new feature, right?

Right, but it was just an example of what could be done to highlight the filename without breaking the column layout for commands like ps and the like.

I wouldn't want --head=1 to change any behaviors on how things get output. If you were doing an ack of multiple files and you wanted --head=1, you would probably also have to have a --no-line-number argument as well.

Makes sense. What I'm sensing is that --head=1 has its own place and that some other option --column-names (or whatever) could use --head=1 and -h et al. to achieve what I really was looking for in the beginning.

When you say "skipped from filtering" here, do you mean "skipped from being searched"?

Yes, I meant that, but after reading the case you made later about showing the first N lines of the source files that match a pattern, I guess it makes more sense to search there too.

@alexm
Copy link
Author

alexm commented Jun 20, 2022

because it's helpful to see the first 5 lines of the file that I'm getting results for, even though they aren't a "heading" like in the ps example

Agreed.

Having those first 5 lines helps give me context for the actual matches. And that said, I think that if "salestax" appears in the first 5 lines, then it should be highlighted like any other ack match.

I changed my mind, you're right.

Another thought: How does --head=N interact with --output?

Good question. My feeling is that when somebody combines both options is because they expect both to be performed. Otherwise, one of them should be removed. Taking the example of the first 5 lines to add context:

  • show the first 5 lines with text highlighted if there's any match, and then
  • show the remaining matches as --output dictates.

@petdance
Copy link
Collaborator

I just realized, maybe --output and --head should be mutually exclusive, and it solves that problem. If you're specifying your own output, then you probably don't want the --head option anyway.

@alexm
Copy link
Author

alexm commented Jun 20, 2022

I just realized, maybe --output and --head should be mutually exclusive, and it solves that problem. If you're specifying your own output, then you probably don't want the --head option anyway.

That was my first thought 😄

Is there any other option that is mutually exclusive with --output? i.e. to be coherent regarding its intent.

@petdance
Copy link
Collaborator

Yes, many mutually exclusive options. See mutex_options function.

@n1vux
Copy link
Contributor

n1vux commented Jul 18, 2022

I don't see a statement of default N, maybe i missed it skimming through.
I would suggest --head without a specific N e.g. N=7 --head=7 should be N=1 , as that's the single most common depth of headers.
(and of course --no-head is the default value.)

@n1vux
Copy link
Contributor

n1vux commented Jul 18, 2022

Can we set flags in .arckrc for only certain file-types?
I could see value in type=csv → head=1 as a personal option.
I might even set it so, were it possible.
(Getting a line of bad data instead of a header would provide a nasty, implicit warning when a CSV does NOT have a header line!)
(it would be wrong as a drop-in-replacement for grep, of course. Gnu Grep 3.7 does not have this feature. yet.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants