Text manipulation - cut, awk and sed

Tokenize strings using cut

If you know the exact delimiter (e.g. a tab) you can use cut. The format is cut -ddelimiter -ffield_number filename where delimiter is the delimiter, field_number is the number of the field we want and filename is the name of the file containing the text. E.g. To get the third field in some text, where the fields are seperated by a comma, use cut -d, -f3. So if you've got the following in a file called file.txt:

   1,2,3,4,5
   6,7,8,9,10
   11,12,13,14,15
   

cut -d, -f3 file.txt will output:

   3
   8
   13
   

Of course you can also pipe text into the command. So echo "1,2,3" | cut -d, -f2 will output 2.

Note that cut will use the exact length of the delimiter when splitting strings, so for example you can't specify a delimiter consisting of a single space, give cut a file in which the fields are seperated by 2 spaces, and expect it to accurately pick out the fields. E.g. When the fields are seperated by 2 spaces, echo "1 2 3" | cut -d' ' -f2 will return a space, echo "1 2 3" | cut -d' ' -f3 will return 2.

Tokenize strings using awk

If you've got some text where the fields are seperated by a known character, where the number of such characters is unknown, you can use awk. For example, the output of the df command seperates the fields using spaces:

   Filesystem           1K-blocks      Used Available Use% Mounted on
   /dev/hda6             10080488   7564840   2003580  80% /
   none                    256964         0    256964   0% /dev/shm
   /dev/hda5             10231392   9026744   1204648  89% /mnt/store
   

Because the number of delimiting spaces differs on each line, and cannot be guaranteed, cut cannot be used. Instead, awk '{print $n}' can be used to output the nth field. E.g. df | awk '{print $3}' will output:

      Used
   7564840
         0
   9026744
   

See The awk programming language for some usefull awk resources.

Remove a line from some text using sed

You can use the sed command for this. sed 'nd' will remove the nth line. So df | sed '2d' will remove the second line from the output of df given above, leaving:


   Filesystem           1K-blocks      Used Available Use% Mounted on
   none                    256964         0    256964   0% /dev/shm
   /dev/hda5             10231392   9026744   1204648  89% /mnt/store
  

Escape spaces with sed

To escape all space characters in a file by preceding them with a backslash:

sed "s/ /\\\ /s" filename

where filename is the name of the file that contains spaces. It writes to stdout.

Also see:

Last modified: 11/05/07 13:41:02
Go to top

Related Pages

No related pages or links.

Login/out

Login

Forgot Password?
Go to top
Go to top