Tokenize strings using cut
If you know the exact delimiter (e.g. a tab) you can use cut. The format is cut -ddelimiter -ffield_number filename where delimiter is the delimiter, field_number is the number of the field we want and filename is the name of the file containing the text. E.g. To get the third field in some text, where the fields are seperated by a comma, use cut -d, -f3. So if you've got the following in a file called file.txt:
1,2,3,4,5 6,7,8,9,10 11,12,13,14,15
cut -d, -f3 file.txt will output:
3 8 13
Of course you can also pipe text into the command. So echo "1,2,3" | cut -d, -f2 will output 2.
Note that cut will use the exact length of the delimiter when splitting strings, so for example you can't specify a delimiter consisting of a single space, give cut a file in which the fields are seperated by 2 spaces, and expect it to accurately pick out the fields. E.g. When the fields are seperated by 2 spaces, echo "1 2 3" | cut -d' ' -f2 will return a space, echo "1 2 3" | cut -d' ' -f3 will return 2.
Tokenize strings using awk
If you've got some text where the fields are seperated by a known character, where the number of such characters is unknown, you can use awk. For example, the output of the df command seperates the fields using spaces:
Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda6 10080488 7564840 2003580 80% / none 256964 0 256964 0% /dev/shm /dev/hda5 10231392 9026744 1204648 89% /mnt/store
Because the number of delimiting spaces differs on each line, and cannot be guaranteed, cut cannot be used. Instead, awk '{print $n}' can be used to output the nth field. E.g. df | awk '{print $3}' will output:
Used
7564840
0
9026744
See The awk programming language for some usefull awk resources.
Remove a line from some text using sed
You can use the sed command for this. sed 'nd' will remove the nth line. So df | sed '2d' will remove the second line from the output of df given above, leaving:
Filesystem 1K-blocks Used Available Use% Mounted on none 256964 0 256964 0% /dev/shm /dev/hda5 10231392 9026744 1204648 89% /mnt/store
Escape spaces with sed
To escape all space characters in a file by preceding them with a backslash:
sed "s/ /\\\ /s" filename
where filename is the name of the file that contains spaces. It writes to stdout.
