Ads

Tuesday, February 02, 2010

Slide Title Extractor Part 1 of 3

This is a three part series about how I went about solving a very small computer problem.

I wanted to extract the text within the \frametitle{} tag in beamer (a latex slide making platform).

Example:


\frame{
\frametitle{Notes}
\begin{itemize}
\item Note 1
\end{itemize}
}

Where I would pull out the title "Notes".

My first attempt was a bash script using sed, tr, and more sed.


#!/bin/bash

#remove comments and get mostly frametitles
sed -n -f ste.sed $1 > $1.1

#remove line endings
tr "\n" " " < $1.1 > $1.2

#make new line endings
tr "}" "\n" < $1.2 > $1.3

#remove any lines with latex commands left
sed -n '/\\frametitle{/p' $1.3 > $1.4

#fix spacing
sed 's/[ ]* / /g' $1.4 > $1.5

#remove the \framtitles and put in periods and spaces
sed 's/ \\frametitle{//;s/$/./' $1.5 > $1.6

#remove line endings
tr "\n" " " < $1.6 > $1.7

cp $1.7 $1_summary.txt
rm $1.*


where ste.sed was:

/^%/d
/\\frametitle/{
N
s/.*\n}/ /
/\\frametitle.*}/p
}


Which I thought was inelegant. So I decided to rewrite it, which is part 2 and 3 of this series.

If anyone knows the regex to get it to work without all of my gymnastics, I would love to see it.

No comments: