Align sequences and visually compare sequence alignment

ASPL User's Guide v 1.00
© 2024 by SetSphere.COM

8-1

ASPL Sequence Alignment Operations

Set operators that have the symbol % as the second character in their mnemonic are called sequence operators. In ASPL these are termed seqops. These seqops do not return any value, but they are merely used to display the alignment between datasets, hence allowing the user to do analytical and comparative obersvation on the data.

The choice for the symbol % is adopted in ASPL as the sequence alignment symbol. While this symbol is sometimes used as the quotient symbol in old mathematics, ASPL uses it as the sequence operator to align two sequences derived from their corresponding datasets. This symbol appears as the second characters in all sequence operators. Sequence operators starts with one of the following three charaters f d g (depicting the subject on which the operation will take place), then followed by the alignment symbol %, then followed by the symbol depicting the type of operation such as one of & U \

    +------ % is the sequence alignment symbol between two datasets
    |
    v
   f%&
   ^ ^
   | |
   | \___ operation is the intersect
   \_____ subject f is the element

Just think of % as a circled dot above the solidus (/) symbol and another circled dot below it, where the first (upper dot) corresponds to the first set and the second (lower dot) corresponds to the second set. ASPL will then display the alignment showing the elements of first set in the upper line, and the elements of the second set in the lower line.

These three letters seqops are typically followed by two set variables and they deliver on displaying the sequence alignment of the first two variables that follow them. When the seqop is issued by itself on the command prompt and not being followed by any operands, then the operation is performed on the object that is located on the top of the stack.

For example to compare two directoriies:

aspl> ggdir(dir,/tmp/aa2)

aspl> ggdir(dir,/tmp/aa1)

aspl> ans

aspl> f%&

The three letters seqops can be extended by suffixing them with one of the following letter & U \ to form a four letter seqops. These four letters seqops are called mediated-seqops and are followed by three set variables, and the alignment is always performed on the first two datasets after being mediated with the mediating operation (depicted by the last letter) with the third dataset. For example:

aspl> f%&U a1 a2 a3

aspl> f%&& a1 a2 a3

It is possible to follow the three letters seqops with three set variables, in which case the mediation operation is implied by the third letter operation. For example, the f%& a1 a2 a3 is equivalent to f%&& a1 a2 a3, and f%U a1 a2 a3 is equivalent to f%UU a1 a2 a3.

Sequence alignment only works with the regular set variables. These are the variables that you can display using the v command.

Example 8.1.1

f%U to align two sequences based on their elements union EXAMPLE

In this example we will compare the files in two directories by displaying their sequence alignment. We will start ASPL by loading the WS1 workspace, then we will issue the command f%U to display the alignment of the elements (which are the files) between the two set variables a1 and a2.

Operation 8.1.1

f%U to align two sequences based on their elements union OPERATION

# aspl WS1
(start ASPL loading the sample workspace WS1 )

① aspl> sequencing
(show what sequence alignment algorithm is used)

② aspl> sequencing ssa
(set the sequence alignment algorithm to sequence similarity analysis)

③ aspl> f%U a1 a2
(show the alignment of the elements between a1 and a2)

④ aspl> sequencing lcs
(set the sequence alignment algorithm to longest common sequence)

⑤ aspl> f%U a1 a2
(show the alignment of the elements between a1 and a2)