Introduction

When we are talking about processing information, it all starts with getting the input data. This matter usually comes down to explore some directory (or repository) containing different sets of information, and then “do something” with it. Those repositories usually tend to be massive, so the most effective thing to do is automating the process. We don’t need superpowers to achieve this, only a few pointers are enough to make this goal reachable.

As an example for this process, I’ll explain how to recover all the content from a tree of folders containing PDF files in a very simple way. In order to sort this out, I’ll divide the problem in three main issues:

  • Getting the file content of a single PDF file.
  • Getting all the folders and sub-folders paths.
  • Create a loop so we retrieve all the PDF text from all the files.

Process

1. Processing the PDFs

One of the usual file formats we get is the Adobe Acrobat PDF (Portable Document Format). This format was created with the intent of being independent from application software, hardware and operating system, by storing not only the text and graphics, but the whole information about the layout and fonts. There are multiple readers, such as the Adobe Acrobat Reader, Evince or Foxit Reader.

Of course, not everything is so pretty, as the some Adobe PDF files contains XFDF (XML Forms Data Format) and therefore they are only properly rendered in proprietary Adobe programs for now. I have faith in the open source community to eventually solve this issue, as this defeats the purpose of the format.

I would also like to point out that, while PDFs may be a standard for documents which will be printed out, they are not “screen-friendly”, meaning they cannot be adapted properly to be read on e-books, tablets and smartphones in a comfortable way, as they are not able to adjust the content size to the screen. My advice is that if you are publishing a document, you may want to consider the EPUB format for the digital edition.

Every single document processing application starts with getting some source data, and in this case, we are going to use the Apache PDFBox package, an open source library which provides us several different functions such as:

  • Create new PDF documents.
  • Extract the contents from existing documents.
  • Manipulate a given document.
  • Digitally sign PDF files.
  • Print a PDF file using the standard Java printing API.
  • Save PDFs as image files, such as PNG or JPEG.
  • Validate PDF files against the PDF/A-1b standard.
  • Encrypt/decrypt a PDF document.

In this example I am only going to work with plain text, as this is an excerpt from a program where I intended to make the text indexable in order to create search relationships between different documents, so bear in mind that PDFBox can do so much more than that.

So let’s get down to business: the very first step if we are using Maven is adding the required dependencies to the pom.xml file, so we can get the library.

❕This was the stable version when the post was originally written.

1
2
3
4
5
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.6</version>
</dependency>

Now we can work on a very short and simple snippet to read the text content from a PDF file, and store it on a String so we are able to do some the heavy processing work afterwards such as using Lucene later to index that said content and create some search functions to improve the access to information.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;

public class PdfReader {

public static String extractTextFromPdf(String path) throws IOException {
System.out.println("Parsing a PDF");
String parsedText = "";

File f = new File(path);
if (!f.isFile()) {
System.err.println("The file " + path + " doesn't exist.");
return null;
}

PDFParser parser = new PDFParser(new FileInputStream(f));
parser.parse();
COSDocument cosDoc = parser.getDocument();
PDFTextStripper pdfStripper = new PDFTextStripper();
PDDocument pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);

return parsedText;
}

}

2. The Spider and its web

When we go through a folder or directory, we may find not only files, but other sub-folders, which may also contain more files or more subdirectories, and so on. The consequence of this is that we would need a way to go through all this hierarchical structure, using a recursive function. This idea would be the core of the “Spider” which will go through the “web” of files:

1
2
3
4
5
6
+ Directory_1
|-- File_1.pdf
|-- File_2.pdf
|-+ Directory_2
|-- File_3.pdf
|-- File_4.pdf

The “Spider” will detect all the files (File_1.pdf, File_2.pdf, File_3.pdf and File_4.pdf) thanks to a recursive structure, instead of getting stuck with only the first level of the tree (File_1.pdf and File_2.pdf).

This can be summarized in the following algorithm structure:

1
2
3
4
5
1.- Initialize the while loop for each element
2.- Is it a base case?
– Yes: solve base case
– No: execute recursive function
3.- End while loop

We can achieve this in Java by relying only on the java.io and java.util libraries, which are included in every Java Development Kit (JDK).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import java.io.File;
import java.util.LinkedList;
import java.util.List;

public class Spider {

/*
* Lists the files only on that level
*/
public List<String> listPaths(String path) {
File f = new File(path);

List<String> l = new LinkedList<String>();
if (f.exists()) {
File[] fileArray = f.listFiles();
for (int i = 0; i < fileArray.length; i++) {
l.add(fileArray[i].getAbsolutePath());
}
} else {
System.err.println("The path " + path + " is incorrect");
}
return l;
}

/*
* Also lists the sub-directories content
*/
public List<String> listPathsRecursive(String path) {
File f = new File(path);
List<String> l = new LinkedList<String>();
if (f.exists()) {
File[] fileArray = f.listFiles();
for (int i = 0; i < fileArray.length; i++) {
// check the sub-directories
if (fileArray[i].isDirectory()) {
List<String> l1 = listPathsRecursive(
fileArray[i].getAbsolutePath());
l.addAll(l1);
} else {
//isValidFormat will check the file extensions
//e.g. fileNameString.endsWith(".pdf")
if (ClasificadorDeFicheros.isValidFormat(ficheros[i]
.getAbsolutePath())) {
l.add(ficheros[i].getAbsolutePath());
}
}
}
} else {
System.err.println("The path " + path + " is incorrect");
}
return l;
}
}

3. Loop using the methods from steps 1 and 2

Finally we get to the easiest part: we just need some basic Java iterations to finish our epic SpiderPdf.java: we get all the files paths with the method from the second step, and process it by invoking the code generated on the first step.

1
2
3
4
5
6
7
// excerpt from MapFolderOfPdfs.java
Iterator<String> it = spider.listPaths(mainFolderPath);
Map<String, String> mapContent = new HashMap<String, String>();
while(it.asNext){
String currentPath = it.next();
mapContent.put(currentPath, PdfReader.extractTextFromPdf(currentPath));
}

❗️ I would recommend working iterators if we want to work with collections, as you may consider changing the structure in the future to a new one which optimizes the access or saving time, so you do not have to rewrite that set of code. A HashMap is probably one of the bests to access information we may want to classify, but it will not get the best time to store the content. If we get to work with an increasing amount of information, you may consider a TreeMap.

Introduction

Most programmers know about the concept of unit tests and have dabbled with the Junit framework while they learn the beauty of automatic tests. However, few are those who have met Mockito, which is in fact one of the best frameworks for unit testing.

We may use a layered structure on a server to split the elements according to their functionalities, and following that train of thought we can modularize the code in logical layers. That’s where Mockito comes through.

graph LR;
  A[Service layer];
  B[Business layer];
  C[Data access layer];
  D((Database));
  A-->B;
  B-->C;
  C-->D;

By using a “mockers system” we can substitute a whole dependant component class with a behavioural emulation, following the behaviour driven development paradigm. The best way to explain this task is using an example, so let’s suppose we have an interface called IDocumentAccessDao, which will be implemented by the class DocumentAccessDao. This data access object has some database accesses using Jdbc, and while we intend to create tests to cover all of its set of instructions, it makes no sense to actually connect to the database as it may be not available and make our tests fail (and that would also be an Integration test, not a Unit test).

Process: How do we drink this?

1. Setting up the Maven dependencies

The first step is getting the testing dependencies into our project, and that’s something we can do via Maven by adding them to the pom.xml file.

❕These were the stable versions when the post was originally written

1
2
3
4
5
6
7
8
9
10
11
12
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>1.9.5</version>
<scope>test</scope>
</dependency>

2. Mocking the Data Access class

❗️If we are using a component which may be also used in other classes (e.g. JDBC or JPA implementations to handle the connections to databases), it would be good to apply inheritance to those components, as they are highly reusable.

1
2
3
4
@Local
public interface IDocumentAccessDao extends IJdbcTemplateDao{
void getCollaborateDocumentStatus() throws GenericDAOException;
}

Let’s start by creating the test class, which we will call DocumentAccessDaoTest, but don’t forget that if you are using Spring, you may want to load the mocks from the context.xml file.

1
2
3
4
<! — DATAACCESS-DAO ->
<context:component-scan base-package=”org.mydomain.app.dataaccess.dao.impl” />
<! — DATAACCESS-DAO-MOCKITO ->
<context:component-scan base-package=”org.mydomain.aap.dataaccess.mocked” />

Now let’s check the class we are going to test:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
@Repository
@Qualifier("DocumentAccessDAO")
@Stateless(name = "DocumentAccessDAO")
@TransactionAttribute(TransactionAttributeType.REQUIRED)
@Interceptors({ SpringBeanAutowiringInterceptor.class })
public class DocumentAccessDAO extends AbstractJdbcTemplateDAO implements
IDocumentAccessDAO {

@EJB
private DocumentDAO documentDAO;
private DocumentAccessRowMapper dmRowMapper = new DocumentAccessRowMapper();
private DocumentAccessPreparedStatementSetter dPreparedStatementSetter
= new DocumentAcccessPreparedStatementSetter();

@Override
public List<Document> getCollaborateDocumentStatus() throws GenericDAOException {
List<Document> listDocs = new LinkedList<Document>();
try {
List<Map<String, Object>> resultSet = this.jdbcTemplate.queryForList(
SqlDocumentManager.COLL_DOCUMENT_STATUS);
if (!CollectionUtils.isEmpty(resultSet)) {
for (Map<String, Object> data : resultSet) {
String docId = String.valueOf(data.get("DOCID"));
int version = documentDAO.getLastDocumentVersion(docId);
Document document = documentDAO.getDocumentVersion(docId, version, null);
listDocs.add(document);
}
}
} catch (DataAccessException e) {
throw new GenericDAOException(e);
}
return listDocs;
}
}

We can see that it uses calls to DocumentDAO and JdbcTemplate methods, so we would need to mock those calls to avoid running code from other classes. Therefore, we will use the following 3 attributes in our DocumentAccessDAOTest class:

  • documentDAO: the entity we will test.
  • mocker: the database connection mocker.
  • documentDAOMock: we intend to execute only the code in DocumentAccessDAO, so we will simulate it by getting default dummy values for every method invoked from this object.

The code on the initMock will follow this structure:

  • Initialize the mocks: we need to know the results expected on the different calls to mocked objects. The syntax for this methods looks like initMockForMethod (inputParameters, resultExpected), and will be detailed later.
  • Call the method we want to test.
  • Check that the result obtained is the one we expected by using assert instructions. If we expect and exception, we should use an “expected” annotation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
@TransactionAttribute(TransactionAttributeType.REQUIRED)
@Interceptors(SpringBeanAutowiringInterceptor.class)
@ContextConfiguration(locations = “/appDao-testConfiguration.xml”)
@RunWith(SpringJUnit4ClassRunner.class)
public class DocumentAccessDAOTest {
// all the mocks will be injected into this instance
@InjectMocks
private DocumentAccessDAO documentDAO;
// initialize the mocker via annotation
@Mock
private JdbcTemplate jdbcTemplate;
@Mock
private DocumentAccessDAO documentDAO;

@Before
public void initMock() {
// initialize generic behaviour
Mockito.when(jdbcTemplate.queryForList(Matchers.anyString())
.thenReturn(createResultList());
Mockito.when(documentDAOMock.getLastDocumentVersion(Matchers.anyString()))
.thenReturn(1);
Mockito.when(documentDAO.getDocumentVersion((String)
Matchers.anyString(), Matchers.anyString(), Matchers.anyString()).thenReturn(
creatDummyDocument());
}

@Test
public void getCollaborateDocumentStatusReturnsValidResultExpected()
throws GenericDAOException {
// method to call
List<Document> listDocument = managerDAO.getCollaborateDocumentStatus();
// check result
Assert.assertTrue(!listDocument.isEmpty() && listDocument.size() == 4
&& listDocument.get(0).getDocId().equals(MockedDocumentValues.MY_DOCUMENT_ID)
&& listDocument.get(1) == null);
}

// the exception is managed through the annotation
@Test(expected = GenericDAOException.class)
public void getCollaborateDocumentStatusReturnsxceptionException()
throws GenericDAOException {
// initialise non-generic mocked methods for this test
Mockito.when(jdbcTemplate.queryForList(Matchers.anyString())
.thenThrow( new RecoverableDataAccessException(MockedValues
.GENERIC_DAO_EXCEPTION_TEXT));
// method to call
managerDAO.getCollaborateDocumentStatus();
}

private Document createDummyDocument(){
Document document = new Document();
document.setVersion(1);
document.setDocId(MockedDocumentValues.MY_DOCUMENT_ID);
return document;
}
}

As you can see, we use the class MockedDocumentValues.java to generate the dummy values for some parameters. This class belongs to a set of common classes named Mocked*Values on the Junit auxiliary project, to avoid duplicated values among the test cases.

Cookbook for more complex cases

I’ll do a quick syntax enumeration for all the odd situations I found:

1. How to mock a standard method

The simplest version of the syntax would be:

  • Mockito.when(method).thenReturn(valueExpected)
1
Mockito.when(documentDAOMock.getLastDocumentVersion(sql)).thenReturn(1);

2. Throwing an exception

If the method returns a value the syntax would be:

  • Mockito.when(method).thenThrow(new Exception())
1
2
Mockito.when(this.template.update(sql, status, docId))
.thenThrow(new RecoverableDataAccessException(ERROR_CONNECTION_GENERATOR_STRING));

However, if the method returns void, the syntax is different, so be careful with the parenthesis on the latest part of the sentence:

  • Mockito.dothrow(Exception.class).when(instance).method(parameter)
1
2
Mockito.doThrow(IOException.class).when(this.em).
persist(Matchers.any(Object.class));

3. Using matchers

  • When we want to return a value independently of the parameters content, we can use Matchers.any(Object.class) (where Object may be any custom class we prefer, or if use one of the classical Java types we may use their own methods: anyString(), anyInt(), anyList() …).
    1
    2
    Mockito.when(documentDAOMock.getLastDocumentVersion(
    Matchers.anyString())).thenReturn(1);
  • If we want to do something similar, mixing parameters whose content we don’t mind while other values may be important, we should combine Matchers.any(Object.class) and Matchers.eq(instance)
    1
    2
    3
    4
    Mockito.when(this.template.update(
    Matchers.eq(SqlDocumentManager.INSERT_DOC_AUTH_ACTION),
    Matchers.any(PreparedStatementSetter.class))).thenThrow(
    new RecoverableDataAccessException(MockedValues.GENERIC_DAO_EXCEPTION_TEXT));
  • Another useful method is Matchers.isA(class). When we have a series of em.persist(object) and we have to find which one of them we actually need, we can determine it by pointing out the class of instance it belongs to.
    1
    2
    3
    4
    5
    public <T> void initMockForFindFails(Class<T> entityClass, Object primaryKey) {
    Mockito.when(this.em.find(
    Macthers.isA(InvalidPersistanceClass.class),
    Matchers.eq(primaryKey))).thenThrow(new NoResultException());
    }

4. Mocking a procedure with possible input/output parameters (persist method)

Sometimes, we have to check the new primary key of an Object after it has been inserted on a database via entityManager.persist(instanceObject). When this happens, we have to mock the method to simulate the answer received, as we do in this example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/**
* Mocks the update done when persisting a LegalDoc in the database
*/
public void initMockForPersistLegalDoc() {
Mockito.doAnswer(new AssignIdToLegalDocAnswer(LEGAL_ID)).when(em).persist(
Matchers.any(LegalDoc.class));
}

private class AssignIdToLegalDocAnswer implements Answer<Void> {

private int legalDocId;
public AssignIdToLegalDocAnswer(int legalDocId) {
this.legalDocId = legalDocId;
}

@Override
public Void answer(InvocationOnMock invocation) throws Throwable {
LegalDoc legalDoc = (LegalDoc) invocation.getArguments()[0];
legalDoc.setLegalDocId(legalDocId);
return null;
}

}

Another complex example using the doAnswer method would be defining via answer “on the fly” not only changes on the output or return statment, but we may be able to define input/output parameters.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public void initMockForMyProcedure(MyInputOutputObject object1){
Mockito.doAnswer(new Answer<MyCustomOutputObject>() {
@Override
public MyCustomOutputObject answer(final InvocationOnMock invocation)
throws Throwable {
// the original argument may be changed, only for this function
final MyInputOutputObject originalArgument = (invocation.getArguments())[0];
// we define the ourput parameter value here
final MyCustomOutputObject returnedValue = new MyCustomOutputObject();
returnedValue.setValueOutput(new MyCustomOutput());

return returnedValue;
}
}).when(myService).myProcedure(Matchers.any(MyInputOutputObject.class));
}

5. Mocking a JPA query-response method as a single method

This avoids problems when several pairs of “named queries”/“getResults” are used in a single method, so the results of each one of them don’t get mixed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public <T> void initMockForCreateNamedQueryGetSingleResult(String sqlQuery,
T returnedObject, boolean shouldFail) {
Query mockedQuery = Mockito.mock(Query.class);
if (shouldFail) {
Mockito.when(mockedQuery.getSingleResult()).thenThrow(new NoResultException());
} else {
Mockito.when(mockedQuery.getSingleResult()).thenReturn(returnedObject);
}
Mockito.when(this.em.createNamedQuery(sqlQuery)).thenReturn(mockedQuery);
}

public <T> void initMockForCreateNamedQueryGetResultList(String sqlQuery,
List<T> returnedObject, boolean shouldFail) {
Query mockedQuery = Mockito.mock(Query.class);
if (shouldFail) {
Mockito.when(mockedQuery.getResultList()).thenThrow(new NoResultException());
} else {
Mockito.when(mockedQuery.getResultList()).thenReturn(returnedObject);
}
Mockito.when(this.em.createNamedQuery(sqlQuery)).thenReturn(mockedQuery);
}

6. Mocking an abstract class

Abstract classes can be instanced as normal ones by mocking them with CALL_REAL_METHODS.

1
2
3
4
public void initMockers() {
dao = Mockito.mock(AbstractDocumentDAOImpl.class, Mockito.CALLS_REAL_METHODS);
dao.setEntityManager(jpaMocker.getEntityManager());
}

7. Mocking a ‘?’ parameter

These should be mocked with the doReturn instruction, in a similar way as throwing exceptions on methods which don’t have return statements.

1
2
3
4
public void initMockForGetMap(Map<String, ?> expectedValue) {
Mockito.doReturn(expectedValue).when(getter).getMap(Matchers.anyString(),
Matchers.anyString());
}

8. Mocking an ‘Object…’ parameter
These are called vargars parameters. E.g. To mock something like JdbcTemplate.queryForList(String sql, Object… args), we need to use Matchers<CLASSNAME>.anyVararg()

1
2
3
4
public void initMockForQueryForList(List<String> expectedvalue){
Mockito.when(this.template.queryForList(Matchers.anyString(),
Matchers.<Object>anyVararg())).thenReturn(expectedValue);
}

Mockito limitations

  1. Mockito can’t mock final classes
  2. Mockito can’t mock static methods
  3. Mockito can’t mock final methods

In case you need to mock legacy code containing any of this issues, you should use Powermock, but taking into account that not all the releases of Mockito are totally compatible with Powermock.

❗️ When there is an evil static class in your application and you can’t get rid of it without messing up half your application, you may consider using a singleton pattern to avoid this issue.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public final class SendMailHelper{
// this way we make sure we have only one instance
private static SendMailHelper instance;

// no one external can create new instance
private SendMailHelper(){
}

//we control here the instance creation
public static SendMailHelper getInstance(){
if (instance == null){
instance = new SendMailHelper();
}
return instance;
}

// just in case we need to set a mocker
public void setMailHelper(SendMailHelper helper){
instance = helper;
}

}

Then, in the classes we used to call SendMailHelper.method() we add an attribute declaration, and when it is needed, we can set it for the tests (in the initMock() method).

1
SendMailHelper sendMailHelper = SendMailHelper.getInstance();

Advice about the best Junit practices

1. Test only one code unit at a time
When we try to test a unit, it may have multiple use cases. We should always test each use case in separate test case. For example, if we are writing test case for a function which is supposed to take two parameters and should return a value after doing some processing, the different use cases might be:

  • First parameter can be null. It should throw an InvalidParameterException.
  • Second parameter can be null. It should throw an InvalidParameterException.
  • Both can be null. It should throw an InvalidParameterException.
  • Finally, test the valid output of function. It should return valid predetermined output.

This helps when you do some code changes or some refactoring: running the test cases should be enough to check that functionality is not broken. Also, if you change any behaviour you would need to change some test cases.

2. Make each test independent to all the others
Don’t create a chain of unit test cases. It will prevent you to identify the root cause of the test case failures, and you will have to spend time debugging the code. Also, it creates dependency, which means that if you have to change one test case then you need to make changes in multiple test cases unnecessarily.

3. Mock out all external services
Otherwise, the behaviour of those external services overlaps multiple tests, and different unit tests can influence each other’s results.

We have to be sure each test resets the relevant statics to a known state before they run. We have to try avoiding dependences between tests and systems so running them in a different order won’t affect the outcome.

4. Name your unit tests clearly and consistently
This is the most important point to keep in mind. We must name our test cases regarding what they actually do and test. A test case naming convention which uses class names and method names for test cases name is never a good idea, as every time you change the method name or class name, you will end up updating a lot of test cases as well.

But, if our test cases names are logical i.e. based on operations then you will need almost no modification, because the application logic will most possibly remain same.

E.g. Test case names should be like (supposing EmployeeTest is our Junit class):

1
2
3
4
EmployeeTest.createWhithNullIdShouldThrowException();
EmployeeTest.createWithNegativeIdShouldThrowException();
EmployeeTest.createWithDuplicateIdShouldThrowException();
EmployeeTest.createWithValidIdShouldPass();

5. Aim for each unit test method to perform exactly one assertion
We should try to test only one thing per test case,so use one single assertion per test case. This way, if some test case fails, you know exactly what went wrong.

6. Create unit tests that target exceptions
If some of your test cases, which expect the exceptions to be thrown from the application, use the “expected” attribute. Try avoiding catching exception in catch blocks and using fail or assert method to conclude these tests.

7. Do not print anything out in unit tests
If you are correctly following all the guidelines, then you will never need to add any print statement in your test cases. If you feel like you need one, revisit your test case(s).

8. Extend from generic classes to avoid rewriting code
Use generic abstract classes (e.g. JdbTemplateDAO and JpaDAO) as much as you can when you are mocking database connections

9. Check that the mocked values you are going to create don’t exist already
When mocking values related to the most used entities, check that they don’t already exist on auxiliary classes.

10. Create a Junit suite when testing classes which implement more than one interface
Our test design is interface oriented, so in case a class implements more than one interface, in order to see the coverage more easily we can create suites as seen in this example using the @Suite.SuiteClasses annotation.

1
2
3
4
@RunWith(Suite.class)
@Suite.SuiteClasses({ DocumentIDocumentLocalTest.class, DocumentIDocumentTest.class })
public class DocumentBeanTestSuite {
}

Definition

The Open Systems Interconnection model (OSI model) is a conceptual model that characterises and standardises the communication functions of a telecommunication or computing system without regard to its underlying internal structure and technology

Number Name Protocol data unit (PDU
7 Application Layer Data
6 Presentation Layer Data
5 Session Layer Data
4 Transport Layer Segment, Datagram
3 Network Layer Packet
2 Data Link Layer Frame
1 Physical Layer Symbol

Abstract Layers

  • Layer 1: Physical Layer
    • unstructured raw data between a device and a physical transmission medium
    • converts the digital bits into electrical, radio, or optical signals
  • Layer 2: Data Link Layer
    • provides node-to-node data transfer
    • detects and possibly corrects errors that may occur in the physical layer
    • defines the protocol to establish and terminate a connection between 2 physically connected devices
    • defines the protocol for flow control between them
  • Layer 3: Network Layer
    • provides the functional and procedural means of transferring variable length data sequences (called packets) from one node to another connected in “different networks”
  • Layer 4: Transport Layer
    • provides the functional and procedural means of transferring variable-length data sequences from a source to a destination host, while maintaining the quality of service functions.
  • Layer 5: Session Layer
    • controls the dialogues (connections) between computers
    • establishes, manages and terminates the connections between the local and remote application
  • Layer 6: Presentation Layer
    • establishes context between application-layer entities, in which the application-layer entities may use different syntax and semantics if the presentation service provides a mapping between them
  • Layer 7: Application Layer
    • layer closest to the end user
    • interacts with software applications that implement a communicating component

Configure tooling

Configure user information for all local repositories

  • Set the name you want attached to your commit transactions
    1
    git config --global user.name "[name]"
  • Set the email you want attached to your commit transactions
    1
    git config --global user.email "[email address]"
  • Enable helpful colorization of command line output
    1
    git config --global color.ui auto

Create repositories

Start a new repository or obtain one from an existing URL

  • Creates a new local repository with the specified name
    1
    git init [project-name]
  • Downloads a project and its entire version history
    1
    git clone [url]

Make changes

Review edits and craft a commit transaction

  • Lists all new or modified files to be committed
    1
    git status
  • Snapshots the file in preparation for versioning
    1
    git add [file]
  • Unstages the file, but preserve its contents
    1
    git reset [file]
  • Shows file differences not yet staged
    1
    git diff
  • Shows file differences between staging and the last file version
    1
    git diff --staged
  • Records file snapshots permanently in version history
    1
    git commit -m "[descriptive message]"
  • Add more changes and change message
    1
    git commit --amend -m "added file and changed message to this"
  • Revert a commit in a new commit
    1
    git revert commit-id

Group changes

Name a series of commits and combine completed efforts

  • Lists all local branches in the current repository
    1
    git branch
  • Creates a new branch
    1
    git branch [branch-name]
  • Switches to the specified branch and updates the working directory
    1
    git checkout [branch-name]
  • Combines the specified branch’s history into the current branch
    1
    git merge [branch]
  • Deletes the specified branch
    1
    git branch -d [branch-name]

Synchronize changes

Register a repository bookmark and exchange version history

  • Downloads all history from the repository bookmark
    1
    git fetch [bookmark]
  • Combines bookmark’s branch into current local branch
    1
    git merge [bookmark]/[branch]
  • Uploads all local branch commits to GitHub
    1
    git push [alias] [branch]
  • Downloads bookmark history and incorporates changes
    1
    git pull

Refactor filenames

Relocate and remove versioned files

  • Deletes the file from the working directory and stages the deletion
    1
    git rm [file]
  • Removes the file from version control but preserves the file locally
    1
    git rm --cached [file]
  • Changes the file name and prepares it for commit
    1
    git mv [file-original] [file-renamed]

Save fragments

Shelve and restore incomplete changes

  • Temporarily stores all modified tracked files
    1
    git stash
  • Lists all stashed changesets
    1
    git stash list
  • Restores the most recently stashed files
    1
    2
    3
    git stash pop
    # or either
    git stash apply
  • Discards the most recently stashed changeset
    1
    git stash drop

Supress tracking

Exclude temporary files and paths

  • a text file named .gitignore supresses accidental versioning of filesand path matching the specified patterns
    1
    2
    3
    *.log
    build/
    temp-*
  • list all the ignored files in this project
    1
    git ls-files --other --ignored --exclude-standard

Redo commits

Erase mistakes and craft replacement history

  • Undoes all commits after [commit], preserving changes locally
    1
    git reset [commit]
  • Discards all history and changes back to the specified commit
    1
    git reset --hard [commit]
  • Remove new/untracked files and directories
    1
    git clean -f

Review history

Browse and inspect the evolution of project files

  • Lists version history for the current branch
    1
    git log
  • Lists version history for a file, including renames
    1
    git log --follow [file]
  • Shows content differences between two branches
    1
    git diff [first-branch]...[second-branch]
  • Outputs metadata and content changes of the specific commit
    1
    git show [commit]

Why learning to code feels like having superpowers, and why I love free/libre software and open source code

Originally posted in Spanish on May 27th 2015

I have always loved coding: it is a world based on maths and physics, where logic allows you to understand causes and consequences. I already had several software geeks among my favourite characters on the comic books I read as a teenager. Of course, they were secundary or tertiary characters like X-men’s Sage or Planetary’s The Drummer. Their main skill was the ability to understand how thinks work, so they were able to see things that were hidden on plain sight for the rest of the world. What I could not guess was that in the future, by studying computer science, I would also feel as if I had that kind superpower.

The so called computer revolution started over the 90s, but nowdays computer science is tangled in our personal life with the “Smart Technology”: we have SmartPhones, SmartBands, SmartGlass, SmartWatches… The Quantified Self, or meassuring our daily information, like pulse, calories, expenses, web positioning leading to have a huge ammount of information about anyone of. The big question is: do you know how that data is handled? That is the skill you get by learning how to code: you may read the source and discover what it does and how it does it. The best thing is that you will be able to make informed and responsible decisions, and the worst is that your friends may look at you as you are some weirdo when they ask you why you do not use a certain app and you answer them: “I have seen the code… AND IT IS HORRIBLE!”.

Therefore, you may be able to edit that code and improve it. You may adapt the system to suit your taste in design, or make it more efficient and safe, like preventing it from sharing some of your data which you may not want anyone else to have. You may even repair it when you have some problems. It is great to be able to create new systems which handle a lot of information for you, so you do not have to spend a lot of time in tedious work. You may be able to spend your time on other matters or manage more information with less effort. This train of thought will lead you to love the free/libre software and open source movements, which will provide you access to the code for free. Some days you may feel like a Shadowrun character in its endless fight against megacorps, but it is worth it.

  • So, why is learning how to code so interesting?

Sometimes you may access to the source, audit it, or even play with it. In other cases, the software patents may prevent you from learning how it works, and that may lead you to all kinds of absurd situations. One example of that are “smart home appliances”. There are washing machines may connect to you home wifi to send a notification to your phone saying that their program is about to be completed. There are also fridges which detect that you do not have enough eggs, and may request to the nearest grocery to send a dozen to you home. The need of these systems may be arguable: my favourite ridiciulous “smart” object is a food dispenser for cats which opens the recipient when you send a tweet with a mention to its Twitter account, and yes, it that thing existed on 2011.

Now imagine that the system is poorly programmed and has “Russian roulette code” like this:

1
2
3
4
5
6
7
8
int min = 1;
int max= 6;
int value = (Math.random()*(max-min))+min;
if (value == min){
die();
}else{
//your code goes here
}

This is a very simple case which simulates throwing a 6-sided dice. The computer do not have tha ability to generate a random value, so they usually rely on the latest digits of the internal clock time to generate the random result. Hence, if you get a number between 2 and 6, everything will work as expected, but if you get a 1 the system will crash and die. This may lead in a silly way to planned obsolescence in its worst definition: if we went back to the smart fridge I mentioned before, it may be buying tomatos until judgement day, which may really hurt your wallet. Fixing it would be as simple as removing the whole if clause surrounding the “your code goes here” line, making it more efficient (you would be running 7 lines of code less) and security, but if this were privative code you would not be able to (or more like you should not) do it.

This kind of situation is a setback, or sometimes it could be part of a twisted business model to damage your right to repair. That is the reason why I will always prefer a system which lets me see what is happening under the hood. People can review it and improve it, making it better for everyone. This does not mean that their creators should lose all control over their work: there are different licenses, similar to creative commons, which can be properly adapted to each business model, allowing you to handle the different rights. You should always keep the authorship and moral rights, and you may (or may not) restrict economic remuneration and derivative works by choosing the model which fits you better. Making a modification does not mean doing something ill intentioned, it may solve a dangerous issue and avoid many future problems.

0%