Accession No.20210416
TitleDeriving sentiments from 10-K reports using Term Frequency (TF) analysis.
Authors/Creators Felicia Tay Sue Ching (TP044602)

The purpose of this study is to develop a step-by-step approach for analyzing 10-K reports, through comparison of term frequency (TF) using the bag of words method. An analysis of the change in TF can be significant to the 10-K report reader because the change represents the corporation’s action in amending the 10-K report, which might be due to changes in operations or fluctuations in corporate expectations. Nevertheless, the reason behind each change of TF is subjective to the reader’s interpretation. The main contribution of this study is in detailing the methodology of TF comparison through analysis of null terms, identification of terms with highest TF counts and validation of these terms against the 10-K report.

Several analytical tools were used to aid analysis, and to produce visualization for clearer understanding of the data. There are two stages of implementation, the first stage being the testing of current methodology in using sentiment wordlist to derive sentiment scores from the 10-K reports, while the second stage consist of steps in TF comparison. The results of Stage 1 show that there might be confusion if sentiment scores were to be evaluated at face value. This is because the scores did not reflect any clear relationship against the financial performance of the corporations analysed, and the sentiment scores tend to be in congruent with the length of the corpus, which might delude the interpretation of the results of sentiment scores. Stage 2 implementation drills down into the microscopic view of the 10-K report through term frequencies, which is filtered through analytical tools to enable comparison of null terms and +TF/-TF terms.

These filtered terms then go through the process of validation through sentences identification in deriving the insights behind TF differences. From the results of Stage 2, it was found that the comparative approach enables the analyzer of the report to understand year on year changes better, as the TF differences are often indicative of changes in corporate directions or operational decisions during the year.

SupervisorTan Chye Cheah, Dr.
InstitutionAsia Pacific University of Technology and Innovation (APU)
SchoolGraduate School of Technology
No. of pages63
Date typeSubmission
RefereedYes, this version has been refereed
Additional Information

A thesis submitted in fulfillment of the requirement of Asia Pacific University of Technology and Innovation for the award of the degree of MSc in Data Science and Business Analytics (UCMP1701DSBA).

  • - Data structures (Computer science)
  • Technology (General)

Term Frequency (TF) ; 10-K report ; Data processing ; Data Structure ; Corporate annual report ; Text mining ; Business keyword ; Financial performance ; Keyword trends ; Word cloud ; Sentiment analysis ; Correlation coefficient ; Hierarchical clustering.

View Login to view full text