AI-Powered Learning: Enhancing Student Attention: Part 1

AI-Powered Learning: Enhancing Student Attention: Part 1

Introduction

When discussing challenges in the education system, we can identify that measuring student’s attention is challenging, especially in a virtual environment, where there is a lack of closeness between students and the educator. It’s known through various research efforts that students in virtual environments engage and participate less in the virtual classroom than those in a normal classroom. As a consequence, students in virtual environments find the learning experience more challenging, and as a result, their learning is negatively affected.

One way to mitigate this problem is when the teacher implements “active learning” strategies, which consist of different activities that stimulate and engage the students:

  • Propose debates

  • Q&A sessions

  • Ask student’s opinions

  • Group discussions

  • Facilitate student-to-student interactions

However, to properly implement these strategies, an educator needs to balance their explanations with these active learning strategies. One important metric for balancing that is the “engagement level” of each student. In a virtual environment, this is very challenging to measure.

Measuring Attention Level

In 2020, a group of researchers from Georgia Southern University presented an approach to identify the attentiveness of students. They conducted an experiment in which students’ faces were recorded while they were in a class. It gave a dataset of face pictures, which they proceeded to label as attentive or inattentive, based on expert supervision (human labeling). After that, they trained a machine learning model based on the data set which identifies the attention level of a student's face.

After getting a working model for identifying attention level, they correlated that model with emotions identified using AWS Rekognition. They made a correlation between the model attention level and the emotions identified by AWS Rekognition. As part of that correlation, they crafted the following formula which helps in identifying attention levels based on the AWS Rekognition emotions for a face:

Image that shows the formula for calculating attention level based on 7 emotions. Each emotion gets multiplied by a “beta” coefficient and the last factor is summing an error coefficient

Taken from Tabassum, T., Allen, A. A., & De, P. (2020). Non-intrusive Identification of Student Attentiveness and Finding Their Correlation with Detectable Facial Emotions. ACM Southeast Conference - ACMSE 2020 - Session 1, 127-134.

The values of the formula can be found in the following correlation table:

Image that shows a table of correlation coefficient and standard error used in the calculation of the attentiveness level. Each emotion represents a row, with a coefficient and standard error. Last row is an “Intercept”, which is also used in the calculation of the attention level

Taken from Tabassum, T., Allen, A. A., & De, P. (2020). Non-intrusive Identification of Student Attentiveness and Finding Their Correlation with Detectable Facial Emotions. ACM Southeast Conference - ACMSE 2020 - Session 1, 127-134.

Disclaimer

Using this formula we can get the attention level of a student’s face, but it’s important to mention this is an approximation using a correlation between the different emotions of a face and the student’s attention level. Therefore, it’s an academic exercise and should not be considered the reality of a student’s engagement with the class. It’s only an approximation that serves as a starting point for analyzing trends and adapting the teaching strategies accordingly. It’s critical for educators and technologists to use these tools as supplements, not replacements, in a live education setting.

Architecture

So, we want to create a system that measures and gives recommendations to the teacher as the virtual class progresses. In this post, I present an approach using event-driven and serverless architecture to accomplish the following tasks:

  • Capture screenshots of students’ faces as they participate in a virtual class

  • Automatically identify and analyze all students’ faces, and get emotion coefficients using AWS Rekognition

  • Determine the attention level of each student and the overall attention level for the class

  • Recommend an activity to the teacher

So, we can identify a couple of modules or services to accomplish it, as shown in the following high-level overview:

Image that shows the high-level overview architecture of the solution. There are two icons representing people, one is the “students” and the other one is the “teacher”. Both have arrows that point to a “virtual class with video feed”. The teacher also has an arror that goes to “screenshot analyzer”, which starts app and takes screenshots periodically. that has two arrows, one goes to Rekognition API and the other one goes to “Teacher Recommender”. The teacher also has an arrow that goes to the “teacher recommender” directly to check if any recommendations

As marked in the diagram, this blog post will cover the “Screenshot Analyzer” system. The “Teacher Recommender” will be covered in a future blog post

Expanding on the “Screenshot Analyzer” system, we can delve into the detailed design with the following diagram that shows the AWS resources and services that we’ll be using to build the solution:

Image that shows the detailed architecture of the screenshot analyzer. It starts with the virtual class with video feedback, with goes into a screenshot node.js REPL app, which uploads screenshots to a “Class Screenshot Bucket” S3. Then that bucket triggers a lambda function on “Object Created”. The lambda is called “screenshot analyzer and goes to AWS Rekognition and an Analysis Results DB

In the image, we can see that the idea is to have a Node.js console application take screenshots of the virtual class video in a set interval of time. Each screenshot gets uploaded to an S3 bucket where a Lambda function gets triggered every time a new screenshot gets created. This Lambda is responsible for analyzing the screenshot, extracting the faces, and posting the results in a DynamoDB table.

Solution

Screenshot App

For the screenshot application, we can use the npm package “screenshot-desktop” to take the screenshots from our node.js application. Using a simple “setInterval” in javascript, we can set a recurring timer that takes a screenshot, and then uploads the image buffer into an S3 bucket using the AWS SDK for javascript (v3), as shown in the following code snippet:


  setInterval(async () => {
  const screenshotTime = getTimestamp()
  // For example:
  // - /classes/2022-01-01/classid-123456/screenshotid-9518945.jpg
  // - /classes/2022-01-01/classid-123456/screenshotid-3159841.jpg
  const screenshotFilePath = `${basePath}/screenshotid-${screenshotTime}.${SCREENSHOTS_FORMAT}`
  try {
    const imgBuffer = await screenshot({ format: SCREENSHOTS_FORMAT })
    const metadata = {
      classId: classId.toString(),
      classStartedAtTimestamp: getTimestamp(startDate).toString(),
      screenshotTime: screenshotTime.toString(),
      screenshotBasePath: basePath,
    }
    await aws.s3.uploadToS3(BUCKET_NAME, screenshotFilePath, imgBuffer, metadata)
  } catch (error) {
    console.log('err', error)
    logMsg('Error taking and saving screenshot.', error, { screenshotTime, screenshotFilePath })
  }
}, SCREENSHOT_INTERVAL_SECONDS * 1000)

(Full code repo can be found in the References section at the end of the article)

Screenshot Analyzer

After a new screenshot gets added to the S3 bucket, we “hook up” a Lambda to the S3 so that the Lambda gets triggered automatically when a new screenshot gets uploaded. Using the Serverless Framework, this can be done easily as shown in the following code snippet:


events: [
    {
      s3: {
        bucket: `${process.env.BUCKET_NAME || '${env:BUCKET_NAME}'}-${process.env.STAGE || '${env:STAGE}'}`,
        event: 's3:ObjectCreated:*',
        existing: true,
        rules: [{ suffix: '.jpg' }],
        forceDeploy: true, // Used to force cloudformation to update the trigger no matter what
      }
    }
  ]

The reference to the bucket is saved in an environment variable for easier management across different stages and deployments.

In the Lambda code, the following bullet points are needed to complete the analysis:

  • Download the screenshot from S3

  • Identify each face that’s in the screenshot, along with bounding borders and emotions of each face

  • Use the “sharp” npm package to extract and save each individual face as its own image

  • Upload each cropped face into another S3 bucket for future reference

  • Save the results into a DynamoDB Table for other consumers to use

The following code snippet shows the Lambda handler processor, which orchestrates those tasks:


const processEvent = async (eventRecord: S3EventRecord): Promise => {
  const {
    bucket: { name: imageBucketName },
    object: { key: imageObjectKey },
  } = eventRecord.s3
  try {
    const faceAnalysisResults = await AttentionService.analyzeImage(imageObjectKey)
    const { Body, Metadata: screenshotMetadata } = await aws.s3.getObject(imageBucketName, imageObjectKey)
    const {
      classid: classId,
      classstarteddatetimestamp: classStartedDateTimestamp,
      screenshottime: screenshotTime,
      screenshotbasepath: screenshotBasePath,
    } = screenshotMetadata


    const completeFaceAnalysisResults = await extractFaces(faceAnalysisResults, Body as Readable, screenshotTime, classId)


    await AttentionService.publishResults(
      completeFaceAnalysisResults,
      classId,
      classStartedDateTimestamp,
      screenshotTime,
      screenshotBasePath,
      imageObjectKey
    )
  }
  catch (ex) {
    console.error('Error processing s3 event record', inspect(eventRecord))
  }
}

When analyzing the image, we use AWS Rekognition to detect faces and get the levels of emotions per face, as shown in the code snippet below. Note that to calculate the attention level of a face, we use the emotions detected by AWS Rekognition (which gets calculated using the formula explained in the above sections).


const analyzeImage = async (s3ObjectKey: string): Promise => {
  try {
    const results = await aws.rekognition.detectFaces(BUCKET_NAME, s3ObjectKey)
    const createdAt = getTimestamp()
    return results.FaceDetails
      // .filter(face => shouldDiscardReading(face)) // (TODO): uncomment this line to discard faces with low confidence
      .map((face, i) => ({
        ...face,
        index: i + 1,
        analysisCreatedAt: createdAt,
        Landmarks: undefined,
        attentionLevel: roundNum(calculateAttentionLevel(face.Emotions) * 100, 2), // convert to percentage
        isUncertainAnalysis: isUncertainAnalysis(face.Emotions),
      }))
  }
  catch (ex) {
    console.error('Error calculating screenshot raw attention', ex, s3ObjectKey)
  }
}

For extracting the faces and cropping them, we use the “sharp” npm package, which has an easy-to-use API for this kind of operation (as long as we have the “bounding box” from the AWS Rekognition analysis results) as shown in the code snippet below:


const { BoundingBox } = faceAnalysisResult;
const screenshot = await streamToBuffer(screenshotStream);
const {
  height: screenshotHeight,
  width: screenshotWidth,
} = await sharp(screenshot).metadata();


const faceHeight = Math.round(BoundingBox.Height * screenshotHeight) + (pixelsOffset * 2);
const faceWidth = Math.round(BoundingBox.Width * screenshotWidth) + (pixelsOffset * 2);
const faceTop = Math.round(BoundingBox.Top * screenshotHeight) - pixelsOffset;
const faceLeft = Math.round(BoundingBox.Left * screenshotWidth) - pixelsOffset;


// Limits the face extraction to the screenshot size
const height = faceHeight > screenshotHeight ? screenshotHeight : faceHeight;
const width = faceWidth > screenshotWidth ? screenshotWidth : faceWidth;
const top = faceTop <= 0 ? screenshotHeight : faceTop;
const left = faceLeft <= 0 ? screenshotWidth : faceLeft;
// Extracts the face from the screenshot
const faceBuffer = await sharp(screenshot).extract({
  height,
  width,
  top,
  left,
}).toBuffer();
return faceBuffer;

(Full code repo can be found in the References section at the end of the article)

Future Work

While this article gave an overview of the solution as a whole, it only delved into the “screenshot analyzer” section. In future articles, we will explore other components, especially the “Teacher recommender” system, which provides tailored activity suggestions based on the data that was collected by the screenshot analysis.

Stay tuned for future updates where we will dive deeper into these components!

Conclusion

In this article, we covered a “learning recommendation system”, which aims to improve the challenge of measuring and enhancing student engagement in virtual learning environments. By using the machine learning capabilities built into AWS Rekognition, we can detect and extract the faces of students in a virtual classroom, then analyze them and save them for future use by a teacher recommender app for tailor-made recommendations that aim to improve the learning experience.

Using serverless architecture, we designed an effective and efficient system that accomplishes the goal of analyzing and calculating student engagement in a virtual classroom.

Stay tuned for the teacher recommender system in a future article!

References