Posts
Search
Contact
Cookies
About
RSS

Paginating and filtering a composite data source with Django

Added 4 Jun 2023, 9:21 a.m. edited 18 Jun 2023, 1:12 a.m.

In an ideal world, all your data for your Django project is in the same database that you have full access to… However, there is often the case where you’re not in this position, for example you might need to access a list of team members from a foreign database (or even server) where you only have read access, additionally because you need to record other types of data against this team data you might have a second table using a key from the read only table.

The issue comes when you want to filter and paginate the composite data, you can’t “join” between tables on different servers. Fortunately, one of the requirements namely pagination is where we can leverage the power of Django to provide the view that we’re after.

Because a view typically might only show a small number of records at a time (a dozen or so as opposed to 100’s or 1000’s) we can process just the page that is required, rather than attempting to filter and join over a whole table.

Let’s look at the get and post handler first:

def get(self, request):
    # render the default view page 1 no filters
    return self.render_index(request, '', '', '', '1')

def post(self, request):
    if 'pagination' in request.POST:
        # get the filters from the session, the page from the pagination form
        user_name_filter = request.session.get('team:user_name_filter', '')
        station_filter = request.session.get('team:station_filter', '')
        alloc_filter = request.session.get('team:alloc_filter', '')
        page = request.POST.get('page', 1)
    else:
        # filter changed
        user_name_filter = request.POST.get('user_name', '')
        station_filter = request.POST.get('station', '')
        alloc_filter = request.POST.get('alloc', '')
        page = 1  # reset page
    # Update the page parameter in the session
    request.session['team:page'] = page
    return self.render_index(request, user_name_filter, station_filter, alloc_filter, page)

The first thing to note is that much of the interaction with this view is via POST requests, even the pagination is done with a set of small forms. There is no requirement for URL that can be shared and keeping the various variables out of sight just helps keep things neat looking. I’ve always thought that raw values in a URL is a positive encouragement to tamper, while POST variables can be as easily tampered with, it's somewhat easier to mitigate this.

Any button on a pagination form is given a name of ‘pagination’ this tells the view whether it’s a pagination POST or a change of the filters.

Both the POST and GET handlers rely on the method ‘render_index’ as the main functionality of the view.

The first thing that this method does is to store the filter values in the session, notice that the session value names are prefaced with the name of the view.

def render_index(self, request, user_name_filter, station_filter, alloc_filter, page):
    # Update the filter values in the session
    request.session['team:user_name_filter'] = user_name_filter
    request.session['team:station_filter'] = station_filter
    request.session['team:alloc_filter'] = alloc_filter
    # Build the filter conditions based on the provided filter values
    filter_conditions = Q()
    if user_name_filter:
        user_ids = list(TblUser.objects.using('MES').filter(strEmployeeDisplayName__icontains=user_name_filter)
            .values_list('intEmployeeId', flat=True)
        )
        filter_conditions &= Q(userId__in=user_ids)
    if station_filter:
        filter_conditions &= Q(station=station_filter)
    if alloc_filter:
        filter_conditions &= Q(alloc=alloc_filter)
    tms = TeamMember.objects.filter(filter_conditions)
    paginator = Paginator(tms, self.paginate_by)
    try:
        page_obj = paginator.page(page)
    except EmptyPage:  # Handle the case where the page number is out of range
        page_obj = paginator.page(1)
    foreign_key_values = [item.userId for item in page_obj]
    # For the current "page," manually "join" two tables from foreign databases
    tbl_usrs = TblUser.objects.filter(intEmployeeId__in=foreign_key_values)
    related_data_mapping = {item.intEmployeeId: item for item in tbl_usrs}
    combined = []
    available_row_ids = []
    for item in page_obj:
        available_row_ids.append(str(item.id))
        related_data = related_data_mapping.get(item.userId)
        combined_dict = {**model_to_dict(item), **model_to_dict(related_data)}
        combined_dict['status_concatenated'] = ', '.join(
           tmStatus.objects.filter(teamMember=item).values_list('status__status', flat=True) )
        combined.append(combined_dict)
    signed_ids = signing.dumps(available_row_ids)  # Mitigate form tampering
    context = {
        'page_obj': page_obj,
        'combined': combined,
        'signed_ids': signed_ids,
        'user_name_filter': user_name_filter,
        'station_filter': station_filter,
        'alloc_filter': alloc_filter,
    }
    return render(request, self.template_name, context)

While building the filter with the ‘Q’ object is straightforward for the fields in the ‘local’ table the filter on the ‘external’ table requires us to do a query on that table, fortunately even 1 or 2 characters makes this a rather modest quantity. Given a set of keys this “foreign” query can be done on the ‘local’ table. Once we have a page of filtered data, in this case just a dozen rows it’s a simple process to iterate each row, building a dictionary of values representing the “joined” page data.

You might notice that the ID’s indexing each record are signed, this is to check that the detail view is only being called with an ID from the specific page that is being displayed. The principle being that the user should only be able to do those actions that they are specifically invited to do, and that “clever” people shouldn’t be able to short circuit any process.

@method_decorator(login_required, name='dispatch')
class Details(View):
    template_name = "TeamMembers/details.html"

    def post(self, request):
        page = request.POST.get('page')
        submitted_id = request.POST.get('id')
        signed_ids = request.POST.get('signed_ids')
        try:
            available_row_ids = signing.loads(signed_ids)
        except (signing.BadSignature, TypeError):
            messages.error(request, 'Invalid request ' + str(signed_ids))
            return redirect('TeamMembers:index')
        if submitted_id not in available_row_ids:
            messages.error(request, 'Invalid ID')
            return redirect('TeamMembers:index')
        context = {
            'id': submitted_id,
            'signed': signed_ids,
            'page': page
        }
        return render(request, self.template_name, context)

You can see in the detail view that actually checking the ID is trivial

It should be noted that while signing the ID’s does stop trivial tampering, in theory it isn’t full proof, as to whether it would be practical or even worth working round this mitigation well that’s another matter!

While Django is a higher level framework, it's clearly been written in a manner that allows you not just the simple expediency of using as much or as little of it as you need, but not only that can be directly used to implement scenarios outside its perceived use case.